This final project, which is worth 45% of your final grade, is your opportunity to test the progress you’ve made as an R programmer. Once you’ve completed it, you’ll be able to say that you have experience in the following:
Your task for the final project is construct a small analysis of a
real-world dataset relevant to economics, finance, or business. The main
deliverable will be one or more R scripts (you are more than welcome to
have, for example, a get_data.R
and a separate
analyze_data.R
) which accomplish all the following
tasks:
{stargazer}
, perhaps) showing comparisons across
modelsWhile it is certainly possible to complete the tasks listed above with the base functionality of R, you will likely find that using external libraries will make your code more powerful and expressive. The ability to use external packages is another key learning outcome for the course. For the remainder of this assignment, I’ll refer to the package list, this package list I’ve hosted on GitHub. In this assignment, you must use at least one external package from each of the three sections in the package list:
You are, of course, welcome to use any other packages you deem appropriate in addition to this minimum requirement.
See the following sections for submission details on each of the four parts of this assignment
The first deliverable for this project is a 1-2 page written report detailing your plans for the final project. This is not meant to lock you in to a particular data set, set of packages, or approach…all those details can change between this report and your final project submission in Week 5. It is just meant to get your thinking about the project and ensure that you’re on the right track.
This report should answer the following items:
This component of the project is due prior to our Week 4 session.
Please submit your report (as a Word doc, PDF, or HTML doc) to the “Final Project Proposal” dropbox on D2L.
This code is the main deliverable for the class…it is worth 20% of your final grade. This is the part of the assignment where you get to show off what you’ve learned! Your goal is to create one or more R scripts that meet the requirements listed in the project description above.
This component of the project is due prior to our Week 5 session.
Please submit your code to the “Final Project (script + report)” dropbox on D2L. If you use multiple scripts (totally acceptable!), please include an additional script called “build.R” which species the order to run the scripts in.
Your script will be scored out of 100 points, using the following rubric:
Grade Item | Total Possible Points |
---|---|
All-or-Nothing Grade Items | |
1. All code runs without error (with no reliance on local data) | 25 |
2. Uses at least one Data Retrieval and Transformation package | 4 |
3. Uses at least one Math and Statistics package | 4 |
4. Uses at least one Visualization, Presentation, and Reporting package | 4 |
5. Every external package used is imported with
library() |
3 |
6. All library() calls are at the top of
submitted script(s) |
1 |
7. Code does not call
install.packages() , install_github() ,
etc. |
4 |
Code Quality Items | |
1. Code commenting | 5 |
(4-5) Clearly commented | |
(1-3) Minimal comments, code is hard to understand | |
(0) No comments | |
2. Code Organization | 15 |
(12-15) Well-organized, intuitive flow | |
(8-11) Difficult to understand without comments | |
(0-8) Takes significant effort to understand | |
3. Use of External Functions | 15 |
(11-15) All/most external functions are called
with :: |
|
(6-10) Some external function calls use
:: |
|
(0-5) Unclear use of external functions | |
4. Problem Solving | 20 |
(15-20) Excellent problem decomposition, R solution | |
(8-14) Good solution, meets minimum requirements | |
(0-7) Solution is significantly incomplete |
There are definitely good and bad ways to comment code. For some tips, see this code-commenting tutorial I really like.
In professional data science teams, it is common for team members to present their work in internal “code reviews”, small meetings where a data scientist shares brief background on the problem he/she sought to solve and then invites criticism of his/her code.
This can be a nerve-wracking experience in the context of a new job (trust me), so I’d like to give you the opportunity to practice sharing code with others in the safe setting of this introductory course. In the code review component of this project, you will do a 5-10 minutes live presentation of your final project.
Code reviews will be done in-class during the Week 5 session.
You do not need to present slides or turn in anything on D2L.
When presentations begin, a Google Doc will be shared with the class.
While you present, everyone in the class will have this doc up on their machines and use it to give you comments and questions. The benefit of this practice, in professional settings, is that days after your code review you’ll have a written record of your audience’s feedback.
Be prepared to show your code + report in front of the class.
Your code review should consist of the following:
Your presentation will be scored out of 100 points, using the following rubric:
Grade Item | Total Possible Points |
---|---|
All-or-Nothing Grade Items | |
1. Shows at least one data visualization | 10 |
2. Describes at least one learning or piece of advice for classmates | 10 |
Other Presentation Items | |
1. Problem Introduction | 10 |
(5-10) introduction is clear and concise | |
(1-4) introduction is confusing or rambling | |
(0) no introduction of the problem | |
2. Explanation of the dataset | 30 |
(20-30) explains source and real-world meaning of the data | |
(10-19) literal description with no connection to problem | |
(0-9) inaccurate or confusing explanation | |
3. Explanation of the Code | 40 |
(30-40) clear explanation of how the code solves the problem | |
(10-29) literal description of the code as-is | |
(0-9) inaccurate or confusing explanation |
Avoid these common code review issues:
You’ll do great!
The final deliverable for this project is a written report with your findings. This should be a 2-4 page “executive briefing”, the type that you would write if you were doing this analysis for a consulting client.
Your report should focus on the problem, not the specifics of the code.
It should describe the problem, a brief summary of the work that was done (including the data that was used), and the result.
The problem should be stated as a falsifiable research question.
Bad
Every knows inflation is a big problem. I looked at how high inflation is.
Good
This project explored whether the relationship between observed inflation and consumer sentiment in the U.S. changed in the period 2021-2023.
Your report should NOT include any raw R code, but it should include the output of the visualization step of your script (i.e. a table, plot, or other viz).
This report should be free from statistical jargon, or such jargon should be clearly and concisely explained in the report.
Bad
I used a Breusch-Pagan test to check for heteroskedasticity, and it had a Chi-squared statistic of 46.98 with a p-value of 0.00715.
Good
I observed larger model errors for higher-value stocks, so I’m not confident that these results would hold for a portfolio with a different mix of company sizes.
For example, none of the following phrases should be used (unless they’re clearly explained in jargon-free language):
This component of the project is due prior to our Week 5 session. Please upload your report to the “Final Project (script + report)” dropbox on D2L.
Your report will be scored out of 100 points, using the following rubric:
Grade Item | Total Possible Points |
---|---|
All-or-Nothing Grade Items | |
1. Report does not contain any raw R code | 5 |
2. Report contains data visualizations created by the code | 10 |
Other Report Items | |
1. Problem Explanation | 10 |
(5-10) introduction is clear and concise | |
(1-4) introduction is not clear | |
(0) no introduction of the problem | |
2. Explanation of where the data came from | 10 |
(8-10) clear explanation | |
(0-7) unclear or inaccurate explanation | |
3. Explanation what the dataset contains | 20 |
(15-20) clear, executive-level explanation | |
(10-14) overly-technical description or somewhat unclear or inaccurate | |
(0-9) very unclea, inaccurate, or confusing explanation or no explanation | |
4. Explanation of the result | 25 |
(20-25) result of the project clearly expressed in business terms | |
(10-19) overly-technical explanation of the result | |
(0-10) inaccurate, confusing, or missing explanation of the result | |
5. Grammar and Formatting | 20 |
(18-20) at most two minor grammar and spelling issues | |
(8-17) some grammar and spelling issues | |
(0-7) many grammar and spelling issues |