Proposal
Your proposal is to be submitted in print or electronically. Once you decide on a topic that interests you, think about what you would like to end up with as a final result without worrying about how to get there. Try to visualize what your end product will look like. Will it be an interactive map? A predictive model? Don’t think about coding, or a particular data set, or what you know how to do now. If you come up with something ambitious and original, you’ll be more motivated to learn new things as you go in order to accomplish your goal. The topic is completely open to your choice, but keep in mind the rules listed above.
Content
Your proposals should contain the following content:
- Title: The title of your project
- Purpose: Describe the general topic/phenomenon you want to explore, as well some carefully considered questions that you hope to address. You should make an argument motivating your work. Why should someone be interested in what you are doing? What do you hope people will learn from your project?
- Data: As best you can, describe where you will find your data, and what kind of data it is. Will you be working with spatial data in shapefiles? Where will you be accessing you data? Be as specific as you can, listing URLs and file formats if possible.
- Variables: List, and briefly describe, each variable that you plan to incorporate. If you can, be specific about units, scale, etc.
- End Product: Describe what you hope to deliver as a final product.
- Will it be an interactive application that will be posted on the Internet?
- Will it be a paper that draws some statistical conclusions?
- Will it be a predictive model that forecasts future values?
- Honor Code: Indicate if any component of this project overlaps with work for another class/thesis. If this is the case, please speak to your professor/advisor and have them email me their consent by the due date of the proposal.
Presentation
An effective oral presentation is an integral part of this project. One of the objectives of this class is to give you experience conveying the results of a technical investigation to a non-technical audience in a way that they can understand. Whether you choose to stay in academia or pursue a career in industry, the ability to communicate clearly is of paramount importance. As a data scientist, the burden of proof is on you to convince your audience that what you are saying is true. If your audience (who may very well be less knowledgeable about statistics than you are) cannot understand your results or their interpretations, then the technical merit of your project is irrelevant.
During the last 4 lectures, you will each give a 12 minute presentation of your work. Your goal should be to convey to your audience a clear understanding of your research topic, along with a basic understanding of your project, and how well it addresses the research question you posed. You should not tell us everything that you did, or show a bunch of things that you tried that didn’t work well. After hearing your talk, each student in the class should be able to answer:
- What was your project about?
- What was your data like, and what techniques did you apply to it?
- What were your findings?
You should prepare electronic slides for your talk. PowerPoint/Keynote is fine, but you might also want to consider
- RStudio tools: R Markdown slides
- Beamer (LaTeX)
- Google Slides
Advice
- Budget your time. You only have 12 minutes and we will be running a very tight schedule. Plan for 10 minutes to talk, and 2 minutes to answer questions. Rehearse your talk ahead of time several times in order to get a better feel for your timing.
- As a rule of thumb I use the one minute per slide rule.
- Don’t write too much on each slide. You don’t want people to have to read your slides, because if the audience is reading your slides, then they aren’t listening to you.
- Put your problem in context. Remember that most of your audience will have little or no knowledge of your subject matter. The easiest way to lose people is to dive right into technical details that require prior domain knowledge. Spend a few minutes at the beginning of your talk introducing your audience to the most basic aspects of your topic and present some motivation for what you are studying.
- Speak loudly and clearly. Remember that you know more about your topic that anyone else in the room, so speak and act with confidence!
Write-Up
Your write-up has to be a reproducible R Markdown HTML document that when printed is of length no more than 15 pages. i.e. I should be able to recreate your entire analysis with one click of the mouse.
In your write-up, you should tell a data science audience about your project, why they should care about it, and what you have discovered. Your audience will be people like you: current or aspiring data scientists. Keep in mind that this audience is extraordinarily diverse in terms of skills and abilities, so you should assume very little about what they might know. However, your audience is reasonably tech-savvy, so you need not “dumb-down” your analysis. Your write-up should make it clear to me and any other student in the class what methods and techniques you have used to produce your finished product.
Content
Do not present all of the R
code that you wrote throughout the process of working on this project. In fact
- The amount of R code in the outputted document should be minimal. The less R code the better.
- Important conclusions should appear in the main text, not in comments in the code.
- The R markdown file should contain the necessary and sufficient (i.e. minimal) set of
R
code that is necessary to understand your results and findings. If you make a claim, it must be justified by explicit calculation. A knowledgeable reviewer should be able to reproduce your analysis:
- Compile your
.Rmd
file without modification
- Verify every statement that you have made.
Motivation
Be sure to motivate your topic at the beginning of your write-up. You should try to hook the reader early on. Assume that your audience is a skeptical data scientist who has stumbled across your report but has very little time to read it. Can you give them a reason to continue reading? A cool visualization or result can help.
Style
The write-up can have interactive components. Take advantage of this by including hyperlinks, figures, videos, etc. to provide context for the reader. You can even include a bibiliography, and your references should be embedded via links. Use Markdown elements like links, lists, LaTeX, and images as needed.
Visualizations, particularly interactive ones, will be well-received. That said, do not overuse visualizations. You may be better off with one complicated but well-crafted visualization as opposed to many quick-and-dirty plots. Any plots should be well-thought out, properly labelled, informative, and visually appealing!