With the internet, we are in a new age of data:
Wed Oct 12, 2016
With the internet, we are in a new age of data:
Jenny Bryan said: "Classroom data are like teddy bears and real data are like a grizzly bear with salmon blood dripping out its mouth."
| Traditional Classroom Data | Real Data |
|---|---|
Some attributes of real data:
Inconsistent formatting is a real pain:
To take this, we now officially introduce the dplyr package: a grammar of data manipulation
Were it not for this package, I probably wouldn't be taking a data-centric view to this course.
Why do I have a dplyr sticker on my laptop? Why is dplyr so good IMO?
function() you use.Say hello to the 5MV: the five main verbs
select() columns by variable namefilter() rows matching criteriamutate() existing variables to create new onesarrange() rowssummarise() numerical variables that are group_by() categorical variables_join() two separate data frames by corresponding variablesselect() columns by variable name: front of cheatsheet, bottom rightfilter() rows matching criteria: front of cheatsheet, bottom middle. We've already used this in Chapter 3 on Data Viz.Keep looking back and forth between book and cheatsheet!