With the internet, we are in a new age of data:
Wed Oct 12, 2016
With the internet, we are in a new age of data:
Jenny Bryan said: "Classroom data are like teddy bears and real data are like a grizzly bear with salmon blood dripping out its mouth."
Traditional Classroom Data | Real Data |
---|---|
Some attributes of real data:
Inconsistent formatting is a real pain:
To take this, we now officially introduce the dplyr
package: a grammar of data manipulation
Were it not for this package, I probably wouldn't be taking a data-centric view to this course.
Why do I have a dplyr
sticker on my laptop? Why is dplyr
so good IMO?
function()
you use.Say hello to the 5MV: the five main verbs
select()
columns by variable namefilter()
rows matching criteriamutate()
existing variables to create new onesarrange()
rowssummarise()
numerical variables that are group_by()
categorical variables_join()
two separate data frames by
corresponding variablesselect()
columns by variable name: front of cheatsheet, bottom rightfilter()
rows matching criteria: front of cheatsheet, bottom middle. We've already used this in Chapter 3 on Data Viz.Keep looking back and forth between book and cheatsheet!