library(dplyr)
library(ggplot2)
library(mosaic)
resample(c(1:6), 2)
coin_flips <- do(500)*rflip(10)
coin_flips <- coin_flips %>%
as_data_frame()
If we View(coin_flips)
the first 6 rows, we see that we have in tidy format:
n | heads | tails | prop |
---|---|---|---|
10 | 4 | 6 | 0.4 |
10 | 6 | 4 | 0.6 |
10 | 3 | 7 | 0.3 |
10 | 4 | 6 | 0.4 |
10 | 6 | 4 | 0.6 |
10 | 5 | 5 | 0.5 |
So we plot a histogram of the heads
variable with binwidth=1
since we are dealing with integers i.e. whole numbers.
ggplot(coin_flips, aes(x=heads)) +
geom_histogram(binwidth = 1)
Let’s unpack sample(c(1:6), 2)
:
c(1:6)
in the console returns six values, 1 2 3 4 5 6
, one for each possible die roll value.sample(c(1:6), 2)
says: sample a value from 1 to 6 twice. This is akin to rolling a die twice.two_dice <- do(500) * sample(c(1:6), 2)
two_dice <- two_dice %>%
as_data_frame()
If we View(two_dice)
the first 6 rows, we see that we have in tidy format:
V1 | V2 |
---|---|
6 | 3 |
6 | 5 |
4 | 1 |
2 | 3 |
6 | 1 |
3 | 5 |
So to get the sum of the two dice, we mutate()
a new variable sum
based on the sum of the two die:
two_dice <- two_dice %>%
mutate(sum = V1 + V2)
V1 | V2 | sum |
---|---|---|
6 | 3 | 9 |
6 | 5 | 11 |
4 | 1 | 5 |
2 | 3 | 5 |
6 | 1 | 7 |
3 | 5 | 8 |
And now we plot it:
What’s the deal with the ugly axes tick marks? This is again b/c computers are stupid, and ggplot does not know we are dealing only with whole numbers i.e. integers. We could cheat it and treat the sum, a numerical variable, as a categorical variable using geom_bar()
Here we can at least separate out the individual whole numbers, but we still have the axes tick marks problem? How do I fix these? That’s for the more advanced data science class MATH 216 Data Science.