library(dplyr)
library(ggplot2)
library(mosaic)
  • LC1: Create a histogram of the number illustrating the long-run behavior of flipping a coin 10 times.
    • Where is it centered?
    • Describe the shape of the distribution of values
  • LC2: Tie all elements of Line 13 of code below to the Attributes of Powerball seen in the slides above.
  • LC3: Try to replicate the above, but for the sum of two die rolls. Hint: resample(c(1:6), 2)

LC1

coin_flips <- do(500)*rflip(10)
coin_flips <- coin_flips %>% 
  as_data_frame()

If we View(coin_flips) the first 6 rows, we see that we have in tidy format:

n heads tails prop
10 4 6 0.4
10 6 4 0.6
10 3 7 0.3
10 4 6 0.4
10 6 4 0.6
10 5 5 0.5

So we plot a histogram of the heads variable with binwidth=1 since we are dealing with integers i.e. whole numbers.

ggplot(coin_flips, aes(x=heads)) +
  geom_histogram(binwidth = 1)

  • Where is it centered? Answer: At 5 i.e. half of 10.
  • Describe the shape of the distribution of values. Answer: bell-shaped. i.e. like a Normal distribution.

LC2

  • Atrributes of the Lottery Machine:
    • How many balls do you have? 2
    • What are written on the balls? Heads and tails
    • Do the balls have equal probability of being picked? Yes
  • Attributes of the Drawing:
    • How are you drawing the balls? At random
    • How many balls do you draw? 10
    • What are you recording about each drawn ball? If the ball has heads or tails
    • What do you do with drawn balls? You put them back in the machine
  • Number of Lotteries:
    • How many times do you repeat the lottery? Many many many times. In the above case, 500 times

LC3

Let’s unpack sample(c(1:6), 2):

  • Running c(1:6) in the console returns six values, 1 2 3 4 5 6, one for each possible die roll value.
  • sample(c(1:6), 2) says: sample a value from 1 to 6 twice. This is akin to rolling a die twice.
two_dice <- do(500) * sample(c(1:6), 2)
two_dice <- two_dice %>% 
  as_data_frame() 

If we View(two_dice) the first 6 rows, we see that we have in tidy format:

V1 V2
6 3
6 5
4 1
2 3
6 1
3 5

So to get the sum of the two dice, we mutate() a new variable sum based on the sum of the two die:

two_dice <- two_dice %>% 
  mutate(sum = V1 + V2)
V1 V2 sum
6 3 9
6 5 11
4 1 5
2 3 5
6 1 7
3 5 8

And now we plot it:

  • Atrributes of the Lottery Machine:
    • How many balls do you have? 6
    • What are written on the balls? Integers 1 through 6
    • Do the balls have equal probability of being picked? Yes
  • Attributes of the Drawing:
    • How are you drawing the balls? At random
    • How many balls do you draw? 2
    • What are you recording about each drawn ball? The sum of the balls
    • What do you do with drawn balls? You put them back in the machine
  • Number of Lotteries:
    • How many times do you repeat the lottery? Many many many times. In the above case, 500 times

Advanced

What’s the deal with the ugly axes tick marks? This is again b/c computers are stupid, and ggplot does not know we are dealing only with whole numbers i.e. integers. We could cheat it and treat the sum, a numerical variable, as a categorical variable using geom_bar()

Here we can at least separate out the individual whole numbers, but we still have the axes tick marks problem? How do I fix these? That’s for the more advanced data science class MATH 216 Data Science.