Probability via Simulation

library(dplyr)
library(ggplot2)
library(mosaic)

LC1: Create a histogram of the number illustrating the long-run behavior of flipping a coin 10 times.
- Where is it centered?
- Describe the shape of the distribution of values
LC2: Tie all elements of Line 13 of code below to the Attributes of Powerball seen in the slides above.
LC3: Try to replicate the above, but for the sum of two die rolls. Hint: resample(c(1:6), 2)

LC1

coin_flips <- do(500)*rflip(10)
coin_flips <- coin_flips %>% 
  as_data_frame()

If we View(coin_flips) the first 6 rows, we see that we have in tidy format:

n	heads	tails	prop
10	4	6	0.4
10	6	4	0.6
10	3	7	0.3
10	4	6	0.4
10	6	4	0.6
10	5	5	0.5

So we plot a histogram of the heads variable with binwidth=1 since we are dealing with integers i.e. whole numbers.

ggplot(coin_flips, aes(x=heads)) +
  geom_histogram(binwidth = 1)

Where is it centered? Answer: At 5 i.e. half of 10.
Describe the shape of the distribution of values. Answer: bell-shaped. i.e. like a Normal distribution.

LC2

Atrributes of the Lottery Machine:
- How many balls do you have? 2
- What are written on the balls? Heads and tails
- Do the balls have equal probability of being picked? Yes
Attributes of the Drawing:
- How are you drawing the balls? At random
- How many balls do you draw? 10
- What are you recording about each drawn ball? If the ball has heads or tails
- What do you do with drawn balls? You put them back in the machine
Number of Lotteries:
- How many times do you repeat the lottery? Many many many times. In the above case, 500 times

LC3

Let’s unpack sample(c(1:6), 2):

Running c(1:6) in the console returns six values, 1 2 3 4 5 6, one for each possible die roll value.
sample(c(1:6), 2) says: sample a value from 1 to 6 twice. This is akin to rolling a die twice.

two_dice <- do(500) * sample(c(1:6), 2)
two_dice <- two_dice %>% 
  as_data_frame()

If we View(two_dice) the first 6 rows, we see that we have in tidy format:

V1	V2
6	3
6	5
4	1
2	3
6	1
3	5

So to get the sum of the two dice, we mutate() a new variable sum based on the sum of the two die:

two_dice <- two_dice %>% 
  mutate(sum = V1 + V2)

V1	V2	sum
6	3	9
6	5	11
4	1	5
2	3	5
6	1	7
3	5	8

And now we plot it:

Atrributes of the Lottery Machine:
- How many balls do you have? 6
- What are written on the balls? Integers 1 through 6
- Do the balls have equal probability of being picked? Yes
Attributes of the Drawing:
- How are you drawing the balls? At random
- How many balls do you draw? 2
- What are you recording about each drawn ball? The sum of the balls
- What do you do with drawn balls? You put them back in the machine
Number of Lotteries:
- How many times do you repeat the lottery? Many many many times. In the above case, 500 times

Advanced

What’s the deal with the ugly axes tick marks? This is again b/c computers are stupid, and ggplot does not know we are dealing only with whole numbers i.e. integers. We could cheat it and treat the sum, a numerical variable, as a categorical variable using geom_bar()

Here we can at least separate out the individual whole numbers, but we still have the axes tick marks problem? How do I fix these? That’s for the more advanced data science class MATH 216 Data Science.

Probability via Simulation

Albert Y. Kim

Mon Oct 31, 2016

LC1

LC2

LC3

Advanced