Wed Nov 2, 2016

Recall

The mosaic package has the following 4 functions will give us (most of) the random simulation tools we need. We've seen

  1. rflip(): Flip a coin
  2. shuffle(): Shuffle a set of values
  3. do(): Do the same thing many, many, many times
  4. resample(): the swiss army knife of functions
  • Today we do shuffle() and resample()

Key Distinction

A huge distinction in types of sampling:

  1. Sampling with replacement
  2. Sampling without replacement

In the Powerball analogy, this translates to:

  1. After picking a ball, putting it back into the machine
  2. After picking a ball, leaving it out. What the lottery does in real life.

Shuffling AKA Permuting

Shuffling AKA re-ordering AKA permuting are all synonyms. I'm going to use all three terms interchangeably.

Run the following in your console:

library(mosaic)
# Define a vector fruit
fruit <- c("apple", "orange", "mango")

# Do this multiple times:
shuffle(fruit)

Shuffling AKA Permuting

This works with the do() operator…

do(5) * shuffle(fruit)

… as well as within a mutate()

example_data <- data_frame(
  name = c("Ilana", "Abbi", "Hannibal"),
  fruit = c("apple", "orange", "mango")
)

# Run this multiple times: 
example_data %>% 
  mutate(fruit = shuffle(fruit))

Resampling

At its most basic, resample() resamples the input vector with replacement. Run this in the console multiple times:

resample(fruit)
  • You can get the same fruit all three times i.e. sampling with replacement
  • resample() has default settings that we can set to fit our needs; it is a swiss army knife.
  • Let's unpack the defaults:

Resampling

resample(x=fruit, size=length(fruit), replace=TRUE, 
         prob=rep(1/length(fruit), length(fruit)) )
  • x is the input. In this case fruit.
  • size: size of output vector. By default the same size as x.
  • replace: Sample with or without replacement. By default with replacement.
  • prob: Probability of sampling each input value. By default, equal probability
  • Run rep(1/length(fruit), length(fruit)) in your console. In the case of fruit, this vector is rep(1/3, 3) i.e. repeat 1/3 three times.

Learning Checks

  1. Rewrite rflip(10) using the resample() command. Hint: coin <- c("H", "T")
  2. Rewrite the shuffle() command by changing the minimal number of default settings of resample(). Test this on fruit
  3. Write code that will allow you to generate a sample of 15 fruit without replacement.
  4. Write code that will allow you to generate a sample of 15 fruit with replacement.
  5. What's the fastest way to do the above 5 times? Write it out

Learning Check

  • Say its the early 1900's, and you are a statistician and you meet someone who claims to be able to tell by tasting whether the tea or the milk was added first to a cup.
  • You call BS and think they are just guessing.
  • Say you have 8 cups, tea, and milk handy. How would you design an experiment to test whether a) they can really tell which came first or b) they are just guessing?
  • Brainstorm all the components of this experiment with your seatmates.
  • Then think about how you can implement this with resampl()ing.