1 Flipping a Coin (and Basic Probabilities)

U.S. Quarter

1.1 Overview

The simplest game of chance I can imagine is flipping a coin. A fair coin will have a 50% chance of coming up heads and 50% chance of coming up tails. A fair bet is one that pays even money. That means that if you bet $1 on a particular outcome (heads or tails), you win $1 if you’re right and lose $1 if you’re wrong. In gambling terms the “odds” are 1-to-1, meaning there is an equal probability of each outcome, and a fair bet will pay 1-to-1, meaning that the dollar you bet is matched with 1 dollar you win.

The expected value of such a wager is calculated as:

\[ Expected Value = Pr(win) * (\$1) + Pr(lose) * (-\$1) = 0.5 * (\$1) + 0.5 * (-\$1) = \$0 \]

This is the basic means of calculating any expectation in statistics. First, you find the probabilities or each event. Next, you multiply each probability by the expected return in each scenario. And then you add those all together. The difficult part is usually finding the probabilities or each outcome, especially in games that involve player’s making choices where you also must work to find the optimal choice in any scenario.

1.2 Payouts for Unfair Coins

So what if the coin is not a fair coin and is instead biased towards landing on one side more than the other? If we know the probability of it coming up heads, we can adjust our expectation of a bet on heads. Let’s say we have a 60% chance of getting “heads”. We still bet $1 and win $1 if we’re right. The expected value of our bet is now:

\[ Expected Value = Pr(win) * (\$1) + Pr(lose) * (-\$1) = 0.6 * (\$1) + 0.4 * (-\$1) = \$0.20 \]

We can now expect to win $0.20, on average, every time we bet $1 on a flip of the coin.

If both players know this, you won’t find anyone who is willing to take the bet. Instead, they will want to modify the payout to make it a fair game. The payout that does this will be a 3-to-2 bet. This can easily be inferred from knowing that a game with a 60% probability of winning is providing 3-to-2 odds of winning. That is we expect to win 3 times for every 2 times we lose (which is the same as saying we expect to win 60 times for every 40 we lose, but reduced as a fraction to simplest terms). If an event occurs with odds 3-to-2, a fair payout is the inverse: 2-to-3. This means that we have to wager $3 in order to win $2:

\[ Expected Value = 0.6 * (\$2) + 0.4 * (-\$3) = \$0 \]

Or if we want to stick with our original bet, we can still wager $1, but the payout if we win will only be 2/3 of that:

\[ Expected Value = 0.6 * (\$1 * 2/3) + 0.4 * (-\$1) = \$0 \]

NOTE: Understanding the differences between probabilities of events occurring, the odds of events occurring, and the payouts for different bets are going to be key in our analysis of pretty much any game. Make sure you understand why an event that has a 60% chance of occurring also has 3-to-2 “odds” of occurring. This can be somewhat confusing at first.

1.3 Inferring the Bias of a Coin

1.3.1 Maximum Likelihood Estimates

Inferring the bias of a coin can be as easy or as complex as you would like to make it. The easiest way to estimate the bias of a coin is to flip it a large number of time (maybe a thousand or a million) and record the percentage of times it comes up heads and the percentage of times it comes up tails. The law of large numbers says that in the long run over a large number of trials the overall proportion of events should come increasingly close to their underlying probability of occurring (assuming those probabilities are constant).

But what if you only flip a coin 10 times and observe it come up heads 7 out of those 10 times? Now you are in a tricky spot where you must determine if you truly have enough information from those 10 trials to reliably infer the probability of heads or tails. If we want to estimate the probability of heads in this case, the “maximum likelihood estimate” is 70%. This means that if we assume the probability of heads is 70% and then calculate the likelihood of getting 7 heads and 3 tails in 10 trials, the likelihood will be higher with this estimate of the coin’s bias than with any other estimate. The maximum likelihood estimate for the probability will always be:

\[ Pr(X=x) = \frac{\sum\limits_{n=1}^{N}{(X_n=x)}}{N} \]

That is, we count the number of times the event occurs in N trials and express that as a percentage of the total number of trials.

But is this past evidence really enough to say this is the best answer? What if we only have 1 coin flip and it came up heads? The maximum likelihood estimate now is 100% that heads will come up every time. This yields the highest likelihood for our observed data (our 1 trial), but it intuitively is a very bad guess.

1.3.2 Getting Bayesian With It

The best way to deal with this type of uncertainty is to use Bayesian statistics. These allow us to combine what we have observed in the data with whatever pre-conceived beliefs we might have about the situation. If we believe that a 100% probability of heads is unlikely, we can include that bias in our calculations. We might be wrong, but it will take a large amount of evidence to convince us. The amount of evidence it takes will correlate with how strong our bias is that this should not happen. But even if we start with the belief that the coin will not always flip to heads, if we perform 1,000 trials and see heads every time, we will increasingly become convinced that this is true.

Although it is a powerful way to analyze probabilities, Bayesian methods can be tricky, however, The hardest part is finding a way to quantify your prior expectations mathematically. How sure are you that the coin is likely to be fair? Maybe you distrust the person you are playing against and actually suspect them to have an unfair coin. In that case, how unfair do you think the coin is likely to be? There is no right answer to these questions. They are all subjective. Therefore there is no “correct” way to solve this problem. Every solution depends on our pre-conceived bias or beliefs and the manner in which we choose to quantify those, and it is possible for us to influence the outcome as little or as much as we’d like with those prior beliefs.

There is a good example of what this looks like here, including R code to reproduce the results. On this page they use a discrete set of possibilities that model the probability of flipping heads as either 0%, 10%, 20%, … up to 100%. They start with a triangular probability distribution that favors a 50% probability and rules out the 0% and 100% altogether, setting their probabilities to zero. They then update this prior periodically with new experimental data and show you how the prior probabilities start to evolve. In general, the probability distribution will begin to shift in a way that it begins to favor the maximium likelihood estimator, but in a way that still incorporates our prior beliefs and allows for other possibilities.

In a similar manner, let’s see what happens if we start with a uniform prior, meaning that we have no idea whether the coin is fair or completely biased.

x_values <- seq(0, 1, 0.1)  # discrete values we will model
pdf_prior <- rep(1/length(x_values), length(x_values))

avg_value <- weighted.mean(x_values, pdf_prior)
print(paste0("Expecte Value = ", avg_value))
## [1] "Expecte Value = 0.5"

As we can see, this simple prior also essentially assumes the coin is fair.

Now, let’s build a function to update our prior based on new data. First, we will need to calculate the likelihood of our data under each of our models (each of our possible values of x):

get_likelihoods <- function(x_values, num_heads, num_tails) {
  dbinom(num_heads, num_heads+num_tails, x_values)
}

Next, we will need to build an update function that applies Bayes’ Theorem:

\[ Pr(\Theta | D) = \frac{Pr(D | \Theta) Pr(\Theta)}{Pr(D)} \]

This tells us that the probability of our model (\(\Theta\)) given the data (D), can be inferred from the probability (or likelihood) or our data given our model, our prior beliefs about the probability of of the model being right, and the probability of the data itself. In our case, this function will be:

update_priors <- function(x_values, priors, num_heads, num_tails) {
  likelihoods <- get_likelihoods(x_values, num_heads, num_tails)
  priors <- (priors * likelihoods) / sum(priors * likelihoods)
  priors
}
trials <- tibble::tribble(
  ~heads, ~tails,
  1, 0,
  0, 1,
  2, 3,
  1, 4,
  5, 5,
  6, 4,
  4, 6,
  9, 1,
  10, 10,
  15, 5,
  19, 1,
  20, 0,
  50, 50,
  60, 40,
  90, 10,
  100, 0
)

avg_posterior <- rep(NA_real_, nrow(trials))
for (i_row in 1:nrow(trials)) {
  pdf_posterior <- update_priors(x_values, pdf_prior, trials$heads[i_row], trials$tails[i_row])
  avg_posterior[i_row] <- weighted.mean(x_values, pdf_posterior)
}

knitr::kable(data.frame(
  n = trials$heads + trials$tails,
  heads = trials$heads,
  tails = trials$tails,
  mle = trials$heads / (trials$heads + trials$tails),
  avg_posterior = avg_posterior
))
n heads tails mle avg_posterior
1 1 0 1.00 0.7000000
1 0 1 0.00 0.3000000
5 2 3 0.40 0.4287129
5 1 4 0.20 0.2928934
10 5 5 0.50 0.5000000
10 6 4 0.60 0.5833635
10 4 6 0.40 0.4166365
10 9 1 0.90 0.8180324
20 10 10 0.50 0.5000000
20 15 5 0.75 0.7273362
20 19 1 0.95 0.8782598
20 20 0 1.00 0.9870205
100 50 50 0.50 0.5000000
100 60 40 0.60 0.5976159
100 90 10 0.90 0.8975115
100 100 0 1.00 0.9999973

We can see that Bayesian updates swing pretty quickly with only 1 observation, moving toward a 70% probability for whichever side of the coin it saw come up first. With more data it starts to go closer to the maximum likelihood estimate. In this case even after 5 flips of the coin we are pretty close to the MLE, 42.8% versus 40% when we see 2 out of 5 come up heads and 29.3% versus 20% when we see 1 out of 4 come up heads. In this case we move quickly towards the MLE because we did not specify a strong preference for any of the models in our prior. We do see that the model is rather unwilling to go too far to the extremes though. When 20 out of 20 trials all turn up heads, it is only willing to go as far as a 98.7% probability of heads, leaving room for tails to still appear on the next toss. Even after 100 out of 100 trials all turn up heads, it still leaves the tiniest probability that tails still might occur.

My brother told me once that he was reading the book “Probability Theory: The Logic of Science” by E.T. Jaynes and that it started with an opening chapter on probabilities for flipping coins. One interesting question Jaynes asked was: what is the probability that the coin will land on its edge and be neither heads nor tails? Now, we might toss a coin a million times and never see these happen, but that doesn’t mean that it is impossible. Jaynes proposed a Bayesian approach that could begin with some assumption about the probability of this event. Perhaps we give it a 1% probability. Then we can run our million tests, update our probabilities, and even if we never see it occur, there will still be a non-zero probability that it could occur on the next roll. This is an interesting way to include potential outcomes in your model even when they have never been witnessed before.

Ultimately, we can be sure of one thing: flipping a coin is not as trivial mathematically as it might first appear. There are all kinds of good statistical lessons that can be learned from this simple experiment.