Bayesian Reasoning

Bayesian reasoning (the application of Bayes' theorem) is incredibly important, but virtually unknown (and not understood) in the general population.  While politicians, advertisers, and salespersons take advantage of this lack of understanding to extract votes and money; doctors, lawyers, and engineers can make life-threatening mistakes if they don't apply the process correctly.  It's a concept so vital to the proper interpretation of the world around us that I believe it should be a mandatory subject during high school.

Let's start with an example to motivate the subject.  Let's say you're not feeling well and you go to your doctor.  Your doctor can't determine what's wrong and decides to run some blood tests.  The results come back and your doctor informs you that the test for a terminal disease came back positive.  Do you panic?  Probably.  Should you panic? Not necessarily.  There is vital information which we don't know and need to know in order to understand the situation properly.

First, we need to know how accurate the test is.  We need to know the rates of type I and type II errors ("false positive" and "false negative", respectively).  Let's say you ask this question and your doctor tells you the test has a false positive rate of 1 in 100,000 and a false negative rate of 1 in 200,000.  One might believe that there is a 99.999% chance that you have this rare disease, but we need to interpret these error rates correctly.  This is where Bayes' theorem comes in and to apply it we also need to know how rare the disease is.

Let's say it's an extremely rare disease, affecting 1 in 200 million people.  Knowing this you can update the probability that you have the disease to something more accurate.  Applying Bayes' theorem we can calculate that a more accurate probability of having the disease is 4.8%.  That's a pretty big difference.

There is a caveat though; we're ignoring any compounding factors that led the doctor to run the test in the first place.  This calculation assumes that we had no particular reason to run the test, so we use the occurrence rate in the general population (1 in 200 million) in our calculation.  If you have specific risk factors that narrows the base group then the probability of having the disease will increase.

For example: let's suppose that the doctor chose to run this test specifically because you have high cholesterol levels and a family history of diabetes.  Suppose that the disease occurs in roughly 1 in 1 million of people with those risk factors.  Now, instead of using the 1 in 200 million as our occurrence rate, we'd use the 1 in 1 million.  In which case the probability of having the disease rises to 9.1%.

Suppose another risk factor increases the likelihood from 1 in 1 million to 1 in 100.  Now the probability of having the disease shoots up to 99.9%.  It's really important to understand what a positive result from a medical test actually means.  If you don't understand Bayes' theorem then you can wildly misinterpret reality and make some pretty serious mistakes.

Let's do another example.  This one is a little easier to understand than the medical testing example.

When I bought my car in 2007, the dealership tried to tack on a charge for something which amounted to a small insurance policy which would pay out if the car were stolen.  The argument used to push this charge was that Honda Civics are the number one stolen car.  Of course, our questions are:  Is that true?  And does it mean what we think it means?

According to the Insurance Bureau of Canada's 2006 list of the top ten most-stolen cars, the 2000, 1999, 1996 and 1994 Honda Civics took 4 of the 10 places.  Hmmm...sounds bad, huh?  But I'm sure you can guess by now that there's a catch.

It's vitally important to our analysis to know how many 2000, 1999, 1996, and 1994 Honda Civics exist in the first place and how many actually got stolen.  But we don't have any of these numbers.  What we have is someone taking a list of stolen cars and adding up each type and declaring the Honda Civic as the most stolen car.  We need to know how many are stolen compared to how many of them exist.

Luckily for us, the Highway Loss Data Institute understands the difference and correctly reports the likelihood of a vehicle getting stolen by comparing the "'theft claim frequency,' which is the number of thefts reported for every 1,000 of each vehicle on the road."  When we look at these numbers the picture changes entirely.  This top-ten list is filled with expensive cars like the Cadillac Escalade and several high-end pickup trucks.  The Honda Civic is nowhere to be found. 

If you don't understand Bayes' Theorem you can be manipulated into making bad decisions.  This applies all over the place in our lives.  It applies in our airport security procedures, our medical exams, our insurance decisions, political decisions, and our general level of fear about life.

You can read more about Bayes' Theorem in its Wikipedia article.  I'm not going to try to teach it here (unless I hear a demand for it in the comments) because it's not an entirely intuitive concept and it's a little tricky to wrap your head around (which is why we get it wrong so often).

1 thought on “Bayesian Reasoning”

Comments are closed.