Bayesian Reasoning

May 10, 2012 4:10 pm

Bayesian reasoning (the application of Bayes’ theorem) is incredibly important, but virtually unknown (and not understood) in the general population.  While politicians, advertisers, and salespersons take advantage of this lack of understanding to extract votes and money; doctors, lawyers, and engineers can make life-threatening mistakes if they don’t apply the process correctly.  It’s a concept so vital to the proper interpretation of the world around us that I believe it should be a mandatory subject during high school.

Let’s start with an example to motivate the subject.  Let’s say you’re not feeling well and you go to your doctor.  Your doctor can’t determine what’s wrong and decides to run some blood tests.  The results come back and your doctor informs you that the test for a terminal disease came back positive.  Do you panic?  Probably.  Should you panic? Not necessarily.  There is vital information which we don’t know and need to know in order to understand the situation properly.

First, we need to know how accurate the test is.  We need to know the rates of type I and type II errors (“false positive” and “false negative”, respectively).  Let’s say you ask this question and your doctor tells you the test has a false positive rate of 1 in 100,000 and a false negative rate of 1 in 200,000.  One might believe that there is a 99.999% chance that you have this rare disease, but we need to interpret these error rates correctly.  This is where Bayes’ theorem comes in and to apply it we also need to know how rare the disease is.

Let’s say it’s an extremely rare disease, affecting 1 in 200 million people.  Knowing this you can update the probability that you have the disease to something more accurate.  Applying Bayes’ theorem we can calculate that a more accurate probability of having the disease is 4.8%.  That’s a pretty big difference.

There is a caveat though; we’re ignoring any compounding factors that led the doctor to run the test in the first place.  This calculation assumes that we had no particular reason to run the test, so we use the occurrence rate in the general population (1 in 200 million) in our calculation.  If you have specific risk factors that narrows the base group then the probability of having the disease will increase.

For example: let’s suppose that the doctor chose to run this test specifically because you have high cholesterol levels and a family history of diabetes.  Suppose that the disease occurs in roughly 1 in 1 million of people with those risk factors.  Now, instead of using the 1 in 200 million as our occurrence rate, we’d use the 1 in 1 million.  In which case the probability of having the disease rises to 9.1%.

Suppose another risk factor increases the likelihood from 1 in 1 million to 1 in 100.  Now the probability of having the disease shoots up to 99.9%.  It’s really important to understand what a positive result from a medical test actually means.  If you don’t understand Bayes’ theorem then you can wildly misinterpret reality and make some pretty serious mistakes.

Let’s do another example.  This one is a little easier to understand than the medical testing example.

When I bought my car in 2007, the dealership tried to tack on a charge for something which amounted to a small insurance policy which would pay out if the car were stolen.  The argument used to push this charge was that Honda Civics are the number one stolen car.  Of course, our questions are:  Is that true?  And does it mean what we think it means?

According to the Insurance Bureau of Canada’s 2006 list of the top ten most-stolen cars, the 2000, 1999, 1996 and 1994 Honda Civics took 4 of the 10 places.  Hmmm…sounds bad, huh?  But I’m sure you can guess by now that there’s a catch.

It’s vitally important to our analysis to know how many 2000, 1999, 1996, and 1994 Honda Civics exist in the first place and how many actually got stolen.  But we don’t have any of these numbers.  What we have is someone taking a list of stolen cars and adding up each type and declaring the Honda Civic as the most stolen car.  We need to know how many are stolen compared to how many of them exist.

Luckily for us, the Highway Loss Data Institute understands the difference and correctly reports the likelihood of a vehicle getting stolen by comparing the “‘theft claim frequency,’ which is the number of thefts reported for every 1,000 of each vehicle on the road.”  When we look at these numbers the picture changes entirely.  This top-ten list is filled with expensive cars like the Cadillac Escalade and several high-end pickup trucks.  The Honda Civic is nowhere to be found. 

If you don’t understand Bayes’ Theorem you can be manipulated into making bad decisions.  This applies all over the place in our lives.  It applies in our airport security procedures, our medical exams, our insurance decisions, political decisions, and our general level of fear about life.

You can read more about Bayes’ Theorem in its Wikipedia article.  I’m not going to try to teach it here (unless I hear a demand for it in the comments) because it’s not an entirely intuitive concept and it’s a little tricky to wrap your head around (which is why we get it wrong so often).

When good security is a problem itself

April 11, 2012 2:10 pm

NPR’s article, “Spate Of Bomb Threats Annoys Pittsburgh Students” got me thinking about the unintended consequences of implementing good security.  Even ignoring the other issues involved like civil rights violations and creating easily attacked lines.

Reacting to every threat has at least two detrimental effects: denial of service and complacency.

The first, and most immediate, is the ability for an adversary to shut down a system without doing anything but writing a letter, making a phone call, or posting something on the Internet.

In computer security we call this type of attack a denial of service (DOS) attack.  With a computer it is usually achieved by making legitimate requests at such a frequency as to bog down the machine and prevent it from responding to normal users.

In this case, however, it’s making threats and forcing law enforcement to respond.  This has two effects.  The first is that it takes law enforcement away from legitimate calls (denying those people of the service of law enforcement).  The second is when law enforcement responds by shutting down or drastically reducing the functionality of the threatened target (denying service to customers of that target).

In the article the students are queued up waiting to go through a security checkpoint in order to get on campus.  In airports they might clear the gates and require everyone to go through security again.  In either case massive amounts of time and money are wasted.  The attacker has done nothing, but still managed to mess with their target.

In this manner terrorists could cause billions of dollars in losses to our economy simply by calling in threats to airports, shopping malls, schools, stadiums, etc.  And given our level of unwarranted fear, what law enforcement agency is going to do nothing when they receive a threat like that?  If they’re wrong no one will listen to arguments about likelihood, corroborating evidence, etc.

The second detrimental effect is complacency or “the boy who cried wolf” effect.  One technique used to bypass an alarm system is to repeatedly trip the alarm, but do nothing else.  Eventually the people responding to the alarm may begin to delay responding presuming it’s another false alarm.  Or in the best case (from the attacker’s view) they may turn off the alarm altogether.

If they do continue responding to the alarm then they’re faced with a dilemma: How many times do you respond to an alarm at cost $X per response before you can no longer afford to respond?  How many airports do you shutdown and flights do you cancel before the airlines begin going bankrupt or flying becomes so unreliable people just stop trying?

In the case of the school in the article, University of Pittsburgh, how many more of these threats are they going to evacuate buildings and run security checkpoints for before the students start leaving looking for schools that actually have time for education?

These are two of the problems that exist from treating every threat seriously and not using risk management techniques to handle threat response.  But given that everyone involved would be fired, if not prosecuted, if they were wrong, what alternative do they have?

If we shut down our society because we’re afraid then haven’t the terrorists won without ever doing a thing?

You have _got_ to be kidding me

April 2, 2012 10:00 am

I am enraged right now.

I just got our AT&T phone bill for this month.  Once again showing their complete disregard for their customers, they’ve taken it upon themselves to increase my bill by 32% with no explanation or prior warning.

I hate AT&T with the fiery passion of one thousand suns.

And still my only recourse would be to switch to Comcast which is easily as abusive as AT&T, but charges more for the same privilege.

I’ve sent an email begging Sonic.net to bring their Fusion DSL service to Livermore which offers 20 Mbps service plus phone for $39.95 a month.  After an introductory offer expires we’ll be paying $69 a month for 6 Mbps service plus phone.

It is insane.

Post hoc ergo propter hoc

March 31, 2012 3:24 pm

 

The Economist hosted a debate between security expert Bruce Schneier and former TSA-administrator Kip Hawley on the topic of whether the changes to airport security since 9/11 have done more harm than good.

It was well done and consisted of opening statements, rebuttals, and closing statements from each participant.

Hawley’s opening statement begins with:

More than 6 billion consecutive safe arrivals of airline passengers since the attacks on America on September 11th 2001 mean that whatever the annoying and seemingly obtuse airport-security measures may have been, they have been ultimately successful.

He continues on and on using the reasoning that because no airplanes have been successfully attacked it means the TSA has been effective and therefore worth its inconvenience, cost, and violation of civil rights.

This is a clear-cut case of post hoc ergo propter hoc reasoning.  He presents no further evidence other than first the TSA was created and second no successful attacks have occurred as proof that the TSA is successful.

Post hoc ergo propter hoc can be phrased as follows: First A occurred, then B occurred, therefore A caused B.  This, however, is frequently not true and is not valid reasoning without further evidence better tying together the events of A and B.

Using post hoc ergo propter hoc reasoning as the sole basis of maintaining the current absurdity of the TSA is unacceptable.

But let’s rephrase the relationship and re-examine the reasoning.  Let’s phrase the relationship like this:

If the TSA is effective then there will be no successful attacks on U.S. airplanes

Now let’s include the knowledge that no successful attacks have occurred.  What can we say about the TSA’s efficacy?

Interestingly enough, nothing.  If the if-then relationship is true, knowing the “then” clause is true tells us nothing about the “if” clause.

Logically, if-then statements can be rewritten.  “If A then B” is equivalent to “B or not A.”  Using an example:  “If it is raining then the ground is wet” is equivalent to “The ground is wet or it is not raining.”

The re-writing makes it easy to see that when we know the ground is wet we don’t actually learn anything about whether it is raining or not.  It might be raining, and the wet ground provides evidence for that hypothesis, but the ground may be wet because a lawn sprinkler is running, or someone spilled a cup of water.  We don’t know why the ground is wet, only that it is.

So let’s rewrite our proposed relationship between the TSA and airplane safety:

There will be no successful attacks on U.S. airplanes or the TSA is not effective.

Knowing that there have been no successful attacks tells us nothing about whether the TSA is effective or not.

This is an incredibly important piece of formal logic to understand because it is almost always misused in common practice.

So the real relationship that Hawley is providing evidence to argue is this:

If there are no successful attacks on U.S. airplanes then the TSA is effective.

But this statement doesn’t mean what Hawley wants it to mean.  Given that there are no successful attacks on U.S. airplanes does mean that the TSA achieved its operational goal, but it does not tell us anything about whether the TSA’s actions contributed to that result or not because the causation is backwards.

It is equivalent to saying: “If I got an A on the test then I learned the material.”  Which is not necessarily true (you may have cheated or made lucky guesses).  The correct causation should be, “If I learned the material then I will have gotten an A on the test.”

In order for the test->learned form to tell us something meaningful about the consequent (the “then” part) we need additional criteria: “If I got an A on the test, and I did not cheat, and I did not make lucky guesses then I learned the material.”

[Updated 4/14 with more obvious example]
Another example would be to say I have a magic wand that causes things to fall to the ground when you let go of them.  I’m holding the wand, you let go of something, it falls to the ground.  So I posit, “If the object falls to the ground, then my wand works.”  Knowing that the object falls to the ground doesn’t really tell you anything about whether my magic wand had anything to do with it.  The stated purpose of the wand was achieved, but it had nothing to do with the wand.

The point being that in this reverse-causation form we have to account for all possible causes in the antecedent (the “if” part) in order to arrive at the consequent–an impossible task given the number of things that are unknowable regarding airplane security.

There may be no successful attacks on U.S. airplanes for many reasons and we would need to account for all of them before declaring the TSA effective.  A silly one is simply that there might be no U.S. airplanes (in which case there could be no attacks against them, successful or otherwise, regardless of the TSA’s efficacy).

A serious reason could be that there is not anyone trying to attack.  And if we count the number of terrorists that the TSA has actually caught (zero), this is true.  If your argument is then that those who would attack were deterred from even trying, then the burden is on you to provide evidence that this has occurred.

My position remains that if terrorists were intent on blowing something up and decided that an airplane was too difficult they would not just give up and go home.  If the TSA is deterring terrorists from attacking airplanes then those same terrorists would be blowing up grocery stores, malls, schools, dams, airport security lines (like the terrorist attack in Russia in 2011), or any of thousands of other completely unprotected targets.

Bruce Schneier argued for the responsible action: Disassemble the TSA, return airport security to pre-9/11 levels, and divert the TSA’s budget to intelligence gathering, law enforcement, and emergency response.

Following this plan would provide greater protection for all targets with no meaningful reduction to the security of airplanes.  And, as a bonus, you would waste less time in airport security lines, have fewer Constitutional rights violated when traveling, and your ticket would cost less.