Re: [luv-talk] IEEE Vic: Wed 27 18:00- Bayesian Decision Making in the Real World

25 Apr 2016

      Quoting Rodney Brown (rdbrown@pacific.net.au):
...
*Bayesian Decision Making in the Real World*
[...]
...
*/Abstract:/*
   Bayesian networks, extended to Bayesian decision networks (BDNs, aka
   influence diagrams), support decision making under uncertainty in an
   ideal (normative) way. That means they can provide guidance to
   people prepared to learn how to build and test them and then
   meticulously analyse a decision problem and tailor a BDN to it. In
   other words, BDNs are used by almost no one for real decision
   making. In this talk I'll outline what's involved in all of that in
   case you really want to do it (or just understand what's involved).
But for the vast majority of people there's a better way: pay
   (directly or indirectly) someone to build a GUI front-end that will
   hide the details from the user.
Interesting idea (and apparently the basis for the speaker's shiny new
business that he'd like to promote).  But Bayes's Theorem isn't really
all _that_ difficult for laymen to apply in the real world without
software front-ends.  Let me quote an example I posted to a different
mailing list this past February 17th (just as I was arriving in Sydney
on cruise ship Crystal Serenity):

I'm just starting to catch up on old threads, having needed to keep 
Internet usage sparse during our ocean crossing.  A week ago, seeing
just the Subject header, I'd wondered if this thread were about the
accuracy of surveillance-type facial-recognition by machines.  I see it
wasn't, but expect it's OK if I digress onto that.

So:  Facebook, Google, Twitter, and such companies with huge collecitons
of other people's tagged digital photos are monetising them.
(Facebook's collection comprises someething like 13 _trillion_ photos.)
FBI has a database of 52 million faces, and describes its integration of
facial recognition software with that database as 'fully operational'.
The agency's director claims its database wouldn't include photos of
ordinary citizens, though this is demonstrably contradicted by its own
documents
(https://www.eff.org/deeplinks/2014/04/fbi-plans-have-52-million-photos-its-n...
-database-next-year) .

Everyone appears to be rah-rah about how successful this is going to be
in every possible application, if not today in year n, then surely in
year n+1 -- and indeed in some applications it works well enough.
However, when I heard that DHS [USA Department of Homeland Security]
seriously expected to use automated facial recognition as the reason to
detain Bad People in airports and elsewhere (the 'FAST program' - Future
Attribute Screening Technology, started in 2012), I thought 'Guys,
you've never heard of the base rate fallacy, have you?'

Or, to put it another way, DHS is yet another institution needing to
learn Bayes's Theorem.

Base rate fallacy is the fallacy of ignoring the probability-skewing
effect of a low base rate.  I will explain:

For the terrorists-in-airports example, that would be the probability
that any random person walking through an airport is actually a
terrorist.  Let's say an example airport has 1 million persons walking
through it in a year (it's a small regional), and it's very popular with
terrorist such that we expect 100 terrorists to walk its halls in that
year.  So, the base rate of being a terrorist in the scenario is 0.0001.
The base rate of being a non-terrorist in the scenario is 0.9999.

DHS gets the 'FAST program' going at the airport, and stocks its
database with super-studly spook-approved photos.  And DHS claims the
software is really, really good!  1% error rate!  Specifically, it says:

o  Actual terrorists fail to trigger the klaxon 1% of the time (false
   negative).  And...

o  Non-terrorists trigger the klaxon 1% of the time (false positive).

(These are invented example numbers of mine, but I think within a
realistic ballpark.)

DHS sends out a press release reporting glowingly positive results,
because the system is '99% accurate'.

But what does '99% accurate' really mean in this context?  It merely
means a low error rate, not high accuracy.  The accuracy is actually
piss-poor, because, observe:

9,999 non-terrorist travelers during the studied year got slammed up
against the wall by the brute squad -- along with 99 terrorists, for a
total of 10,098 klaxon soundings.  So, the probability that a person
triggering the alarm actually is a terrorist, is only about 99 in
10,098, which is 0.98% accuracy.

I call _accuracy_, here, the probability of terrorist given klaxon, 
whcih we'll call  'p(terrorist|K)', where p() means probability of, and
the | character means 'given'.

Bayes's theorem says:

p(terrorist|K) =  p(K|terrorist) times p(terrorist) divided by p(K).

p(K|terrorist) is 99 / 100 = .99000000 (1% false negative)
p(terrorist) is 100 / 1000000 = .00010000
p(K) = 10098 / 1000000 = .01009800

Probability of terrorist given klaxon is thus .00980392 or only 0.98%
accuracy -- less than 1% accurate, though I have little doubt DHS would
call it '99% accurate' (ignoring the low base rate).

And the point is, this sort of fallacy occurs _all the time_ when people
talk about probilities and rates of success for infrequent events and
large amounts of data.

Quoting Dave Palmer:
...
http://arstechnica.co.uk/security/2016/02/the-nsas-skynet-program-may-be-kil...
Quoting from the article:

  The 0.008 percent false positive rate would be remarkably low for
  traditional business applications. This kind of rate is acceptable where
  the consequences are displaying an ad to the wrong person, or charging
  someone a premium price by accident. However, even 0.008 percent of the
  Pakistani population still corresponds to 15,000 people potentially
  being misclassified as "terrorists" and targeted by the military -- not to
  mention innocent bystanders or first responders who happen to get in the
  way.

Once again, classic base rate fallacy.  The 'failure rate of 0.008%'
figure is totally wrong.

  "On whether the use of SKYNET is a war crime, I defer to lawyers,"
  Ball said. "It's bad science, that's for damn sure, because
  classification is inherently probabilistic...."

Worse than that, it's it's classification on grounds of mathematically
incompetent calculation of that probability.

A jury of random citizens in 1996 UK murder trial Regina v. Denis Adams
(https://en.wikipedia.org/wiki/R_v_Adams) learned to correctly
apply Bayes's Theorem -- but their usage of Bayesian inference was
ultimately overturned by an appellate judge
(http://www.bailii.org/cgi-bin/markup.cgi?doc=/ew/cases/EWCA/Crim/2006/222.ht...
- a horrific scientific blunder on judge's part, IMO):

Quoting Greg B (cyclopasaurus@gmail.com):

[I only barely remember this case, but:]
...
A proper Bayesian approach to the evidence should have lead the jury
to at least a reasonable doubt conclusion, but the calculations are
both involved and require counter-intuitive thinking, and further the
last thing the prosecution wanted was a bunch of thinking jurors.
The Appeal Court ruling calling Bayes's Theorem inappropriate in the
courtroom was appalling, and is the sort of thing jurists are going to
be embarrassed about in the future.  The jurors' scenario was a classic
situation where frequentist notions of probability was IMO even more
idiotic than usual.  (Fortunately, my understanding is that the Appeal
Court judge's comments on that matter have no binding power for the
future.)

Here's the nub of the problem, as quoted from the magazine article
quoted in the Wikipedia article's Notes section:

  However, there was very strong DNA evidence linking him with the crime
  and when the case came to trial in 1995, effectively the only
  incriminating evidence was that his DNA profile matched the DNA evidence
  found at the scene of the crime. The prosecution forensic scientist had
  calculated what is called a match probability, that is, the probability
  that if you pick someone at random, their DNA would match the DNA sample
  of the assailant. That, according to him, was 1 in 200 million.

  It's tempting for people not trained in statistics to get muddled, and
  to confuse two different probabilities. The first is the probability
  that a person would match the criminal's DNA profile given that they are
  innocent. The second is the probability that they are innocent given
  that they match the DNA profile. The forensic scientist's 1 in 200
  million refers to the first probability. But jurors may wrongly think
  that this is the probability that the defendant is innocent. This
  misunderstanding is called the prosecutor's fallacy, and can be
  extremely prejudicial to the defendant. 

http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2005.00089.x/epdf

The forensic scientist (testifying for prosecution) said that, if Denis
Adams is innocent, there's only a 1 in 200 million chance of his DNA
matching that of the real murderer.  In other words, there was probably 
maybe only a few other men in all of the UK who might have matched.
Which is actually pretty suggestive of guilt, but would have to be
considered in contrast to all the other evidence (all of which pointed
the other way).

However, what the jury tends to hear the forensic scientist say is that
given that Denis Adams matched the DNA sample, there's only a 1 in 200
million chance he's innocent, which is a completely different statement
_not_ supported by the testimony, and very prejudicial against defence.

Quoting Laurie Forbes:
...
I don't understand why "friend of the court" expert witnesses are
not used more frequently in these types of cases.
The article I cited to was in fact written by an Oxford statistician
called in to advise the court (and jury) on the original trial.  He
taught the jury how to do Bayesian inference -- which later was
ignorantly deprecated by the Appeal Court judge.

Re: [luv-talk] IEEE Vic: Wed 27 18:00- Bayesian Decision Making in the Real World

Rick Moen