
Quoting Rodney Brown (rdbrown@pacific.net.au):
*Bayesian Decision Making in the Real World*
[...]
*/Abstract:/* Bayesian networks, extended to Bayesian decision networks (BDNs, aka influence diagrams), support decision making under uncertainty in an ideal (normative) way. That means they can provide guidance to people prepared to learn how to build and test them and then meticulously analyse a decision problem and tailor a BDN to it. In other words, BDNs are used by almost no one for real decision making. In this talk I'll outline what's involved in all of that in case you really want to do it (or just understand what's involved).
But for the vast majority of people there's a better way: pay (directly or indirectly) someone to build a GUI front-end that will hide the details from the user.
Interesting idea (and apparently the basis for the speaker's shiny new business that he'd like to promote). But Bayes's Theorem isn't really all _that_ difficult for laymen to apply in the real world without software front-ends. Let me quote an example I posted to a different mailing list this past February 17th (just as I was arriving in Sydney on cruise ship Crystal Serenity): I'm just starting to catch up on old threads, having needed to keep Internet usage sparse during our ocean crossing. A week ago, seeing just the Subject header, I'd wondered if this thread were about the accuracy of surveillance-type facial-recognition by machines. I see it wasn't, but expect it's OK if I digress onto that. So: Facebook, Google, Twitter, and such companies with huge collecitons of other people's tagged digital photos are monetising them. (Facebook's collection comprises someething like 13 _trillion_ photos.) FBI has a database of 52 million faces, and describes its integration of facial recognition software with that database as 'fully operational'. The agency's director claims its database wouldn't include photos of ordinary citizens, though this is demonstrably contradicted by its own documents (https://www.eff.org/deeplinks/2014/04/fbi-plans-have-52-million-photos-its-n... -database-next-year) . Everyone appears to be rah-rah about how successful this is going to be in every possible application, if not today in year n, then surely in year n+1 -- and indeed in some applications it works well enough. However, when I heard that DHS [USA Department of Homeland Security] seriously expected to use automated facial recognition as the reason to detain Bad People in airports and elsewhere (the 'FAST program' - Future Attribute Screening Technology, started in 2012), I thought 'Guys, you've never heard of the base rate fallacy, have you?' Or, to put it another way, DHS is yet another institution needing to learn Bayes's Theorem. Base rate fallacy is the fallacy of ignoring the probability-skewing effect of a low base rate. I will explain: For the terrorists-in-airports example, that would be the probability that any random person walking through an airport is actually a terrorist. Let's say an example airport has 1 million persons walking through it in a year (it's a small regional), and it's very popular with terrorist such that we expect 100 terrorists to walk its halls in that year. So, the base rate of being a terrorist in the scenario is 0.0001. The base rate of being a non-terrorist in the scenario is 0.9999. DHS gets the 'FAST program' going at the airport, and stocks its database with super-studly spook-approved photos. And DHS claims the software is really, really good! 1% error rate! Specifically, it says: o Actual terrorists fail to trigger the klaxon 1% of the time (false negative). And... o Non-terrorists trigger the klaxon 1% of the time (false positive). (These are invented example numbers of mine, but I think within a realistic ballpark.) DHS sends out a press release reporting glowingly positive results, because the system is '99% accurate'. But what does '99% accurate' really mean in this context? It merely means a low error rate, not high accuracy. The accuracy is actually piss-poor, because, observe: 9,999 non-terrorist travelers during the studied year got slammed up against the wall by the brute squad -- along with 99 terrorists, for a total of 10,098 klaxon soundings. So, the probability that a person triggering the alarm actually is a terrorist, is only about 99 in 10,098, which is 0.98% accuracy. I call _accuracy_, here, the probability of terrorist given klaxon, whcih we'll call 'p(terrorist|K)', where p() means probability of, and the | character means 'given'. Bayes's theorem says: p(terrorist|K) = p(K|terrorist) times p(terrorist) divided by p(K). p(K|terrorist) is 99 / 100 = .99000000 (1% false negative) p(terrorist) is 100 / 1000000 = .00010000 p(K) = 10098 / 1000000 = .01009800 Probability of terrorist given klaxon is thus .00980392 or only 0.98% accuracy -- less than 1% accurate, though I have little doubt DHS would call it '99% accurate' (ignoring the low base rate). And the point is, this sort of fallacy occurs _all the time_ when people talk about probilities and rates of success for infrequent events and large amounts of data. Quoting Dave Palmer:
http://arstechnica.co.uk/security/2016/02/the-nsas-skynet-program-may-be-kil...
Quoting from the article: The 0.008 percent false positive rate would be remarkably low for traditional business applications. This kind of rate is acceptable where the consequences are displaying an ad to the wrong person, or charging someone a premium price by accident. However, even 0.008 percent of the Pakistani population still corresponds to 15,000 people potentially being misclassified as "terrorists" and targeted by the military -- not to mention innocent bystanders or first responders who happen to get in the way. Once again, classic base rate fallacy. The 'failure rate of 0.008%' figure is totally wrong. "On whether the use of SKYNET is a war crime, I defer to lawyers," Ball said. "It's bad science, that's for damn sure, because classification is inherently probabilistic...." Worse than that, it's it's classification on grounds of mathematically incompetent calculation of that probability. A jury of random citizens in 1996 UK murder trial Regina v. Denis Adams (https://en.wikipedia.org/wiki/R_v_Adams) learned to correctly apply Bayes's Theorem -- but their usage of Bayesian inference was ultimately overturned by an appellate judge (http://www.bailii.org/cgi-bin/markup.cgi?doc=/ew/cases/EWCA/Crim/2006/222.ht... - a horrific scientific blunder on judge's part, IMO): Quoting Greg B (cyclopasaurus@gmail.com): [I only barely remember this case, but:]
A proper Bayesian approach to the evidence should have lead the jury to at least a reasonable doubt conclusion, but the calculations are both involved and require counter-intuitive thinking, and further the last thing the prosecution wanted was a bunch of thinking jurors.
The Appeal Court ruling calling Bayes's Theorem inappropriate in the courtroom was appalling, and is the sort of thing jurists are going to be embarrassed about in the future. The jurors' scenario was a classic situation where frequentist notions of probability was IMO even more idiotic than usual. (Fortunately, my understanding is that the Appeal Court judge's comments on that matter have no binding power for the future.) Here's the nub of the problem, as quoted from the magazine article quoted in the Wikipedia article's Notes section: However, there was very strong DNA evidence linking him with the crime and when the case came to trial in 1995, effectively the only incriminating evidence was that his DNA profile matched the DNA evidence found at the scene of the crime. The prosecution forensic scientist had calculated what is called a match probability, that is, the probability that if you pick someone at random, their DNA would match the DNA sample of the assailant. That, according to him, was 1 in 200 million. It's tempting for people not trained in statistics to get muddled, and to confuse two different probabilities. The first is the probability that a person would match the criminal's DNA profile given that they are innocent. The second is the probability that they are innocent given that they match the DNA profile. The forensic scientist's 1 in 200 million refers to the first probability. But jurors may wrongly think that this is the probability that the defendant is innocent. This misunderstanding is called the prosecutor's fallacy, and can be extremely prejudicial to the defendant. http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2005.00089.x/epdf The forensic scientist (testifying for prosecution) said that, if Denis Adams is innocent, there's only a 1 in 200 million chance of his DNA matching that of the real murderer. In other words, there was probably maybe only a few other men in all of the UK who might have matched. Which is actually pretty suggestive of guilt, but would have to be considered in contrast to all the other evidence (all of which pointed the other way). However, what the jury tends to hear the forensic scientist say is that given that Denis Adams matched the DNA sample, there's only a 1 in 200 million chance he's innocent, which is a completely different statement _not_ supported by the testimony, and very prejudicial against defence. Quoting Laurie Forbes:
I don't understand why "friend of the court" expert witnesses are not used more frequently in these types of cases.
The article I cited to was in fact written by an Oxford statistician called in to advise the court (and jury) on the original trial. He taught the jury how to do Bayesian inference -- which later was ignorantly deprecated by the Appeal Court judge.