Microsoft Store
 

Bayesian inference


 

Bayesian inference is a statistical inference in which probabilities are interpreted not as frequencies or proportions or the like, but rather as degrees of belief. The name comes from the frequent use of Bayes' theorem in this discipline.

Simple examples of Bayesian inference

From which bowl is the cookie?

To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. Let H1 correspond to bowl #1, and H2 to bowl #2.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

It is given that the bowls are identical from Fred's point of view, thus P(H1) = P(H2), and the two must add up to 1, so both are equal to 0.5.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

The datum D is the observation of a plain cookie. From the contents of the bowls, we know that P(D | H1) = 30/40 = 0.75 and P(D | H2) = 20/40 = 0.5. Bayes' formula then yields

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

egin{matrix} P(H_1 | D) &=& rac{P(H_1) cdot P(D | H_1)}{P(H_1) cdot P(D | H_1) + P(H_2) cdot P(D | H_2)} \ \ & =& rac{0.5 imes 0.75}{0.5 imes 0.75 + 0.5 imes 0.5} \ \ & =& 0.6. end{matrix}

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Before observing the cookie, the probability that Fred chose bowl #1 is the prior probability, P(H1), which is 0.5.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

After observing the cookie, we revise the probability to P(H1|D), which is 0.6.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

False positives in a medical test

False positives are a problem in any kind of test: no test is perfect, and sometimes the test will incorrectly report a positive result. For example, if a test for a particular disease is performed on a patient, then there is a chance (usually small) that the test will return a positive result even if the patient does not have the disease. The problem lies, however, not just in the chance of a false positive prior to testing, but determining the chance that a positive result is in fact a false positive. As we will demonstrate, using Bayes' theorem, if a condition is rare, then the majority of positive results may be false positives, even if the test for that condition is (otherwise) reasonably accurate.

Related Topics:
False positive - Test - Disease - Patient

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Suppose that a test for a particular disease has a very high success rate:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • if a tested patient has the disease, the test accurately reports this, a 'positive', 99% of the time (or, with probability 0.99), and
  • if a tested patient does not have the disease, the test accurately reports that, a 'negative', 95% of the time (i.e. with probability 0.95).
  • Suppose also, however, that only 0.1% of the population have that disease (i.e. with probability 0.001). We now have all the information required to use Bayes' theorem to calculate the probability that, given the test was positive, that it is a false positive.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Let A be the event that the patient has the disease, and B be the event that the test returns a positive result. Then, using the second alternative form of Bayes' theorem (above), the probability of a true positive is

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    :egin{matrix}P(A|B) &= & rac{0.99 imes 0.001}{0.99 imes 0.001 + 0.05 imes 0.999}, ,\ ~\ &pprox &0.019, .end{matrix}

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    and hence the probability that a positive result is a false positive is about  (1 − 0.019) = 0.981.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the disease. (Nonetheless, the proportion of patients who tested positive who do have the disease is 20 times the proportion before we knew the outcome of the test! Thus the test is not useless, and re-testing may improve the reliability of the result.) In particular, a test must be very reliable in reporting a negative result when the patient does not have the disease, if it is to avoid the problem of false positives. In mathematical terms, this would ensure that the second term in the denominator of the above calculation is small, relative to the first term. For example, if the test reported a negative result in patients without the disease with probability 0.999, then using this value in the calculation yields a probability of a false positive of roughly 0.1(1-(0.99x0.001/(0.99x0.001+0.001x0.999))) = 0.050.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    In this example, Bayes' theorem helps show that the accuracy of tests for rare conditions must be very high in order to produce reliable results from a single test, due to the possibility of false positives. (The probability of a 'false negative' could also be calculated using Bayes' theorem, to completely characterise the possible errors in the test results.)

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In the courtroom

Bayesian inference can be used in a court setting by an individual juror to coherently accumulate the evidence for and against the guilt of the defendant, and to see whether, in totality, it meets their personal threshold for 'beyond a reasonable doubt'.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

  • Let G be the event that the defendant is guilty.
  • Let E be the event that the defendant's DNA matches DNA found at the crime scene.
  • Let p(E | G) be the probability of seeing event E assuming that the defendant is guilty. (Usually this would be taken to be unity.)
  • Let p(G | E) be the probability that the defendant is guilty assuming the DNA match event E
  • Let p(G) be the juror's personal estimate of the probability that the defendant is guilty, based on the evidence other than the DNA match. This could be based on his responses under questioning, or previously presented evidence.
  • Bayesian inference tells us that if we can assign a probability p(G) to the defendant's guilt before we take the DNA evidence into account, then we can revise this probability to the conditional probability p(G | E), since

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    :p(G | E) = p(G) p(E | G) / p(E)

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Suppose, on the basis of other evidence, a juror decides that there is a 30% chance that the defendant is guilty. Suppose also that the forensic evidence is that the probability that a person chosen at random would have DNA that matched that at the crime scene was 1 in a million, or 10-6.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    The event E can occur in two ways. Either the defendant is guilty (with prior probability 0.3) and thus his DNA is present with probability 1, or he is innocent (with prior probability 0.7) and he is unlucky enough to be one of the 1 in a million matching people.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Thus the juror could coherently revise his opinion to take into account the DNA evidence as follows:

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    :p(G | E) = (0.3 × 1.0) /(0.3 × 1.0 + 0.7 × 10-6) = 0.99999766667.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    The benefit of adopting a Bayesian approach is that it gives the juror a formal mechanism for combining the evidence presented. The approach can be applied successively to all the pieces of evidence presented in court, with the posterior from one stage becoming the prior for the next.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    The juror would still have to have a prior for the guilt probability before the first piece of evidence is considered. It has been suggested that this could be the guilt probability of a random person of the appropriate sex taken from the town where the crime occurred. Thus, for a crime committed by a adult male in a town containing 50,000 adult males the appropriate initial prior probability might be 1/50,000.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    For the purpose of explaining Bayes' theorem to jurors, it will usually be appropriate to give it in the form of betting odds rather than probabilities, as these are more widely understood. In this form Bayes' theorem states that

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    :Posterior odds = prior odds x Bayes factor

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    In the example above, the juror who has a prior probability of 0.3 for the defendant being guilty would now express that in the form of odds of 3:7 in favour of the defendant being guilty, the Bayes factor is one million, and the resulting posterior odds are 3 million to 7 or about 429,000 to one in favour of guilt.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    In the United Kingdom, Bayes' theorem was explained to the jury in the odds form by a statistician expert witness in the rape case of Regina versus Denis John Adams. A conviction was secured but the case went to Appeal, as no means of accumulating evidence had been provided for those jurors who did not want to use Bayes' theorem. The Court of Appeal upheld the conviction and gave their opinion that "To introduce Bayes' Theorem, or any similar method, into a criminal trial plunges the Jury into inappropriate and unnecessary realms of theory and complexity, deflecting them from their proper task." No further appeal was allowed and the issue of Bayesian assessment of forensic DNA data remains controversial.

    Related Topics:
    Expert witness - Regina versus Denis John Adams - Bayesian assessment of forensic DNA data

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Gardner-Medwin argues that the criterion on which a verdict in a criminal trial should be based is not the probability of guilt, but rather the probability of the evidence, given that the defendant is innocent. He argues that if the posterior probability of guilt is to be computed by Bayes' theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime and this is an odd piece of evidence to consider in a criminal trial. Consider the following three propositions:

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    A: The known facts and testimony could have arisen if the defendant is guilty,

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    B: The known facts and testimony could have arisen if the defendant is innocent,

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    C: The defendant is guilty.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Gardner-Medwin argues that the jury should believe both A and not-B in order to convict. A and not-B implies the truth of C, but the reverse is not true. It is possible that B and C are both true, but in this case he argues that a jury should acquit, even though they know that they are probably acquitting a guilty person.

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

    Other court cases in which probabilistic arguments played some role were the Howland will forgery trial and the Sally Clark case.

    Related Topics:
    Howland will forgery trial - Sally Clark

    ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Search theory

In May 1968 the US nuclear submarine USS Scorpion (SSN-589) failed to arrive as expected at her home port of Norfolk, Virginia. The US Navy was convinced that the vessel had been lost off the Eastern seaboard but an extensive search failed to discover the wreck. The US Navy's deep water expert, John Craven, believed that it was elsewhere and he organised a search south west of the Azores based on a controversial approximate triangulation by hydrophones. He was allocated only a single ship, the USNS Mizar, and he took advice from a firm of consultant mathematicians in order to maximise his resources. A Bayesian search methodology was adopted. Experienced submarine commanders were interviewed to construct hypotheses about what could have caused the loss of the Scorpion. The sea area was divided up into grid squares and a probability assigned to each square, under each of the hypotheses, to give a number of probability grids, one for each hypothesis. These were then added together to produce an overall probability grid. The probability attached to each square was then the probability that the wreck was in that square. A second grid was constructed with probabilities that represented the probability of successfully finding the wreck if that square were to be searched and the wreck were to be actually there. This was a known function of water depth. The result of combining this grid with the previous grid is a grid which gives the probability of finding the wreck in each grid square of the sea if it were to be searched. This sea grid was systematically searched in a manner which started with the high probability regions first and worked down to the low probability regions last. Each time a grid square was searched and found to be empty its probability was reassessed using Bayes' theorem. This then forced the probabilities of all the other grid squares to be reassessed (upwards), also by Bayes' theorem. The use of this approach was a major computational challenge for the time but it was eventually successful and the Scorpion was found in October of that year. Suppose a grid square has a probability p of containing the wreck and that the probability of successfully detecting the wreck if it is there is q. If the square is searched and no wreck is found, then, by Bayes' theorem, the revised probability of the wreck being in the square is given by

Related Topics:
USS Scorpion (SSN-589) - Norfolk, Virginia - USNS Mizar - Bayes' theorem

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

: p' = rac{p(1-q)}{(1-p)+p(1-q)}.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~