A simple explanation of Bayes' theorem. Total Probability Formula and Bayes Formulas

Who is Bayes? and what does it have to do with management? - a completely fair question may follow. For now, take my word for it: this is very important!.. and interesting (at least to me).

What is the paradigm in which most managers operate: If I observe something, what conclusions can I draw from it? What does Bayes teach: what must actually be there for me to observe this something? This is exactly how all sciences develop, and he writes about this (I quote from memory): a person who does not have a theory in his head will shy away from one idea to another under the influence of various events (observations). It’s not for nothing that they say: there is nothing more practical than a good theory.

Example from practice. My subordinate makes a mistake, and my colleague (the head of another department) says that it would be necessary to exert managerial influence on the negligent employee (in other words, punish/scold). And I know that this employee performs 4–5 thousand of the same type of operations per month, and during this time makes no more than 10 mistakes. Do you feel the difference in the paradigm? My colleague reacts to the observation, and I have a priori knowledge that the employee makes a certain number of mistakes, so another one did not affect this knowledge... Now, if at the end of the month it turns out that there are, for example, 15 such mistakes!.. This will already be a reason to study the reasons for non-compliance with standards.

Convinced of the importance of the Bayesian approach? Intrigued? I hope yes. And now the fly in the ointment. Unfortunately, Bayesian ideas are rarely given right away. I was frankly unlucky, since I became acquainted with these ideas through popular literature, after reading which many questions remained. When planning to write a note, I collected everything that I had previously taken notes on Bayes, and also studied what was written on the Internet. I present to your attention my best guess on the topic. Introduction to Bayesian Probability.

Derivation of Bayes' theorem

Consider the following experiment: we call any number lying on the segment and record when this number is, for example, between 0.1 and 0.4 (Fig. 1a). The probability of this event is equal to the ratio of the length of the segment to the total length of the segment, provided that the appearance of numbers on the segment equally probable. Mathematically this can be written p(0,1 <= x <= 0,4) = 0,3, или кратко r(X) = 0.3, where r- probability, X– random variable in the range , X– random variable in the range . That is, the probability of hitting the segment is 30%.

Rice. 1. Graphic interpretation of probabilities

Now consider the square x (Fig. 1b). Let's say we have to name pairs of numbers ( x, y), each of which is greater than zero and less than one. The probability that x(first number) will be within the segment (blue area 1), equal to the ratio of the area of ​​the blue area to the area of ​​the entire square, that is (0.4 – 0.1) * (1 – 0) / (1 * 1) = 0, 3, that is, the same 30%. The probability that y located inside the segment (green area 2) is equal to the ratio of the area of ​​the green area to the area of ​​the entire square p(0,5 <= y <= 0,7) = 0,2, или кратко r(Y) = 0,2.

What can you learn about values ​​at the same time? x And y. For example, what is the probability that at the same time x And y are in the corresponding given segments? To do this, you need to calculate the ratio of the area of ​​area 3 (the intersection of the green and blue stripes) to the area of ​​the entire square: p(X, Y) = (0,4 – 0,1) * (0,7 – 0,5) / (1 * 1) = 0,06.

Now let's say we want to know what the probability is that y is in the interval if x is already in the range . That is, in fact, we have a filter and when we call pairs ( x, y), then we immediately discard those pairs that do not satisfy the condition for finding x in a given interval, and then from the filtered pairs we count those for which y satisfies our condition and considers the probability as the ratio of the number of pairs for which y lies in the above segment to the total number of filtered pairs (that is, for which x lies in the segment). We can write this probability as p(Y|X at X hit the range." Obviously, this probability is equal to the ratio of the area of ​​area 3 to the area of ​​blue area 1. The area of ​​area 3 is (0.4 – 0.1) * (0.7 – 0.5) = 0.06, and the area of ​​blue area 1 ( 0.4 – 0.1) * (1 – 0) = 0.3, then their ratio is 0.06 / 0.3 = 0.2. In other words, the probability of finding y on the segment provided that x belongs to the segment p(Y|X) = 0,2.

In the previous paragraph we actually formulated the identity: p(Y|X) = p(X, Y) / p( X). It reads: “probability of hitting at in the range , provided that X hit the range, equal to the ratio of the probability of simultaneous hit X into the range and at to the range, to the probability of hitting X into the range."

By analogy, consider the probability p(X|Y). We call couples ( x, y) and filter those for which y lies between 0.5 and 0.7, then the probability that x is in the interval provided that y belongs to the segment is equal to the ratio of the area of ​​​​region 3 to the area of ​​​​green region 2: p(X|Y) = p(X, Y) / p(Y).

Note that the probabilities p(X, Y) And p(Y, X) are equal, and both are equal to the ratio of the area of ​​zone 3 to the area of ​​the entire square, but the probabilities p(Y|X) And p(X|Y) are not equal; while the probability p(Y|X) is equal to the ratio of the area of ​​region 3 to region 1, and p(X|Y) – region 3 to region 2. Note also that p(X, Y) is often denoted as p(X&Y).

So we introduced two definitions: p(Y|X) = p(X, Y) / p( X) And p(X|Y) = p(X, Y) / p(Y)

Let us rewrite these equalities in the form: p(X, Y) = p(Y|X) * p( X) And p(X, Y) = p(X|Y) * p(Y)

Since the left sides are equal, the right sides are equal: p(Y|X) * p( X) = p(X|Y) * p(Y)

Or we can rewrite the last equality as:

This is Bayes' theorem!

Do such simple (almost tautological) transformations really give rise to a great theorem!? Don't rush to conclusions. Let's talk again about what we got. There was a certain initial (a priori) probability r(X), that the random variable X uniformly distributed on the segment falls within the range X. An event occurred Y, as a result of which we received the posterior probability of the same random variable X: r(X|Y), and this probability differs from r(X) by coefficient. Event Y called evidence, more or less confirming or refuting X. This coefficient is sometimes called power of evidence. The stronger the evidence, the more the fact of observing Y changes the prior probability, the more the posterior probability differs from the prior. If the evidence is weak, the posterior probability is almost equal to the prior.

Bayes' formula for discrete random variables

In the previous section, we derived Bayes' formula for continuous random variables x and y defined on the interval. Let's consider an example with discrete random variables, each taking two possible values. During routine medical examinations, it was found that at the age of forty, 1% of women suffer from breast cancer. 80% of women with cancer receive positive mammogram results. 9.6% of healthy women also receive positive mammogram results. During the examination, a woman in this age group received a positive mammography result. What are the chances that she actually has breast cancer?

The line of reasoning/calculation is as follows. Of the 1% of cancer patients, mammography will give 80% positive results = 1% * 80% = 0.8%. Of 99% of healthy women, mammography will give 9.6% positive results = 99% * 9.6% = 9.504%. Total of 10.304% (9.504% + 0.8%) with positive mammography results, only 0.8% are sick, and the remaining 9.504% are healthy. Thus, the probability that a woman with a positive mammography result has cancer is 0.8% / 10.304% = 7.764%. Did you think 80% or so?

In our example, the Bayes formula takes the following form:

Let's talk about the “physical” meaning of this formula once again. X– random variable (diagnosis), taking values: X 1- sick and X 2– healthy; Y– random variable (measurement result – mammography), taking values: Y 1- positive result and Y2– negative result; p(X 1)– probability of illness before mammography (a priori probability) equal to 1%; p(Y 1 |X 1 ) – the probability of a positive result if the patient is sick (conditional probability, since it must be specified in the conditions of the task), equal to 80%; p(Y 1 |X 2 ) – the probability of a positive result if the patient is healthy (also conditional probability) is 9.6%; p(X 2)– the probability that the patient is healthy before mammography (a priori probability) is 99%; p(X 1|Y 1 ) – the probability that the patient is sick, given a positive mammography result (posterior probability).

It can be seen that the posterior probability (what we are looking for) is proportional to the prior probability (initial) with a slightly more complex coefficient . Let me emphasize again. In my opinion, this is a fundamental aspect of the Bayesian approach. Measurement ( Y) added a certain amount of information to what was initially available (a priori), which clarified our knowledge about the object.

Examples

To consolidate the material you have covered, try solving several problems.

Example 1. There are 3 urns; in the first there are 3 white balls and 1 black; in the second - 2 white balls and 3 black; in the third there are 3 white balls. Someone approaches one of the urns at random and takes out 1 ball from it. This ball turned out to be white. Find the posterior probabilities that the ball is drawn from the 1st, 2nd, 3rd urn.

Solution. We have three hypotheses: H 1 = (the first urn is selected), H 2 = (the second urn is selected), H 3 = (the third urn is selected). Since the urn is chosen at random, the a priori probabilities of the hypotheses are equal: P(H 1) = P(H 2) = P(H 3) = 1/3.

As a result of the experiment, the event A = appeared (a white ball was drawn from the selected urn). Conditional probabilities of event A under hypotheses H 1, H 2, H 3: P(A|H 1) = 3/4, P(A|H 2) = 2/5, P(A|H 3) = 1. For example , the first equality reads like this: “the probability of drawing a white ball if the first urn is chosen is 3/4 (since there are 4 balls in the first urn, and 3 of them are white).”

Using Bayes' formula, we find the posterior probabilities of the hypotheses:

Thus, in the light of information about the occurrence of event A, the probabilities of the hypotheses changed: hypothesis H 3 became the most probable, hypothesis H 2 became the least probable.

Example 2. Two shooters independently shoot at the same target, each firing one shot. The probability of hitting the target for the first shooter is 0.8, for the second - 0.4. After shooting, one hole was found in the target. Find the probability that this hole belongs to the first shooter (The outcome (both holes coincided) is discarded as negligibly unlikely).

Solution. Before the experiment, the following hypotheses are possible: H 1 = (neither the first nor the second arrow will hit), H 2 = (both arrows will hit), H 3 - (the first shooter will hit, but the second will not), H 4 = (the first shooter will not will hit, and the second will hit). Prior probabilities of hypotheses:

P(H 1) = 0.2*0.6 = 0.12; P(H2) = 0.8*0.4 = 0.32; P (H 3) = 0.8 * 0.6 = 0.48; P(H 4) = 0.2*0.4 = 0.08.

The conditional probabilities of the observed event A = (there is one hole in the target) under these hypotheses are equal: P(A|H 1) = P(A|H 2) = 0; P(A|H 3) = P(A|H 4) = 1

After the experiment, hypotheses H 1 and H 2 become impossible, and the posterior probabilities of hypotheses H 3 and H 4 according to Bayes’ formula will be:

Bayes against spam

Bayes' formula has found wide application in the development of spam filters. Let's say you want to train a computer to determine which emails are spam. We will proceed from the dictionary and phrases using Bayesian estimates. Let us first create a space of hypotheses. Let us have 2 hypotheses regarding any letter: H A is spam, H B is not spam, but a normal, necessary letter.

First, let's “train” our future anti-spam system. Let's take all the letters we have and divide them into two “piles” of 10 letters each. Let's put spam emails in one and call it the H A heap, in the other we'll put the necessary correspondence and call it the H B heap. Now let's see: what words and phrases are found in spam and necessary letters and with what frequency? We will call these words and phrases evidence and denote them E 1 , E 2 ... It turns out that commonly used words (for example, the words “like”, “your”) in the heaps H A and H B occur with approximately the same frequency. Thus, the presence of these words in a letter tells us nothing about which pile to assign it to (weak evidence). Let’s assign these words a neutral “spam” probability score, say 0.5.

Let the phrase “spoken English” appear in only 10 letters, and more often in spam letters (for example, in 7 spam letters out of all 10) than in necessary ones (in 3 out of 10). Let's give this phrase a higher rating for spam: 7/10, and a lower rating for normal emails: 3/10. Conversely, it turned out that the word “buddy” appeared more often in normal letters (6 out of 10). And then we received a short letter: “My friend! How is your spoken English?”. Let's try to evaluate its “spammyness”. We will give general estimates P(H A), P(H B) of a letter’s belonging to each heap using a somewhat simplified Bayes formula and our approximate estimates:

P(H A) = A/(A+B), Where A = p a1 *p a2 *…*p an , B = p b1 *p b2 *…*p b n = (1 – p a1)*(1 – p a2)*… *(1 – p an).

Table 1. Simplified (and incomplete) Bayes estimation of writing.

Thus, our hypothetical letter received a probability of belonging score with an emphasis on “spammy”. Can we decide to throw the letter into one of the piles? Let's set decision thresholds:

  • We will assume that the letter belongs to the heap H i if P(H i) ≥ T.
  • A letter does not belong to the heap if P(H i) ≤ L.
  • If L ≤ P(H i) ≤ T, then no decision can be made.

You can take T = 0.95 and L = 0.05. Since for the letter in question and 0.05< P(H A) < 0,95, и 0,05 < P(H В) < 0,95, то мы не сможем принять решение, куда отнести данное письмо: к спаму (H A) или к нужным письмам (H B). Можно ли улучшить оценку, используя больше информации?

Yes. Let's calculate the score for each piece of evidence in a different way, just as Bayes actually proposed. Let:

F a is the total number of spam emails;

F ai is the number of letters with certificate i in a pile of spam;

F b is the total number of letters needed;

F bi is the number of letters with certificate i in a bunch of necessary (relevant) letters.

Then: p ai = F ai /F a, p bi = F bi /F b. P(H A) = A/(A+B), P(H B) = B/(A+B), Where A = p a1 *p a2 *…*p an , B = p b1 *p b2 *…*p b n

Please note that assessments of evidence words p ai and p bi have become objective and can be calculated without human intervention.

Table 2. More accurate (but incomplete) Bayes estimate based on available features from a letter

We received a very definite result - with a large advantage, the letter can be classified as the desired letter, since P(H B) = 0.997 > T = 0.95. Why did the result change? Because we used more information - we took into account the number of letters in each of the piles and, by the way, determined the estimates p ai and p bi much more correctly. They were determined as Bayes himself did, by calculating conditional probabilities. In other words, p a3 is the probability of the word “buddy” appearing in a letter, provided that this letter already belongs to the spam heap H A . The result was not long in coming - it seems that we can make a decision with greater certainty.

Bayes against corporate fraud

An interesting application of the Bayesian approach was described by MAGNUS8.

My current project (IS for detecting fraud at a manufacturing enterprise) uses the Bayes formula to determine the probability of fraud (fraud) in the presence/absence of several facts that indirectly testify in favor of the hypothesis about the possibility of committing fraud. The algorithm is self-learning (with feedback), i.e. recalculates its coefficients (conditional probabilities) upon actual confirmation or non-confirmation of fraud during an inspection by the economic security service.

It’s probably worth saying that such methods when designing algorithms require a fairly high mathematical culture of the developer, because the slightest error in the derivation and/or implementation of computational formulas will nullify and discredit the entire method. Probabilistic methods are especially prone to this, since human thinking is not adapted to work with probabilistic categories and, accordingly, there is no “visibility” and understanding of the “physical meaning” of intermediate and final probabilistic parameters. This understanding exists only for the basic concepts of probability theory, and then you just need to very carefully combine and derive complex things according to the laws of probability theory - common sense will no longer help for composite objects. This, in particular, is associated with quite serious methodological battles taking place on the pages of modern books on the philosophy of probability, as well as a large number of sophisms, paradoxes and curious puzzles on this topic.

Another nuance that I had to face is that, unfortunately, almost everything even more or less USEFUL IN PRACTICE on this topic is written in English. In Russian-language sources there is mainly only a well-known theory with demonstration examples only for the most primitive cases.

I completely agree with the last remark. For example, Google, when trying to find something like “the book Bayesian Probability,” did not produce anything intelligible. True, he reported that a book with Bayesian statistics was banned in China. (Statistics professor Andrew Gelman reported on the Columbia University blog that his book, Data Analysis with Regression and Multilevel/Hierarchical Models, was banned from publication in China. The publisher there reported that "the book was not approved by authorities due to various politically sensitive material in text.") I wonder if a similar reason led to the lack of books on Bayesian probability in Russia?

Conservatism in human information processing

Probabilities determine the degree of uncertainty. Probability, both according to Bayes and our intuitions, is simply a number between zero and that which represents the degree to which a somewhat idealized person believes the statement to be true. The reason a person is somewhat idealized is that the sum of his probabilities for two mutually exclusive events must equal his probability of either event occurring. The property of additivity has such consequences that few real people can meet all of them.

Bayes' theorem is a trivial consequence of the property of additivity, indisputable and agreed upon by all probabilists, Bayesian and otherwise. One way to write this is as follows. If P(H A |D) is the subsequent probability that hypothesis A was after a given value D was observed, P(H A) is its prior probability before a given value D was observed, P(D|H A ) is the probability that a given value D will be observed if H A is true, and P(D) is the unconditional probability of a given value D, then

(1) P(H A |D) = P(D|H A) * P(H A) / P(D)

P(D) is best thought of as a normalizing constant, causing the posterior probabilities to add up to unity over the exhaustive set of mutually exclusive hypotheses that are being considered. If it needs to be calculated, it could be like this:

But more often P(D) is eliminated rather than calculated. A convenient way to eliminate this is to transform Bayes' theorem into probability-odds ratio form.

Consider another hypothesis, H B , which is mutually exclusive with H A , and change your mind about it based on the same given quantity that changed your mind about H A . Bayes' theorem says that

(2) P(H B |D) = P(D|H B) * P(H B) / P(D)

Now let's divide Equation 1 by Equation 2; the result will be like this:

where Ω 1 are the posterior odds in favor of H A through H B , Ω 0 are the prior odds, and L is the quantity familiar to statisticians as the probability ratio. Equation 3 is the same relevant version of Bayes' theorem as Equation 1, and is often significantly more useful especially for experiments involving hypotheses. Bayesians argue that Bayes' theorem is a formally optimal rule about how to revise opinions in the light of new evidence.

We are interested in comparing the ideal behavior defined by Bayes' theorem with the actual behavior of people. To give you some idea of ​​what this means, let's try an experiment with you as the test subject. This bag contains 1000 poker chips. I have two such bags, one containing 700 red and 300 blue chips, and the other containing 300 red and 700 blue. I tossed a coin to determine which one to use. So, if our opinions are the same, your current probability of getting a bag containing more red chips is 0.5. Now, you make a random sample with a return after each chip. In 12 chips you get 8 red and 4 blue. Now, based on everything you know, what is the probability of landing the bag with the most reds? It is clear that it is higher than 0.5. Please do not continue reading until you have recorded your score.

If you are like a typical test taker, your score fell in the range of 0.7 to 0.8. If we were to do the corresponding calculation, however, the answer would be 0.97. It is indeed very rare for a person who has not previously been shown the influence of conservatism to arrive at such a high estimate, even if he was familiar with Bayes' theorem.

If the proportion of red chips in the bag is r, then the probability of receiving r red chips and ( n –r) blue in n samples with return – p r (1–p)n–r. So, in a typical experiment with a bag and poker chips, if NA means that the proportion of red chips is r A And NB– means that the share is rB, then the probability ratio:

When applying Bayes' formula, one needs to consider only the probability of the actual observation, and not the probabilities of other observations that he might have made but did not. This principle has broad implications for all statistical and non-statistical applications of Bayes' theorem; it is the most important technical tool for Bayesian reasoning.

Bayesian revolution

Your friends and colleagues are talking about something called "Bayes' Theorem" or "Bayes' Rule" or something called Bayesian Reasoning. They're really interested in this, so you go online and find a page about Bayes' theorem and... It's an equation. And that's it... Why does a mathematical concept create such enthusiasm in the minds? What kind of “Bayesian revolution” is happening among scientists, and it is argued that even the experimental approach itself can be described as its special case? What is the secret that Bayesians know? What kind of light do they see?

The Bayesian revolution in science did not happen because more and more cognitive scientists suddenly began to notice that mental phenomena had a Bayesian structure; not because scientists in every field have begun to use the Bayesian method; but because science itself is a special case of Bayes' theorem; experimental evidence is Bayesian evidence. Bayesian revolutionaries argue that when you perform an experiment and obtain evidence that “confirms” or “disproves” your theory, that confirmation or refutation occurs according to Bayesian rules. For example, you must consider not only that your theory can explain a phenomenon, but also that there are other possible explanations that can also predict that phenomenon.

Previously, the most popular philosophy of science was the old philosophy, which was displaced by the Bayesian revolution. Karl Popper's idea that theories can be completely falsified but never fully verified is another special case of Bayesian rules; if p(X|A) ≈ 1 – if the theory makes correct predictions, then observing ~X falsifies A very strongly. On the other hand, if p(X|A) ≈ 1 and we observe X, this does not strongly confirm the theory; perhaps some other condition B is possible, such that p(X|B) ≈ 1, and under which observation X does not testify in favor of A but does testify in favor of B. For observation X to definitely confirm A, we would have to know not that that p(X|A) ≈ 1 and that p(X|~A) ≈ 0, which we cannot know because we cannot consider all possible alternative explanations. For example, when Einstein's theory of general relativity surpassed Newton's well-supported theory of gravity, it made all the predictions of Newton's theory a special case of the predictions of Einstein's.

In a similar way, Popper's claim that an idea must be falsifiable can be interpreted as a manifestation of the Bayesian rule of conservation of probability; if result X is positive evidence for the theory, then result ~X must disprove the theory to some extent. If you try to interpret both X and ~X as “confirming” the theory, Bayesian rules say it's impossible! To increase the likelihood of a theory you must subject it to tests that can potentially reduce its likelihood; this is not just a rule to identify charlatans in science, but a corollary of the Bayesian probability theorem. On the other hand, Popper's idea that only falsification is needed and no confirmation is needed is incorrect. Bayes' theorem shows that falsification is very strong evidence compared to confirmation, but falsification is still probabilistic in nature; it is not governed by fundamentally different rules and is no different in this way from confirmation, as Popper claims.

Thus, we find that many phenomena in the cognitive sciences, plus the statistical methods used by scientists, plus the scientific method itself, are all special cases of Bayes' theorem. This is the Bayesian revolution.

Welcome to the Bayesian Conspiracy!

Literature on Bayesian probability

2. A lot of different applications of Bayes are described by the Nobel laureate in economics Kahneman (and his comrades) in a wonderful book. In my brief summary of this very large book alone, I counted 27 mentions of the name of a Presbyterian minister. Minimum formulas. (.. I really liked it. True, it’s a bit complicated, there’s a lot of mathematics (and where would we be without it), but individual chapters (for example, Chapter 4. Information) are clearly on topic. I recommend it to everyone. Even if mathematics is difficult for you, read every other line , skipping math, and fishing for useful grains...

14. (addition dated January 15, 2017), a chapter from the book by Tony Crilly. 50 ideas you need to know about. Mathematics.

Nobel laureate physicist Richard Feynman, speaking of one philosopher with particularly great self-importance, once said: “What irritates me is not philosophy as a science, but the pomposity that is created around it. If only philosophers could laugh at themselves! If only they could say: “I say it is like this, but Von Leipzig thought it was different, and he also knows something about it.” If only they remembered to clarify that it's just theirs .

Let their probabilities and the corresponding conditional probabilities be known. Then the probability of the event occurring is:

This formula is called total probability formulas. In textbooks it is formulated as a theorem, the proof of which is elementary: according to algebra of events, (an event occurred And or an event occurred And after it came an event or an event occurred And after it came an event or …. or an event occurred And after it came an event). Since hypotheses are incompatible, and the event is dependent, then according the theorem of addition of probabilities of incompatible events (first step) And theorem of multiplication of probabilities of dependent events (second step):

Many people probably anticipate the content of the first example =)

Wherever you spit, there is an urn:

Problem 1

There are three identical urns. The first urn contains 4 white and 7 black balls, the second contains only white balls, and the third contains only black balls. One urn is selected at random and a ball is drawn from it at random. What is the probability that this ball is black?

Solution: consider the event - a black ball will be drawn from a randomly chosen urn. This event can occur as a result of one of the following hypotheses:
– the 1st urn will be selected;
– the 2nd urn will be selected;
– the 3rd urn will be selected.

Since the urn is chosen at random, the choice of any of the three urns equally possible, hence:

Please note that the above hypotheses form full group of events, that is, according to the condition, a black ball can only appear from these urns, and, for example, cannot come from a billiard table. Let's do a simple intermediate check:
, OK, let's move on:

The first urn contains 4 white + 7 black = 11 balls, each classical definition:
– probability of drawing a black ball given that, that the 1st urn will be selected.

The second urn contains only white balls, so if chosen the appearance of the black ball becomes impossible: .

And finally, the third urn contains only black balls, which means the corresponding conditional probability extracting the black ball will be (the event is reliable).



– the probability that a black ball will be drawn from a randomly chosen urn.

Answer:

The analyzed example again suggests how important it is to delve into the CONDITION. Let's take the same problems with urns and balls - despite their external similarity, the methods of solution can be completely different: somewhere you only need to use classical definition of probability, somewhere events independent, somewhere dependent, and somewhere we are talking about hypotheses. At the same time, there is no clear formal criterion for choosing a solution - you almost always need to think about it. How to improve your skills? We decide, we decide and we decide again!

Problem 2

The shooting range has 5 rifles of varying accuracy. The probabilities of hitting the target for a given shooter are respectively equal to 0.5; 0.55; 0.7; 0.75 and 0.4. What is the probability of hitting the target if the shooter fires one shot from a randomly selected rifle?

A short solution and answer at the end of the lesson.

In most thematic problems, the hypotheses are, of course, not equally probable:

Problem 3

There are 5 rifles in the pyramid, three of which are equipped with an optical sight. The probability that a shooter will hit a target when firing a rifle with a telescopic sight is 0.95; for a rifle without an optical sight, this probability is 0.7. Find the probability that the target will be hit if the shooter fires one shot from a rifle taken at random.

Solution: in this problem the number of rifles is exactly the same as in the previous one, but there are only two hypotheses:
– the shooter will select a rifle with an optical sight;
– the shooter will choose a rifle without an optical sight.
By classical definition of probability: .
Control:

Consider the event: – a shooter hits a target with a rifle taken at random.
According to the condition: .

According to the total probability formula:

Answer: 0,85

In practice, a shortened way of formatting a task, which you are also familiar with, is quite acceptable:

Solution: according to the classical definition: – the probabilities of choosing a rifle with an optical sight and without an optical sight, respectively.

According to the condition, – the probability of hitting the target from the corresponding types of rifles.

According to the total probability formula:
– the probability that a shooter will hit a target with a randomly selected rifle.

Answer: 0,85

The following task is for you to solve on your own:

Problem 4

The engine operates in three modes: normal, forced and idling. In idle mode, the probability of its failure is 0.05, in normal operation mode – 0.1, and in forced mode – 0.7. 70% of the time the engine operates in normal mode, and 20% in forced mode. What is the probability of engine failure during operation?

Just in case, let me remind you that to get the probability values, the percentages must be divided by 100. Be very careful! According to my observations, people often try to confuse the conditions of problems involving the total probability formula; and I specifically chose this example. I'll tell you a secret - I almost got confused myself =)

Solution at the end of the lesson (formatted in a short way)

Problems using Bayes' formulas

The material is closely related to the content of the previous paragraph. Let the event occur as a result of the implementation of one of the hypotheses . How to determine the probability that a particular hypothesis occurred?

Given that that event has already happened, hypothesis probabilities overrated according to the formulas that received the name of the English priest Thomas Bayes:


– the probability that the hypothesis took place;
– the probability that the hypothesis took place;

– the probability that the hypothesis took place.

At first glance, it seems completely absurd - why recalculate the probabilities of hypotheses if they are already known? But in fact there is a difference:

- This a priori(estimated to tests) probability.

- This a posteriori(estimated after tests) probabilities of the same hypotheses, recalculated in connection with “newly discovered circumstances” - taking into account the fact that the event definitely happened.

Let's look at this difference with a specific example:

Problem 5

2 batches of products arrived at the warehouse: the first - 4000 pieces, the second - 6000 pieces. The average percentage of non-standard products in the first batch is 20%, and in the second – 10%. The product taken from the warehouse at random turned out to be standard. Find the probability that it is: a) from the first batch, b) from the second batch.

First part solutions consists of using the total probability formula. In other words, calculations are carried out under the assumption that the test not yet produced and event “the product turned out to be standard” not yet.

Let's consider two hypotheses:
– a product taken at random will be from the 1st batch;
– a product taken at random will be from the 2nd batch.

Total: 4000 + 6000 = 10000 items in stock. According to the classical definition:
.

Control:

Let's consider the dependent event: – a product taken at random from the warehouse will standard.

In the first batch 100% – 20% = 80% standard products, therefore: given that that it belongs to the 1st party.

Similarly, in the second batch 100% - 10% = 90% standard products and – the probability that a product taken at random from a warehouse will be standard given that that it belongs to the 2nd party.

According to the total probability formula:
– the probability that a product taken at random from a warehouse will be standard.

Part two. Let a product taken at random from a warehouse turn out to be standard. This phrase is directly stated in the condition, and it states the fact that the event happened.

According to Bayes formulas:

a) is the probability that the selected standard product belongs to the 1st batch;

b) is the probability that the selected standard product belongs to the 2nd batch.

After revaluation hypotheses, of course, still form full group:
(examination;-))

Answer:

Ivan Vasilyevich, who again changed his profession and became the director of the plant, will help us understand the meaning of the revaluation of hypotheses. He knows that today the 1st workshop shipped 4,000 products to the warehouse, and the 2nd workshop - 6,000 products, and comes to make sure of this. Let's assume that all products are of the same type and are in the same container. Naturally, Ivan Vasilyevich preliminarily calculated that the product that he would now remove for inspection would most likely be produced by the 1st workshop and most likely by the second. But after the chosen product turns out to be standard, he exclaims: “What a cool bolt! “It was rather released by the 2nd workshop.” Thus, the probability of the second hypothesis is overestimated for the better, and the probability of the first hypothesis is underestimated: . And this revaluation is not unfounded - after all, the 2nd workshop not only produced more products, but also works 2 times better!

Pure subjectivism, you say? In part - yes, moreover, Bayes himself interpreted a posteriori probabilities as level of trust. However, not everything is so simple - there is also an objective grain in the Bayesian approach. After all, the likelihood that the product will be standard (0.8 and 0.9 for the 1st and 2nd workshops, respectively) This preliminary(a priori) and average assessments. But, speaking philosophically, everything flows, everything changes, including probabilities. It is quite possible that at the time of the study the more successful 2nd workshop increased the percentage of production of standard products (and/or the 1st workshop reduced), and if you check a larger number or all 10 thousand products in the warehouse, then the overestimated values ​​will turn out to be much closer to the truth.

By the way, if Ivan Vasilyevich extracts a non-standard part, then on the contrary, he will be more “suspicious” of the 1st workshop and less of the second. I suggest you check this out for yourself:

Problem 6

2 batches of products arrived at the warehouse: the first - 4000 pieces, the second - 6000 pieces. The average percentage of non-standard products in the first batch is 20%, in the second – 10%. The product taken from the warehouse at random turned out to be Not standard. Find the probability that it is: a) from the first batch, b) from the second batch.

The condition is distinguished by two letters, which I have highlighted in bold. The problem can be solved from scratch, or using the results of previous calculations. In the sample, I carried out a complete solution, but in order to avoid any formal overlap with Problem No. 5, the event “a product taken at random from a warehouse will be non-standard” indicated by .

The Bayesian scheme for reestimating probabilities is found everywhere, and it is also actively exploited by various types of scammers. Let’s consider a three-letter joint stock company that has become a household name, which attracts deposits from the public, supposedly invests them somewhere, regularly pays dividends, etc. What's happening? Day after day, month after month passes, and more and more new facts, conveyed through advertising and word of mouth, only increase the level of trust in the financial pyramid (posteriori Bayesian reestimation due to past events!). That is, in the eyes of investors there is a constant increase in the likelihood that “this is a serious company”; while the probability of the opposite hypothesis (“these are just more scammers”), of course, decreases and decreases. What follows, I think, is clear. It is noteworthy that the earned reputation gives the organizers time to successfully hide from Ivan Vasilyevich, who was left not only without a batch of bolts, but also without pants.

We will return to equally interesting examples a little later, but for now the next step is perhaps the most common case with three hypotheses:

Problem 7

Electric lamps are manufactured at three factories. The 1st plant produces 30% of the total number of lamps, the 2nd - 55%, and the 3rd - the rest. The products of the 1st plant contain 1% of defective lamps, the 2nd - 1.5%, the 3rd - 2%. The store receives products from all three factories. The purchased lamp turned out to be defective. What is the probability that it was produced by plant 2?

Note that in problems on Bayes formulas in the condition Necessarily there is a certain what happened event, in this case the purchase of a lamp.

Events have increased, and solution It’s more convenient to arrange it in a “quick” style.

The algorithm is exactly the same: in the first step we find the probability that the purchased lamp is it turns out defective.

Using the initial data, we convert percentages into probabilities:
– the probability that the lamp was produced by the 1st, 2nd and 3rd factories, respectively.
Control:

Similarly: – the probability of producing a defective lamp for the corresponding factories.

According to the total probability formula:

– the probability that the purchased lamp will be defective.

Step two. Let the purchased lamp turn out to be defective (the event occurred)

According to Bayes' formula:
– the probability that the purchased defective lamp was manufactured by a second plant

Answer:

Why did the initial probability of the 2nd hypothesis increase after revaluation? After all, the second plant produces lamps of average quality (the first is better, the third is worse). So why did it increase a posteriori Is it possible that the defective lamp is from the 2nd plant? This is no longer explained by “reputation”, but by size. Since plant No. 2 produced the largest number of lamps, they blame it (at least subjectively): “most likely, this defective lamp is from there”.

It is interesting to note that the probabilities of the 1st and 3rd hypotheses were overestimated in the expected directions and became equal:

Control: , which was what needed to be checked.

By the way, about underestimated and overestimated estimates:

Problem 8

In the student group, 3 people have a high level of training, 19 people have an average level and 3 people have a low level. The probabilities of successfully passing the exam for these students are respectively equal to: 0.95; 0.7 and 0.4. It is known that some student passed the exam. What is the probability that:

a) he was prepared very well;
b) was moderately prepared;
c) was poorly prepared.

Perform calculations and analyze the results of re-evaluating the hypotheses.

The task is close to reality and is especially plausible for a group of part-time students, where the teacher has virtually no knowledge of the abilities of a particular student. In this case, the result can cause quite unexpected consequences. (especially for exams in the 1st semester). If a poorly prepared student is lucky enough to get a ticket, then the teacher is likely to consider him a good student or even a strong student, which will bring good dividends in the future (of course, you need to “raise the bar” and maintain your image). If a student studied, crammed, and repeated for 7 days and 7 nights, but was simply unlucky, then further events can develop in the worst possible way - with numerous retakes and balancing on the brink of elimination.

Needless to say, reputation is the most important capital; it is no coincidence that many corporations bear the names of their founding fathers, who led the business 100-200 years ago and became famous for their impeccable reputation.

Yes, the Bayesian approach is to a certain extent subjective, but... that’s how life works!

Let’s consolidate the material with a final industrial example, in which I will talk about hitherto unknown technical intricacies of the solution:

Problem 9

Three workshops of the plant produce the same type of parts, which are sent to a common container for assembly. It is known that the first workshop produces 2 times more parts than the second workshop, and 4 times more than the third workshop. In the first workshop, defects are 12%, in the second – 8%, in the third – 4%. For control, one part is taken from the container. What is the probability that it will be defective? What is the probability that the extracted defective part was produced by the 3rd workshop?

Ivan Vasilyevich is on horseback again =) The film must have a happy ending =)

Solution: unlike Problems No. 5-8, here a question is explicitly asked, which is resolved using the total probability formula. But on the other hand, the condition is a little “encrypted”, and the school skill of composing simple equations will help us solve this puzzle. It is convenient to take the smallest value as “x”:

Let be the share of parts produced by the third workshop.

According to the condition, the first workshop produces 4 times more than the third workshop, so the share of the 1st workshop is .

In addition, the first workshop produces 2 times more products than the second workshop, which means the share of the latter: .

Let's create and solve the equation:

Thus: – the probability that the part removed from the container was produced by the 1st, 2nd and 3rd workshops, respectively.

Control: . In addition, it would not hurt to look at the phrase again “It is known that the first workshop produces products 2 times more than the second workshop and 4 times more than the third workshop.” and make sure that the obtained probability values ​​actually correspond to this condition.

Initially, one could take the share of the 1st or the share of the 2nd workshop as “X” - the probabilities would be the same. But, one way or another, the most difficult part has been passed, and the solution is on track:

From the condition we find:
– the probability of producing a defective part for the corresponding workshops.

According to the total probability formula:
– the likelihood that a part randomly removed from a container will turn out to be non-standard.

Question two: what is the probability that the extracted defective part was produced by the 3rd workshop? This question assumes that the part has already been removed and it turned out to be defective. We re-evaluate the hypothesis using Bayes' formula:
– the desired probability. Completely expected - after all, the third workshop not only produces the smallest proportion of parts, but also leads in quality!

In this case it was necessary simplify four-story fraction, which you have to do quite often in problems using Bayes formulas. But for this lesson, I somehow randomly selected examples in which many calculations can be carried out without ordinary fractions.

Since the condition does not contain points “a” and “be”, then it is better to provide the answer with text comments:

Answer: – the probability that a part removed from the container will be defective; – the probability that the extracted defective part was produced by the 3rd workshop.

As you can see, problems with the total probability formula and Bayes formula are quite simple, and, probably for this reason, they so often try to complicate the condition, which I already mentioned at the beginning of the article.

Additional examples are in the file with ready-made solutions for F.P.V. and Bayes formulas, in addition, there will probably be those who want to become more deeply acquainted with this topic in other sources. And the topic is really very interesting - what is it worth? Bayes' paradox, which justifies the everyday advice that if a person is diagnosed with a rare disease, then it makes sense for him to conduct a repeat or even two repeat independent examinations. It would seem that they are doing this solely out of desperation... - but no! But let's not talk about sad things.


is the probability that a randomly selected student will pass the exam.
Let the student pass the exam. According to Bayes formulas:
A) – the probability that the student who passed the exam was very well prepared. The objective initial probability turns out to be overestimated, since almost always some “average people” are lucky with the questions and answer very strongly, which gives the erroneous impression of impeccable preparation.
b) – the probability that the student who passed the exam was averagely prepared. The initial probability turns out to be slightly overestimated, because students with an average level of preparation are usually the majority, in addition, here the teacher will include “excellent” students who answered unsuccessfully, and occasionally a poorly performing student who was very lucky with a ticket.
V) – the likelihood that the student who took the exam was poorly prepared. The initial probability was overestimated for the worse. No wonder.
Examination:
Answer :

Formulate and prove the formula for total probability. Give an example of its application.

If the events H 1, H 2, ..., H n are pairwise incompatible and at least one of these events necessarily occurs during each test, then for any event A the following equality holds:

P(A)= P H1 (A)P(H 1)+ P H2 (A)P(H 2)+…+ P Hn (A)P(H n) – total probability formula. In this case, H 1, H 2, …, H n are called hypotheses.

Proof: Event A breaks down into options: AH 1, AH 2, ..., AH n. (A comes along with H 1, etc.) In other words, we have A = AH 1 + AH 2 +…+ AH n. Since H 1 , H 2 , …, H n are pairwise incompatible, the events AH 1 , AH 2 , …, AH n are also incompatible. Applying the addition rule, we find: P(A)= P(AH 1)+ P(AH 2)+…+ P(AH n). Replacing each term P(AH i) on the right side with the product P Hi (A)P(H i), we obtain the required equality.

Example:

Let's say we have two sets of parts. The probability that the part of the first set is standard is 0.8, and the second is 0.9. Let's find the probability that a part taken at random is standard.

P(A) = 0.5*0.8 + 0.5*0.9 = 0.85.

Formulate and prove Bayes' formula. Give an example of its application.

Bayes formula:

It allows you to reestimate the probabilities of hypotheses after the result of the test that resulted in event A becomes known.

Proof: Let event A occur subject to the occurrence of one of the incompatible events H 1 , H 2 , …, H n , forming a complete group. Since it is not known in advance which of these events will occur, they are called hypotheses.

The probability of occurrence of event A is determined by the total probability formula:

P(A)= P H1 (A)P(H 1)+ P H2 (A)P(H 2)+…+ P Hn (A)P(H n) (1)

Let us assume that a test was carried out, as a result of which event A appeared. Let us determine how the probabilities of the hypotheses have changed due to the fact that event A has already occurred. In other words, we will look for conditional probabilities

P A (H 1), P A (H 2), ..., P A (H n).

By the multiplication theorem we have:

P(AH i) = P(A) P A (H i) = P(H i)P Hi (A)

Let us replace P(A) here according to formula (1), we obtain

Example:

There are three identical-looking boxes. In the first box there are n=12 white balls, in the second there are m=4 white and n-m=8 black balls, in the third there are n=12 black balls. A white ball is taken from a box chosen at random. Find the probability P that the ball is drawn from the second box.

Solution.

4) Derive the formula for probabilityksuccess in the seriesntests according to the Bernoulli scheme.

Let us examine the case when n identical and independent experiments, each of which has only 2 outcomes ( A;). Those. some experience is repeated n times, and in each experiment some event A may appear with probability P(A)=q or not appear with probability P()=q-1=p .

The space of elementary events of each series of tests contains points or sequences of symbols A And . Such a probability space is called the Bernoulli scheme. The task is to ensure that for a given k find the probability that n- multiple repetition of the experiment event A will come k once.

For greater clarity, let’s agree on each occurrence of an event A consider as success, non-advancement A - like failure. Our goal is to find the probability that n experiments exactly k will be successful; Let's denote this event temporarily by B.

Event IN is presented as the sum of a series of events - event options IN. To record a specific option, you need to indicate the numbers of those experiments that end in success. For example, one of the possible options is

. The number of all options is obviously equal to , and the probability of each option due to the independence of the experiments is equal to . Hence the probability of the event IN equal to . To emphasize the dependence of the resulting expression on n And k, let's denote it . So, .

5) Using the integral approximate Laplace formula, derive a formula for estimating the deviation of the relative frequency of event A from the probability p of the occurrence of A in one experiment.

Under the conditions of the Bernoulli scheme with given values ​​of n and p for a given e>0, we estimate the probability of the event , where k is the number of successes in n experiments. This inequality is equivalent to |k-np|£en, i.e. -en £ k-np £ en or np-en £ k £ np+en. Thus, we are talking about obtaining an estimate for the probability of the event k 1 £ k £ k 2 , where k 1 = np-en, k 2 = np+en. Applying the integral approximate Laplace formula, we obtain: P( » Taking into account the oddness of the Laplace function, we obtain the approximate equality P( » 2Ф.

Note : because by condition n=1, then we substitute one instead of n and get the final answer.

6) Let X– a discrete random variable that takes only non-negative values ​​and has a mathematical expectation m. Prove that P(X≥ 4) ≤ m/ 4 .

m= (since the 1st term is positive, then if you remove it, it will be less) ³ (replace a by 4, it will only be less) ³ = =4× P(X³4). From here P(X≥ 4) ≤ m/ 4 .

(Instead of 4 there can be any number).

7) Prove that if X And Y are independent discrete random variables that take a finite set of values, then M(XY)=M(X)M(Y)

x 1 x 2
p 1 p2

called number M(XY)= x 1 p 1 + x 2 p 2 + …

If random variables X And Y are independent, then the mathematical expectation of their product is equal to the product of their mathematical expectations (the theorem of multiplication of mathematical expectations).

Proof: Possible values X let's denote x 1 , x 2, …, possible values Y - y 1 , y 2, … A p ij =P(X=x i , Y=y j). XY M(XY)= Due to the independence of the quantities X And Y we have: P(X= x i , Y=y j)= P(X=x i) P(Y=y j). Having designated P(X=x i)=r i , P(Y=y j)=s j, we rewrite this equality in the form p ij =r i s j

Thus, M(XY)= = . Transforming the resulting equality, we derive: M(XY)=()() = M(X)M(Y), Q.E.D.

8) Prove that if X And Y are discrete random variables that take a finite set of values, then M(X+Y) = M(X) +M(Y).

Mathematical expectation of a discrete random variable with a distribution law

x 1 x 2
p 1 p2

called number M(XY)= x 1 p 1 + x 2 p 2 + …

The mathematical expectation of the sum of two random variables is equal to the sum of the mathematical expectations of the terms: M(X+Y)= M(X)+M(Y).

Proof: Possible values X let's denote x 1 , x 2, …, possible values Y - y 1 , y 2, … A p ij =P(X=x i , Y=y j). Law of magnitude distribution X+Y will be expressed in the corresponding table. M(X+Y)= .This formula can be rewritten as follows: M(X+Y)= .The first sum of the right side can be represented as . The expression is the probability that any of the events will occur (X=x i, Y=y 1), (X=x i, Y=y 2), ... Therefore, this expression is equal to P(X=x i). From here . Likewise, . As a result, we have: M(X+Y)= M(X)+M(Y), which is what needed to be proven.

9) Let X– discrete random variable distributed according to the binomial distribution law with parameters n And r. Prove that M(X)=nр, D(X)=nр(1-р).

Let it be produced n independent trials, in each of which event A can occur with probability r, so the probability of the opposite event Ā equal to q=1-p. Let's consider the following. size X– number of occurrence of the event A V n experiments. Let's imagine X as the sum of indicators of event A for each trial: X=X 1 +X 2 +…+X n. Now let's prove that M(X i)=p, D(X i)=np. To do this, consider the law of distribution sl. quantities, which looks like:

X
R r q

It's obvious that M(X)=p, the random variable X 2 has the same distribution law, therefore D(X)=M(X 2)-M 2 (X)=р-р 2 =р(1-р)=рq. Thus, M(X i)=p, D(Х i)=pq. According to the theorem of addition of mathematical expectations M(X)=M(X 1)+..+M(X n)=nр. Since random variables Xi are independent, then the variances also add up: D(X)=D(X 1)+…+D(X n)=npq=np(1-p).

10) Let X– discrete random variable distributed according to Poisson’s law with parameter λ. Prove that M(X) = λ .

Poisson's law is given by the table:

From here we have:

Thus, the parameter λ, which characterizes this Poisson distribution, is nothing more than the mathematical expectation of the value X.

11) Let X be a discrete random variable distributed according to a geometric law with parameter p. Prove that M (X) = .

The geometric distribution law is associated with the sequence of Bernoulli trials until the 1st successful event A. The probability of the occurrence of event A in one trial is p, the opposite event q = 1-p. The distribution law of the random variable X - the number of tests - has the form:

X n
R r pq pq n-1

The series written in brackets is obtained by term-by-term differentiation of the geometric progression

Hence, .

12) Prove that the correlation coefficient of random variables X and Y satisfies the condition.

Definition: The correlation coefficient of two random variables is the ratio of their covariance to the product of the standard deviations of these variables: . .

Proof: Let's consider the random variable Z = . Let's calculate its variance. Since the left side is non-negative, the right side is non-negative. Therefore, , |ρ|≤1.

13) How is the variance calculated in the case of a continuous distribution with density f(x)? Prove that for a random variable X with density dispersion D(X) does not exist, and the mathematical expectation M(X) exists.

The variance of an absolutely continuous random variable X with a density function f(x) and mathematical expectation m = M(X) is determined by the same equality as for a discrete variable

.

In the case when an absolutely continuous random variable X is concentrated on the interval,

∞ - the integral diverges, therefore, dispersion does not exist.

14) Prove that for a normal random variable X with a distribution density function mathematical expectation M(X) = μ.

Let us prove that μ is the mathematical expectation.

By determining the mathematical expectation of a continuous r.v.,

Let's introduce a new variable . From here. Taking into account that the new limits of integration are equal to the old ones, we obtain

The first of the terms is equal to zero due to the oddness of the integrand function. The second of the terms is equal to μ (Poisson integral ).

So, M(X)=μ, i.e. the mathematical expectation of a normal distribution is equal to the parameter μ.

15) Prove that for a normal random variable X with a distribution density function dyspresia D(X) = σ 2 .

The formula describes the density of the normal probability distribution of a continuous random variable.

Let us prove that is the standard deviation of the normal distribution. Let's introduce a new variable z=(x-μ)/ . From here . Taking into account that the new limits of integration are equal to the old ones, we obtain Integrating by parts, putting u=z, we find Therefore, .So, the standard deviation of the normal distribution is equal to the parameter.

16) Prove that for a continuous random variable distributed according to an exponential law with parameter , the mathematical expectation is .

A random variable X, taking only non-negative values, is said to be distributed according to the exponential law if for some positive parameter λ>0 the density function has the form:

To find the mathematical expectation, we use the formula

Bayes formula

Bayes' theorem- one of the main theorems of elementary probability theory, which determines the probability of an event occurring in conditions where only some partial information about events is known based on observations. Using Bayes' formula, it is possible to recalculate the probability more accurately, taking into account both previously known information and data from new observations.

"Physical meaning" and terminology

Bayes' formula allows you to “rearrange cause and effect”: given the known fact of an event, calculate the probability that it was caused by a given cause.

Events reflecting the action of “causes” in this case are usually called hypotheses, since they are alleged the events that led to this. The unconditional probability of the hypothesis being true is called a priori(how likely is the reason at all), and conditional - taking into account the fact of the event - a posteriori(how likely is the reason turned out to take into account the event data).

Consequence

An important consequence of Bayes' formula is the formula for the total probability of an event depending on several inconsistent hypotheses ( and only from them!).

- probability of an event occurring B, depending on a number of hypotheses A i, if the degree of reliability of these hypotheses is known (for example, measured experimentally);

Derivation of the formula

If an event depends only on causes A i, then if it happened, it means that one of the reasons must have occurred, i.e.

According to Bayes' formula

By transfer P(B) to the right we obtain the desired expression.

Spam filtering method

A method based on Bayes' theorem has found successful application in spam filtering.

Description

When training a filter, for each word encountered in letters, its “weight” is calculated and stored - the probability that a letter with this word is spam (in the simplest case - according to the classic definition of probability: “appearances in spam / appearances in total”).

When checking a newly arrived letter, the probability that it is spam is calculated using the above formula for a variety of hypotheses. In this case, “hypotheses” are words, and for each word “the reliability of the hypothesis” is the % of this word in the letter, and “the dependence of the event on the hypothesis” P(B | A i) - the previously calculated “weight” of the word. That is, the “weight” of a letter in this case is nothing more than the average “weight” of all its words.

A letter is classified as “spam” or “non-spam” based on whether its “weight” exceeds a certain level specified by the user (usually 60-80%). After a decision is made on a letter, the “weights” for the words included in it are updated in the database.

Characteristic

This method is simple (the algorithms are elementary), convenient (allows you to do without “blacklists” and similar artificial techniques), effective (after training on a sufficiently large sample, it cuts out up to 95-97% of spam, and in case of any errors it can be retrained). In general, there are all indications for its widespread use, which is what happens in practice - almost all modern spam filters are built on its basis.

However, the method also has a fundamental drawback: it based on assumption, What some words are more common in spam, while others are more common in regular emails, and is ineffective if this assumption is incorrect. However, as practice shows, even a person cannot detect such spam “by eye” - only by reading the letter and understanding its meaning.

Another, not fundamental, drawback related to the implementation is that the method only works with text. Knowing this limitation, spammers began to insert advertising information into the picture, but the text in the letter was either missing or meaningless. To counter this, you have to use either text recognition tools (an “expensive” procedure, used only when absolutely necessary), or old filtering methods - “black lists” and regular expressions (since such letters often have a stereotypical form).

See also

Notes

Links

Literature

  • Bird Kiwi. Reverend Bayes's theorem. // Computerra magazine, August 24, 2001.
  • Paul Graham. A plan for spam (English). // Personal website of Paul Graham.

Wikimedia Foundation. 2010.

See what “Bayes Formula” is in other dictionaries:

    A formula that has the form: where a1, A2,..., An are incompatible events, General scheme of application of f.v. g.: if event B can occur in different conditions regarding which n hypotheses A1, A2, ..., An were made with probabilities P(A1), ... known before experiment. Geological encyclopedia

    Allows you to calculate the probability of an event of interest through the conditional probabilities of this event under the assumption of certain hypotheses, as well as the probabilities of these hypotheses. Formulation Let a probability space be given, and the complete group in pairs... ... Wikipedia

    Allows you to calculate the probability of an event of interest through the conditional probabilities of this event under the assumption of certain hypotheses, as well as the probabilities of these hypotheses. Formulation Let a probability space be given, and a complete group of events such... ... Wikipedia

    - (or Bayes’ formula) is one of the main theorems of probability theory, which allows you to determine the probability that some event (hypothesis) has occurred in the presence of only indirect evidence (data), which may be inaccurate... Wikipedia

    Bayes' theorem is one of the main theorems of elementary probability theory, which determines the probability of an event occurring in conditions where only some partial information about events is known based on observations. Using Bayes' formula you can... ... Wikipedia

    Bayes, Thomas Thomas Bayes Reverend Thomas Bayes Date of birth: 1702 (1702) Place of birth ... Wikipedia

    Thomas Bayes Reverend Thomas Bayes Date of birth: 1702 Place of birth: London ... Wikipedia

    Bayesian inference is one of the methods of statistical inference in which Bayes' formula is used to refine probabilistic estimates of the truth of hypotheses when evidence is received. The use of Bayesian updating is especially important in... ... Wikipedia

    To improve this article, it is desirable?: Find and arrange in the form of footnotes links to authoritative sources confirming what has been written. After adding footnotes, provide more precise indications of sources. Pere... Wikipedia

    Will prisoners betray each other, following their selfish interests, or will they remain silent, thereby minimizing the overall sentence? Prisoner's dilemma

Books

  • Probability theory and mathematical statistics in problems: More than 360 problems and exercises, Borzykh D.. The proposed manual contains problems of varying levels of complexity. However, the main emphasis is on tasks of medium complexity. This is done intentionally to encourage students to...

Siberian State University of Telecommunications and Informatics

Department of Higher Mathematics

in the discipline: “Probability Theory and Mathematical Statistics”

“The formula of total probability and the formula of Bayes (Bayes) and their application”

Completed:

Head: Professor B.P. Zelentsov

Novosibirsk, 2010


Introduction 3

1. Total probability formula 4-5

2. Bayes formula (Bayes) 5-6

3. Problems with solutions 7-11

4. The main areas of application of the Bayes formula (Bayes) 11

Conclusion 12

Literature 13


Introduction

Probability theory is one of the classical branches of mathematics. It has a long history. The foundations of this branch of science were laid by great mathematicians. I will name, for example, Fermat, Bernoulli, Pascal.
Later, the development of probability theory was determined in the works of many scientists.
Scientists from our country made a great contribution to the theory of probability:
P.L.Chebyshev, A.M.Lyapunov, A.A.Markov, A.N.Kolmogorov. Probabilistic and statistical methods have now penetrated deeply into applications. They are used in physics, technology, economics, biology and medicine. Their role has especially increased in connection with the development of computer technology.

For example, to study physical phenomena, observations or experiments are made. Their results are usually recorded in the form of values ​​of some observable quantities. When repeating experiments, we discover a scattering of their results. For example, by repeating measurements of the same quantity with the same device while maintaining certain conditions (temperature, humidity, etc.), we obtain results that are at least slightly different from each other. Even repeated measurements do not make it possible to accurately predict the result of the next measurement. In this sense, they say that the result of a measurement is a random variable. An even more obvious example of a random variable is the number of a winning ticket in a lottery. Many other examples of random variables can be given. Nevertheless, in the world of chance, certain patterns are revealed. The mathematical apparatus for studying such patterns is provided by probability theory.
Thus, probability theory deals with the mathematical analysis of random events and associated random variables.

1. Total probability formula.

Let there be a group of events H 1 ,H 2 ,..., Hn, having the following properties:

1) all events are pairwise incompatible: H i

Hj =Æ; i , j =1,2,...,n ; i ¹ j ;

2) their union forms the space of elementary outcomes W:

.
Fig.8

In this case we will say that H 1 , H 2 ,...,Hn form full group of events. Such events are sometimes called hypotheses .

Let A- some event: AÌW (Venn diagram is shown in Figure 8). Then it holds total probability formula:

P (A) = P (A /H 1)P (H 1) + P (A /H 2)P (H 2) + ...+P (A /Hn)P (Hn) =

Proof. Obviously: A=

, and all events ( i = 1,2,...,n) are pairwise incompatible. From here, using the addition theorem of probabilities, we obtain

P (A) = P (

) + P () +...+ P (

If we take into account that by the multiplication theorem P (

) = P (A/H i) P (H i) ( i = 1,2,...,n), then from the last formula it is easy to obtain the above total probability formula.

Example. The store sells electric lamps produced by three factories, with the share of the first factory being 30%, the second being 50%, and the third being 20%. Defects in their products are 5%, 3% and 2%, respectively. What is the probability that a lamp randomly selected in a store turns out to be defective?

Let the event H 1 is that the selected lamp is produced in the first factory, H 2 on the second, H 3 - at the third plant. Obviously:

P (H 1) = 3/10, P (H 2) = 5/10, P (H 3) = 2/10.

Let the event A is that the selected lamp turned out to be defective; A/H i means the event that a defective lamp is selected from lamps produced in i-th plant. From the problem statement it follows:

P (A / H 1) = 5/10; P (A / H 2) = 3/10; P (A / H 3) = 2/10

Using the total probability formula we get

2. Bayes formula (Bayes)

Let H 1 ,H 2 ,...,Hn- a complete group of events and AМ W – some event. Then, according to the formula for conditional probability

(1)

Here P (Hk /A) – conditional probability of an event (hypothesis) Hk or the likelihood that Hk is implemented provided that the event A happened.

According to the probability multiplication theorem, the numerator of formula (1) can be represented as

P = P = P (A /Hk)P (Hk)

To represent the denominator of formula (1), you can use the total probability formula

P (A)

Now from (1) we can obtain a formula called Bayes formula :

Bayes' formula calculates the probability of the hypothesis being realized Hk provided that the event A happened. Bayes' formula is also called formula for the probability of hypotheses. Probability P (Hk) is called the prior probability of the hypothesis Hk, and the probability P (Hk /A) – posterior probability.

Theorem. The probability of a hypothesis after the test is equal to the product of the probability of the hypothesis before the test and the corresponding conditional probability of the event that occurred during the test, divided by the total probability of this event.

Example. Let's consider the above problem about electric lamps, just change the question of the problem. Suppose a customer bought an electric lamp in this store, and it turned out to be defective. Find the probability that this lamp was manufactured in the second plant. Magnitude P (H 2) = 0.5 in this case is the a priori probability of the event that the purchased lamp was manufactured at the second plant. Having received information that the purchased lamp is defective, we can correct our assessment of the possibility of manufacturing this lamp at the second plant by calculating the posterior probability of this event.