Cyberchondria: Why Bayes is a must when looking for web-based diagnoses

3 min readJul 18, 2016

By Mario Alemi, PhD, Chief Data Officer, Your.MD

London: 18 July 2016

The NY Times (and few others) wrote a story entitled Microsoft Finds Cancer Clues in Search Queries:

Microsoft scientists have demonstrated that by analyzing large samples of search engine queries they may in some cases be able to identify internet users who are suffering from pancreatic cancer, even before they have received a diagnosis of the disease.
The scientists said they hoped their work could lead to early detection of cancer.

A little below you read:

The researchers reported that they could identify from 5 to 15 percent of pancreatic cases with false positive rates of as low as one in 100,000.

Before you run to Bing with the hope of having some miraculous diagnosis, ask yourself: what’s the probability that, being “positive” at the bing test, you actually have pancreatic cancer?

(Spoiler: it’s 50%, and in the indented paragraphs below there is some math you can skip.)

There is a simple formula for that, called the Bayes formula (or theorem):

P(Positive) is the probability that you score positive at the bing test, regardless of whether you have cancer or not. This the probability that you score positive and have cancer (“true positive” probability times probability of having the cancer), plus the probability that you score positive without having cancer (“false positive” probability times 1 — probability of having cancer).

The article states that the “false positive rates [are] of as low as one in 100,000”. The “true positive” is reported as 5–15%, which I will approximate to 10%.

The probability that a random person will be diagnosed with pancreatic cancer is roughly 10 in 100,000 (see cancer.org).

We now have all the numbers to compute the probability that, if Microsoft diagnoses cancer, a user actually has it:

This is not difficult to understand; it’s common sense. The probability of being rightfully diagnosed by bing is 1 out of 100,000 (10 out of 100,000 people have cancer, and of these 10 only one receives the diagnosis). Of the remaining healthy ones (practically 100,000), 1 person is diagnosed by mistake. Therefore, bing, out of 100,000 people, will diagnose 2 people: one is the “true positive”, and the other the “false positive”. The probability of being a true positive is the 50%.

Now, there is some difference between the original article by the two Microsoft researchers (Ryen White and Eric Horvitz) and the NY Times’ vesion. White and Horvitz are not sensationalist in their publication for a reason: in 2008, they wrote an article on “cyberchondria”, or “unfounded escalation of concerns about common symptomatology, based on the review of search results and literature on the Web”. Their belief, I believe, is that the Web is a dangerous place for diagnosing yourself.

Nonetheless, the tone of the NY Times article is slightly sensationalist. I do believe that 50% is better than nothing. But I also believe that writing about “false positive rates of as low as one in 100,000” is misleading. Particularly when the article does not report that the final confidence of the diagnosis is 50%.

Articles like this, IMHO, are prone to lead to cyberchondria, and it would be a pity for White and Horvitz to achieve exactly the opposite result they (supposedly) had in mind when writing their original piece.

— Originally posted by Mario Alemi on 9 June 2016

Your.MD is free and available on iOS and Android apps, popular messenger platforms (such as Facebook Messenger, Kik, Skype, Slack and Telegram), and via the web.

Cyberchondria: Why Bayes is a must when looking for web-based diagnoses

Written by Healthily

No responses yet