Wednesday, March 29, 2006

Battle of Britannica

In this week’s science section we follow the fight between the journal Nature and the encyclopedia Britannica. The argument is about a comparison that Nature did between two encyclopedias: Wikipedia and Encyclopedia Britannica[1].

I don’t intend to weigh in either way on the debate about whether an open-source encyclopedia can approach the accuracy of an edited one. I'm not an information scientist. But as a science journalist, I'm interested in two issues: one is whether Britannica really is as error-laden as the Nature study seems to suggest; the other is whether Britannica really is only 30% more accurate than Wikipedia.

This has been a fractious debate, so let me set out my stall. That way it is clear where I’m coming from and anyone who suspects I have an axe to grind in either direction will at least have some facts to base their conclusions on. Firstly, I think Wikipedia and Britannica are both great. Very different. But wonderful resources.

At the outset, I also ought to declare that I worked at Nature as a news journalist in 2000. So I have a lot of respect for the idea that led to this study, and the efforts that were taken to make it happen. It’s a fascinating question that none of us had thought to ask before: how accurate are science articles in Wikipedia when compared to their equivalents in the gold-standard Britannica?

Unfortunately, answering this question is not as straightforward as it may seem. And while experts may have been involved in the reviews, the methodology was designed, and the data collated, by journalists, not scientists. The results may have been published in Nature, but they were published in its front half as a news story—not the back half where the original science is published. So the first thing to emphasise is that the study is not a scientific one in the sense that one would normally expect of a “study in Nature”.

For anyone interested in the minutiae of the debate, it is worth looking at Britannica’s 20-page review of the Nature study[2]. Ignore the moans about the availability of study data and misleading headlines, and go straight to the Appendix. Here you will find that Britannica has taken issue with 58 of the “errors” that reviewers reported finding in their tome. Reading these it is difficult to avoid concluding that about half of the errors Nature reviewers have flagged up are actually points of legitimate debate rather than mistakes. This is important because if 47% of the "errors" in Britannica can actually be ignored, then rather than having 123 errors in 42 articles (about 3 per article), it has only 65 (about 1.5 per article). A big difference for Britannica’s reputation.

Nature has a good answer to this[3,4]. Look, they say, we were not trying to count the actual number of mistakes but compare the relative accuracy of each encyclopedia. Sure, reviewers will have noted lots of mistakes, that we counted as errors, but which are in fact correct or debatable. But because each reviewer was not told which article came from which encyclopedia they would have been just as likely to make the same kind of mistakes (and the same number) when reviewing Wikipedia articles. In other words, because the study was done “blind”, there is no evidence of systematic bias that would alter the conclusion that Britannica was 30% more accurate.

The next question is whether the study was done in a way that would not introduce any systematic bias. Here, I had a few concerns. One of the first things that bothered me about the study was that some data, by necessity, had to be thrown away. That data was when reviewers declared that an article was badly written. My worry is that one of the things that was being counted was misleading statements. What I wondered was whether there was a point at which a lot of misleading statements simply became a badly written article and therefore unmeasurable? Wikipedia articles are often badly written, so I wondered if it was simply easier to count misleading statements in Britannica's better written articles. I don't know the answer to this, so I raise it only to flag the point that making these comparisons is not straightforward.

Britannica’s disputes 58 errors[5]. At The Economist, myself and a researcher counted roughly 20 errors of omission[5]. Although Nature says they took articles of "comparable" length, this would not have always been possible. Articles on the Thyroid Gland, Prion and Haber Process are all simply shorter in Britannica. Yet in all of these subjects the reviewer cited an error of critical omission something they would have noticed when comparing the longer Wikipedia article side-by-side with the Britannica article. In another case, Paul Dirac, Britannica claim that reviewers were sent an 825 word article. If true, this would have been compared with a 1,500 word article in Wikipedia--yet Britannica is given another error for not mentioning the monopole in its 825 word article. In Wikipedia this gets a 44 word mention in an article of twice the length. This doesn't seem fair. However, I can't replicate this finding, because as of writing, both the articles available online on both sites at the moment are 1,500 words and mention the monopole.

A reviewer is faced with two articles, one which is slightly longer, a bit more rambling, where facts have been bolted in over the years with no eye for comprehension. The other is edited for style and readability, and streamlined to include the facts that are vital in an article of that length. The problem is that when these two articles are compared side-by-side, it will be hard for a reviewer not to notice that one article has a fact or a citation in the rambling one that the more concise one doesn't. What makes this more problematic is that there is no control for this test. We don't know if a reviewer would have actually even noticed that this "critical fact" was missing had it not been in the Wikipedia article. So my concern is that in the Nature study Britannica is more liable to be unfairly criticised for errors of omission--simply because of the different nature of the information sources.

This kind of error is important because it would have an impact on the relative accuracy of each encyclopedia. For example, assuming for the sake of argument that all Britannica's errors of omission were the product of a systematic bias this would mean that Britannica was 50% more accurate, not 30%. Warning: this is an example, not my estimation of the true difference between the two encyclopedias.

Another concern is in the way the data were compiled. Nature’s journalists were not always able to find identical subject entries in both encyclopedias. When that happened, they did an online search on the subject at both websites and, if necessary, bolted together bits from several articles. (Britannica complains that in some cases they used paragraphs from a children’s encyclopedia, and a Yearbook article).

Unfortunately, this collation of data was not done blind. When Nature’s journalists compiled the material, they knew which encyclopedia it came from and could well have introduced an unconscious but systematic bias while collating the material. In addition, because Wikipedia has 1m entries and Britannica has only 124,000 [6], it is possible that reviewers were more often sent complete Wikipedia articles to compare with cobbled-together Britannica articles than the other way round. This might be particularly important when counting omissions in articles. How do we know that the omissions are truly the mistakes made by Britannica or ones made by the journalists?

Finally, it is important to note that this isn’t, technically, a comparison of encyclopedias but rather a comparison of what the average user would find if they searched for a subject online at Wikipedia or Britannica. (This is actually only made clear in the supplementary information that was published a week after the initial article.) It may be true that the average online user is just looking for information, but it is also true that we rate that information according to where we found it. If I downloaded something from a children’s encyclopedia or a Yearbook article I’d know to judge it in that context. The reviewers were not aware of this distinction, so there is no way of knowing how it might have affected their judgement.

In conclusion, I’d say that I don’t think the Nature study has proven, to my satisfaction at least, that Wikipedia is 30% less accurate than Britannica. I was also left with the impression that the study was trying to compare apples with oranges.

I have great respect for the fine editors and journalists at Nature, and I don’t think for one minute that they deliberately fixed or cooked the results[7]. I just think the study doesn't quite demonstrate what it says it does.

Nor is this a critique on the fine work of the Wikipedians. I’ll continue to read it, and cite it[8]--but I’ll also continue to check everything I use. The Economist article, free only to subscribers, is available here.

1. Internet encyclopedias go head to head. Jim Giles, Nature 438, 900-901. December 15, 2005.
2. Fatally flawed. Refuting the recent study on encyclopedic accuracy by the journal Nature.
3. Encyclopedia Britannica and Nature: a response. March 23, 2006.
4. Britannica attacks… and we respond. Nature, Vol 440. March 30 2006.
March 2006. 58 debated errors, about 20 errors of omission
In total, myself and a researcher at the Economist counted (independently of each other) in the Britannica report, 58 examples of “errors” which Nature reviewers identified and that Britannica contest. We also looked at each of these 58 errors and counted the number of errors of different kinds (factual, misleading and omissions). This is an inexact science. The crucial figure was how many were errors of omission were found—as this is one of the things I identify as a source of potential bias in the study. I counted 17 errors of omission. The Economist researcher (who is far better at this sort of thing than I am), counted 21. We decided to say, about 20 were errors of omission. The point is not to give an exact figure—because neither of us are qualified to say whether these were really errors of omission or not—but to give readers an idea of how a small but arguable systematic bias could have a big impact on the comparison.
124,000 articles in Britannica
Tom Panelas “Our big online sites have about 124,000 articles in total, but that includes all of the reference sources, including our student encyclopedia, elementary encyclopedia, and archived yearbook articles. The electronic version of the Encyclopaedia Britannica proper has about 72,000 articles. The print set has about 65,000.”
7. Nature mag cooked Wikipedia study. Andrew Orlowski, The Register. March 23, 2006.
8. Small wonders. Natasha Loder, The Economist. December 29th, 2004.

Other further reading:
Supplementary information to accompany Nature news article “Internet encyclopaedias go head to head (Nature 438, 900-901; 2005)

In a war of words, famed encyclopedia defends its turf—At Britannica, Comparisons to an online upstart are a bad work of ‘Nature’. Sarah Ellison. March 24, 2006.
Encyclopedia Britannica, Wikipedia.
Encyclopaedia Britannica." Encyclopædia Britannica. 2006. Encyclopædia Britannica Premium Service. 24 Mar. 2006 (no sub required)
Wikipedia, Wikipedia.

On the cutting-room floor: quotes from sources

Nature only wanted to respond on the record in writing to queries, so I have a much shorter collection of quotes from reporter Jim Giles. Ted Pappas agreed to be interviewed on the record, and we spent a lively 45-minutes talking about the study and the nature of information.

Jim Giles, senior reporter, Nature
"Britannica is complaining that we combined material from more than one Britannica article and sent it to reviewers. This was deliberate and was clearly acknowledged in material published alongside our original story. In a small number of cases, Britannica's search engine returned two links to substantive amounts of material on the subject we wanted to review. In these cases, we sent reviews the relevant information from both links. This could, of course, make the combined article sound disjointed. But we asked reviewers to look only for errors, not issues of style. When reviewers commented on style problems, we ignored those comments. More importantly, we feel that if the review identified an error, it is irrelevant whether that error came from a single article or the combination of two entries."

When we combined Britannica articles, the point was to more fairly represent what it offered. Lets imagine you search on "plastic" and get two Britannica articles: "plastic" and "uses of plastic". But Wikipedia covers both aspects of this topics in a single entry. In that case, it is fairer to combine the two Britannica entries when making a comparison with Wikipedia. When we combined the entries we were careful to provide what we thought was a fair representation of what Britannica had to offer on a topic. This actually reduced the chance of generating errors of omission, not increased it.

Britannica’s executive editor, Ted Pappas.
“The premise is that Wikipedia and Britannica can be compared. Wikipedia has shown that if you let people contribute they will. But all to often they contribute their bias, opinions and outright falsehoods. Too often Wikipedia is a forum that launches into character defamation, these are the sorts of fundamental publishing vices that no vetted publication would fall prey to which is why it is so absurd to lump gross offenses in publishing to typos in Britannica articles. They are fundamentally different.”

“Comparing articles from children’s products to Wikipedia, and taking Yearbook articles, something written about a 12-month period, to a Wikipedia article is absurd. There is also this ludicrous attempt to justify this by saying that Nature was doing a comparison of websites. Nothing in their study intimates it was a study of websites."

“The idea of excepting a 350-word intro to a 6,000 word article [on lipids] and sending only this is absurd. And to allow reviewers to criticise us for not finding these omission in an article because they have only been given a 350-word introduction is a major flaw in the methodology. Who chooses the excerpts? Nature’s editors had no right to change these, and pass these off as Britannica articles. Anyone who had to write 350 words on lipids would approach this very differently to someone having to write 6,000”.

“We’ll never know if they were biased from the beginning. Everything from day one, their refusal to give us the data through the heights of criticism, to the pro-Wikipedia material that accompanied the release, gives us reason to suspect there was bias from the beginning.

“I’m sure there will a continuing discussion about nature and the future of knowledge and how it should be codified and consumed. There is room in this world for many sources of information, and we are confident about our place in that world and know we are not the only source, and never have been”.