Google At How Many Billions? 9? 11?

By Nathan Weinberg

Jean VĂ©ronis has done a mathematical study that shows that a random selection of popular words shows an almost exactly 13% increase in Google’s index in the last two months. He studied the differences in search results for 16 words, comparing results from 11/22/2004 (just over a week after Google upped its public index number from 4 billion to 8 billion) to results from 1/22/2005. As shown in this graph, all 16 queries increased in a proportion that is a quasi-perfect correlation to a perfect straight line (determination coefficient> 0,999), numbers that are too statistically significant to ignore:

If all the queries coincide with a regression of 1.13, that means Google’s index has increased roughly 13% over the two month period, which Jean says translates to an index of 9,105,590,456 pages, up from the 8,058,044,651 that has been reported over that period.

Of course, faithful readers know that Google is lying.

This screenshot, taken by myself on November 10th, shows that Google had then at a minumum, 10,980,000,000. See, that night, the new MSN Search entered beta, and Google felt the need to up its page count to make sure MSN couldn’t claim to have more pages. Since Google never reports the correct page count, all it had to do was recount its index, and post any random number higher than MSN’s 6 billion. Apparently, marketing decided 8 billion was a good number, but not before I snagged a screenshot of the results page in mid-count.

Lets make a few assumptions. First, assume that 11 billion was as high as it got, and that had I come back an hour later, it wouldn’t have been 12. Second, when Google reports there being only 8 billion pages today for the, its lying. The actual number would have to be 12,407,400,000, or 10,980,000,000×1.13. Now, any number under 8 billion, we can trust. So, narrowing it down, which my best list of 32 very popular terms on Google, removing each previous one (i.e. searching for the, then a -the, then and -a -the…):

  • the = 12,407,400,000
  • a = 60,000,000
  • and = 40,200,000
  • of =34,400,000
  • i = 39,700,000
  • this = 19,800,000
  • that = 1,660,000
  • or = 17,300,000
  • you = 8,470,000
  • e = 40,000,000
  • s = 28,300,000
  • your = 9,520,000
  • page = 25,600,000
  • not = 4,120,000
  • d = 28,200,000
  • t = 17,300,000
  • us = 16,400,000
  • l = 17,400,000
  • c = 27,700,000
  • can 1,310,000
  • http = 19,500,000
  • if = 1,030,000
  • do = 10,300,000
  • other =4,090,000
  • m = 13,600,000
  • o = 11,300,000
  • but = 395,000
  • n = 17,500,000
  • y = 12,200,000
  • my = 6,550,000
  • news = 19,000,000
  • b = 9,120,000
I have Google’s new 32 word limit to thank for being able to find just those 13,169,365,000 results. I think we can safely assume there are anywhere from 13.5 to 14 billion pages in Google’s index, far more than the 8 billion currently reported. Any other search engine is going to have to play catch-up, big time.

Oh, and assuming a 13% trend for the rest of the year, Google on January 22, 2006 would have around 15.5 billion pages.

Posted:
January 23, 2005 by Nathan Weinberg in:

5 Responses to “Google At How Many Billions? 9? 11?”

  1. Dirson Says:

    Nathan: when you search ‘the’ on Google, the figure you get is the number of indexed *DOCUMENTS*. This includes HTML documents, but also PDF/DOC/.. files.

    The number shown by Google on their main page is the number of indexed *WEBPAGES* (or HTML documents).

    This is the reason of the difference.

  2. Jean VĂ©ronis Says:

    I wasn’t aware of that! It’s fascinating.

    I am not sure that you can trust Google’s boolean operators too much, though. I’ve checked recently that numbers just don’t add up:

    http://aixtal.blogspot.com/2005/01/web-google-perd-la-boole.html

  3. Philipp Lenssen Says:

    I would think Google includes PDF in their number of “web pages,” not that it would matter much as these numbers seem to be not too accurate. In a past press release Google Inc. stated:
    “Google Web Search: The company’s flagship search service now offers 4.28 billion web pages. Google’s powerful and scalable technology searches this information and delivers a list of relevant results in an instant. Google Web Search also enables users to search for numerous non-HTML files, including PDF, Microsoft Office, and Corel documents.”
    http://www.google.com/intl/en/press/pressrel/6billion.html

  4. Nathan Weinberg Says:

    Google doesn’t really use anything to determine that front page number. As I said, its chosen by marketing, so they couild be talking about the number of pages, html documents, unique web addresses, .com domains, pretty much anything. And there is no disparity between the “the” search and the front page; that, in a way, proves the “the” numbers are false, because its impossible.

  5. Dan Says:

    This is sort of silly. When you search, you get a VERY rough estimate because Google’s algorithms are just not good at coming up with a precise number. You can more or less ignore those figures. Sometimes that estimate is messed up even when there are only 40 results. When there are 5 to 10 billion, the figure is so rough you can’t trust it.

    The 8 billion number is an estimate as well, but a bit more time went into that estimate. Perhaps a little marketing as well, but the estimate on the search results page doesn’t mean a thing.

Leave a Reply