The paper is very interesting looking back on it 17-18 years after it was published. I thought I'd comment on some of the fun things I read.
Improved Search Quality
November 1997, only one of the top four commercial search engines finds itself (returns its own search page in response to its name in the top ten results)If the above is true, it is truly comical by today's standards of web search quality.
Major Data Structures
Throughout this section, Brin & Page continually do "bit stuffing" to save storage space. Typically only done by those dealing with firmware, I find it a little ironic that they had to go to such lengths. Given the amount of data they had to deal and the amount of hardware resources they had, it was obviously justified. But it's sort of funny to think about it given today's data sizes and hardware resources that Google, Facebook, Yahoo, Bing, etc. have.
Servers to Crawl the Web
The original Google used a single URL server to serve lists to 3 web crawlers. Insanely tiny by today's standards. Of course, it was a much tinier web in the 1990s.
Social Consequences to Web Crawling
Perhaps the best part of the paper, Brin & Page talk of the social consequences of their crawler. Most notably, some website owners were confused at what a web crawler was and why they were looking at their page. Some would e-mail them asking questions ... some even called them.
Apparently the original Google had a compressed repository of just 53GB of data. Insanely puny by today's standards.
In addition, it took only 9 days to download all of the data on the web at the time. It's not clear how many machines were at their disposal, but it did not appear to be more than maybe a dozen (as said above, they only used 3 for web crawling, and they note they used 4 for sorting the index).
"Advertising and Mixed Motives"
In this appendix section Brin & Page talk about the conflict of interest that search engines have when advertising is involved. They specifically site the search of "cellular phone" as a keyword and say
It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.It's ironic of course, b/c this is nearly the exact opposite of modern day Google. A search for "cellular phone" on the site returned for me (in order)
- An iPhone ad on apple.com
- An ad for cell phones off a retailer site
- An ad for Sprint
- A Google Maps result for several retailers that sell cell phones
- The Wikipedia article for "Mobile Phone"