Pandas, Penguins & a dose of Caffeine: a brief history of the Google Algorithm

//Pandas, Penguins & a dose of Caffeine: a brief history of the Google Algorithm

Pandas, Penguins & a dose of Caffeine: a brief history of the Google Algorithm

{NB: for a more truncated list of the major updates to Google’s search algorithms, please see our Algorithmic Updates section}

Most of us use the Internet. In fact, over 36 million Britons – or 73% of the population – go online every day. The latest figures represent a huge increase from 16.2 million web users in 2006. Even more striking, however, is that Google accounts for 90% of UK-based desktop searches and 92% of mobile searches. Yet, for most of us, the way in which Google’s search engine works remains a bit of an enigma.

Google Search for Search Engine

Photo courtesy of moneyblognewz (Flickr Creative Commons)

So what do we know? Well, some of the basics. If you ask Search Google a question or type in a keyword, Google decides what results to show you by calculating over two-hundred “signals”. Traditionally, search engines tended to rely on the frequency of certain words on a web page. Google Search, however, uses its constantly evolving PageRank algorithm, determining the most important pages by examining the entire link structure of the web. Hypertext-matching analysis then determines the most relevant pages to your specific search – all at a fraction of a second. If you type in “Asparagus”, for example, Google produces results for webpage dedicated to British Asparagus, its Wikipedia page, BBC Good Food recipes and a Youtube video for how to cook the vegetable. If I ask Google a more challenging (and indeed subjective) question such as “What’s the best Gabriel Garcia Marquez novel?”, it refers me to a Goodreads forum discussing magical realism in Latin American literature, the Wikipedia page of his seminal novel One Hundred Years of Solitude and several articles by the Guardian. So basically, Google produces the most relevant and reliable results by combining overall importance and query-specific relevance (here is Google’s useful infographic resource).

To stay at the top of the game, Google makes over five hundred changes to its search algorithm every year – some are minor tweaks, others are major overhauls. For the average web user most algorithm changes will go unnoticed. But for SEOs, search marketers and online businesses updates in Google’s search algorithms are major events – they can significantly alter search rankings and web traffic, which can have a huge impact on return in investment.

At DSM, we take great interest in developments of Google’s search algorithms. After all, they shape the services we offer to our clients. Therefore, we’ve decided to look more fully into the history of Google’s algorithm changes. We will look, for instance, at how Google has adapted its algorithms to thwart “black hat” webpage tactics and how it had continually refined its search engine to create the indispensable service millions of Britons use today.


The Early Years: 1998-2002

Initially conceived as research project in early 1996, Google was officially launched in 1998 by founders Sergey Brin and Larry Page, marking the birth of PageRank – the company’s best-known algorithm and the keystone for all search engine results. Google Toolbar and Toolbar PageRank (TBPR) were launched in December 2000 (the year when lead engineer and SEO guru, Matt Cutts, joined the team) and, throughout 2001, record numbers of web users abandoned rival search engines such as AltaVista, Excite and HotBot, while SEOs realised that Google was the future of online search.

Webmasters started to update Google’s search engine in bits and pieces. Before its first officially named update “Boston” in February 2003, there were a number of major renovations in autumn 2002. It was a shaky start – 404 pages showed up in the top 10 of the SERPs (search engine results page) – but Google recovered with a series of pioneering changes the following year.

“Hang on, I’ll just Google it”. Indicative of the search engine’s success, “Google” also emerged as a widely used verb (check out this BBC News article from June 2003).


Google targets website over-optimisation: 2003-2006

For a brief period in 2002 and 2003 Google updated its algorithms monthly – known as Google Dance. By mid-2003, however, Google ditched the idea of a monthly batch update in favour of a system of incremental updates. Matt Cutts explains that Google decided to “move away from monolithic, big Google Dance updates to a larger number of small updates…so in one year, for example, we had four hundred search quality changes go out”. Google Dance gave rise to a series of index refreshes in spring and summer 2003. In April, Cassandra cracked down on basic link-quality issues, as well as hidden text and links. Reports suggest that Dominic (May 2003) changed the way Google reported backlinks. Finally, in June, Esmeralda – the last of Google’s monthly updates – prefigured significant infrastructure changes, as Dance was replaced by “Everflux” to help make results more relevant and fresh.

In September 2003, Google introduced its Supplemental Index to help its engine index more documents without having to sacrifice performance. SEOs were displeased and the index was later reintegrated. In November, the SEO world was turned upside down again following the release of the Florida update. It was the first step that brought about the demise of tactics such as keyword stuffing – which SEOs often used in the 1990s to manipulate the rankings of particular websites. From then on, Google was to emphasise that quality of content was the most important factor in determining the PageRank of a website.

Google’s initial public offering (IPO) took place in August 2004. The company sold 19 million shares, raising $1.67 billion in capital at $85 a share – and by January 2005 share prices had almost doubled (see this revealing CNN report). By 2004, Google had a market value of about $24 billion; in comparison, Yahoo! had a value of around $39 billion. In January that year, Google had taken another measure against “black hat” SEO tactics such as meta-tag stuffing, hidden text and cloaking with its Austin update. The Brandy update in February made a number of infrastructural modifications including a huge expansion of the index, called Latent Semantic Indexing. LSI was designed to inhibit keyword stuffing tactics by giving greater importance to “link neighbourhoods” and text relevance. The Marketing Analytics site “Moz” – a truly excellent resource if you want to find out more about Google algorithms – explains how LSI “expanded Google’s ability to understand synonyms and took keyword analysis to the next level”.

By 2005 Google had reached new technological and financial levels. Increasingly, Google emphasised the importance of anchor texts links and the quality of both in and out links. Moreover, ordinary SEO tactics that attempted to highlight keywords – using bold font, italics or utilising the heading or title – were reduced in importance. A series of gradual changes throughout 2005 continued to refine and define the search engine we use today. The Nofollow Attribute (January 2005) – a collective measure developed by Google, Microsoft and Yahoo – helped to combat spam and control the quality of outbound links. Allegra (February) and Bourbon (May) supposedly penalised suspicious links and assessed duplicate content and non-canonical URLs. Similarly, Jagger (October) targeted link farms as well as paid and reciprocal links. In December Google started to roll out a major infrastructure/ software update – Big Daddy. As Matt Cutts explained, it was not a typical change to search algorithms (nor was it paying homage to Adam Sandler’s painfully awful film!); rather, it changed how Google handled technical issues such as URL canonicalization.

In terms of algorithm updates, 2006 was a much quieter year. In fact, despite webmasters claiming that extensive changes were made to ranking formulas (Google seemed to make continual alterations to the supplemental index), no official updates were announced. Nevertheless, it was still an important year. Google Webmaster Central was launched; it announced support for sitemap protocol in collaboration with MSN and Yahoo!; and in October it purchased YouTube for $1.65 billion (£883 million) – which would, remarkably, become the world’s second largest search engine.


Building the search engine of today: 2007-2008

In May 2007, Google launched Universal Search. Again, it was not a typical algorithm change. Instead, in a radical move, it integrated the traditional search engine with News, Images, Videos, Books, Maps and Local results. Google Product Search was also incorporated – or, to be more accurate, Froogle was just renamed.

Google has always tried to modernise its search engine to cater for evolving search habits and technological advances. In April 2008, it realised the Dewey update – although the specifics of which are unclear, Moz calls it “a large-scale shuffle”. Like most algorithm updates Google kept the particulars under wraps, but some webmasters “suspected Google was pushing its own internal properties, including Google Books”. After years of testing Google Suggest was launched in August 2008. A fundamental part of how we now use Google Search, Suggest was essentially just a display algorithm update – which now powers Google Instant. For instance, if you type “Manchester” into Google you are provided with suggestions for Manchester United, Manchester Airport, Manchester Evening News and Manchester City, which you can scroll through accordingly. The Google which both web users and SEOs know today was beginning to take shape.


Google gives more importance to brand ranking signals: 2009-2010

In February 2009, Google released the Vince Brand update, which SEOs claimed strongly favoured major brands. The superb SEM industry resource, Search Engine Watch, believes that understanding Vince is vital to get the best SEO programs. Following Vince it became even more important for companies to build an online brand, and to market that brand across multiple channels. In the same month, Google, Yahoo! and Microsoft (Bing) all announced support for the Rel-Canonical Tag, which enabled webmasters to send canonicalization signals to search bots to identify the webpage that should be credited as an original.

In August 2009, Google previewed its new Caffeine release. It was a huge infrastructural change that expanded the index, made speed crawling faster and more efficient, and integrated ranking and indexing in a real-time format. Indeed, in June 2010, Google reported a 50% fresher index. Furthermore, Google’s new Real Time Search integrated Twitter feeds, Rich Snippets, Google News and newly indexed web content to generate real-time SERP feeds.

In recent years Google has sought to integrate more results relevant to individual users into it SERPs – particularly “local” results. In April 2010, for instance, it launched Google Places, which closely combined web pages with local search results to help people “make more informed decisions about where to go, from restaurants and hotels to dry cleaners and bike shops, as well as non-business places like museums, schools and parks”. In May, Google released a new algorithm update – the May Day update – which appeared to target websites with weak content. Webmasters reported a decline in long-tail traffic, prompting Matt Cutts to admit that the update “changes how we assess which sites are the best match for long tail queries”. Affected sites needed to determine whether they have “great content” relevant to the search query; if they are an “authority” on the matter; and ensure that the site is not just simply matching keywords in the query. Accordingly, sites with relevant and useful content were rewarded with higher search rankings.

To round-off 2010, in December Google introduced two further updates. It tweaked its ranking algorithm to target sites that have received negative reviews, following a report by the New York Times which explained that (a glasses reseller) had gained high search rankings because it received dozens of angry complaints on online forums. The Guardian writes that the “negative review” case underlines the growing importance of social search. In other words, SERPs should be based on what other web users recommend, rather than what they link to. Both Google and Bing confirmed that they had started to use “social signals” – particularly from Facebook and Twitter – to determine more suitable search rankings. “But, in addition”, added Matt Cutts, “we are also trying to work out a bit more about the reputation of an author or creator on Twitter or Facebook.”


The Big-Hitters: Panda and Penguin

Since 2011, social signals have become an increasingly important feature of Google’s ranking algorithm. As a precursor to the Panda updates, in January 2011 Google released its Attribution update in an attempt to reduce the number of spam results in SERPs. It targeted sites with low levels of original content and high levels of copied content. On 23rd February 2011 Google released arguably the biggest ever modification to its search engine – the Panda update. Google claimed the initial changes affected up to 12% of search results by cracking down on poor content, content farms and sites with high advert to content ratios. Panda was gradually rolled out to all English queries worldwide in a succession of incremental updates over the course of the year: 2.0, 2.1, 2.2, 2.3, 2.4, 2.5… you get the picture.  Following Panda 2.5, in September Google entered a period of what Moz refers to as “Panda Flux”. In other words, although Google frequently continued to adjust the Panda algorithm, updates tended to be relatively minor, and their impact, difficult to ascertain.

However, 2011 was not exclusively the year of the Panda. In response to competition from major social networking sites such as Facebook and Twitter, for example, Google launched the +1 Button directly next to results links. It attempted to replicate the “Like it” effect of rival social sites and enabled users to influence SERPs within their own social circle. Google+ revolves around what it calls “social layers” – relying on content sharing circles and integration with products such as Gmail and YouTube. In October 2013, reports indicated that Google+ had 540 million active users – making it one of the web’s most popular networks.

Google also developed a number of new features to help achieve richer and fresher search results. The SERP Freshness update in November supposedly impacted upon 35% of queries. While a number of other new “schemas” – in collaboration with Bing and Yahoo! – helped search engines to better understand websites and business to promote their sites on internet search engines more effectively. Following the Expanding Sitelinks and Query Encryption measures, in December 2011 Google released 10-Pack update – a conscious effort to be more transparent. Moz Analytics writes that updates “included related query refinements, parked domain protection, blog search freshness and image search freshness”. Instead of signing off the year with a typical boozy party, Google kept it fresh.

Google celebrated the start of 2012 with a radical shift in the personalisation of its search engine. Search+ Your World aggressively pushed user profile date from Google+ into SERPs (although a new toggle button was added to limit personalisation). Google explained the decision in its official blog: “search is still limited to a universe of webpages created publicly, mostly by people you’ve never met, [so] we’re changing that by bringing your world, rich with people and information, into search… We’re transforming Google into a search engine that understands not only content, but also people and relationships”. In February 2012, the Venice algorithm update integrated local results more tightly into SERPs to provide web users with results more relevant to their location. Venice made it increasingly important for SEOs to develop more localised plans. At the same time, a new Page Layout Algorithm update which sought to devalue webpages with too much ad-space above the fold – the visible section of the page before scrolling down.

Penguin was released in April 2012 following speculation in the SEO world about a new over-optimisation penalty. Otherwise known as the “Webspam update”, Penguin targeted “black hat” tactics such as keyword stuffing, sites with poor quality content and unnatural link profiles – affecting 3.1% of Google queries. Penguin continued the work of Panda (and page layout algorithms) in downgrading websites that provide a poor user experience and promoting sites with high quality content. Moreover, in August, the DMCA Penalty Pirate update started to penalise pages with repeat copyright violations. Good news for the entertainment industry; bad news for users who watch movies and TV shows on dodgy sharing websites.

Photo courtesy of EAWB (Flickr Creative Commons)

Photo courtesy of EAWB (Flickr Creative Commons)

Throughout 2013 Google continued to roll out incremental updates for its Panda and Penguin algorithms. In June, it released the “Payday Loan” update targeting spammy results such as payday loan and pornographic websites. Matt Cutts says the update sought to detect untrustworthy, and often illegal, unique link schemes – impacting less than 1% of queries in the US, though up to 4% in countries where web spam is typically higher. The most important update of last year, however, was announced in August – Hummingbird. It has drawn comparisons with Caffeine (2009-10) as it seemed to affect Google’s core algorithms and infrastructure (here’s a typically perceptive blog by Moz). Hummingbird marked Google’s 15th birthday, but was specifically tailored to meet the demands of the web user of today. For example, it sought to benefit browsers using more modern forms of search, such as conversational and voice search. The Guardian reports how Google’s latest major algorithm update focuses on Knowledge Graph: “an encyclopaedia of about 570m concepts and relationships that allows Google to anticipate facts and figures you might want to know about your search term”. Essentially, Hummingbird was developed to generate results that are more relevant to the actual meaning of a particular search, rather than a few of the keywords. Amit Singhal, a senior vice-president of the company, believes the update will affect the analytics of about 90% of search queries and, in summing up the change, he echoes his illustrious colleagues Brin and Page: “we want to keep getting better at helping you make the most of your life”.

In October 2013, Matt Cutts suggested that webpages heavy on ads needed to start making changes to sustain their PageRank. And in February 2014 he announced a new modification to Google’s page layout algorithm (initially introduced in June 2012). Web users have speculated that it targets sites that use the endless scrolling technique, lots of white space and unsuitably large fonts. Obviously, the results of this latest update will not be fully felt until later this year; however, Google clearly trying to reward high quality content (in December 2013 it also issued a new program to weed out authorship abuses). The relationship between web content and ads is symbiotic: ad space provides money for quality content, which in turn encourages new users and more exposure. Google wants to emphasise the importance of this relationship by promoting sites with quality content and downgrading exploitative sites.


What does the future have in store?

The future of online search is incredibly exciting. Our search habits are constantly evolving and so are Google’s search algorithms. Mobile search has a big future and Matt Cutts has recently stressed the importance of mobile site usability. YouTube traffic on mobile, for example, increased from just 6% in 2011 to 40% last year. Moreover, Google is working on the next generation of hacked site detection to prevent users from finding results for nasty search queries. “If you type in really nasty search queries, we don’t want you to find it in Google”, said Cutts.

Google has always been on top of the game, making it the global standard for web browsing. However, much of its future innovation lies elsewhere. Following the company’s recent sale of Motorola, its engineers are free to focus on different revolutionary technologies such as robotics and the next generation of cyber security. Google has already accelerated the rollout of their new “smart-glasses”, which businesses believe could be ideal for the corporate market. What’s more, reports speculate that the company are designing hot air balloons that carry internet connections to remote parts of the world, digital tattoo-like identity tags, humanoid robots and vehicles that drive themselves.

In reality, only Google’s webmasters truly know what’s in store for the future, yet we can be assured that the company will continue to revolutionise the way we live and work.

NEXT: in our next paper, we will be taking a more detailed look at the impact of these iterations, analysing whether the algorithm is actually fit for purpose and the extraordinary power that Google wield over whether most SMEs succeed or fail – be sure to keep your eyes peeled for that instalment.


What are YOUR thoughts?  We’re always keen on feedback so please feel free to comment below – or via our social media (@smg_uk) – and tell us if there’s anything you’d like us to discuss.

2014-03-05T16:47:34+00:00March 5th, 2014|Articles|