There’s more to Google Instant than ajax. It’s the n-grams, stupid.
With the release of Google Instant, developers everywhere have sprung into action developing their own version of Instant for their favourite web service. This avalanche of development was undoubtedly spurred on by one developer’s job offer from YouTube based on his YouTube Instant work. But what many of the copy-cats fail to recognize is that an asynchronous results page is not what makes Google Instant effective; it’s their ability to predict your search. And how do they do that? N-grams my friend, n-grams.
What’s an n-gram? It’s simply a string of ‘n’ consecutive tokens. For example a tri-gram (or 3-gram) would be “how are you” or “las vegas vacation”. A bi-gram would be “hey there” or “justin bieber”. Google stores every single n-gram it can get its hands on (via Google Search, GMail, Google Voice, Google Voice Search etc.) in order to improve its accuracy in predicting a human’s search intent. In fact, Google published their list of most commonly used n-grams a while back based on their corpus.
So what does this all mean to you, the endeavouring developer? Repustate has added to its API a call to generate n-grams. With this call, you can generate your own n-grams over any data set you like or based on any web page you like and come up with your own prediction algorithms. For a quick demo, head over to our home page and using the demo box on the right, choose “Generate n-grams” from the drop down, enter a URL, and you’ll get a quick sampling of our n-gram generator.