Powerset's Q & A vs. Keyword Search
Powerset would likely be the first to promote Natural Language Processing, NLP, as the future of search. Their recent blog post provokes a few interesting debates about the premise of their approach to improving Web search as we know it today. In theory, natural language processing is a very attractive method of human-computer interaction. In practice, it still has its limitations.
English is particularly challenging in this regard because it has little inflectional morphology to distinguish between parts of speech. Wikipedia has a simple little example to illustrate this point:
English and several other languages don't specify which word an adjective applies to. For example, in the string "pretty little girls' school".
- Does the school look little?
- Do the girls look little?
- Do the girls look pretty?
- Does the school look pretty?
Powerset will attempt to solve this problem with NLP and the creation of what must be an insanely massive library of ontologies in attempts to contextualize all the Web. A bold undertaking indeed. But let's set aside their pending solution and look at the potential impact to the user experience an NLP-based system would introduce. NLP works best with well formed questions, phrases, and 'contextual' descriptions. You'd be hard pressed to find NLP making improvements to results returned for some types of typical queries such as: "weather 94107" or "paris hilton" or "the police concert tour dates"
So the question becomes this: What percentage of all Web searches would truly benefit from NLP style queries? Is it enough to make it universal or stand on its own? Or it is better served as an enhancement or feature add-on to existing web search offerings. Me thinks it is the latter. Feature, product, business. Remember the FPB test. All technologies and ideas fall into one of the three.
NLP prefers the user to formulate semi-structured sentences to produce the best or most noticeably improved results when compared to traditional keyword searches. As stated above, this can be very handy for certain types of searches, without question. But what happens if your sentence is poorly written? What if your English, French, or Spanish language skills are not up to par? What if you are unfamiliar with the host's ontologies and vocabularies for a new research topic you want to explore? Can NLP produce better results in the absence of accurate or sufficient natural language input? And what of the content being retrieved? What if it too is miscategorized, or poorly structured text?
A common solution is: categorization, classification, and taxonomic organization of content. Another is to predetermine a vocabulary for a given topic of information. Ontologies as they are better or lesser known, for any genre of information, be it politics, sports, or nanotechnology are thereby subject to the vast interpretation of the authors that create them. These authors assign meaning in ways that could be interpreted much differently from how other people, cultures, and languages understand them to be. This could create incongruence between the question and the answer, er...between the query and the results.
Another interesting data point to bear in mind: Web searchers today are actually quite efficient and effective with keyword searching, enhanced further by increasing fluency with boolean and other advanced search operators. As such, keyword searching is often (but not always) hyper-efficient at getting the user precisely what they are looking for. Let us also remember that "keyword search" per se, doesn't necessarily equate to "keyword matching" as the sole or even primary means by which related content is return from a traditional Web search index. Today's top search engine algorithms are far more complex than simple keyword matching, counting, and/or extraction. In fact, some components of page ranking, relevance, and ordering of results pages are language/text independent. Rather, they rely on the organic substructure of the Web, and its interconnections between information that helps to paint the picture related or important subject matter. This helps tremendously in dealing with the Wild, Wild, Web that is fraught with unstructured text, errors in spelling, inconsistent or incomplete grammar and the like found in millions of web pages around the world.
A lot of people in the industry like to assert that "not much has changed with web search over the past several years" which couldn't be further from the truth. The major search engines are enhancing their core search algo's multiple times per week in fact. The problem is that they (non-search experts, journalists, analysts) base their assessments on what they read or don't read about search in the press. Alternatively, they (new search upstarts and old dying breeds in search and enterprise search) are simply in denial, and keep telling themselves that search hasn't changed to help justify a withering existence.
But do not fret (too much anyway) all is not lost. I do believe there are a few definitive paths to success in the web search industry for new companies with the right idea-- but only those that come prepared with their eyes wide open, and a very realistic view of where search truly is today, and where the world of end-users is influencing it from here. Without an accurate view, let's be honest, they're pretty much dead.
As for Powerset, I have to believe they've embraced this exercise, but only time will tell. Let's see how they debut later this year.
Read the full story