Showing posts with label google. Show all posts
Showing posts with label google. Show all posts

Thursday, June 7, 2007

iSearch, uSearch

Just returned from a month in Asia and Australia (inset pix of Shanghai nightlife). Fascinating centers of innovation from Tokyo to Singapore to Sydney. All have some interesting twists to next-gen Web applications and search. But that's for another time, another post. Today it's time we revisit the death of federated search (aka metasearch, single search, etc.) as we know it, and share a glimpse of what the future holds for finally solving the very elusive problem of getting at all of our information as easily as we should be able to. Ok, just a moment...ok, yep just checked, and it's still dead, dead as a doornail.

My friends and colleagues at Stanford University library and info-sciences department have been researching this problem head-on with over 700 databases of searchable research and academic content at their disposal. They are not alone. Countless universities, companies, and web services at large have found themselves at the end of the same dead end road.

Single search doesn't work, nor does traditional metasearch, or any other twists on federated search. Clustering metasearched results from multiple sources into artificial categories or groups only exacerbates the problem, and thoroughly confuses the end user. (sidebar: clustering has a very long way to go before it is anywhere near ready for prime time public consumption. Until then, it has no business being in search) These approaches have, in fact, proven pointless and only further delay any attempts to arrive at an acceptable user experience for effectively accessing a multitude of content sources simultaneously. I would go as far as saying that these so called solutions are robbing these important customers of their youth. Costing not just hundreds of thousands in license fees, but years of setbacks and distractions dealing with totally ineffective solutions. Everybody seems to have an angle that to me is nothing short of amusing. You'll notice through that link several spins on the same broken solution. I reviewed everything listed in those results. RIP.

There is a reason why basic search remains so widely popular, effective, and accepted by the vast majority of info seekers. Because it works. Because it is simple and intuitive. People get it. What people don't get are kludgey attempts to mash a bunch of square pegs into a round hole. If you look at the quality of search results from any of the tens or hundreds of enterprise search vendors, metasearch peddlers, and then say, Google, what you'll find might surprise you. Or maybe it won't. Yes, obviously Google.com works, and Google Search Appliance is no different. GSA stuck to its roots from Google.com for a reason: simple and intuitive user experience and high quality results-- from ONE source. Today GSA can crawl and index virtually any type of info object or database in existence. Why bother promoting new content in separate databases? This only adds to the problem. And with Google OneBox, we go even further, wiring competing content management systems to a better Google-controlled search experience.

So just what am I getting at? No, I'm not pimping Google's 'wares, but I am using them as one of only a few early examples of how to correctly begin to approach this problem. The answer is simple. One source. One index. One search interface. The fact that 700 databases sit in front of the info seeker is the real problem. There is no cohesive data model to support any meaningful metasearch whatsoever. "Normalizing" the boolean structure of the query language for each source's retrieval method was thought to 'standardized' the results that come back from all these random content sources. Not so. For it is not the query that matters, rather it is how the content is indexed. Just because the genre or subject nature of two content databases appears to be 'related' does not imply that the returned results will be the best combination of the two sources. Why? Because they have completely independent relational structures, metadata schemas, and ontologies.

Federated search, as we knew it before it died, did nothing more than mask this problem with a bland search interface wrapped around a broken and discontinuous distributed data model. Despite the cold reality, many of you still employ this type of solution at an increasingly expensive cost to your company and to your users' productivity.

But let's get back to the answer. Google introduced Universal Search, after quietly testing the concept under an alias website: searchmash.com. Yep, they really do. Universal Search is not there yet, but it is a move in the right direction. Yes, even Google faced a minor federation/metasearch problem as they continued to grow laterally into new content categories, e.g. News, Photos, Videos, Blogs, Products, Scholar, etc... As a result, it became increasingly unclear whether Google.com was the right place to start a search with so many alternate entry points that may be more appropriate for certain searches, e.g.: blogsearch.google.com, or news.google.com, and many more.

Universal Search is an early attempt to give the user a little taste of everything: pictures, videos, blogs, news, and web search results in one result page. Check out this basic example here for Steve Jobs. You get what I'm saying. Now, this doesn't exactly scale if you have 20, 30, or 700 types of content, or content sources to display on a page. They simply wouldn't fit. Additionally, Universal Search is more about displaying content of different types or formats versus merely different sources of content. For example, web pages, news articles, pictures, and videos are all very different types of content. I have designed two unique ways to address this problem, following some of the principles of Universal Search. Enter Integrated Search.

The integration of content sources is where we begin. The devil is most certainly in the details for this design and implementation, but here is the gist:

Recipe for Integrated Search

Ingredients

n parts of unique content sources
1 part really nice crawler/indexer (Nutch, GSA, or Lucene)
1 part high quality query interface with boolean translators, NLP, and auto completion and suggestion. (See CiteSeer or ACM for several)

Frappé all ingredients until smooth. Let stand and cool for 10 minutes.
Season to taste with one or both of the following:

1 search index inverter (yes, the secret sauce)
A dash of user intent interpolation at the point of query


This solves 3 problems at once. A single index, so that no sources need be considered at query time, ever. Smart pre-query processing to help guide the search query to match the users' intent. (We'll discuss intent-driven searches, or lack thereof, in an upcoming post.) And a powerful index/ranker to ensure that every content object in the index, from every original source is uniformly considered when ordering and displaying the results that best match the query.

This is NOT the case with traditional federators, which do nothing more than combine search results from hundreds of different indexing methodologies, with absolutely no way to 'honestly' or intelligently rank and order results that come from different indexers and ranking algos.

So even without revealing the secret sauce, you can see how this approach is fast, simple, and aligned with traditional search user experiences. The hard part? Crawling all the content sources means writing system adapters to content to the weirdest of old school flat file DB's, obscure object databases, and a whole lot worse. But if you pick a good crawler or general search product, much of that hacking has been done for you, as with Google's Search Appliance and their 220+ adapters that work pretty well out of the Box, pun intended.

So about that secret sauce? Well with a good inference about the user's intent we can bias the search results to better cater to the user's objective. And as for index inverting, its really about inverting the results that come from the index, for a given query. Ever curious what results actually appear at the end of a big web search with 5,400,000 results? How about dead middle of those 5.4 mil? Curious aren't we? Yes, it's all about discovery, and those deeper results can more useful that you might think.

As screen real estate continues to increase on the desktop/laptop, we'll no doubt continue to see search results get 'fatter' as in wider across the page. Yes, two and three column search results are on the way. And wait till you see where the ads turn up. For search its just the beginning. For federated search, well maybe we'll call it a new beginning. But for them, this means starting over. Completely.

So far I've yet to see any legitimate newcomers enter the arena to take up this challenge/opportunity head-on. In the meantime, partial solutions are manifesting within Web search while Google, Yahoo, and Ask continue to advance some good ideas in this arena. Yes, even Ask has been doing 'Unified' Search on their home page for a while now, and it's actually a reasonably clean UI...try out this query: iPhone be sure to stretch your browser as wide as it will go...not bad.

Integrated Search, iSearch. Coming to a theater near you? We'll soon find out...

Read the full story

Wednesday, March 21, 2007

gPhone rumors squashed...for now

This gPhone news picked up by Engadget...

This confirms my previous post. Software is the logical start for Google, and while a phone isn't out of the question for Google, the hardware business particularly mobile phones is an entirely different animal for the company. Yes, they do design their own data center servers, and their own GSA hardware, so they are not without experience in the space.




Read the full story

Monday, March 12, 2007

iPhone and gPhone - together as one?

I am pretty sure that this is not the phone that Google is going to ship, however this 'rumored to be' insider snapshot of the 'gPhone' prototype is more about the software integration with the OS at this stage, and less about the hardware enclosing it.


Either way, the working name probably won't last long given the gPhone from GlobalPhone Corporation, and the gPhone from Gnome-o-Phone, the open source Skype-like software. However, Google would by far have the most interesting use for the gPhone moniker. In fact, when you consider the power of the g-Apps that come out of Google.com, it is quite compelling to see the potential, even with just the first few 1.0 mobile phone apps from Google: search, gmail, maps, and news. The Java Midlets for gmail and maps are particularly impressive for mere 1.0 applications. Satellite imagery straight to your Samsung Blackjack on the 3G network is the bomb.

But the killer 'app' per se is not going to be any one g-App, rather it will be the seamless integration of these applications to the phone that will make them compelling. Google almost doesn't need to market a phone at all, just the platform. However, if they do, it will only be to make the integration airtight. Windows Mobile 5 is just not there yet. Apple's iPhone? Not here yet, but coming. I think iPhone will be great, and the multi-touch interface will be killer. But will iLife be too much overhead, and thus overkill for mobile productivity? Will we really want iPhoto on our phones? Definitely a nice to have either way, but hardcore productivity remains to be seen. Yes, I'm sure to buy at least one iPhone in June, but I'd really like to see what Google can do here as well especially for true mobile productivity. Their notoriously lightweight, super-fast Web apps just work. And at 1.0, they work better than most of the 4th generation WM5 apps on the Blackjack.

Best of both worlds? How about a 'native' Google mobile suite for iPhone, to include all the apps from gmail to Jot. Yes, I know, Steve already mentioned that there would be some Google integration with iPhone at launch time. But how much integration is the question, especially in light of all this gPhone speculation. iPhone seems like a more straightforward entry point for Google, but in this business nothing is straightforward and anything is possible...should be an interesting summer.

Read the full story