Showing posts with label web 2.0. Show all posts
Showing posts with label web 2.0. Show all posts

Monday, August 20, 2007

Being Verity in a Google World

"Real Source Content / Result Federation is alive and well"
- J.W. Lehman, founder of Verity


I would be remiss in not taking the opportunity to respond to an interesting comment posted to my blog back in June on Federated Search. The response came from a founder of Verity, a leading enterprise search vendor acquired by Autonomy. The comments revitalize the debate surrounding the evolution of information retrieval and the evolution of information storage. To best clarify my position, and provide a rebuttal to the points raised by J.W., I'll provide my comments in-line and in bold following the poster's comments. The debate is on, right after the jump!

From J.W. Lehman,
founder of Verity

Real Source Content / Result Federation is alive and well

“Old Federated searchers never die, they just become…..”
anon

1. The poster hasn’t a clue about the purpose of federated search in information retrieval / research. Should “federated search” take the blame for slow/poor collection access? Of course not. Federated search is NOT, as the poster claims, an “interactive” single collection search mechanism, ala google, verity or any like…it’s a “watcher-monitor” of what is going on in the info-world in specific subject areas. If the poster told his enterprise customers they were getting google-for-the-deep-web, the poster just didn’t understand their requirements….typical for IR technology vendors, and VCs. Who cares if the answer takes 5 minutes or 5 hours? The purpose of federated search is sending-alerting new relevant material as it’s generated. Federated search is a very powerful, and quick, research assistant WHEN IT IS APPLIED PROPERLY.


Well, I think the opening paragraph pretty much says it all: "Who cares if the answer takes 5 minutes, or 5 hours?" Who indeed...hmmm. How about everyone. Everyone who's grown up with today's superior web search engines at their disposal. I would love to take a poll to see how many people would be willing to wait 5 minutes much less 5 hours for ANY form of search. But let us carry on.

We must first correct the poster's bold premise because federated search clearly does not belong in the 'watcher/monitor' category. Watcher/monitors have a very distinct membership that is quite different from federated search. RSS/Atom feed readers, dashboards, and RSS aggregators such as iGoogle, NetVibes, NewsGator, Bloglines, and OriginalSignal, are watcher/monitors. They are not federated search players whatsoever. Search is pull not push. It is active not passive. Feed aggregation is passive, and it pushes. Apples and oranges here.


Federated search supports COMMUNITIES OF INTEREST by replacing the incredibly complex need to individually access and merge content from all appropriate sources in the search for answers (regardless of their “fun-ness” to access), with a process that does it on command.

J.W. obviously hasn't read my other posting on Yahoo Pipes, and Google CSE. I offer up this as prerequisite reading material before claiming that big search engines can't address 'communities of interest' in a far easier and more powerful way.

If the user can’t wait 5 minutes or 5 whatevers for results that he/she couldn’t obtain in 5 weeks-months of manual effort, then the sources themselves must be unnecessary.

This remark invokes the proverbial 'wake up and smell the coffee' response. Every search engine in existence has invested millions in R & D and usability studies to unanimously confirm and conclude that speed matters. And it doesn't just matter, it is vital to achieving wide spread adoption and utility-- it is vital to survival. It is often difference between #1 in the industry and #100.

See this link for empirical evidence to this point. You'll find that user adoption, and user satisfaction is of paramount importance to the search experience. A 500ms drop in response time results in millions of abandoned searches and unsatisfied users. Can you imagine what would happen if these users had to wait 100 times as long? That would be 50 seconds. How about, as J.W. suggests, 1000 times as long? I think we can all predict the outcome.


The poster, and most of the rest of us, have fallen under the google-spell that time to first result and time-to-answer are the same. Not! How long does it take to find the fact/assumption/relationship in google/convera/verity/zylab/inxight result # 870? We’ll Never Find It, because we gave up after result 25.

This is no spell. This is reality. The world has evolved people. The majority of web surfers are a few of us Gen-x'ers, Gen-Y, Z, Millennials. Most were born into this world with a cell phone in hand, and broadband, and Wi-Fi everywhere. The expectation of always on, instant gratification, and real-time computing convenience is not a nice to have in today's world, it is now merely an assumed, necessary requirement. And, they are the best and brightest generations of our time.


2. “keyword” search? What century is the poster from? If you can’t explore content via explicit taxonomies with the searchrules to back them up, of course you’re going to get poor, mixed up results. [and not only is clustering is dead, dead, dead…, it was never alive!]

We do agree on one point above-- clustering is not ready for prime time. Beyond that, perhaps our differences are simply generational. I am part of the Internet generation, and not a day earlier. Let's be real folks, keyword search works, it works really, really well. It is undisputedly the fastest, most popular, and most effective universal mechanism for finding information today.

Today's keyword search engines are anything but just keywords today. But my discussion is not (and has not ever been) about keyword searching. It is about federated search, and its shortcomings, and why we must everything. But for the sake of discussion here's my quick take on the state of keyword search technology: Today's 'keyword search interpretation' technologies are more intelligent, proactive, interpretative, interpolative, and extrapolative than ever before. They are capable of much more than meets the eye. But that is the point, to keep it simple to the user, to appear as if the system is 'idiot proof' and the all it takes are a few simple keywords and magic happens. This is increasingly becoming the case today. More to do, this is certain. However, keyword search is still by far the most effective input mechanism to for matching information with your intent, even if you aren't fully aware of your intent nor fully knowledgeable on the subject you pursue. See an upcoming post titled: "Browsing the Web for Knowledge Using Keyword Search."

The industry deadpool is full of vendors that once hocked taxonomies, directories, and other structured content browsers. Taxonomies are great for very specialized collections of content, but they totally implode when mashed together by a federated search engine and 10 other content sources with totally different ontologies, categories, and metadata. It just doesn't work when blended together from completely different sources.


Index everything!!!!!!!!! Why bother? Keyword search will give you the same mess on an indexed collection…actually worse, because it’s only the rare and to-date, unpopular engine that recognized the presence of evidence at the meaningful text unit (i.e. paragraph) level….so instead of federated search telling you your “KEY-WORD” is actually in the title/snippet/abstract, you now get to discover the 1000x list of content where it’s anywhere in the full-text. What an advancement!

Why bother, hmmm...why indeed... Well, let's see... the last time someone got the idea to do this the right way, out popped a couple of life changing web companies with worldwide adoption and sustained valuations in the tens and hundreds of billions of dollars.


But here's a better reason: It just plain works.

The real problem here is that my counterpart is mixing metaphors for comparison sake by effectively equating federated search with concept search, and earlier with watcher/monitors which are both false equations. I'm not comparing methods of retrieval. I am focused on the virtues of storing all content in a single index. And just because we've indexed everything into a single source, does not mean that we are limited to mere keyword searching for information retrieval.

Every federated search engine, including Verity, when plugged into multiple sources for keyword searching does at least this much: pass the keyword queries to each content source wired to the federated search, and get results back from each, the keyword way. We know there are many other ways to retrieve content from a source, but this topic is and has always been about federated searching, not federated browsing, nor conceptual matching. All of which can still be done better with a single index of content anyway.


3. Result Federation…..The ability to de-dupe, de-mystify and normalize results from multiple relevancy determination techniques has been available for years…where have you been? All that’s necessary is to make a practical relevance determination of each result based upon the search request; and order it.

Regarding the existence of de-duping, etc. I distinctly don't recall saying anything to the contrary. I merely support the fact that all implementations to date do not work very well. Not one federated search engine can possibly make a reliable relevance determination based on the search query for one simple reason: it is not up to the federated engine to decide! The results that come in from each disparate content source are determined by the ranking and relevancy engine of each source's proprietary algorithm. Thus, even if the federated engine could magically infer the inter-source ranking with some degree of usefulness (though doubtful), the net results would only be as good as the worst ranking algo from the worst content source. Let's look at a simple illustration to clarify, shall we:

Step 1: Example query: nanotechnology fabrication

Step 2: Sources 1-5 are selected to 'federate' - assume sources 3-5 have terrible ranking engines

Step 3: The above keywords (yes keywords J.W.) are passed to each sources' query engine

Step 4: The "top ten" results are returned from each source's relevancy engine

Step 5: the 50 results are some how re-ranked based on the nature of the query? I'd like to see that. Especially since the results returns are merely title, snippet, URL, and NOT full-text. As is the case with every standard enterprise and web search engine index.

Step 6: Regardless, sources 3-5 poorly ranked documents make it impossible to unify the ranking in anything but a largely arbitrary way, and giving arbitrary credibility of the results list.

Step 7: Because the federation technology has no way to evaluate how well a given source is ranking its own documents, it is impossible to establish a consistently high quality set of ordered results, using this antiquated yet widely suggested way of federating.


4. In any subject, google-yahoo-ms-altavista-etc, lets you find out what everyone
else already knows…..the ability to find out what nobody else knows/surmises is
virtually denied.

This belief makes one heck of a gross assumptions as to the way in which any of the aforementioned engines employ page ranking. Discovery is purely a function of the nature of the access methods to the information source, all other things being equal. With a single index of content I can create discover, knowledge, connectedness, and relatedness of concepts, sentences, subjects, and more without the need for federating a single thing. It was called Grokker 2.3 Desktop for Google, back in 2004. Today its called Google CSE for a single source, and for multiple sources its called Yahoo Pipes.


That is what federated search is for … multi-disciplined
communities of interest seeking answers to advance knowledge, as opposed to
wikipedias-google results.

June 11, 2007 3:35 PM


Federated search as it exists today is not a social medium, and it was never intended to be. Collaborative filtering, collective intelligence on the other hand, is the future today. Has someone slept through the web2.0 phenom? digg, delicious, feedburner, flickr, Wize, Yelp, Google Reader, iGoogle. Web 2.0 companies have already categorically taken this aging notion of 'communities of interest' via metasearch tools and turned it upside down-- and actually made it work for the first time. And while all of these new web services aggregate content from a huge multiple of sources, they are not federated search engines in any sense of the word, as I have described in all of my postings.

What's more, equating or limiting the definition of federated search to apply only to research/enterprise content versus searching public WWW content, is a significant misnomer.

For if the best of today's web search engines were to index ALL of the available high quality, structured enterprise/research content behind the firewall (which now a few of them are doing, btw), I could then profess the end of old-school federated search, that has plagued enterprises, universities, and the world at large for over a decade now. Giving way to entirely new ways of federating, classifying, categorizing content-- but from a universal index of content with standardized metadata and shared ranking algorithms.

So my position remains unchanged, if not reinforced. The doctor has checked the patient for a pulse, and she's still dead as a doornail. Good night and good bye my dear federator...

Read the full story

Tuesday, April 3, 2007

Personalized Google Mashups - On The Fly

If you haven't used JSON, you're missing out. If you haven't heard of it, your just out of it period. JSON is a great data interchange format, that Google utilizes to streamline their first mashup wizard for Google Maps. It's a simple alternative to coding (certain) server-side proxy's for http requests to get to data in the form of JSON feeds. JSON liberated this extremely cool mashup wizard at Google a few days ago. Zero coding required to build very useful Google maps mashups of your own from your own Google Spreadsheet table. Reminds me of XQuery's thin client-side data extraction properties. Not surprising. Hmmm...XQuery for JSON...we could really be on to something. At any rate, for this example, you have to get your data into Google's Spreadsheet first, but that's far simpler that coding a mashup from scratch. This is the power of great front-side middleware, making custom app building truly user friendly. An excellent step forward that will no doubt unleash a new bevy of corporate, personal, and startup mashups. My first mashup to follow...

Read the full story

Tuesday, February 20, 2007

Design Simple, Part 1: Enterprise 2.0

A question I am constantly being asked by up and coming tech entrepreneurs: "How can we better leverage the web, and new web2.0 models to spark growth or create a new market segment or revenue channel altogether?" But now I being asked the same questions by serial entrepreneurs and seasoned fortune 1000 executives alike. The answer is not the same for all scenarios, but indeed there are some battle-tested rules that everyone should know. In a multi-part blog series, I will be addressing various aspects of this new era and the dilemmas that most companies, old and new, will come to face at some point in the not too distant future.

In my mind, this question can not be answered unless you've built online businesses in both web worlds, Web 1.0 and Web 2.0. I've sat on countless panel discussions and debated with a wide variety of journalists, analysts, bloggers, and 'pundits' whom all have their own 'unique' spin on the subject. Sideline referee versus quarterback perspective, I suppose. Mostly what you hear from this crowd are catch phrases like: the long tail, viral marketing, scalability, network effects, social mediums, participation age, the list goes on and on. Judging by the ever expanding definition for Web 2.0 at Wikipedia, it's no wonder that many joining the fray quickly become shrouded in the vagueness of their own Web 2.0 execution.

As you might expect, Tim O'Reilly presents a reasonably sound analysis of the web2.0 conceptual framework, and the new 'levers' that can influence the adoption of a web business, product, or service. A good primer to get you oriented before attempting the recommendations that follow. Bear in mind that there is a big difference between the conceptualization of web2 strategies and real world execution and producing tangible, measurable results. Don't be fooled into thinking there are shortcuts and quick hits by simply regurgitating what you've read online. The list here is by no means complete how-to, rather, it is dialed in to address the most common problems with executing the vision.

1. Get focused. Converge on single idea. Don't zoom out. Resist the temptation to lump 10 separate features, 5 unique value propositions, or even just 2 different products into something you defend as your sole mission.

2. Repeat step #1

3. Repeat step #2, this time have a colleague or customer test your focus for clarity.

Let the value proposition sell itself. Too often people equate Web 2.0 with Hype 2.0. A marketing blitz will only dilute the DNA of the product's key value to reach the user in a meaningful, sustainable way. You need to get the product right out in front of the audience. If you cannot articulate or demonstrate the core value proposition with one picture, or one sentence, all subsequent strategies to leverage the Web effect will be ineffective. I see this all the time with the companies I am advising today. Great products, great teams, great ideas, but all of the value creation gets buried behind logins, passwords, downloads, and other forms of "friction in the adoption curve", as I have coined it. This friction does nothing but keep the best assets and value from being discovered and adopted in market place. It's sort of like self-inflicted wounds that never heal quite right.

This is most prevalent with enterprise companies trying to become enterprise 2.0 companies overnight. I've watch Cisco try it, Sun try it, and hoards of other companies large and small. Each with varying (read: minimal) degrees of success thus far. And without some serious reprogramming, none will truly garner the uptake of the web2.0 pro's like Jot, (which at the moment is off line for new customers because of the recent acquisition).

What constitutes a good example? First, it is the company whose product or service is the website itself, and vice versa. Think about that for a moment. Second, the Web must be an integral part of the value proposition, and an integral growth driver. More on this topic in part 2 of the series to follow.

What constitutes a weak example? Brochureware is not 2.0. Demoware is not 2.0. Screenshots, Flash movies, and 30-day trials are so not 2.0.

The challenges most companies face with embracing the Web and '2.0' as a new market for growth can be complex and subtle. Conventional sales and marketing techniques only create surface-level awareness. They don't spark adoption and they don't promote viral uptake. Only an immediately recognizable value proposition and a legitimate social incentive will put 2.0 in motion. Instant gratification is vital. Without it, it's only a matter of time before you're back to the drawing board.

Enterprise companies today have tremendous potential for growth in a 2.0 world because of the vast asset bases they are sitting on. Unlocking new potential and new markets for these assets can conceivable give them a tremendous unfair advantage. Though it is indeed an art and a science to getting it right; and producing tangible, scalable results doesn't come easy. What's more, the process can easily backfire without a qualified team in place to execute. Brands, credibility, and market share can easily be swept away with ill-fated attempts to join the "in" crowd on whim. Same can be said for start-ups. Though the assets are not necessarily vast, they are potent, high-value, fresh concepts that risk becoming the greatest technical innovation that never took off. 2.0 is a powerful concept that when used responsibility, can make reaching critical mass, a closer reality. More to come. Stay tuned.

Read the full story