Next Generation Search Challenges
Coinciding with the announcement of the Sun/Google partnership, it’s looking like “next generation search” is getting a bit of attention over at Microsoft at the moment. The issue is that, despite the free availability of search engines like Google, MSN and Yahoo!, people often have quite a hard time finding what they’re looking for on the Internet.
This is a fundamental problem with simple “enter a keyword or two” approaches to search. Such search terms are, of course, ambiguous. So, to take one of Scoble’s examples - if you enter HDTV as your search term, the search engine can’t tell if you’re interested in buying an HDTV, or trying to find out about the technology.
Are there solutions to this problem? Sure. Are there solutions that are cost-effective to implement. That’s much more difficult. The widely held view is that the key to solving these problems is having the right meta data available both to search, and to provide context for the search. That’s how Google image search works. You first tell Google you want to search images - that provides context. And then, you don’t search the images themselves - instead, you search meta data (in this case, plain text that relates to the images). The search results are returned to you, and visualised in a way that makes sense for the type of data you’re searching. In the case of images, thumb nails are a good way to present the results.
Now, a lot of really smart people have spent a lot of time figuring out good ways to handle meta data. They’ve produced some fabulous work on knowledge representations. But, the problem is - storing meta data isn’t the problem. The real problem is that high-quality meta data can be very challenging and expensive to produce. Five years ago, I had a lot of fun building computer systems to extract detailed, accurate meta data from unstructured text documents, in a fully automated (i.e. cost-effective) way. And, coincidently, talking about next generation search, one of the things another company of mine is doing at the moment is buiding software to automatically extract detailed meta data from video footage, including from features that can’t even be seen by the naked eye… but that’s another story. So, I’m a big believer in software as a great way to create high-quality meta data.
This kind of software is quite difficult to write, though. It needs both clear thinking and a deep understanding of unstructured, error-rich data to work out and implement good algorithms. I’ve seen many exceptionally smart, highly talented software developers not get it. However, the bottom line is: you can do amazing things with search, if you have the right meta data. With the resources at their disposal, it would be suprising if at least one of the major search players didn’t take a step forward some time soon.
Simon Brocklehurst's Weblog on 04 Nov 2005 at 9:10 pm
Stuff That’s Impossible To Google
Lots of people think search is done. Google has won the war. Yahoo! And Microsoft are but a poor second. However, Microsoft’s Robert Scoble is pretty vocal about the need for continued innovation in search. And just recently, I was reminded o…