The Two Big Problems For Powerset
Powerset, a start-up search company focused on so-called “natural language search” is now open for business (see Techcrunch story). So far, you can search only the content on Wikipedia. There’s a reason for that: the Powerset technology doesn’t simply index content by keyword; rather, it runs more computationally intensive software that attempts to figure out the semantic meaning of the content, and then make that content searchable with a natural language interface. So, despite $12.5M of VC funding, they don’t have the computational resources to index the web. Sounds great to you? Well, I’m not so sure it is…
Unfortunately, I think Powerset has two major problems: firstly, the kind of approach Powerset appears to be taking performs worse than keyword search; and secondly, Google is, in a low-key way, already implementing a “natural language” approach to search that does work better than keyword search.
Back in the late 1990s, I developed a natural language search system, based on a model of how the human brain understands language, for use in the biotechnology and pharmaceutical industries (you can read a bit about it in this publication on the American Chemical Society web-site - it’s an easy read, with no overly technical content in it!). The key thing that made our system work, was that we built in detailed knowledge about the particular domain we were interested in, and the system was designed from the ground up to make use of domain knowledge. General purpose, or domain-neutral, approaches to natural language understanding software, by themselves, simply don’t work very well compared to keyword search.
What does this mean in practice? Well, when you type in your natural language query into Powerset, it means you’re unlikely to get a clear answer to your question, because Powerset’s domain-neutral software doesn’t really understand the question. However, if you type in a natural language query into Google, and it’s specific type of question that the Google software understands, then you get a great answer back. Here’s an example query - “What is Elton John’s real name?”
Powerset Results
On Powerset, the results are as follows:
-
Elton John
(A spoonerism for “Budget Rent-O-Van” and a reference to his real name Reg.) … Spots related to Elton John’s childhood and career
- close
Elton John discography
Elton John’s Milestones (1980) … | 1994 | “Ain’t Nothing Like the Real Thing” (with Marcella Detroit) (Nickolas Ashford/Valerie Simpson) | Duets | 24 | (N/R) | (N/R) | – | – |
- close
Elton John’s Greatest Hits
Though the album includes many hits from Elton John’s early era, it is noted for leaving off several of his more popular songs.
- close
List of albums (E)
Elton John – Elton John’s Greatest Hits Volume II (1977) … Evermore – Real Life (2006)
- close
Elton John’s Greatest Hits Volume II
Elton John chronology Blue Moves (1976) Elton John’s Greatest Hits, Vol. 2 (1977) A Single Man (1978)
Not exactly what someone asking that question is looking for. It’s worse than a keyword search.
Google Results
On the other hand, if you type the same question into Google, you get the following answer:
Elton John — Birth Name: Reginald Kenneth Dwight
According to http://www.movietome.com/people/3666/elton-john/index.html - More sources »
That’s a perfect answer. Google’s software understood what I was asking, and gave me a clear answer back. How did it understand the question? It didn’t take a domain-neutral “subject verb object” style approach to trying to understand the question. Instead, the software “knows” that a search with “the name of a person” and the phrase “real name” is most likely to mean, “What is this person’s birth name?”.
I really think Powerset has a lot of work to do if it wants to be taken seriously as a player in search. The truth is, as of now, Google gives better results for most searches, including some natural language searches. That’s not to say Powerset can’t find a niche where it can succeed - but, in my opinion, promoting general natural language queries isn’t the right way forward for the company.
Mark Johnson wrote:
I agree entirely that promoting natural language searches is not the way to succeed. Powerset tries to add value to keyword searches as well as NL queries (try “elton john” and look at the Factz, some cool stuff in there). Note also that we bring our semantic technology into our enhanced Wikipedia articles with tools that help users to read the content. Play around with us for awhile and then see what you think.
-mark, powerset product manager
Posted 12 May 2008 at 5:46 pm ¶
simon wrote:
Hi Mark, Thank you for commenting.
So, I agree with you: keyword-based searches that lead to your summaries of the Wikipedia and Freebase content are more promising; the results are certainly relevant to the query with keyword searches.
This is where it gets tricky though. One of the questions I suspect people will be asking is: how much value are you adding to the data?
For example, presenting the results of a query to Freebase possibly doesn’t add too much value (i.e. your tabbed summaries that often appear at the top of a results page); and presenting the list of “hits” of the keyword from an index search of Wikipedia in the last section of the results page, possibly doesn’t either (that is, the list of links). That’s because you can get broadly equivalent results from Freebase and Google, respectively.
The middle section “Factz we found”, which appears in some results pages, is, I think, genuinely unique to Powerset. That is, the “subject verb object” triplets. People’s mileage is going to vary with this part of your results page. Without doubt, there are some interesting nuggets of information sometimes there; but there’s quite a bit of noise too.
Having said all that, it’s easily possible I’m being overly critical (I spend quite a lot of time thinking about adding genuine value to data in vertical search engines, because of some of the work we do, and that makes me look at these kind of things more closely than your typical user might). Your results pages present all the information in an easy-to-digest way, and there is value in that. And there is value in a type of meta search that gives equivalent results to searches of Freebase and Google combined. At this point, it’s just not obvious to me how *much* (time will tell!). I wish you every success though!
Posted 12 May 2008 at 9:09 pm ¶