2008年6月6日 星期五

At Google, a search guru's dream comes true

At Google, a search guru's dream comes true

Q&A Search has become central to the functioning of the Internet, but Udi Manber isn't the kind of person who takes that for granted.

"I don't have to tell anybody around here that search is important. That's a very nice luxury to have," said Manber, the Google vice president in charge of search quality.

manber

Udi Manber, Google VP, engineering

(Credit: Google)

Search quality may seem like an unassuming element of Google's operations, but in fact it's at the core. Manber oversees the company's search algorithm--all the different inputs Google weighs to judge which Web sites to rank highest in search results.

Manber's work has been highly secret, partly because search is central to Google's competitive advantage and partly because Google doesn't want people gaming the system to get artificially prominent results. But the company has begun sharing a smidgen, including an opening blog post by Manber in May. I talked to him at Google headquarters recently.

How mature is search today on the Internet? Are we 5 percent of the way done with the problem? Ninety percent?
My best analogy is that a 15-year-old thinks he's very mature. A 19-year-old thinks he's extremely mature. Every few years you learn that you were not mature before. Search on the Web is about 15 years old, and obviously we were much more mature than we were 5 years ago and 10 years ago and 15 years ago. One way to put it is that it's science fiction every 5 years. What's possible today to me was science fiction 5, or definitely 10 years ago. What was (ordinary) 10 years ago was science fiction 15 years ago. The development is really pretty amazing. It surprised even me. I expect a certain level of progress, and we're actually surpassing it.

You were at the University of Arizona, then Yahoo and Amazon, then A9, then you moved to Google in 2006. Is there anything you've learned from looking at it from different perspectives, or have you been just tackling the same thing with different phone numbers on your business card?
It's the same problem, and I've looked at it from many different angles. It's bigger here, and it's better here. We have a team that's beyond any other team I've ever been with. We put more resources into it. I don't have to tell anybody around here that search is important, and that's a very nice luxury to have.

I remember the old days of AltaVista and HotBot and WebCrawler some of these other search engines--days when search was really very primitive.
I remember starting those things. They looked very sophisticated and mature at the time, which is my point about the 15-year-old.

It's clearly become a lot more usable. But even 10 years ago, everybody hadn't been trained to think the way we get information is we go to a search box and type something in. Now that seems abundantly obvious. What 10 years from now is going to look stunningly obvious as having a search box is today?
It was clear to some people. I don't want to brag too much, but it was clear to me. That's why I moved to search in the early 1990s, because everybody was talking about the information revolution. It was very clear that to have an information revolution, it's not enough to store the information and move it around, you have to find it. I know a lot of people at the time who were talking in those terms--that's going to be the revolution. The ability to find things among huge amounts of information is the key factor. So while nowadays it's completely obvious, even 6 or 7 years ago it was not obvious. I think the reason Google is so successful now is because it was obvious to (co-founders) Larry (Page) and Sergey (Brin) 10 years ago, they put in all the effort, and they're still doing it now.

Don't take that for granted. It was not that well understood, but it was understood by some people. When I started working on search when I was in academia and I said I'm working on search, they looked at me and said, "What do you mean you're working on search? Did you lose something?" In the early 1990s, even, very few people worked on search, because search was done by professionals in various limited domains. There was legal search, there was medical search, there was chemical search, and some limited news search. And it was done by a searcher--professional people. You tell them, "This is what I want to find," and they find it for you. I went to trade conferences with searchers. The idea that people will do the search themselves--that it'll democratize the whole thing and you don't have to go to a professional--that's the revolution.

I think that'll advance much more because you'll do more searches. There are a lot of things you don't search for now, because you don't expect Google will know or that the search engine will find out. We are finding that user expectations grow. The kind of searches people do now are more complicated than the kinds they were doing five years ago. People expect a lot more from us.

Ten years ago, if you actually found an answer to some specific question, it was, "Hey, look at this, it's so cool!" It was an event. Nowadays if you don't find exactly what you want in the first or second result, something is wrong. That's nice. The expectation is that we'll do it.

What do you understand that everybody else is missing five years from now--the sci-fi time frame?
I would say it's more of the same, but it's more in-depth, easier, and allowing you to control more of what you're looking for, giving you more input, finding more things. There will be lots of rocket-science things that will come along, but those I can't talk about.

One interesting trend is personalization. I like cameras, so if I do a year's worth of searching, Google understands I tend to like cameras, if I specifically enable it. How mature is that?
Yes. If you specifically give us permission and enable search history, then we'll store your search history and use it to improve your search later on. It's not something that will completely change your results. It's something that may take something from position five and move it to position three once in awhile, sometimes higher, sometimes lower. Even if you really care about cameras, many of yours searches are not about cameras. A lot of searches are about things where you don't know the answer. If you search about the history of the Renaissance and I tell you about somebody who took pictures of books about the Renaissance, it's not a good result, just because you like cameras. We have to know when to use it and when not to use it.

A related question: Everybody is into social networks these days. Is there some intermediate tier between the entire Internet searching and my personal searches, where you might have a search results input of me and my friends or me and my family?
I'm looking at it the same way I'm looking at personal search. It's one more signal we can use to improve your search. Just as if you've done the same search 15 times before, if you've done 100 searches on cameras, that tells us something. If a lot of your friends do a particular search, it's just one more signal. You have to figure out when to use it and not use it--for what kind of queries it's going to give you better results and when you ignore it.

Is that something you have under development? I don't think it's actually out there now.
It's not out there now, and we're not talking about (any future plans).

I wonder how deterministic Google is--how reproducible one search is from one day to the next. With this vast constellation of servers you have, do the changes propagate slowly across the system? If I do a search in Boston tomorrow, or with servers that are out of sync, will I get different results?
It does percolate through the system. It doesn't do it slowly. It does it very, very fast. But it's definitely the case that if you do the same search on a different cluster, you may get slightly different results at a given time. It's also the case that if you do the same search on different days you may get different results, because some of the results are things we indexed five minutes ago.

We are really, really fast. If something new happens in the world and you search for it, I'm not going to give you an exact time, but within an hour you will see in the direct results pages that relate to that story. Freshness is extremely important to us.

The other difference is it depends on location. If you do the same search from a different country, you get different results, even if it's the same language. We will tune the results by the country in which you're searching. It's by language and location.

How is universal search working? To what extent do people use it?
What we really want is for you to go to one place, Google.com, search for whatever you want, and we'll try to figure out whether you wanted a video, a book, local information. Based on the query, we will insert different media types into the main search results. You don't have to remember to go to images.google.com when you're looking for images. The idea is you don't have to think about it.

How hard is it to keep the porn out? With two large classes of things people don't want--spam and porn--a lot of energy goes into blocking it or into getting around whoever is blocking it.
We put quite a bit of energy into it. There is a whole team dedicated to removing porn, not just in the U.S., but internationally. I think we're doing a pretty good job. It can't be 100 percent, but it's a very small number. Whenever we evaluate anything, that's one of the things. Does it introduce more porn or less porn?

Is that as active a cat-and-mouse game as spam?
Not really. My impression is that most porn sites do not try to trick you. Spam is the idea of tricking you--showing you one thing when you're looking for something else. I think most porn sites have enough clientele that they don't need to do that. They pretty much identify themselves.

I'm curious about what I think of as the return of the command-line interface: Google's onebox idea. How do you see that as different from conducting a regular search? Are people going to be trained to type in specific commands to get the result they want, like "time London"? That's in the gray area between a search term and an actual command. Or do you want it to work just when people type in what they want?
We don't want to force people to learn syntax or any special thing. We want to be able to understand what they mean without them having to learn. That doesn't mean we won't allow people to type in special commands to get something special. We want to put advanced tools in there without requiring everybody to use them. "Time London" is a good example--that's what you would think if you're looking for what is now the time in London. You can also say "what is the time now in London." The idea is you're in charge, we should figure out what you want and get you that answer.

There's a solid business already with vertical search--chemical, medical, health, legal. Are you looking at some sort of search subset for specific areas?
Our approach is universal search. We want to get everything into Google search. Having said that, there are specific searches--the kind of operations you want to do in the results. For example when you do local search, geography matters, where it is on the map matters, and you want to see the results on a map. Or when you do product search, price matters. If you have cases of searches where some things matter more and you want to allow people to operate on those or navigate those parameters, then we'll give you more tools to do that. But you shouldn't have to navigate to a specific site to do that.

With video and photo, are we going to get to a point where the computer can know the content without convenient text labels?
I think we can do better at it. The typical question people pose--"Can you tell that's a tree?"--I think that's the wrong question. We can tell that's a tree with the text. We'll get you good pictures of trees. The problem is you want to look at the Hearst Building with the sign from the right angle with the sun up above. That's the kind of question that's very hard to tell, because the image doesn't say it's the Hearst Building and whether the sun is shining. You're getting into a lot of depth there. That's going to require some combination of some image processing and some information about it. The metadata around the image is going to get more important.

Can user-generated content give you an entree into that, where you have somebody who does say specifically this is the Hearst Building? So you mine Flickr and Picasa Web albums.
We have that now. And we can say I want pictures that are high quality. Or just line drawings. We can tell that, and move to more signals and more features.

For a lot of searches, I get Wikipedia entries. Do you think that's good or bad? Wikipedia has a huge number of internal links to itself; I don't know if that increases its rank in the search results.
I look at it in terms of whether people find what they're looking for. If they find what they're looking for, it's good.

沒有留言:

網誌存檔