Search Engine Strategies: Indexing Summit
Now onto the main event, or at least the session I was most interested in: the indexing summit. Danny Sullivan got together employees of the top four search engines, with the plan being to hash out some ideas on how to prevent web spam, which is primarily aimed at getting search engine rankings. Will it work? Let’s listen in…
Google’s Matt Cutts kicked things off by looking at “nofollow”. While Danny scheduled this summit, Google was looking for a way for site owners to say they can’t vouch for a link, and rel=”nofollow” was their solution. In just six weeks, Google has discovered 40 million nofollow links. Signed up so far: Yahoo, MSN, Google, AOL Journals, MSN Spaces, LiveJournal, Scripting News, Six Apart, Blogger, Technorati, Bloxsom, Blojsom, Modblog, drupal, leaonardo, pebble, netdoc, bblog, and others.
Most of the opponents to nofollow, he claims, are spammers who are afraid of it. He also claimed that spammers are already moving to different types of spam, what with nofollow in their way. Matt mentioned Yahoo’s Web Spam Squashing Summit held last week.
Next up was Tim Mayer, Director of Product Management at Yahoo. He calls nofollow a good first step, and notes that Yahoo implemented it just last night. He talked about Yahoo’s summit, noting that it was the first time all the major players met up outside a conference.
Yahoo is also working on tags that can block off publicly modifiable content, so it doesn’t count as links in a search engine. This is of the format: div class=”content-public”…/div. They have also come up with a class for navigational elements: div class=”content-nav”. In addition, there is a tag for the actual content of the page, to let it stand out and not fall under the other rules: div =”content-default”. The public and nav tags work on the link level, as rel=”content-public” and rel=”content-nav”.
He was followed by Kaushal Kurapati, product manager for search technology at Ask Jeeves. while discussing their spidering strategy, he noted that Jeeves is working to make sure their spiders don’t hurt a site too much, and suggested sites use gzip compression to save up to 75% of bandwidth. Servers can also help out by time-stamping the content amd simplifying site navigation.
Kuashal suggested sites used more descriptive anchar text and to watch out for JavaScript and dynamic pages, which can be difficult to index.
Finally, we heard from Eytan Seidman, program manager for MSN Search. He notes MSN started supporting nofollow two weeks ago, and that they are also working on reducing crawl loads. One thing MSN is concerned about is that pages are properly indexed based on their country of interest and langauge, but that metatext is not very useful, because of abuse.
Eytan also explained that it might be useful if users could say what they thought was web spam, much like they do with email spam.
After this, Danny came up and read some suggestions from the Search Engine Watch forums, including letting publishers finally being able to see what the search engines think of their pages and the engine’s metrics for their pages, so they can see what they’re doing wrong and needs to be fixed. Danny wants them to unify the things they are doing.
During the questions, Yahoo’s Mayer reiterated Yahoo’s policy of marking down page elements, and surprisingly (to me), it seemed very popular with the crowd.
One questioner pointed out a very interesting idea, where content providers would be able to identify their content all over the web. No matter where you post, your statements would be linked to you, and you personally would gain relevancy points. Its a very interesting idea, but its more likely about 10 years away.
I asked the panel what they thought about editorial nofollow, where, for example, a blogger links to a site he hates with nofollow so he isn’t helping it. Matt Cutts says it was a suitable use of the tag. Someone else asked what about creating a negative nofollow tag, as opposed to an ignore one, and while Matt suggested it would be abused by sites ot give their competitors negative link juice.
Danny suggests that site networks should be sure to indicate navigational elements, so they have nothing to worry about linking to themselves on every page.


