I found it as a result of doing some research on how to get Google search results programmatically–turns out you have to cheat as Google really is working overtime to obfuscate and deny programmatic search results. They do this through Terms of Service that object to programmatic access and limiting their supported API to custom search for sites, which doesn’t return the search results you’d typically see. Let’s leave aside the impact of personalization, which is also very hard to turn off, but which I think of as a relatively good thing according to my White Hat definition below.
This business of fighting against SEO is a bad thing born out of weak algorithms.
Okay, calling the algorithms bad and weak has got to be picking a fight with a company that prides itself on not doing evil and on algorithmic prowess. What the heck am I on about now?
Let’s start with a very odd analogy. Assume you’re trying to encrypt data. At the same time, you want to create a tool that measures the strength of the encryption as simply as possible. What would you do and what else would the tool be good for?
I would create a tool that measures the apparent randomness of the encrypted string. The other thing the tool would be good for is as a metric for compression algorithms. If a string is truly random, there is no apparent intelligence there–it is well encrypted. If a string has any patterns to it whatsoever, those patterns could be exploited to learn something valuable about how to decrypt or further compress the string. For example, “e” is the letter that appears most frequently in English text. If we don’t obscure that so letter frequencies appear random, we have a valuable clue for decryption. If we don’t take advantage of the relative frequency of various characters, we have an inferior compression algorithm. Either way, randomness is not a bad proxy to measure encryption or compression even though it doesn’t measure it directly.
What does this have to do with SEO, for Heaven’s sake?
Let’s differentiate White Hat (Good) from Black Hat (Evil) SEO strategies in a particular way to make the point:
Black Hat is gaming the system to provide an unfair advantage to search results that otherwise would be considered undesirable by searchers.
White Hat is understanding the system in order to gain insight into what searchers are doing to make it easier to find your valuable content.
Can you see where I’m going?
Content Farms are Black Hat. They throw together a mish mash of relatively low value content in order to brute force their way to the top of search results. We all know one when we click-through their links. With Panda, Google is working hard to push them back down the results.
But, why, OTOH, is Google acting like it is Black Hat to want to understand how your pages are ranking against various queries so you can do better at designing pages users can find more easily? The only way it could be Black Hat is if Google’s algorithms are not very good at understanding what good content really is. So they have to keep changing things up to prevent gaming the content with slick strategies that can emphasize any old crap in the results.
Getting back to my encryption example, Google needs some sort of simple test to identify good content. Perhaps all this personalization and +1′ing will do the trick. But Google, while you’re wrestling with the problem, recognize one important thing: Computers don’t understand Language!
Yes, I know you of all organizations must know that well, but you’re threatening to show a lack of understanding while throwing out the baby with the bath water in this war on SEO. You’re denying legitimate content creators the tools they need to help you get their content into the right hands. Meanwhile, the Bad Guys are not going to listen to your Terms and Conditions anyway. They’ll figure out how to game you over and over again. Why penalize the Good Guys in the process?
Lest you think you can win this arms race in any meaningful way, consider the difficulty of truly stopping the analysis of keyword rank. All the players have to do is put their application in the Amazon Cloud or go to a P2P system to distribute the load across many machines and you’ll have no idea whether you’re facing one SEO Tool you could try to block or a zillion hand typed queries from legit searchers.
This talk of Clouds and P2P brings me to another thought–search engines of all kinds should take advantage of elasticity to add value.
The infrastructure of any search provider must have elasticity as search demand is not constant–it has ups and downs. I first thought of this watching Amazon’s algorithms for making recommendations for what I should read next on my Kindle. They’re better than nothing, but they’re actually not all that good. I’m sure they’re missing out on the opportunity to sell me a lot more books based on how often I go root out books searching by hand, and how I usually go through a lot of their recommendations before I find something I really like. I don’t know whether Netflix does a markedly better job, but I have been impressed that they run contests to see if anyone outside Netflix can come up with better results and then they try to add what they learn back into their engine.
Search engines should be asking what they could do with the extra cycles to improve search results. There won’t be enough elasticity to improve all results. Some of the problem with search engines is likely not that they don’t know better algorithms, but that they’re too expensive to implement at scale. Perhaps the secret is knowing how and when to implement them to the extent they can in order to bring up the poor user experiences to better standards (or to optimize user experiences that are particularly valuable by some metric). For an Amazon-style E-tailer, should they apply the extra cycles finding things for shoppers who are known to spend more? Google could do the same, but that would be somewhat Evil in that organizing search results to increase ad click-through seems fraught with peril. OTOH, what if Google invested the elastic spare time in more expensive algorithms focused on high volume searches that are known to produce lower quality results?
In terms of measuring result quality, they have a variety of proxies for that too. They’re known to use live human reviewers sort of like Secret Shoppers (only I guess they’re Secret Searchers). With all the toolbars and other gizmos out there measuring our every move, they can determine how long people spend on a search result’s page before popping back into to search to look at the next one. Let’s not forget Personalization either. Surely all of those signals can be put together to identify trouble spots where more powerful algorithms might be put to good use.
There’s got to be a better way than going to war against SEO in general.