Home

Archive for January, 2008

My Problem with the Google Algorithm

MeI generally love Google and I think they have created some of the best services that the web has to offer. I am an avid user of Gmail and obviously use their service engine about 100 times a day to research and expand my knowledge. However, I have a one big issue with Google and with their world-famous “PageRank” algorithm. To put it simply, it has a flaw. Before you call me an idiot or laugh at me – “How can you call the worlds largest & arguably most successful internet search engine flawed?” – well, let me explain.

The problem I have with the Google PageRank algorithm is the manner in which the content is ranked on the engine itself. Let’s say you are a massive website and have a large pagerank – predominately derived from the number of incoming and outgoing links you have on your site – and you write a story on a topic. Your post is immediately ranked at the top of search results – even if the post is inaccurate or written without full knowledge simply because of the existing pagerank your site has derived. This means that when a user is searching for content relating to a particular topic, or searching for information about a particular company – your post ranks at the top of the Google results based on your sites existing pagerank – regardless of whether it is accurate or not. Is this fair ?

Of course it’s not fair, and it provides sites that have a lot of traffic with an enormous amount of power. It means that any site with a high pagerank can effectively write an unbiased and overtly critical review of a topic, company or situation and it will receive preferential treatment within the Google search engine based solely on the bias of the existing pagerank of the site. The consequential problem then lies in the fact that Google is then not “organising the worlds information” correctly because there is no way that humans can “demote” the post in Googles search results or let them know it’s wrong, inaccurate or poorly researched. In my opinion, I believe this is a critical flaw of Google search ideology and one that needs fixing or solving. The question is – how can it be done ?

Well obviously, the most effective way is by utilizing the power of human search to some degree. Companies such as Mahalo allow users to influence the results of their search to an “n th” degree, and therefore efficiently judge the relevancy of content which is delivered within a search. Sure, there are going to be people that disagree with the contents relevancy etc – but the majority of people will always rule. (Before you start on about “automated influences” - it is becoming increasingly more difficult to register, login, vote repeatedly and then logout without being detected as a “robotic” influence on these results.) So why is this human intervention or “voting” good ? Well it would assist in solving the biggest problem (in my opinion) that Google faces in respect of content which is heavily influenced by PageRank. Just because a website has a high pagerank – its posts should not always be assigned immediately with a high page rank solely based on the overall pagerank of the site itself. The post must “earn” (to some degree) the equivalent pagerank via other websites commenting on the post, and determining whether the post is subjectively fair and/or correct.

It is clear that there is no way Google can efficiently determine the relevancy of content or whether a post is subjectively fair and/or correct – and yes, I mean the content specifically. Sure, Google can determine the relevancy of a website and the popularity of that site perfectly by looking at the number of inbound and outbound links – but it is limited in determining whether a specific article is accurately and/or fairly written. It assigns the initial pagerank of the site itself, and then effectively keeps the post at this pagerank – regardless of other websites linking and writing about the validity of the post. Put simply, Google have yet to develop a method – without human intervention – to specifically determine whether an article is “accurate or fairly written”.

The only way to interpret this is to automatically string together the words that appear on not only the initial article/topic page, but on all other websites which link and have written about the original post. If Google could dynamically determine the “mood” of a post such that they are able to identify words like “agree, good write up etc” or “disagree, got it wrong, inaccurate” on all sites linking to and from the original post, and weigh these words in assigning the overall ranking to the page – then they would be able to more effectively determine the relevancy of not only the website and the post – but also of the “content within the post”.

I think if Google can solve this problem mathematically (or not), then it will be a key step in ensuring they remain the dominant search engine. Of course humans can judge this quickly and easily and Google have tested incorporating human search in their Labs page – although they have since removed this experiment. Either way – I really believe it’s something that needs to be considered and resolved.

Leave a note in the comments with your thoughts. Do you agree with me or ?

3 comments