Panda

Google’s algorithm on the characteristics of unnatural pages is periodically updated by a machine learning background job. This means it is not a live algorithm! The much reported Panda versions 1.0 to 2.5 are algorithm changes which are first calculated on a training dataset and combined with the existing learnings they are exported to the live Google environment as more static algorithm tests.

This means that while bounce-rate (in this case: visitors returning to search results quickly) isn’t used as a direct ranking factor, it is used to teach the Panda new tricks. Signals like bounce rate are fed as bamboo to the Panda background system with the instruction to find out what patterns can be derived from characteristics that form thin content, unnatural text and excessive on-page advertising. The system picks various combinations of attributes combined to get a high degree of certainty for someone’s spammy activities.

For those familiar with “distributed tree learning”, look up the works of Google engineer Biswanath Panda. After whom the Panda update was named. He will explain how continuously splitting sites into groups with similar attribute values helps you afterwards derive which attributes effected a certain outcome (like high bounce-rate) the most. It also gives some indication of the thresholds to be used and it can signal when false positives or negatives are likely to occur.

This entry was posted on Sunday, January 29th, 2012 at 9:54 am and is filed under Internet. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

CLS Solutions Technology Blog

Panda

Recent Posts

Archives

Categories