Google’s algorithm on the characteristics of unnatural pages is periodically updated by a machine learning background job. This means it is not a live algorithm! The much reported Panda versions 1.0 to 2.5 are algorithm changes which are first calculated on a training dataset and combined with the existing learnings they are exported to the live Google environment as more static algorithm tests.

This means that while bounce-rate (in this case: visitors returning to search results quickly) isn’t used as a direct ranking factor, it is used to teach the Panda new tricks. Signals like bounce rate are fed as bamboo to the Panda background system with the instruction to find out what patterns can be derived from characteristics that form thin content, unnatural text and excessive on-page advertising. The system picks various combinations of attributes combined to get a high degree of certainty for someone’s spammy activities.

For those familiar with “distributed tree learning”, look up the works of Google engineer Biswanath Panda. After whom the Panda update was named. He will explain how continuously splitting sites into groups with similar attribute values helps you afterwards derive which attributes effected a certain outcome (like high bounce-rate) the most. It also gives some indication of the thresholds to be used and it can signal when false positives or negatives are likely to occur.

Leave a Comment



Formatting Your Comment

The following XHTML tags are available for use:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

URLs are automatically converted to hyperlinks.