Is Page Rank Calculation still following the “BackRub” Algorithm? An examination of the PR algorithm, with examples from real world sites.
Many a time and oft, we hear the speculation that the Page Rank Calculation has changed. Disappointingly, it is voiced by people who should know better. Perhaps the green monster of jealousy at seeing a spammy (competitor) site or a “google bomb” taking a site to PR-7 gets the better of their judgement. We will proceed to shatter a few urban legends.
1. Home page is necessarily a higher PR than the rest of the site. 2. Older pages get highest PR. 3. Google considers “on page” factors or contents to judge PR. 4. Pages with most external link backs get the highest PR.
To justify our assertions more rigorously, let’s first quickly introduce the Page Rank Calculation.
Page rank measures the importance of a page using the Google page rank formula: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
The page rank shown on a tool bar is actually a scaled number between 0-10. The scaling is logarithmic to accommodate the widely different number of IBL’s, where a linear scale would not convey the information appropriately.
While we are not privy to the log factor “lf” that Google uses for it’s Page Rank tool bar normalization, we take a stab at the number by making a few assumptions. We assume that Google is the highest ranking PR10 page. We use the number of links incoming to Google reported by Google itself. On March 29, 2006 it shows it to be 3,750,000 links. We further assume that the average Page rank of the links pointing to the Google home page is PR 1.
Therefore, to obtain the upper bound of the log factor(“lf”), we just take the appropriate root. (e.g. ((Incoming Links to Google)^(1/(Google PR-Average_Incoming_PR)) ) ).
This boils down to (3.75*10^6)^(1/(9)) or lf=5.38. Naturally, as the size of the web increases, or the number of back links that Google exports to outside world increases, “lf” is likely to increase. Our tool allows you to set a different “lf”.
We use the following formula to scale numbers to the “RPR”(real non normalized PR) and obtain the lower and upper bounds. ((lf)^(n)) =< RPRn < ((lf)^(n+1))
As an example when n=5, 4507 < PR-5 < 24,248 Naturally, some pages are a very strong PR-5(i.e. almost a PR-6), or a very weak PR-6.
Therefore, we allow a user to set the strength of the page. The default is 0.5(i.e. it’s the equivalent of a PR5.5 page).Now, if the above holds, then “1”, “2” or “3” are irrelevant. Our contention is that Google has not changed the Page Rank calculation in any significant way, aside from tweaking the constants.The proof is obviously in the pudding. Let’s take three sample sites and see how the update of April 6, 2006 has affected them.
We turn to our Event Tickets Website and examine two pages on the site:
1. Tickets Website's Home Page , PR-4 2. New Tickets Events Page , PR-6
Page 1 of 2 :: First | Last :: Prev | 1 2 | Next
|