The Implementation of PageRank in the Google Search
Engine:
Regarding the implementation of PageRank, first
of all, it is important how PageRank is integrated into the general
ranking of web pages by the Google search engine. The proceedings
have been described by Lawrencec Page and Sergey Brin in several
publications. Initially, the ranking of web pages by the Google
search engine was determined by three factors:
- Page specific factors
- Anchor text of inbound links
- PageRank
Page specific factors are, besides the body text,
for instance the content of the title tag or the URL of the document.
It is more than likely that since the publications of Page and Brin
more factors have joined the ranking methods of the Google search
engine. But this shall not be of interest here.
In order to provide search results, Google computes
an IR score out of page specific factors and the anchor text of
inbound links of a page, which is weighted by position and accentuation
of the search term within the document. This way the relevance of
a document for a query is determined. The IR-score is then combined
with PageRank as an indicator for the general importance of the
page. To combine the IR score with PageRank the two values are multiplicated.
It is obvious that they cannot be added, since otherwise pages with
a very high PageRank would rank high in search results even if the
page is not related to the search query.
Especially for queries consisting of two or more
search terms, there is a far bigger influence of the content related
ranking criteria, whereas the impact of PageRank is mainly visible
for unspecific single word queries. If webmasters target search
phrases of two or more words it is possible for them to achieve
better rankings than pages with high PageRank by means of classical
search engine optimisation.
Especially for queries consisting of two or more
search terms, there is a far bigger influence of the content related
ranking criteria, whereas the impact of PageRank is mainly visible
for unspecific single word queries. If webmasters target search
phrases of two or more words it is possible for them to achieve
better rankings than pages with high PageRank by means of classical
search engine optimisation.
The PageRank Display of the Google
Toolbar:
PageRank became widely known by the PageRank display
of the Google Toolbar. The Google Toolbar is a browser plug-in for
Microsoft Internet Explorer which can be downloaded from the Google
web site. The Google Toolbar provides some features for searching
Google more comfortably.

The Google Toolbar displays PageRank on a scale from 0 to 10. First
of all, the PageRank of an actually visited page can be estimated
by the width of the green bar within the display. If the user holds
his mouse over the display, the Toolbar also shows the PageRank
value.
Caution: The PageRank display is one of the advanced features of
the Google Toolbar. And if those advanced features are enabled,
Google collects usage data. Additionally, the Toolbar is self-updating
and the user is not informed about updates. So, Google has access
to the user's hard drive.
If we take into account that PageRank can theoretically
have a maximum value of up to dN+(1-d), where N is the total number
of web pages and d is usually set to 0.85, PageRank has to be scaled
for the display on the Google Toolbar. It is generally assumed that
the scalation is not linearly but logarithmically. At a damping
factor of 0.85 and, therefore, a minimum PageRank of 0.15 and at
an assumed logaritmical basis of 6 we get a scalation as follows:
| Toolbar-PageRank |
Real PageRank |
|
0/10
|
0.15
|
-
|
0.9 |
|
1/10
|
0.9
|
-
|
5.4 |
|
2/10
|
5.4
|
-
|
32.4 |
|
3/10
|
32.4
|
-
|
194.4 |
|
4/10
|
194.4
|
-
|
1,166.4 |
|
5/10
|
1,166.4
|
-
|
6,998.4 |
|
6/10
|
6,998.4
|
-
|
41,990.4 |
|
7/10
|
41,990.4
|
-
|
251,942.4 |
|
8/10
|
251,942.4
|
-
|
1,511,654.4 |
|
9/10
|
1,511,654.4
|
-
|
9,069,926.4 |
|
10/10
|
9,069,926.4
|
-
|
0.85 × N + 0.15 |
It is uncertain if in fact a logarithmical scalation
in a strictly mathematical sense takes place. There is likely a
manual scalation which follows a logarithmical scheme, so that Google
has control over the number of pages within the single Toolbar PageRank
ranges. The logarithmical basis for this scheme should be between
6 and 7, which can for instance be rudimentary deduced from the
number of inbound links of pages with a high Toolbar PageRank from
pages with a Toolbar PageRank higher than 4, which are shown by
Googe using the link command.
The Toolbar's PageRank Files:
Even webmasters who do not want to use the Google
Toolbar or the Internet Explorer permanently for security and privacy
concerns have the possibility to check the PageRank values of their
pages. Google submits PageRank values in simple text files to the
Toolbar. In former times, this happened via XML. The switch to text
files occured in August 2002.
The PageRank files can be requested directly from
the domain www.google.com. Basically, the URLs for those files look
like follows (without line breaks):
http://www.google.com/search? client=navclient-auto&
ch=0123456789& features=Rank& q=info:http://www.domain.com/
There is only one line of text in the PageRank
files. The last cipher in this line is PageRank.
The parameters incorporated in the above shown
URL are inevitable for the display of the PageRank files in a browser.
The value "navclient-auto" for the parameter "client"
identifies the Toolbar. Via the parameter "q" the URL
is submitted. The value "Rank" for the parameter "features"
determines that the PageRank files are requested. If it is omitted,
Google's servers still transmit XML files. The parameter "ch"
transfers a checksum for the URL to Google, whereby this checksum
can only change when the Toolbar version is updated by Google.
Thus, it is necessary to install the Toolbar at
least once to find out about the checksum of one's URLs. To track
the communication between the Toolbar and Google, often the use
of packet sniffers, local proxies an similar tools is suggested.
But this is not necessarily needed, since the PageRank files are
cached by the Internet Explorer. So, the checksums can simply been
found out by having a look at the folder Temporary Internet Files.
Knowing the checksums of your URLs, you can view the PageRank files
in your browser and you do not have to accept Google's 36 years
lasting cookies.
Since the PageRank files are kept in the browser
cache and, thus, are clearly visible, and as long as requests are
not automated, watching the PageRank files in a browser should not
be a violation of Google's Terms of Service. However, you should
be cautious. The Toolbar submits its own User-Agent to Google. It
is:
Mozilla/4.0 (compatible; GoogleToolbar
1.1.60-deleon; OS SE 4.10)
1.1.60-deleon is a Toolbar version which may of
course change. OS is the operating system that you have installed.
So, Google is able to identify requests by browsers, if they do
not go out via a proxy and if the User-Agent is not modified accordingly.
Taking a look at IE's cache, one will normally
notice that the PageRank files are not requested from the domain
www.google.com but from IP addresses like 216.239.33.102. Additionally,
the PageRank files' URLs often contain a parameter "failedip"
that is set to values like "216.239.35.102;1111" (Its
function is not absolutely clear). The IP addresses are each related
to one of Google's seven data centers and the reason for the Toolbar
querying IP-addresses is most likely to control the PageRank display
in a better way, especially in times of the "Google Dance".
The PageRank Display at the Google
Directory:
Webmasters who do not want to check the PageRank
files that are used by the toolbar have another possibility to receive
information about the PageRank of their sites by means of the Google
Directory (directory.google.com).
The Google Directory is a dump of the Open Directory Project (dmoz.org),
which shows the PageRank for listed documents similarly to the Google
Toolbar display scaled and by means of a green bar. In contrast
to the Toolbar, the scale is from 1 to 7. The exact value is not
displayed, but it can be determined by the divided bar respectively
the width of the single graphics in the source code of the page
if one is not sure by looking at the bar.
By comparing the Toolbar PageRank of a document
with its Directory PageRank, a more exact estimation of a pages
PageRank can be deduced, if the page is listed with the ODP. This
connection was mentioned first by Chris
Raimondi.

Especially for pages with a Toolbar PageRank of
5 or 6, one can appraise if the page is on the upper or the lower
end of its Toolbar scale. It shall be noted that for the comparison
the Toolbar PageRank of 0 was not taken into account. It can easily
be verified that this is appropriate by looking at pages with a
Toolbar PageRank of 3. However, it has to be considered that for
a verification pages of the Google Directory respectively the ODP
with a Toolbar PageRank of 4 or lower have to be chosen, since otherwise
no pages linked from there with a Toolbar PageRank of 3 will be
found.
|