|
||||||||
|
Google has spidered my website. Now if you list what Google has spidered... site:in-my-opinion.org ♣ ... then Google lists all my pages sorted by relevance/importance = most important pages first. Since our search term "site:in-my-opinion.org" doesn't include any keywords we search for it's highly interesting that Google considers IMO → Michael Jackson paid million ("Michael Jackson paid $23 million") as the most important page at my site. Here are the result:
Anyone any clue why? Is it actuality? posted by knn |
||||||||
|
|
||||||||
| in-my-opinion.orgTechnology, Computers, Science, InternetComputers and InternetHow does Google work? |
|
|||
|
Based on it's own records of search requests? This is just a guess, but what about word scores I.e. historically more people have searched for "Michael Jackson" than for "Jesus" and so a spidered page containing "Michael Jackson" would have a higher value than one containing "Jesus" posted by Marl64 |
|||
|
|||
|
|||
|
The Google explanation includes the term "propriatory technologies" and mention of a particular article for further reading Quote: 2.1.1 Description of PageRank Calculation
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows: We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one. PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper. Quote: 4.5.1 The Ranking System
Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document. For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system. [CLICK HERE TO VIEW THIS PICTURE] posted by Marl64 |
|||
|
|||
|
|||
|
Yes I know this system, but although pagerank is a part of Google "importance measurement" in this case some of Google's (200!) other algorithms must have been triggered, since I am sure, I don't link to Michael Jackson more often than to other pages. posted by knn |
|||
|
|||
|
|||
|
knn: ...since I am sure, I don't link to Michael Jackson more often than to other pages. No, but perhaps other people (external to this site) do. Or it could be something stupid like, something elsewhere on the page, not necesarily related to MJ that triggers some importance factor. Perhaps there's a secret magic word which you can use to get the ranking up and you just happen to have it somewhere on that page posted by Marl64 |
|||
|
|||
|
|||
|
Agent Zero's personal gallery is astonishingly on the first result page too posted by knn |
|||
|
|||
|
|||
|
knn: Agent Zero's personal gallery is astonishingly on the first result page too It's gotta be links then and he's been out spamming all the other forums with links to his pictures here. posted by Marl64 |
|||
|
|||
|
|||
|
Here is a "revelation picture". In other words: If you design webpages you will find this pic very helpful. [CLICK HERE TO VIEW THIS PICTURE] posted by knn |
|||
|
|||
|
The time now is 1 December 2008, 19:24 php B.B. |