| The Google Sandbox Effect has been discussed | | | | |
| at length in our | | | | Crawling schedules seem to have been |
| | | | established for this site |
| case study of a new website first crawled in | | | | |
| May by Googlebot. | | | | by Google and indexing changes occur on a |
| | | | very regular |
| We can now further the case study with | | | | |
| indexing comparisons | | | | schedule.The first observation of Sandbox |
| | | | release was at noon on |
| and discuss interesting Googlebot crawler | | | | |
| behavior after | | | | Thursday July 28, seventy-five days from |
| | | | first crawling by |
| release, at the 75 day mark, of the study | | | | |
| website from that | | | | Googlebot when a search turned up 379 pages |
| | | | indexed with a |
| very confining Sandbox.This case study is | | | | |
| not for the faint of heart - those just | | | | "site:Publish101.com" query. That number |
| | | | increased later the |
| launching a new web business on a new domain | | | | |
| name with hopes | | | | same evening to 3,660 pages at a search done |
| | | | around the dinner |
| of instant indexing and immediate traffic | | | | |
| may find their | | | | hour Pacific time. Oddly, the next day, |
| | | | Friday July 29, the |
| website very lonely for two and a half | | | | |
| months - if it is in a | | | | number took a slight hop upward to 3,700 |
| | | | pages and on the |
| competitive market segment. You may as well | | | | |
| plan to stay in | | | | following Monday, showed 3,770 pages |
| | | | indexed.That schedule and pattern have |
| the Google Sandbox for at least 45 days on | | | | repeated on the second week of |
| average. If some | | | | |
| | | | Sandbox release when a "site:Publish101.com" |
| early release stories are to be believed, | | | | query produced |
| search phrases | | | | |
| | | | 5,660 results from from Google for the site |
| nobody wants to play with are taken pity on | | | | on Thursday August |
| by Google and sent | | | | |
| | | | 4 at just after noon and then nearly doubled |
| home for early release.Those non-competitive | | | | at around the |
| or obscure search phrases seem to be | | | | |
| | | | dinner hour to 10,700 pages on that same |
| seen as good, quiet little children, playing | | | | query. A final check |
| by themselves in | | | | |
| | | | just now on Saturday shows it at 12,100 |
| Sandbox playground and are sent home early | | | | pages indexed by |
| on good behavior. | | | | |
| | | | Google. It should be pointed out to those |
| Googlebot probably sees good behavior as | | | | who wonder about the |
| playing well with | | | | |
| | | | total number of pages that this is a dynamic |
| others, like a good little baby domain and | | | | site with a very |
| NOT being | | | | |
| | | | large archive of articles that increases |
| competitive as some young domains can be. | | | | daily as new |
| Throwing sand in | | | | |
| | | | submissions are contributed by member |
| other childrens' faces and insisting on | | | | authors at the site.Those articles are added |
| having your site | | | | through a content management system |
| | | | |
| indexed, throwing sand out of the Sandbox | | | | on a daily basis by an editor who reviews |
| with your bright | | | | submissions and |
| | | | |
| plastic toy shovel and bucket will not be | | | | processes them for approvals or rejections. |
| allowed.Now that the site discussed in this | | | | Those approved are |
| study is out of the | | | | |
| | | | made live from the home page nightly. We've |
| Sandbox, it still lingers on the playground, | | | | started doing this |
| unable to escape | | | | |
| | | | on the crawler's schedules as we've noted |
| the community park and leave for the | | | | very regular visits |
| business world to play | | | | |
| | | | by Yahoo's Slurp crawler to the site home |
| with the big boys in the outside world. It | | | | page just once daily |
| does indeed take | | | | |
| | | | at around 5pm each evening and Googlebot |
| time to grow up and be the model citizen in | | | | visiting the home |
| this new search | | | | |
| | | | page only once, at near 11pm nightly, so |
| playground. Though on the first full day | | | | we've instituted a |
| after this first week | | | | |
| | | | midnight activation of each day's new |
| of being released from the sandbox, the site | | | | article submissions on |
| has gotten 68 | | | | |
| | | | the home page of the site so that none of |
| visitors referred by searches done at | | | | the new pages are |
| Google, the first | | | | |
| | | | missed by those crawlers. MSNbot seems to |
| referred search traffic coming into the | | | | hit the home page |
| site. MSN has sent 8 | | | | |
| | | | multiple times through the day, so timing is |
| visitors, Yahoo has sent 6, 4 came from AOL | | | | less important |
| searches, 2 from | | | | |
| | | | for MSN.Crawler activity has been heated, |
| Netscape and 1 from Dogpile.The indexing | | | | with Yahoo crawling the |
| behavior of Yahoo and MSN has been nothing | | | | |
| short | | | | least and the slowest, barely seeming to |
| | | | attempt any updates |
| of bizarre with numbers of indexed pages | | | | |
| increasing rapidly | | | | and the total of indexed pages has not |
| | | | changed for over three |
| over the first two months to reflect 6,941 | | | | |
| pages indexed until | | | | weeks since it peaked at 8,210 pages indexed |
| | | | and then dropped |
| 8 weeks into this study and we outlined | | | | |
| previously how numbers | | | | to it's current level of 3,510. As |
| | | | previously stated, Slurp |
| changed as you click through results pages | | | | |
| first upward, then | | | | seems to be unhindered by any form of |
| | | | consistency in indexing |
| downward to about half the total of highest | | | | |
| numbers listed | | | | or crawling behavior. MSNbot has crawled |
| | | | extensively and |
| along the top of the results pages.It | | | | |
| appears that Yahoo and MSN are playing on the | | | | fairly regularly for weeks, but that odd |
| 'slippery | | | | indexing behavior is |
| | | | |
| slide' in this playground, climbing to the | | | | a serious flaw in their utility as a search |
| top of the ladder | | | | tool.It should be mentioned here that |
| | | | AskJeeves had been noted to |
| of results at about 10 week mark showing | | | | |
| 8,210 and 6,941 pages | | | | crawl the site extensively early in this |
| | | | case study and |
| respectively indexed, then sliding down | | | | |
| again to 3,510 for | | | | displayed a very regular and consistent |
| | | | crawl, but stopped |
| Yahoo and 373 for MSN, as of this writing | | | | |
| two weeks later on | | | | abruptly three weeks ago on july 13, after |
| | | | hitting most of the |
| August 6. Still, Yahoo will show you only | | | | |
| 1,000 (100 pages) of | | | | pages then available on the site. Teoma, |
| | | | their spider, has |
| those results and MSN will show you only 250 | | | | |
| results, or 25 | | | | been absent ever since and they have not |
| | | | indexed this domain |
| pages, no matter how many they claim to | | | | |
| index. MSNbot is | | | | at all since first crawling on May 23, over |
| | | | 10 weeks ago. |
| crawling the site faster and more | | | | |
| consistently than any of the | | | | Clearly, Teoma appears to have the longest |
| | | | Sandbox of all the |
| engines, yet shows by far fewer pages | | | | |
| indexed than the others.One of the | | | | search engines.Much has been learned in this |
| interesting comparisons between Google and | | | | Sandbox case study about crawler |
| MSN in | | | | |
| | | | behavior, indexing delays, robots.txt |
| our Sandbox study is that Google will show | | | | requirements and index |
| you most of what | | | | |
| | | | updates at each of the top three search |
| they claim to have indexed after you click | | | | engines. Where that |
| that link at the | | | | |
| | | | knowledge leads will, of course, change as |
| bottom of the first page showing only 3 or 4 | | | | algorithms and |
| results when you | | | | |
| | | | crawling schedules are adjusted by MSN, |
| use the "site:Publish101.com" query operator | | | | Yahoo and Google. But |
| then go to the | | | | |
| | | | valuable information has been shared that |
| bottom of the page and click the link under | | | | may help other |
| the line reading, | | | | |
| | | | webmasters to better understand each of the |
| "In order to show you the most relevant | | | | factors that |
| results, we have | | | | |
| | | | determine the success of any |
| omitted some entries very similar to the 3 | | | | website."Further findings in follow-up |
| already displayed. | | | | articles at the 3, 6 and 9 |
| | | | |
| If you like, you can repeat the search with | | | | month marks, explore search referrals gained |
| the omitted | | | | as Google adds |
| | | | |
| results included."Go ahead and click that | | | | more pages and rankings fluctuations begin |
| link, then you'll be presented with | | | | to level. |
| | | | |
| the claimed total of indexed pages. That | | | | Meanwhile, we'd like to encourage others to |
| number has very | | | | publicly review |
| | | | |
| steadily increased since Sandbox release | | | | their crawler traffic through logs to |
| after 75 days from | | | | compare behavior on new |
| | | | |
| first crawling of this Sandbox study site. | | | | domains to verify findings and disclose |
| The timing and | | | | indexing behavior and |
| | | | |
| numbers of indexed pages at Google goes | | | | timing for new domains and further document |
| upward, and ONLY | | | | SE indexing as |
| | | | |
| upward with VERY distinct patterns noted | | | | well as crawling behavior. |
| from raw log files. | | | | |