| The Google Sandbox Effect has been discussed at | | | | Crawling schedules seem to have been established |
| length in our | | | | for this site |
| case study of a new website first crawled in May | | | | by Google and indexing changes occur on a very |
| by Googlebot. | | | | regular |
| We can now further the case study with indexing | | | | schedule.The first observation of Sandbox release |
| comparisons | | | | was at noon on |
| and discuss interesting Googlebot crawler behavior | | | | Thursday July 28, seventy-five days from first |
| after | | | | crawling by |
| release, at the 75 day mark, of the study website | | | | Googlebot when a search turned up 379 pages |
| from that | | | | indexed with a |
| very confining Sandbox.This case study is not for | | | | "site:Publish101.com" query. That number increased |
| the faint of heart - those just | | | | later the |
| launching a new web business on a new domain | | | | same evening to 3,660 pages at a search done |
| name with hopes | | | | around the dinner |
| of instant indexing and immediate traffic may find | | | | hour Pacific time. Oddly, the next day, Friday July 29, |
| their | | | | the |
| website very lonely for two and a half months - if it | | | | number took a slight hop upward to 3,700 pages |
| is in a | | | | and on the |
| competitive market segment. You may as well plan | | | | following Monday, showed 3,770 pages indexed.That |
| to stay in | | | | schedule and pattern have repeated on the second |
| the Google Sandbox for at least 45 days on | | | | week of |
| average. If some | | | | Sandbox release when a "site:Publish101.com" query |
| early release stories are to be believed, search | | | | produced |
| phrases | | | | 5,660 results from from Google for the site on |
| nobody wants to play with are taken pity on by | | | | Thursday August |
| Google and sent | | | | 4 at just after noon and then nearly doubled at |
| home for early release.Those non-competitive or | | | | around the |
| obscure search phrases seem to be | | | | dinner hour to 10,700 pages on that same query. A |
| seen as good, quiet little children, playing by | | | | final check |
| themselves in | | | | just now on Saturday shows it at 12,100 pages |
| Sandbox playground and are sent home early on | | | | indexed by |
| good behavior. | | | | Google. It should be pointed out to those who |
| Googlebot probably sees good behavior as playing | | | | wonder about the |
| well with | | | | total number of pages that this is a dynamic site |
| others, like a good little baby domain and NOT being | | | | with a very |
| competitive as some young domains can be. | | | | large archive of articles that increases daily as new |
| Throwing sand in | | | | submissions are contributed by member authors at |
| other childrens' faces and insisting on having your | | | | the site.Those articles are added through a content |
| site | | | | management system |
| indexed, throwing sand out of the Sandbox with | | | | on a daily basis by an editor who reviews |
| your bright | | | | submissions and |
| plastic toy shovel and bucket will not be allowed.Now | | | | processes them for approvals or rejections. Those |
| that the site discussed in this study is out of the | | | | approved are |
| Sandbox, it still lingers on the playground, unable to | | | | made live from the home page nightly. We've |
| escape | | | | started doing this |
| the community park and leave for the business | | | | on the crawler's schedules as we've noted very |
| world to play | | | | regular visits |
| with the big boys in the outside world. It does | | | | by Yahoo's Slurp crawler to the site home page just |
| indeed take | | | | once daily |
| time to grow up and be the model citizen in this | | | | at around 5pm each evening and Googlebot visiting |
| new search | | | | the home |
| playground. Though on the first full day after this | | | | page only once, at near 11pm nightly, so we've |
| first week | | | | instituted a |
| of being released from the sandbox, the site has | | | | midnight activation of each day's new article |
| gotten 68 | | | | submissions on |
| visitors referred by searches done at Google, the | | | | the home page of the site so that none of the new |
| first | | | | pages are |
| referred search traffic coming into the site. MSN has | | | | missed by those crawlers. MSNbot seems to hit the |
| sent 8 | | | | home page |
| visitors, Yahoo has sent 6, 4 came from AOL | | | | multiple times through the day, so timing is less |
| searches, 2 from | | | | important |
| Netscape and 1 from Dogpile.The indexing behavior | | | | for MSN.Crawler activity has been heated, with |
| of Yahoo and MSN has been nothing short | | | | Yahoo crawling the |
| of bizarre with numbers of indexed pages increasing | | | | least and the slowest, barely seeming to attempt |
| rapidly | | | | any updates |
| over the first two months to reflect 6,941 pages | | | | and the total of indexed pages has not changed for |
| indexed until | | | | over three |
| 8 weeks into this study and we outlined previously | | | | weeks since it peaked at 8,210 pages indexed and |
| how numbers | | | | then dropped |
| changed as you click through results pages first | | | | to it's current level of 3,510. As previously stated, |
| upward, then | | | | Slurp |
| downward to about half the total of highest | | | | seems to be unhindered by any form of consistency |
| numbers listed | | | | in indexing |
| along the top of the results pages.It appears that | | | | or crawling behavior. MSNbot has crawled |
| Yahoo and MSN are playing on the 'slippery | | | | extensively and |
| slide' in this playground, climbing to the top of the | | | | fairly regularly for weeks, but that odd indexing |
| ladder | | | | behavior is |
| of results at about 10 week mark showing 8,210 | | | | a serious flaw in their utility as a search tool.It should |
| and 6,941 pages | | | | be mentioned here that AskJeeves had been noted |
| respectively indexed, then sliding down again to | | | | to |
| 3,510 for | | | | crawl the site extensively early in this case study |
| Yahoo and 373 for MSN, as of this writing two | | | | and |
| weeks later on | | | | displayed a very regular and consistent crawl, but |
| August 6. Still, Yahoo will show you only 1,000 (100 | | | | stopped |
| pages) of | | | | abruptly three weeks ago on july 13, after hitting |
| those results and MSN will show you only 250 | | | | most of the |
| results, or 25 | | | | pages then available on the site. Teoma, their spider, |
| pages, no matter how many they claim to index. | | | | has |
| MSNbot is | | | | been absent ever since and they have not indexed |
| crawling the site faster and more consistently than | | | | this domain |
| any of the | | | | at all since first crawling on May 23, over 10 weeks |
| engines, yet shows by far fewer pages indexed | | | | ago. |
| than the others.One of the interesting comparisons | | | | Clearly, Teoma appears to have the longest |
| between Google and MSN in | | | | Sandbox of all the |
| our Sandbox study is that Google will show you | | | | search engines.Much has been learned in this |
| most of what | | | | Sandbox case study about crawler |
| they claim to have indexed after you click that link | | | | behavior, indexing delays, robots.txt requirements |
| at the | | | | and index |
| bottom of the first page showing only 3 or 4 results | | | | updates at each of the top three search engines. |
| when you | | | | Where that |
| use the "site:Publish101.com" query operator then go | | | | knowledge leads will, of course, change as algorithms |
| to the | | | | and |
| bottom of the page and click the link under the line | | | | crawling schedules are adjusted by MSN, Yahoo and |
| reading, | | | | Google. But |
| "In order to show you the most relevant results, we | | | | valuable information has been shared that may help |
| have | | | | other |
| omitted some entries very similar to the 3 already | | | | webmasters to better understand each of the |
| displayed. | | | | factors that |
| If you like, you can repeat the search with the | | | | determine the success of any website."Further |
| omitted | | | | findings in follow-up articles at the 3, 6 and 9 |
| results included."Go ahead and click that link, then | | | | month marks, explore search referrals gained as |
| you'll be presented with | | | | Google adds |
| the claimed total of indexed pages. That number has | | | | more pages and rankings fluctuations begin to level. |
| very | | | | Meanwhile, we'd like to encourage others to publicly |
| steadily increased since Sandbox release after 75 | | | | review |
| days from | | | | their crawler traffic through logs to compare |
| first crawling of this Sandbox study site. The timing | | | | behavior on new |
| and | | | | domains to verify findings and disclose indexing |
| numbers of indexed pages at Google goes upward, | | | | behavior and |
| and ONLY | | | | timing for new domains and further document SE |
| upward with VERY distinct patterns noted from raw | | | | indexing as |
| log files. | | | | well as crawling behavior. |