Release from Google Sandbox Only to Search the Playground

The Google Sandbox Effect has been discussed atCrawling schedules seem to have been established
length in ourfor this site
case study of a new website first crawled in Mayby Google and indexing changes occur on a very
by Googlebot.regular
We can now further the case study with indexingschedule.The first observation of Sandbox release
comparisonswas at noon on
and discuss interesting Googlebot crawler behaviorThursday July 28, seventy-five days from first
aftercrawling by
release, at the 75 day mark, of the study websiteGooglebot when a search turned up 379 pages
from thatindexed with a
very confining Sandbox.This case study is not for"site:Publish101.com" query. That number increased
the faint of heart - those justlater the
launching a new web business on a new domainsame evening to 3,660 pages at a search done
name with hopesaround the dinner
of instant indexing and immediate traffic may findhour Pacific time. Oddly, the next day, Friday July 29,
theirthe
website very lonely for two and a half months - if itnumber took a slight hop upward to 3,700 pages
is in aand on the
competitive market segment. You may as well planfollowing Monday, showed 3,770 pages indexed.That
to stay inschedule and pattern have repeated on the second
the Google Sandbox for at least 45 days onweek of
average. If someSandbox release when a "site:Publish101.com" query
early release stories are to be believed, searchproduced
phrases5,660 results from from Google for the site on
nobody wants to play with are taken pity on byThursday August
Google and sent4 at just after noon and then nearly doubled at
home for early release.Those non-competitive oraround the
obscure search phrases seem to bedinner hour to 10,700 pages on that same query. A
seen as good, quiet little children, playing byfinal check
themselves injust now on Saturday shows it at 12,100 pages
Sandbox playground and are sent home early onindexed by
good behavior.Google. It should be pointed out to those who
Googlebot probably sees good behavior as playingwonder about the
well withtotal number of pages that this is a dynamic site
others, like a good little baby domain and NOT beingwith a very
competitive as some young domains can be.large archive of articles that increases daily as new
Throwing sand insubmissions are contributed by member authors at
other childrens' faces and insisting on having yourthe site.Those articles are added through a content
sitemanagement system
indexed, throwing sand out of the Sandbox withon a daily basis by an editor who reviews
your brightsubmissions and
plastic toy shovel and bucket will not be allowed.Nowprocesses them for approvals or rejections. Those
that the site discussed in this study is out of theapproved are
Sandbox, it still lingers on the playground, unable tomade live from the home page nightly. We've
escapestarted doing this
the community park and leave for the businesson the crawler's schedules as we've noted very
world to playregular visits
with the big boys in the outside world. It doesby Yahoo's Slurp crawler to the site home page just
indeed takeonce daily
time to grow up and be the model citizen in thisat around 5pm each evening and Googlebot visiting
new searchthe home
playground. Though on the first full day after thispage only once, at near 11pm nightly, so we've
first weekinstituted a
of being released from the sandbox, the site hasmidnight activation of each day's new article
gotten 68submissions on
visitors referred by searches done at Google, thethe home page of the site so that none of the new
firstpages are
referred search traffic coming into the site. MSN hasmissed by those crawlers. MSNbot seems to hit the
sent 8home page
visitors, Yahoo has sent 6, 4 came from AOLmultiple times through the day, so timing is less
searches, 2 fromimportant
Netscape and 1 from Dogpile.The indexing behaviorfor MSN.Crawler activity has been heated, with
of Yahoo and MSN has been nothing shortYahoo crawling the
of bizarre with numbers of indexed pages increasingleast and the slowest, barely seeming to attempt
rapidlyany updates
over the first two months to reflect 6,941 pagesand the total of indexed pages has not changed for
indexed untilover three
8 weeks into this study and we outlined previouslyweeks since it peaked at 8,210 pages indexed and
how numbersthen dropped
changed as you click through results pages firstto it's current level of 3,510. As previously stated,
upward, thenSlurp
downward to about half the total of highestseems to be unhindered by any form of consistency
numbers listedin indexing
along the top of the results pages.It appears thator crawling behavior. MSNbot has crawled
Yahoo and MSN are playing on the 'slipperyextensively and
slide' in this playground, climbing to the top of thefairly regularly for weeks, but that odd indexing
ladderbehavior is
of results at about 10 week mark showing 8,210a serious flaw in their utility as a search tool.It should
and 6,941 pagesbe mentioned here that AskJeeves had been noted
respectively indexed, then sliding down again toto
3,510 forcrawl the site extensively early in this case study
Yahoo and 373 for MSN, as of this writing twoand
weeks later ondisplayed a very regular and consistent crawl, but
August 6. Still, Yahoo will show you only 1,000 (100stopped
pages) ofabruptly three weeks ago on july 13, after hitting
those results and MSN will show you only 250most of the
results, or 25pages then available on the site. Teoma, their spider,
pages, no matter how many they claim to index.has
MSNbot isbeen absent ever since and they have not indexed
crawling the site faster and more consistently thanthis domain
any of theat all since first crawling on May 23, over 10 weeks
engines, yet shows by far fewer pages indexedago.
than the others.One of the interesting comparisonsClearly, Teoma appears to have the longest
between Google and MSN inSandbox of all the
our Sandbox study is that Google will show yousearch engines.Much has been learned in this
most of whatSandbox case study about crawler
they claim to have indexed after you click that linkbehavior, indexing delays, robots.txt requirements
at theand index
bottom of the first page showing only 3 or 4 resultsupdates at each of the top three search engines.
when youWhere that
use the "site:Publish101.com" query operator then goknowledge leads will, of course, change as algorithms
to theand
bottom of the page and click the link under the linecrawling schedules are adjusted by MSN, Yahoo and
reading,Google. But
"In order to show you the most relevant results, wevaluable information has been shared that may help
haveother
omitted some entries very similar to the 3 alreadywebmasters to better understand each of the
displayed.factors that
If you like, you can repeat the search with thedetermine the success of any website."Further
omittedfindings in follow-up articles at the 3, 6 and 9
results included."Go ahead and click that link, thenmonth marks, explore search referrals gained as
you'll be presented withGoogle adds
the claimed total of indexed pages. That number hasmore pages and rankings fluctuations begin to level.
veryMeanwhile, we'd like to encourage others to publicly
steadily increased since Sandbox release after 75review
days fromtheir crawler traffic through logs to compare
first crawling of this Sandbox study site. The timingbehavior on new
anddomains to verify findings and disclose indexing
numbers of indexed pages at Google goes upward,behavior and
and ONLYtiming for new domains and further document SE
upward with VERY distinct patterns noted from rawindexing as
log files.well as crawling behavior.