‘NASA Search 1.0′ ??? Something Google should worry about ???
Having written my own Wordpress logging / statistics plug-in over the weekend – which still in prototype, consider it a ‘coming soon’ - I have started to notice more and more peculiar User-Agents visiting my blog.
I quite like to keep an eye on what spiders / bots visit my sites, how often they return and try to infer something about how they were designed by watching them visit.
I was surprised recently to see that the big three ( Yahoo!, MSN & Google ) actually pull RSS feeds as well as HTML pages – of course this makes sense from a efficiency & bandwidth side of things, the RSS feed is the interesting stuff already stripped out.
Today’s one is a real winner though, coming from the following net block and advertising itself as “NASA Search 1.0”.
-
Comcast Cable Communications, Inc. NJ-SOUTH-4 (NET-68-46-128-0-1)
-
68.46.128.0 - 68.46.191.255
The bot / spider crawled my entire site within a few minutes, starting from my ‘changes-in-wordpress-152’ post and was completely oblivious to my robots.txt (it didn’t even request it).
Also, it appeared to be quite a primitive HTTP client, providing no referrer information or any of the usual headers “Connection: close”, “Accept: */*” even though it was sending a “HTTP/1.1” request. Surprisingly though it did persist a session cookie for the duration of its visit.
I Google’d for the phrase “NASA Search 1.0” and only seemed to find results where auto-generated-stats pages list visiting User-Agents.
It would be quite interesting (and maybe even fun – in a very geeky way) to write a Wordpress plug-in that watches for these peculiar bots and pings their details to a centralised stats database – forming a sort of spider-spider.
Anyway, I will be keeping a keen eye out for the return of “NASA Search 1.0” … Could it be the next greatest NASA funded project? Or is it just some smart a** that has figured out how to change the User-Agent string in his favourite spider/bot.
Stay tuned!
2 Comments so far
Leave a comment
Same bot, different Net Block:
2/15/2006 6:43:59 AM Visited the Sign Guestbook page.
Visitor 64.229.148.29 was allowed access.
Visitor’s country of origin is Canada.
(64.229.128.0 – 64.229.159.255)
***** Visitor NOT in United States *****
HTTP_USER_AGENT = NASA Search 1.0
Protected by 419buster Anti-Scam Software V2.6.3
© 2003-2006 .NExT Web Security
——————————————–
One that has definately learned to change the User Agent string:
2/14/2006 9:31:26 PM Visited the Guestbook. Page 4
Visitor 64.86.167.236 was allowed access.
Visitor’s country of origin is Brazil
***** Visitor NOT in United States *****
HTTP_USER_AGENT = dsNjxdeyNmxmfaemymhNh yg
HTTP_ACCEPT = text/html, text/plain
Protected by 419buster Anti-Scam Software V2.6.3
© 2003-2006 .NExT Web Security
By 419buster on 02.15.06 14:41
Got it as well from this IP block:
65.0.0.0 – 65.15.255.255
65.2.253.93
Resolves to:
OrgName: BellSouth.net Inc.
OrgID: BELL
Address: 575 Morosgo Drive
City: Atlanta
StateProv: GA
PostalCode: 30324
Country: US
Has to be some smart a**, IMO.
By Turbo on 06.24.06 07:33
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>