GOOGLE
SECURITY
Soda_Popinsky has very kindly allowed this tutorial of his to be hosted
on the TAZ.
You can find the original post here:
http://www.antionline.com/showthread.php?s=&threadid=260714
Google
Security
Some background first...
Google as a Hacking Tool by 3rr0r:
http://www.antionline.com/showthrea...threadid=257512
Google Hacking Honeypots:
http://www.antionline.com/showthrea...threadid=260050
Google hacking and Credit Card Security:
http://www.antionline.com/showthrea...threadid=260580
Google: Net Hacker Tool:
http://www.antionline.com/showthrea...threadid=240791
Google Aids Hackers:
http://www.antionline.com/showthrea...threadid=240734
Google is watching you:
http://www.antionline.com/showthrea...threadid=260700
Seems that Google is becoming a problem for some webmasters. I decided
to check out what Google knew about the site I took over, so I decided
to write this tut while I worked as a reference.
Control the Spiders
Nearly all crawlers work with something called the Robots Exclusion
Standard, which allows webmasters to determine which parts of their
website are indexed.
To do this, we stick a text file called robots.txt at the top level of
our document root folder. Here is an example file:
Code:
User-agent: *
Disallow:
This code sucks. It allows all crawlers to index whatever they want.
Lets write code to deny all crawlers.
Code:
User-agent: *
Disallow: /
Notice the slash, it tells all crawlers to ignore everything past the
document root folder.
Code:
User-agent: *
Disallow: /admin
Disallow: /cgi-bin
This code will tell the crawler to ignore documents past the admin and
cgi-bin folders in the document root folder. Now lets define which
crawlers we like and dont like. These are called records, and hard
returns matter for it to work. 1 return between records.
Code:
#Denys access to Google's spiders
User-agent: Google
Disallow: /
User-agent: *
Disallow:
You can also deny a single file
Code:
User-agent: *
Disallow: /admin/index.html
Note that wildcards only work in the "User-agent" line.
Meta Tag Crawler Denial
You may not have permission to put a robots.txt file in the document
root of your webserver. This method is available, though crawlers do
not support this method as well. This is simple, place one of these
meta tags in your pages:
Permission to index, and follow links:
<meta name="robots" content="index,follow">
Do not index, permission to follow links
<meta name="robots" content="noindex,follow">
Permission to index, do not follow links
<meta name="robots" content="index,nofollow">
Do not index, do not follow links.
<meta name="robots" content="noindex,nofollow">
This method is a lot more work, and is not well supported, but requires
no permission to setup.
Dumping info in Google
This is an easy trick, though not practical for large sites. Enter this
into the google search engine:
site:www.YOURSITEHERE.com
You'll see that it dumps all it knows about your site. If you aren't
too popular, you can skim through it to see what it knows.
Foundstone's SiteDigger
In order to use this great tool, you need to register for a Google
license key. Get it done here:
https://www.google.com/accounts/NewAccount
SiteDigger can be found here-
http://www.foundstone.com/resources.../sitedigger.htm
Install SiteDigger, and enter your license key in the bottom right
corner. After that, update your signatures by clicking options, update
signatures. Enter your domain where it says, "please enter your
domain", and click search.
What SiteDigger does is run automated searches on your domain with
signatures, looking for common indexing mistakes left behind by
webmasters. Hackers use this, so should you. Anything it finds should
be handled accordingly.
In short, learn to protect your public files. Learn to use .htaccess
files for apache webservers here-
http://www.antionline.com/showthrea...threadid=231380
All done.
Comments and criticisms encouraged.
SOURCES:
http://www.robotstxt.org/
http://www.antionline.com/
Original Tutorial
Submitted by nokia for TheTAZZone-TAZForum
Originally posted on March 6th, 2006 here
Do not use, republish, in whole or in part, without the consent of
the Author. TheTAZZone policy is that Authors retain the rights to the
work they submit and/or post...we do not sell, publish, transmit, or
have the right to give permission for such...TheTAZZone merely retains
the right to use, retain, and publish submitted work within it's
Network.

