如何擋掉 google, 大陸百度 搜尋引擎

發現google實在強了，他不只可以讓你網站曝光，如果你上傳一些重要的檔案放在網路上，只要沒有經過帳號密碼的機制，讓google知道你的絕對路徑，那個檔案就完蛋了，因為google還會暫存到他的機器，順便還幫你轉成 html ，夠屌了吧，所以我認為唯一最終解決辦法，就是自己寫一隻下載檔案的function，然後下載檔案都要透過該程式然後在header出來，這樣才不會被google弄到檔案，囧。當然要如何擋掉搜尋引擎，方法如下，在自己網站底下新增 .htaccess #擋掉百度 SetEnvIfNoCase User-Agent “^Baidu” bad_bot SetEnvIfNoCase User-Agent “^sogou” bad_bot SetEnvIfNoCase User-Agent “^Bloghoo” bad_bot SetEnvIfNoCase User-Agent “^Scooter” bad_bot Deny from env=bad_bot #擋掉google SetEnvIf User-Agent “^Googlebot” google Deny from env=google 其實還有另外一種方法，那就是用 robots.txt 如何攔截 Googlebot？，這個方法也不錯擋掉的結果如下：

66.249.70.107 - - [20/Jun/2007:15:30:11 +0800] "GET /store/market_list.php??bid=&year=2009&month=10 HTTP/1.1" 403 999 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"


61.135.162.53 - - [20/Jun/2007:14:41:00 +0800] "GET /web/news_show.php?bid=30&newsid=189&list= HTTP/1.1" 403 1003 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"

看到 403 就對了，不過會看到比較多百度的可以參考底下這篇討論，相當不錯

[求助]Bot 如果要測試的話，可以利用 fx 的 user.agent 功能，這招超級好用的喔，方法如下 1.在FX網址列輸入about:config 2.然後新增general.useragent.override 選 string『字串』 3.然後看要輸入什麼值，例如 Googlebot/2.1 (+http://www.googlebot.com/bot.html) 都可以然後就可以測試了參考這篇：用 htaccess 擋 spam