首页 > 开发 > PHP > 正文

PHP判断来访是搜索引擎蜘蛛还是普通用户的代码小结

2024-05-04 22:35:04
字体:
来源:转载
供稿:网友

1、推荐的一种方法:php判断搜索引擎蜘蛛爬虫还是人为访问代码,摘自Discuz x3.2

<?phpfunction checkrobot($useragent=''){	static $kw_spiders = array('bot', 'crawl', 'spider' ,'slurp', 'sohu-search', 'lycos', 'robozilla');	static $kw_browsers = array('msie', 'netscape', 'opera', 'konqueror', 'mozilla');	$useragent = strtolower(empty($useragent) ? $_SERVER['HTTP_USER_AGENT'] : $useragent);	if(strpos($useragent, 'http://') === false && dstrpos($useragent, $kw_browsers)) return false;	if(dstrpos($useragent, $kw_spiders)) return true;	return false;}function dstrpos($string, $arr, $returnvalue = false) {	if(empty($string)) return false;	foreach((array)$arr as $v) {		if(strpos($string, $v) !== false) {			$return = $returnvalue ? $v : true;			return $return;		}	}	return false;}if(checkrobot()){	echo '机器人爬虫';}else{	echo '人';}?>

实际应用中可以这样判断,直接不是搜索引擎才执行操作

<?phpif(!checkrobot()){//do something}?>

2、第二种方法:

使用PHP实现蜘蛛访问日志统计

$useragent = addslashes(strtolower($_SERVER['HTTP_USER_AGENT'])); if (strpos($useragent, 'googlebot')!== false){$bot = 'Google';} elseif (strpos($useragent,'mediapartners-google') !== false){$bot = 'Google Adsense';} elseif (strpos($useragent,'baiduspider') !== false){$bot = 'Baidu';} elseif (strpos($useragent,'sogou spider') !== false){$bot = 'Sogou';} elseif (strpos($useragent,'sogou web') !== false){$bot = 'Sogou web';} elseif (strpos($useragent,'sosospider') !== false){$bot = 'SOSO';} elseif (strpos($useragent,'360spider') !== false){$bot = '360Spider';} elseif (strpos($useragent,'yahoo') !== false){$bot = 'Yahoo';} elseif (strpos($useragent,'msn') !== false){$bot = 'MSN';} elseif (strpos($useragent,'msnbot') !== false){$bot = 'msnbot';} elseif (strpos($useragent,'sohu') !== false){$bot = 'Sohu';} elseif (strpos($useragent,'yodaoBot') !== false){$bot = 'Yodao';} elseif (strpos($useragent,'twiceler') !== false){$bot = 'Twiceler';} elseif (strpos($useragent,'ia_archiver') !== false){$bot = 'Alexa_';} elseif (strpos($useragent,'iaarchiver') !== false){$bot = 'Alexa';} elseif (strpos($useragent,'slurp') !== false){$bot = '雅虎';} elseif (strpos($useragent,'bot') !== false){$bot = '其它蜘蛛';} if(isset($bot)){   $fp = @fopen('bot.txt','a');   fwrite($fp,date('Y-m-d H:i:s')."/t".$_SERVER["REMOTE_ADDR"]."/t".$bot."/t".'http://'.$_SERVER['SERVER_NAME'].$_SERVER["REQUEST_URI"]."/r/n");   fclose($fp); }

第三种方法:

我们可以通过HTTP_USER_AGENT来判断是否是蜘蛛,搜索引擎的蜘蛛都有自己的独特标志,下面列取了一部分。

function is_crawler() {   $userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);   $spiders = array(     'Googlebot', // Google 爬虫     'Baiduspider', // 百度爬虫     'Yahoo! Slurp', // 雅虎爬虫     'YodaoBot', // 有道爬虫     'msnbot' // Bing爬虫     // 更多爬虫关键字   );   foreach ($spiders as $spider) {     $spider = strtolower($spider);     if (strpos($userAgent, $spider) !== false) {       return true;     }   }   return false; }            
发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表