最近在做爬虫应用,但是目标网站限制了IP访问,同IP访问次数过多将会被屏蔽。既然网站限制的是IP地址,那么只要在访问时变更IP不就行了。只要掌握了伪装IP的技巧,限制IP访问的网站,或者是限IP的投票,就可以无视限制,为所欲为啦。限制IP的几种形式
要想伪装IP,首先要了解网站是如何获取客户IP并限制的。
以PHP为例,PHP获取客户端IP地址的方式主要有三种:
$_SERVER[“REMOTE_ADDR”];
$_SERVER[“HTTP_CLIENT_IP”];
$_SERVER[“HTTP_X_FORWARDED_FOR”];
REMOTE_ADDR:无代理时获取客户端IP;
HTTP_CLIENT_IP:透明代理时获取客户端IP;
HTTP_X_FORWARDED_FOR:多层代理时返回多个IP或随机地址;
可见只要能搞定这三种获取用户IP的方式,就能伪装成功大多数的网站。
PHP伪装客户端IP
这里以PHP采集类Snoopy为例,因为它能很方便的设置请求头信息,以便伪装IP。
include "Snoopy.class.php";$snoopy = new Snoopy;//分别设置CLIENT-IP和HTTP_X_FORWARDED_FOR来伪装IP$snoopy->rawheaders["CLIENT-IP"] = "1.2.3.4"; //伪装ip$snoopy->rawheaders["HTTP_X_FORWARDED_FOR"] = "1.2.3.4"; //伪装ip$snoopy->fetch("/");print $snoopy->results;
除伪装IP之外,还可以设置其他头信息。
$snoopy->cookies["cookie"] = "value"; //设置Cookie$snoopy->cookies["SessionID"] = 238472834723489l; //设置Session//设置浏览器信息$snoopy->agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"; //伪装浏览器$snoopy->referer = "/"; //伪装来源页地址 http_referer$snoopy->rawheaders["Pragma"] = "no-cache"; //设置Catch$snoopy->rawheaders["Accept-language"] = "zh-cn"; //设置页面语言$snoopy->rawheaders["Content-Type"] = "text/html; charset=utf-8"; //设置页面编码//设置用户名密码$snoopy->user = "username"; $snoopy->pass = "password";
这样不但能伪装客户端IP还能同时伪装客户端,来源等信息。如果网站有验证码,仍然可以通过固定SessionID来实现每次访问都是同样的验证码。
Python伪装
headers={ "Accept-Language":"zh-CN,zh;q=0.9", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36", "X-Forwarded-For":"1.2.3.4", "referer":"/", "Content-Type": "multipart/form-data; session_language=cn_CN"}base_req=requests.get(url="/",headers=headers)
原理相同,只是写法不同,重点是设置伪装IP即可
代理伪装
以上方法不好用的时候只能用代理伪装了,推荐一个国外的代理地址https://free-proxy-/。
Snoopy完整示例
include "Snoopy.class.php"; $snoopy = new Snoopy;$snoopy->cookies["cookie"] = "value"; //设置Cookie$snoopy->cookies["SessionID"] = 238472834723489l; //设置Session//设置浏览器信息$snoopy->agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"; //伪装浏览器$snoopy->referer = "/"; //伪装来源页地址 http_referer$snoopy->rawheaders["Pragma"] = "no-cache"; //设置Catch$snoopy->rawheaders["Accept-language"] = "zh-cn"; //设置页面语言$snoopy->rawheaders["Content-Type"] = "text/html; charset=utf-8"; //设置页面编码//分别设置CLIENT-IP和HTTP_X_FORWARDED_FOR来伪装IP$snoopy->rawheaders["CLIENT-IP"] = "1.2.3.4"; //伪装ip$snoopy->rawheaders["HTTP_X_FORWARDED_FOR"] = "1.2.3.4"; //伪装ip //设置代理$snoopy->proxy_host = "my.proxy.host";$snoopy->proxy_port = "8080";//设置用户名密码$snoopy->user = "username"; $snoopy->pass = "password";$snoopy->fetch("/"); print $snoopy->results;