python3的爬虫库与python2库的区别较大,python3将urllib2和urllib直接合并成了一个库——urllib,在其下面有四个模块,分别为request,parse,error,robotparser模块,在request之下的urlopen方法,方法原型如下:urlopen(url,data=none),这个方法返回的是一个response对象,其中参数url可以使一个request对象,也可以是一个字符串,该方法等价于:
res=urlib.request.Request(url)
response=urlli.request.urlopen(res)
再回过头来说一下urlopen里面的data参数,首先需要构造一个字典,然后再用urllib.urlencode()进行转化为相应的格式。
由于我的机器装的是python2.7版本,所以需要将这个代码进行转化,所以,我在网上查阅了一下相应的对应代码:
Python 2 name
Python 3 name | |
urllib.urlretrieve() | urllib.request.urlretrieve() |
urllib.urlcleanup() | urllib.request.urlcleanup() |
urllib.quote() | urllib.parse.quote() |
urllib.quote_plus() | urllib.parse.quote_plus() |
urllib.unquote() | urllib.parse.unquote() |
urllib.unquote_plus() | urllib.parse.unquote_plus() |
urllib.urlencode() | urllib.parse.urlencode() |
urllib.pathname2url() | urllib.request.pathname2url() |
urllib.url2pathname() | urllib.request.url2pathname() |
urllib.getPRoxies() | urllib.request.getproxies() |
urllib.URLopener | urllib.request.URLopener |
urllib.FancyURLopener | urllib.request.FancyURLopener |
urllib.ContentTooShortError | urllib.error.ContentTooShortError |
urllib2.urlopen() | urllib.request.urlopen() |
urllib2.install_opener() | urllib.request.install_opener() |
urllib2.build_opener() | urllib.request.build_opener() |
urllib2.URLError | urllib.error.URLError |
urllib2.HTTPError | urllib.error.HTTPError |
urllib2.Request | urllib.request.Request |
urllib2.OpenerDirector | urllib.request.OpenerDirector |
urllib2.BaseHandler | urllib.request.BaseHandler |
urllib2.HTTPDefaultErrorHandler | urllib.request.HTTPDefaultErrorHandler |
urllib2.HTTPRedirectHandler | urllib.request.HTTPRedirectHandler |
urllib2.HTTPCookieProcessor | urllib.request.HTTPCookieProcessor |
urllib2.ProxyHandler | urllib.request.ProxyHandler |
urllib2.HTTPPassWordMgr | urllib.request.HTTPPasswordMgr |
urllib2.HTTPPasswordMgrWithDefaultRealm | urllib.request.HTTPPasswordMgrWithDefaultRealm |
urllib2.AbstractBasicAuthHandler | urllib.request.AbstractBasicAuthHandler |
urllib2.HTTPBasicAuthHandler | urllib.request.HTTPBasicAuthHandler |
urllib2.ProxyBasicAuthHandler | urllib.request.ProxyBasicAuthHandler |
urllib2.AbstractDigestAuthHandler | urllib.request.AbstractDigestAuthHandler |
urllib2.HTTPDigestAuthHandler | urllib.request.HTTPDigestAuthHandler |
urllib2.ProxyDigestAuthHandler | urllib.request.ProxyDigestAuthHandler |
urllib2.HTTPHandler | urllib.request.HTTPHandler |
urllib2.HTTPSHandler | urllib.request.HTTPSHandler |
urllib2.FileHandler | urllib.request.FileHandler |
urllib2.FTPHandler | urllib.request.FTPHandler |
urllib2.CacheFTPHandler | urllib.request.CacheFTPHandler |
urllib2.UnknownHandler | urllib.request.UnknownHandler |
新闻热点
疑难解答