首页 > 编程 > Python > 正文

Python3实现的爬虫爬取数据并存入mysql数据库操作示例

2020-02-15 21:40:25
字体:
来源:转载
供稿:网友

本文实例讲述了Python3实现的爬虫爬取数据并存入mysql数据库操作。分享给大家供大家参考,具体如下:

爬一个电脑客户端的订单。罗总推荐,抓包工具用的是HttpAnalyzerStdV7,与chrome自带的F12类似。客户端有接单大厅,罗列所有订单的简要信息。当单子被接了,就不存在了。我要做的是新出订单就爬取记录到我的数据库zyc里。

设置每10s爬一次。

抓包工具页面如图:

首先是爬虫,先找到数据存储的页面,再用正则爬出。

# -*- coding:utf-8 -*-import reimport requestsimport pymysql #Python3的mysql模块,Python2 是mysqldbimport datetimeimport timedef GetResults():  requests.adapters.DEFAULT_RETRIES = 5 #有时候报错,我在网上找的不知道啥意思,好像也没用。  reg = [r'"id":(.*?),',      r'"order_no":"(.*?)",',      r'"order_title":"(.*?)",',      r'"publish_desc":"(.*?)",',      r'"game_area":"(.*?)///(.*?)///(.*?)",',      r'"order_current":"(.*?)",',      r'"order_content":"(.*?)",',      r'"order_hours":(.*?),',      r'"order_price":"(.*?)",',      r'"add_price":"(.*?)",',      r'"safe_money":"(.*?)",',      r'"speed_money":"(.*?)",',      r'"order_status_desc":"(.*?)",',      r'"order_lock_desc":"(.*?)",',      r'"cancel_type_desc":"(.*?)",',      r'"kf_status_desc":"(.*?)",',      r'"is_show_pwd":(.*?),',      r'"game_pwd":"(.*?)",',      r'"game_account":"(.*?)",',      r'"game_actor":"(.*?)",',      r'"left_hours":"(.*?)",',      r'"created_at":"(.*?)",',      r'"account_id":"(.*?)",',      r'"mobile":"(.*?)",',      r'"contact":"(.*?)",',      r'"qq":"(.*?)"},']  results=[]  try:    for l in range(1,2):   #页码      proxy = {'HTTP':'61.135.155.82:443'} #代理ip      html = requests.get('https://www.dianjingbaozi.com/api/dailian/soldier/hall?access_token=3ef3abbea1f6cf16b2420eb962cf1c9a&dan_end=&dan_start=&game_id=2&kw=&order=price_desc&page=%d'%l+'&pagesize=30&price_end=0&price_start=0&server_code=000200000000&sign=ca19072ea0acb55a2ed2486d6ff6c5256c7a0773×tamp=1511235791&type=public&type_id=%20HTTP/1.1',proxies=proxy) # 用get的方式访问。网页解码成中文。接单大厅页。      #      html=html.content.decode('utf-8')      outcome_reg_order_no = re.findall(r'"order_no":"(.*?)","game_area"', html)  #获取订单编号,因为订单详情页url与订单编号有关。      for j in range(len(outcome_reg_order_no)):        html_order = requests.get('http://www.lpergame.com/api/dailian/order/detail?access_token=eb547a14bad97e1ee5d835b32cb83ff1&order_no=' +outcome_reg_order_no[j] + '&sign=c9b503c0e4e8786c2945dc0dca0fabfa1ca4a870×tamp=1511146154 HTTP/1.1',proxies=proxy)  #订单详细页        html_order=html_order.content.decode('utf-8')        # print(html_order)        outcome_reg = []        for i in range(len(reg)):#每条订单          outcome = re.findall(reg[i], html_order)          if i == 4:            for k in range(len(outcome)):              outcome_reg.extend(outcome[k])          else:            outcome_reg.extend(outcome)        results.append(outcome_reg) #结果集    return results  except:    time.sleep(5)  #有时太频繁会报错。    print("失败")    pass            
发表评论 共有条评论
用户名: 密码:
验证码: 匿名发表