100字范文,内容丰富有趣,生活中的好帮手!
100字范文 > python 抓取内涵段子

python 抓取内涵段子

时间:2019-12-12 09:54:08

相关推荐

python 抓取内涵段子

#coding:utf-8import urllib2import reclass Spider:def __init__(self):self.page = 1 self.switch = True def loadPage(self):print "正在下载数据...." url = "/article/list_5_" + str(self.page) + ".html" headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36" }request = urllib2.Request(url, headers=headers)response = urllib2.urlopen(request)html = response.read()html = unicode(html, "gb2312").encode("utf8")pattern = pile('<div\sclass="f18 mb20">(.*?)</div>', re.S)content_list = pattern.findall(html)self.dealPage(content_list)def dealPage(self, content_list):for item in content_list:p = pile(r'\s+')item = p.sub('',item)+"\n\r" self.writePage(item)def writePage(self, item):print "正在写入数据..." with open("duanzi.txt", "a") as f:f.write(item)def startWork(self):while self.switch:self.loadPage()command = raw_input("如果继续爬取, 请按回车(退出输入quit)")if command == 'quit':self.switch = False self.page += 1 print "谢谢使用"if __name__ == '__main__':duanziSpider = Spider()duanziSpider.startWork()

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。