100字范文 > 爬虫实例------淘宝商品比价定向爬虫

爬虫实例------淘宝商品比价定向爬虫

时间：2021-02-28 09:00:45

功能描述

目标：获取淘宝搜索页面的信息，提取其中商品的名称和价格

理解：获得淘宝的搜索接口，翻页的处理

技术路线：requests- re

注意：通过查取相关协议，发现淘宝不允许任何爬虫爬取相关页面

程序设计

爬取淘宝的页面信息，需要模拟淘宝登陆才可以

代码：

import requestsimport rekv = {'user-agent': 'Mozilla/5.0'}def getHTMLText(url):try:r = requests.get(url, timeout = 30, headers = kv)r.raise_for_status()r.encoding = r.apparent_encodingreturn r.textexcept:return ""def parsePage(ilt, html):try:plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"', html)tlt = re.findall(r'\"raw_title\"\:\".*?\"', html)for i in range(len(plt)):price = eval(plt[i].split(':')[1])title = eval(tlt[i].split(':')[1])ilt.append([price, title])except:print("")def printGoodsList(ilt):tplt = "{:4}\t{:8}\t{:16}"print(tplt.format("序号", "价格", "商品名称"))count = 0for g in ilt:count += 1print(tplt.format(count, g[0], g[1]))def main():goods = '书包'depth = 2start_url = '/search?q=' + goodsinfolist = []for i in range(depth):try:url = start_url + '&s=' + str(44 * i)html = getHTMLText(url)parsePage(infolist, html)except:continueprintGoodsList(infolist)main()

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。