100字范文 > python request 淘宝评论数据简易爬虫

python request 淘宝评论数据简易爬虫

时间：2022-03-29 01:16:59

淘宝商品的评价数据非常具有研究意义，可以尝试作为神经网络RNN的训练原料。我们使用python中的request库可以直接爬取评论数据，并不需要任何其他框架以及浏览器支持。

1，我们要爬取的淘宝商品页，我们可以看到地址栏中id=*************显示的内容是商品在数据库中的id

2，评论所在位置的真实url，以刚刚的宝贝为例，/feedRateList.htm?auctionNumId=553063221972&currentPageNum=1。在Url中体现了对应商品的ID以及评论当前所在的页数。

3，使用request库爬取评论对应的真实url。我们代码的第一步是从宝贝页面的url中获取到评论的地址位置，然后进行request，并循环执行直到最后一页，最后进行内容解析并将数据用pandas df进行存储。

import requestsimport jsonimport pandas as pddef getCommodityComments(url):if url[url.find('id=')+14] != '&':id = url[url.find('id=')+3:url.find('id=')+15]else:id = url[url.find('id=')+3:url.find('id=')+14]url = '/feedRateList.htm?auctionNumId='+id+'¤tPageNum=1'res = requests.get(url)jc = json.loads(res.text.strip().strip('()'))max = jc['total']users = []comments = []count = 0page = 1while count<max:res = requests.get(url[:-1]+str(page))page = page + 1jc = json.loads(res.text.strip().strip('()'))jc = jc['comments']for j in jc:users.append(j['user']['nick'])comments.append( j['content'])#print(count+1,'>>',users[count],'\n ',comments[count])count = count + 1comment_dic = {'count': count+1,'user':users, 'comments':comments} return pd.DataFrame(comment_dic)getCommodityComments('/item.htm?spm=a21bo.7929913.198967.23.5b274174WTT4T8&id=553063221972')

运行结果，其实淘宝已经为我们做了筛选，系统默认的评价全都沉底到了底部。。。

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。