Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

您好,请教新浪微博爬虫的问题 #4

Open
myron520 opened this issue Feb 10, 2020 · 3 comments
Open

您好,请教新浪微博爬虫的问题 #4

myron520 opened this issue Feb 10, 2020 · 3 comments

Comments

@myron520
Copy link

大佬您好,请问我运行您新浪微博爬虫的Create_all.py文件,显示ModuleNotFoundError: No module named 'sqlalchemy',该怎么解决呢?谢谢!

@Eternity666666
Copy link

你好,我下载了代码运行,但是当有多个用户id时,爬取完一个人的微博后,就会出现错误
Traceback (most recent call last):
File "D:/code/Spider-master/weibo/sina_spider.py", line 223, in
main(use_proxies=False)#默认不使用代理ip
File "D:/code/Spider-master/weibo/sina_spider.py", line 216, in main
getmain(resmain, uid, wb_data, conn, mainurl, user_agents, cookies,conf,use_proxies)
File "D:/code/Spider-master/weibo/sina_spider.py", line 148, in getmain
pagenums=pages[0]
IndexError: list index out of range
为什么会出现这个错误呢
网上查了下,两种原因,一是下标越界了,二是列表是空的导致的,但是我可以抓取到第一个人的,它为什么会是空的呢。是什么原因导致从网页爬取信息失败吗?爬取时间间隔也不短,我也将headers的connection属性设为了close,防止它由于连接数过多而失败,想请教一下您怎么解决这个问题,谢谢!

@starFalll
Copy link
Owner

@hoho-yin 应该是没有安装依赖导致的,看一下是否运行了pip3 install -r requirements.txt

@starFalll
Copy link
Owner

starFalll commented Mar 29, 2020

你好,我下载了代码运行,但是当有多个用户id时,爬取完一个人的微博后,就会出现错误
Traceback (most recent call last):
File "D:/code/Spider-master/weibo/sina_spider.py", line 223, in
main(use_proxies=False)#默认不使用代理ip
File "D:/code/Spider-master/weibo/sina_spider.py", line 216, in main
getmain(resmain, uid, wb_data, conn, mainurl, user_agents, cookies,conf,use_proxies)
File "D:/code/Spider-master/weibo/sina_spider.py", line 148, in getmain
pagenums=pages[0]
IndexError: list index out of range
为什么会出现这个错误呢
网上查了下,两种原因,一是下标越界了,二是列表是空的导致的,但是我可以抓取到第一个人的,它为什么会是空的呢。是什么原因导致从网页爬取信息失败吗?爬取时间间隔也不短,我也将headers的connection属性设为了close,防止它由于连接数过多而失败,想请教一下您怎么解决这个问题,谢谢!

@Eternity666666 可能是第二个用户没有page_number这个属性的信息,导致的数组越界,现在已经修复,在这里如果没有page_number就会报错并跳过.

Repository owner deleted a comment Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants