-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add another parsing method #53
Comments
What do you propose, that they distribute BeautifulSoup with Qbit? |
@hannsen Yes I was thinking something like that. I don't think that it will require too effort and it may help a lot people. What do you think? |
I think beautifulsoup4 is perfect for this, but we have to include the package in the qbittorrent repository so the user doesn't have to install external packages. I can do it but I think @sledgehammer999 will oppose... |
I understand, never mind. Mine was just a suggestion |
Could you please share a couple of sites with this issue? nindogo |
I was thinking about all the site whose content are inside a table. It
would be easy to find all tr and then iterating all td inside each tr.
Instead of parsing whole page with flag. My request was intended as a
suggestion if it could be done easily. I don't want to give you hard work
Maurizio Ricci
Il giorno sab 1 dic 2018, 20:26 Ni Ndogo <[email protected]> ha
scritto:
… Often desired information in a web site, are grouped under class name. For
example in some site a list of torrent are a list of div with a particular
class. So in some case it would be better to find elements by ID, ClassName
or by Type instead of parsing page with the standard html parser and using
various flags or variable to remember the state during parsing.
Could you please share a couple of sites with this issue?
nindogo
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#53 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AflCPDKtffqJRp2AN0fqo902lxAbsZ01ks5u0tf7gaJpZM4W1qfG>
.
|
Hi, I have found for many of those, the re module can usually help. |
yeah I used a lot regex, too. It's faster than parsing but also not very readable, but neither is the standard html parser |
Right now i write (trying) a module (wrapper) for HTMLParser: https://github.com/imDMG/HTMLSelector |
i like your module, it looks like pyquery or similar plus it's based on HTMLparser, wich is a standars module |
Often desired information in a web site, are grouped under class name. For example in some site a list of torrent are a list of div with a particular class. So in some case it would be better to find elements by ID, ClassName or by Type instead of parsing page with the standard html parser and using various flags or variable to remember the state during parsing.
The question is, what about adding another parsing method? Something like jQuery, maybe pyquery or BeautifulSoup:
https://pythonhosted.org/pyquery/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
What do you think? @Chocobo1 @sledgehammer999 @Piccirello @zeule @ngosang @hannsen
The text was updated successfully, but these errors were encountered: