Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local dev #27

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Local dev #27

wants to merge 6 commits into from

Conversation

scotmatson
Copy link

Made changes that fixed an error due to the mixture of tabs with white spaces. But the biggest change involved making modifications for adding support for both Python2 and Python3. This included dropping the unicode encoding in html_parser, adding an updated from of the StringIO module, and adding a few additional parenthesis that were missing from various print statements.

I would like to note that I've been testing only the HTML parsing with this pull request. Nothing else should be effected but I am still learning the build of this utility. I've been testing html pages from malware-traffic-analysis.net which is pulling in many FPs - something I plan on playing with the future.

Finally I made the default PDF parser PyPDF2 as it has python2&3 support where pdfminer does not.

@scotmatson
Copy link
Author

Was reading the Issue response regarding PyPDF2 vs. pdfminer. I understand the reason behind sticking with pdfminer, but feel it would be worthwhile to implementing a solution that addresses Python3 problems out of the box as well.

@@ -35,12 +35,16 @@
#
###################################################################################################

#from __future__ import unicode_literals
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remnants of a change that was not kept. This line has no real purpose any longer and should not be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants