Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend support #65

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

johipsum
Copy link

@johipsum johipsum commented Dec 19, 2016

solves #62

because i needed it quickly I transformed the txt files to JSONs and introduced a stopwords require function. this can be bundled with webpack, browserify, etc. and you can use it in browsers or like me in a aws lambda function (bundled with webpack). works for me 🙂
but let me know if you have a better idea or cleaner way to do it

@johipsum johipsum mentioned this pull request Dec 19, 2016
@knod
Copy link

knod commented Dec 21, 2016

Did you try bundling this with browserify? I don't think it'll work - I tried the same thing. I haven't gotten plain browserify to work with variables, only with explicit strings, so something like the stopwords loader I see in the commits won't work.

@johipsum
Copy link
Author

johipsum commented Dec 21, 2016

@knod unfortunately I only tested it with webpack. works just fine because of webpacks context feature. I just created a repository with a working example johipsum/unfluff-browser-test.

I didn't know that browserify can not handle dynamic requires... even if adding tons of require statements for every single stopwords json sounds bad, is this probably the "best"/easiest way to support browserify... or does someone have a better idea?

@knod
Copy link

knod commented Dec 26, 2016

@johipsum: From what I understand, the only way to support browserify in this kind of situation is by using additional modules. I think there's no ideal solution here, unfortunately, but it'd be great to hear any additional ideas, or even a confirmation that this is the case.

I'd also like thoughts on whether converting the stopwords files to JSON in order to include them in the loop is very different than just adding them, as an array, to one object. Do stopwords librarys often offer their data as JSON? Or are they usually .txt files that would need to be converted?

Another possible option (haven't really used makefiles much) may be to convert and combine the .txt files into a JSON object file during make, since make has to be run every time the solution is changed anyway. Is that feasible?

@johipsum
Copy link
Author

johipsum commented Jan 6, 2017

I updated the stopwords-loader in order to support browserify 10c9ac9 ...
an additional make task to create the JSONs would be great! we could also generate the stopwords-loader via make ...

@knod
Copy link

knod commented Jan 7, 2017

I also made a couple pull requests with different options. One was very similar to your implementation. Great minds...

@mikhailbot
Copy link

@johipsum care to share how you got your unfluff fork to run in Lambda? I keep getting timeouts when installing your fork via NPM.

@johipsum
Copy link
Author

@mikhaildelport the default timeout of a Lambda function is 3 seconds. Maybe your unfluff function needs more. Have you tried to increase the allowed time for your lambda execution?

@mikhailbot
Copy link

mikhailbot commented Mar 20, 2017

@johipsum yeah, I bumped it up to 5 seconds with no luck, and if it's that slow it's also mostly useless sadly! Running locally it finishes under a second. Here's the quick and dirty code I used to test it.

https://gist.github.com/mikhaildelport/28060909bbe276d537b328e36142f23b

Edit: So I bumped the timeout to 30 seconds just to see, and it finally completed in 5.5 seconds. Not sure why it's so slow. Is it getting the HTML (in which case I'll move that to the client) or is it the unfluff process?

Edit 2: Logs show it's unfluff process sadly.

2017-03-20T12:39:59.329Z	54440e07-0d6a-11e7-aec5-0be0bb66f6c7	unfluffing...
2017-03-20T12:39:59.743Z	54440e07-0d6a-11e7-aec5-0be0bb66f6c7	Got HTML
2017-03-20T12:40:04.902Z	54440e07-0d6a-11e7-aec5-0be0bb66f6c7	Got unfluffed

@johipsum
Copy link
Author

@mikhaildelport maybe you can try the lazy extractors ... my lambda looks more or less like yours, except that i use the lazy functions, and its almost as fast as on my local machine.

@mikhailbot
Copy link

@johipsum I'll keep that in mind! I found a web parser API that works for what I want so I'm going with it for now, thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants