implement domainExtractor for image and title, with a single implementation wikipedia #30

danielgranat · 2015-02-24T23:04:54Z

First thanks for sharing this code. It's exactly what i needed, and i didn't find anything i liked in NodeJS.

This is more a request to get feedback then actual pull request.
The problem i encountered is that wikipedia does not work with the image and title extraction that's implemented now.
The Image is not in the header, but is the first image in the '.infobox'.
Title- When splitting the Title, the longest part is usually not the important part, like in 'Thomas Edison - Wikipedia, the free encyclopedia'
Trying to tackle this problem i saw 2 options:

Re-factor the current extraction implementation to support wikipedia structure. I don't think it's a good option. First it will cause the code to be less readable. Second, what will happen when i need more customization?!
Second option is to have something like domain specific plugins.

Obviously i decided to use the second option.

There is still work to be done and issues to address, but I would like to get your input on the proposed solution.

Thank you for your time!

…tation wikipedia

ageitgey · 2015-02-24T23:34:16Z

Hey, thanks for the PR. I do agree that domain-specific plugins are a better path here than hacks on top of hacks in the main code.

I'll take a look at the PR in detail when I have a little free time and let you know what I think.

Thanks!

…g and it.wikipedia.org

…/js)

implement domainExtractor for image and title, with a single implemen…

2e8b2a7

…tation wikipedia

danielgranat added 4 commits February 25, 2015 10:32

Change matching domain so tht wikipedia.org works for en.wikipedia.or…

278a761

…g and it.wikipedia.org

update build files

37de0c0

detect domain extractors file extention based on running file (coffee…

3c74e4b

…/js)

Add hebrew stopwords

3283ae4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement domainExtractor for image and title, with a single implementation wikipedia #30

implement domainExtractor for image and title, with a single implementation wikipedia #30

danielgranat commented Feb 24, 2015

ageitgey commented Feb 24, 2015

implement domainExtractor for image and title, with a single implementation wikipedia #30

Are you sure you want to change the base?

implement domainExtractor for image and title, with a single implementation wikipedia #30

Conversation

danielgranat commented Feb 24, 2015

ageitgey commented Feb 24, 2015