Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de.tudarmstadt.ukp.wikipedia.parser.Link.getText may return empty string #90

Open
daxenberger opened this issue Jul 31, 2015 · 3 comments
Labels
Milestone

Comments

@daxenberger
Copy link
Member

Originally reported on Google Code with ID 96

I noticed that when a page has categories as follows, getText() will return an empty
string. Take for example, the 'Anarchism' page. It has six categories defined in its
wikitext:
[[Category:Anarchism| ]]
[[Category:Political culture]]
[[Category:Political ideologies]]
[[Category:Social theories]]
[[Category:Anti-fascism]]
[[Category:Greek loanwords]]

The following code 
for (Link link : page.getCategories()) {
  System.out.println(">" + link.getText() + "<");
}

will print:
><
>Category:Political culture<
>Category:Political ideologies<
>Category:Social theories<
>Category:Anti-fascism<
>Category:Greek loanwords<

Note the first line. We get an empty text because the string after the | character
is empty.

I suggest that in such a case, we return the category "target" itself or the target
without the "Category:" string.


What version of the product are you using? On what operating system?
Running latest release (0.9.1) on Linux.



Reported by jbabooa on 2012-05-20 12:28:29

@daxenberger
Copy link
Member Author

more of a request for enhancement..

Reported by jbabooa on 2012-05-20 12:28:59

@daxenberger
Copy link
Member Author

Thanks for the report. I will look into it and make the suggested change.

However, be aware that as of the next release of JWPL, the parser will not be supported
any more. It has been moved into its own module.
We will still apply patches provided by the community, but we will not develop the
parser any further.
We now use the Sweble parser (www.sweble.org), which we also integrated into JWPL Core.

Reported by oliver.ferschke on 2012-05-29 10:17:10

  • Status changed: Accepted

@daxenberger
Copy link
Member Author

Reported by oliver.ferschke on 2012-05-29 10:23:00

@reckart reckart added this to the Bug backlog milestone Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants