error during execution of LinkProbability and PU #9

andres11f · 2018-07-18T01:55:50Z

I have been having problems executing LinkProbability and PU.

When I execute LinkProbability i get this error:

It seemed to me to have something to do with some of the strange characters in the info-measure.txt file so I deleted everything in this file except the first 400 or so lines and with that it works (if I leave chinese characters in my reduced info-measure.txt file it fails; so I guess it's safe to say that this error is caused by those characters)

How do I avoid this error? am I doing anything wrong, perhaps some config or something in the character encoding? I find it strange that what seems to fail is caused by the provided file info-measure.txt and I can only make it work after altering that file.

Next, when I execute PU I keep getting this error:

I thought it had something to do with permissions but even when I execute cmd as an administrator I keep getting the error. I have searched but can't find anything useful besides someone saying that this is a Windows problem and Spark doesn't give a lot of support to the windows version. I would like to not have to install another OS just to execute the program.

Lastly, I would like to ask if there is some kind of minimum requirements to executing some of the algorithms since I noticed that they use my memory up to 95% of its capacity. I have a laptop with 8gb of RAM but my intention is to run the program in a lower-end computer and I am worried that it will simply not have enough resources to do it.

astrakhantsev · 2018-07-23T21:36:55Z

Sorry for the late response.

Fixed. There was an issue with encoding during info-measure.txt reading.
Can't fix this; tried multiple solutions, but seems like there is a bug in Sparl for Windows indeed.
It highly depends on size of the datasets you are going to use and on particular methods. I didn't measure memory consumption, but with Xmx of 12gb it could process datasets up to 64M words; I tried to use iterators wherever possible, but there is some lower limit - e.g. to store word2vec model used for KeyConceptRelatedness in the memory, which would cost about 1gb.
In case of low memory and big datasets, I'd suggest to firstly try faster methods (see table 6 from the paper), they tend to occupy less memory and they may have not worse quality in some cases, especially on big datasets, where good statistics on word occurrences is available. E.g. try to play with parameters of ComboBasic - increase alpha if you want more 'representative' or generic terms and increase betta if you want more specific terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error during execution of LinkProbability and PU #9

error during execution of LinkProbability and PU #9

andres11f commented Jul 18, 2018

astrakhantsev commented Jul 23, 2018

error during execution of LinkProbability and PU #9

error during execution of LinkProbability and PU #9

Comments

andres11f commented Jul 18, 2018

astrakhantsev commented Jul 23, 2018