This project aim at analysing a french Wikipedia Dump, using two different approaches :
- text-mining : building a vector representation of the corpus, using well-known VSM and word embedding method.
- graph-mining : build an atlas based on the cross references.
Before installing the project, you'll need
You can check your current versions of the two softwares using the linux commands :
mvn --version
java -version
Building the Maven project :
mvn clean install
ArcToScience Team, M2 Data Mining, University Lyon 2, France :
This project is licensed under the MIT License - see the LICENSE.md file for details