Skip to content
Piotr Wendykier edited this page Sep 19, 2013 · 8 revisions


COntent ANalysis SYStem is a framework for mining scientific publications using Apache Hadoop. It is primarily developed by employees of the Centre for Open Science (CeON) at Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw (UW).

Citation Matching

During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. Citation matching module in CoAnSys scales up to handle great amounts of data using appropriate indexing and a MapReduce paradigm.


  1. Fedoryszak, M. Tkaczyk, D. and Bolikowski, Ł. Large Scale Citation Matching Using Apache Hadoop, Research and Advanced Technology for Digital Libraries, Springer Berlin Heidelberg, 2013, 8092, 362-365

  2. Dendek, P. J. Czeczko, A. Fedoryszak, M. Kawa, A. Wendykier, P. and Bolikowski Ł. Taming the zoo - about algorithms implementation in the ecosystem of Apache Hadoop, arXiv, 2013

  3. Dendek, P. J. Czeczko, A. Fedoryszak, M. Kawa, A. Wendykier, P. and Bolikowski Ł. How to perform research in Hadoop environment not losing mental equilibrium - case study, arXiv, 2013

Clone this wiki locally