NomenMatch

This branch is developed based on original version of NomenMatch. The following steps will run both original version (v1) and new version (v2) at the same time, if you only need particular version, please refer to respective branch.

Main changes compared to original version (v1)

matching name of higher ranks such as order, family, genus
matching common names (in traditional chinese)

Installing

build image

 $ docker-compose build

run devel

 $ docker-compose up -d

create solr core & set custom config (only first time)

  $ docker-compose exec solr bash
  $ ./bin/solr create_core -c taxa
  $ cp solrconfig.xml /var/solr/data/taxa/conf
  $ cp managed-schema /var/solr/data/taxa/conf
  $ exit
  $ docker-compose restart solr

prepare data for v1

prepare source data csv and put it in source-data folder
copy conf/sources.csv to source-data folder if source-data don't have sources.csv
modified souces.csv to map source id and source info

  $ cp conf/sources.csv source-data

import data (example: TaiCOL) for v1

  $ docker-compose exec php bash
  $ cd /code/workspace
  $ php ./importChecklistToSolr.php ../source-data/<taicol-checklist.csv> taicol

prepare data for v2

prepare source data csv and put it in source-data folder
copy conf/sources.csv to source-data folder if source-data don't have sources.csv
modified souces.csv to map source id and source info

  $ cp v2/conf/sources.csv source-data

import data (example: TaiCOL) for v2

  $ docker-compose exec php bash
  $ cd /code/v2/workspace
  $ php ./importChecklistToSolr.php ../source-data/<taicol-checklist.csv> taicol_2

Note in order to distinguish between original TaiCOL and new TaiCOL, we use different source_id. (taicol for original TaiCOL and taicol_2 for new TaiCOL)

Source data format

Tab seperated, see v2/workspace/data/example.csv
Column definition:

namecode
accepted_namecode
scientific_name (full name or canonical form is ok)
name_url_id (the id which can be used to create a valid url to a taxon name page)
accepted_name_url_id (the id which can be used to create a valid url to an accepted taxon name page, if the name is a synonym)
family
order
class
phylum
kingdom
simple_name
name_status

Note example code for generating source data could be found in ./scripts folder.

Describe source data

Edit conf/sources.example.csv and rename to sources.csv
Column definition:

source_id
source_name
url_base (when combined with [accepted] name_url_id, it becomes valid url for the taxon, blah blah)
for example,
citation format
source data page
date (of source data fetched, downloaded, or created)

Delete source data in solr

Under v2/workspace dir, run

php clean_source.php {source_id}

to remove a specific source from solr, or run

php clean_source.php all

to remove all sources at once.
If this script doesn't work, usually it means java heap space out of memory. Try to restart solr and then run again.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
compose		compose
conf		conf
css		css
include		include
js		js
scripts		scripts
v2		v2
workspace		workspace
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
about.html		about.html
api-doc.html		api-doc.html
api.php		api.php
count.php		count.php
docker-compose.yml		docker-compose.yml
howto.html		howto.html
index.html		index.html
production.yml		production.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NomenMatch

Main changes compared to original version (v1)

Installing

Source data format

Describe source data

Delete source data in solr

Demo

About

Releases

Packages

Languages

License

Jeffersonktw/NomenMatch

Folders and files

Latest commit

History

Repository files navigation

NomenMatch

Main changes compared to original version (v1)

Installing

Source data format

Describe source data

Delete source data in solr

Demo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages