Skip to content
MajoBerger edited this page Jun 14, 2023 · 5 revisions

Note: check README.md for up-to-date instructions

How to migrate CLARIN-DSpace5.* to CLARIN-DSpace7.*

Prerequisites:

  • Installed CLARIN-DSpace7.*. with running database, solr, tomcat

Steps:

  1. Clone python-api: https://github.com/dataquest-dev/dspace-python-api (branch internal/data-migration-items - it's still in progress) and dpace://https://github.com/dataquest-dev/DSpace (branch internal/migrate-clarin-dspace5-to-clarin-dspace7

  1. Get database dump (old CLARIN-DSpace) and unzip it into the <PSQL_PATH>/bin (or wherever you want)

  1. Create CLARIN-DSpace5.* databases (dspace, utilities) from dump.

// clarin-dspace database

  • createdb --username=postgres --owner=dspace --encoding=UNICODE clarin-dspace // create a clarin database with owner

// It run on second try:

  • psql -U postgres clarin-dspace < <CLARIN_DUMP_FILE_PATH>

// clarin-utilities database

  • createdb --username=postgres --owner=dspace --encoding=UNICODE clarin-utilities // create a utilities database with owner

// It run on second try:

  • psql -U postgres clarin-utilities < <UTILITIES_DUMP_FILE_PATH>

  1. Recreate your local CLARIN-DSpace7.* database NOTE: all data will be deleted
  • createdb --username=postgres --owner=dspace --encoding=UNICODE dspace // create database
  • psql --username=postgres dspace -c "CREATE EXTENSION pgcrypto;" // Add pgcrypto extension

If it throws warning that -c parameter was ignored, just write a CREATE EXTENSION pgcrypto; command in the database cmd. CREATE EXTENSION pgcrypto; image

// Now the clarin database for DSpace7 should be created

  • Run the database by the command: pg_ctl start -D "<PSQL_PATH>\data\"

  1. (Your DSpace project must be installed) Go to the dspace/bin and run the command dspace database migrate force // force because of local types NOTE: dspace database migrate force creates default database data that may be not in database dump, so after migration, some tables may have more data than the database dump. Data from database dump that already exists in database is not migrated.

  1. Create an admin by running the command dspace create-administrator in the dspace/bin

  1. Prepare dspace-python-api project for migration IMPORTANT: If data folder doesn't exist in the project, create it

Update const.py

  • user = "<ADMIN_NAME>"

  • password = "<ADMIN_PASSWORD>"

  • # http or https

  • use_ssl = False

  • host = "<YOUR_SERVER>" e.g., localhost

  • # host = "dev-5.pc"

  • fe_port = "<YOUR_FE_PORT>"

  • # fe_port = ":4000"

  • be_port = "<YOUR_BE_PORT>"

  • # be_port = ":8080"

  • be_location = "/server/"

Update migration_const.py

  • REPOSITORY_PATH = "<PROJECT_PATH>"
  • DATA_PATH = REPOSITORY_PATH + "data/"

  1. Create JSON files from the database tables. NOTE: You must do it for both databases clarin-dspace and clarin-utilities (JSON files are stored in the data folder)
  • Go to dspace-python-api in the cmd
  • Run pip install -r requirements.txt
  • Run python data_migration.py <DATABSE NAME> <HOST> postgres <PASSWORD FOR POSTGRES> e.g., python data_migration.py clarin-dspace localhost postgres pass (arguments for database connection - database, host, user, password) for the BOTH databases // NOTE there must exist data folder in the project structure

  1. Copy assetstore from dspace5 to dspace7 (for bitstream import)

  1. Import data from the json files (python-api/data/) into dspace database (CLARIN-DSpace7.)
  • NOTE: database must be up to date (dspace database migrate force must be called in the dspace/bin)
  • NOTE: dspace server must be running
  • From the dspace-python-api run command python dspace_import.py

Migration notes:

  • The values of table attributes that describe the last modification time of dspace object (for example attribute last_modified in table Item) have a value that represents the time when that object was migrated and not the value from migrated database dump.
Clone this wiki locally