Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to start zbMATH Open data import #87

Open
physikerwelt opened this issue Jul 7, 2023 · 5 comments
Open

Document how to start zbMATH Open data import #87

physikerwelt opened this issue Jul 7, 2023 · 5 comments
Assignees

Comments

@physikerwelt
Copy link
Member

Could you document how the zbMATH Open data import can be started?

@LizzAlice
Copy link
Contributor

Sure, in this issue, or somewhere else?

@physikerwelt
Copy link
Member Author

It should be referenced from this issue if you can think of a better place. As long as it is documented somewhere we can move the documentation to another place later.

@LizzAlice LizzAlice transferred this issue from MaRDI4NFDI/MaRDIRoadmap Jul 7, 2023
@LizzAlice
Copy link
Contributor

I'll document it here for now, as it will change anyway as soon as we introduce Airflow or sth similar.

  1. Enter mardi-importer container
  2. Create data dir
  3. Copy config file and data dump file from data/zbmath on mardi02 into data dir
  4. add line in /usr/local/lib/python3.9/dist-packages/wikibaseintegrator/models/claims.py in from_json after if 'datatype' in claim['mainsnak']: --> if claim['mainsnak']['datatype'] == 'contentmath': continue
  5. Get name of latest uploaded paper
  6. In mardi/import/zbmath/ZBMathSource.py in push(): un-comment the code after and including if not found; substitute paper title with the latest uploaded paper (or a few papers before that, to be safe)
  7. start import

@physikerwelt
Copy link
Member Author

I suggest to add those steps to the docker file or the entry point of the container.

@physikerwelt
Copy link
Member Author

I'll document it here for now, as it will change anyway as soon as we introduce Airflow or sth similar.

I think you should try the already existing job-runner first. It was good for creating DLMF pages.

1. Enter `mardi-importer` container

2. Create `data` dir

3. Copy config file and data dump file from `data/zbmath` on `mardi02` into `data` dir

This can be avoided by mounting the directory in the docker-compose file.

4. add line in `/usr/local/lib/python3.9/dist-packages/wikibaseintegrator/models/claims.py` in `from_json` after `if 'datatype' in claim['mainsnak']:`  --> `if claim['mainsnak']['datatype'] == 'contentmath': continue`

Can be changed in the container.

5. Get name of latest uploaded paper

This requires some work. The script should be changed to write the name of the latest successfully imported paper to a directory mounted to the host machine.

6. In `mardi/import/zbmath/ZBMathSource.py` in `push()`: un-comment the code after and including `if not found`; substitute paper title with the latest uploaded paper (or a few papers before that, to be safe)

It can also be included in the container, but read the file written in step 5 instead.

7. start import

How exactly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants