Experts@Minnesota Database Schema

Schema for the Experts@Minnesota project database.

Overview

UMN Libraries created this database to complement the Elsevier product, Pure, which we have branded Experts@Minnesota. We did this both to provide alternative means of access to some of the data, as well as to maintain a vendor-independent, UMN-local copy of that data.

Access

Read-only direct SQL access is available to anyone with a UMN internet ID. This database is on the OIT Oracle Hotel, currently only on the tst instance, with access granted via the oit_expert_rd_all role, managed by OIT. Each role member must be an Oracle Hotel account, or "schema" in Oracle-speak. These accounts can be personal internet IDs or departmental/functional accounts. Note that personal internet IDs must use Duo two-factor authentication, so a functional/departmental account will be better for non-interactive, automated access. For access to the role and database connection information, and to optionally create an Oracle Hotel account, send email to: [email protected] The DBAs at that address will probably tell you to go to the Access Request Forms on Service Now and fill out the "IT Hosted Databases" form with the information described above. If you already have an Oracle Hotel account you want to use, it may be faster to just go to that form directly.

Also, Elsevier has just added to Pure a new JSON-based web services API, which we are currently evaluating. We had originally planned to design and implement our own web services API. However, it seems the Pure JSON-based API may meet user needs, so we are deferring that plan for now.

Announcements & Discussion

Announcements and discussion about this database and other Experts@Minnesota services happen on both the Experts Data Warehouse-API Google Group and on the #research-activity channel on Tech-People UMN Slack. Both are open to anyone with a UMN internet ID.

Maturity

It may be charitable to assign this database an alpha level of maturity. Probably obvious that this repository is a bit of a mess. We want your feedback to help make it better! Our goal was to release a useful, usable product to users as soon as possible. We recognize that we can't give you exactly what you need or want without working directly with you to find out what that is. Please join the user community discussion by following the instructions above.

Data Model Design

Of course the Pure data model heavily influenced the design of this data model, but Citation Style Language (CSL) was at least as big an influence. Again, one goal was vendor-independence. CSL is a popular standard used by many citation managers. Another goal was to find a robust data model already implemented in popular web services APIs. csl-data.json was the best such model we could find. Our upcoming web services will adhere to it as closely as possible.

Organizational Complexity

As the diagrams and documentation below should make painfully obvious, the data model is most complex wherever organizations are involved. We especially welcome feedback from users about these parts of the data model. There are likely many improvements we could make to allow for more convenient and performant queries, as well as conceptual clarity and simplicity.

Entity Relationship Diagram (ERD)

The following diagrams are exports from Oracle SQL Developer.

Scalable version (PDF)

Data Dictionary

The following information, in a separate section for each table, is in the database itself, in the form of comments on all tables and columns. We re-produce it here for convenience.

PERSON

A person, usually an author of research outputs. May be internal or external to UMN.

Column	Description
UUID	Universally-unique ID for the person, generated for this Experts@Minnesota database.
PURE_UUID	Universally-unique ID for the person in our Elsevier Pure database.
PURE_ID	Unique ID for the person in our Elsevier Pure database. For UMN persons whose data we loaded into the Elsevier predecessor product, SciVal, this will be the SciVal ID. For other UMN persons whose data we have loaded into Pure, this will be the UMN employee ID (emplid). For UMN-external persons, this will be NULL. Note that because we have not loaded data for all UMN persons into Pure, some UMN persons will be classified as external in Pure.
ORCID	Open Researcher and Contributor ID for the person.
SCOPUS_ID	Unique ID for the person in the Elsevier Scopus database.
HINDEX	An index that attempts to measure both the productivity and impact of the published work of a scientist or scholar. Used only in some disciplines, so for many persons this will be NULL. More info on blog.scopus.com.
EMPLID	UMN employee ID (emplid).
INTERNET_ID	UMN internet ID.
FIRST_NAME	The given name for the person.
LAST_NAME	The family name for the person.
PURE_INTERNAL	"Y" if Pure classifies the person as UMN-internal, "N" otherwise. Note that, because we have not loaded data for all UMN persons into Pure, some UMN persons will be classified as external in Pure.

PUB

Research output. Named "pub", short for "publication", due to Oracle character-lenght limits.

Column	Description
UUID	Universally-unique ID for the item, generated for this Experts@Minnesota database.
PURE_UUID	Universally-unique ID for the item in our Elsevier Pure database.
SCOPUS_ID	Unique ID for the item in the Elsevier Scopus database.
PMID	Unique ID for the item in the NCBI PubMed database.
DOI	Digital Object Identifier for the item.
TYPE	Publication type or format of the item. See the CSL spec for a list of values.
TITLE	Primary title of the item.
CONTAINER_TITLE	Title of the container holding the item (e.g. the book title for a book chapter, the journal title for a journal article).
ISSUED	Date the item was issued/published.
ISSUED_PRECISION	Precision of the ISSUED column, in days: 366 (year), 31 (month), 1 (day).
VOLUME	Volume holding the item (e.g. “2” when citing a chapter from book volume 2).
ISSUE	Issue holding the item (e.g. “5” when citing a journal article from journal volume 2, issue 5).
PAGES	Range of pages the item (e.g. a journal article) covers in a container (e.g. a journal issue).
CITATION_TOTAL	Number of citations of the item.
ISSN	International Standard Serial Number.
OWNER_PURE_ORG_UUID	Unique ID for the organization that owns the item in our Elsevier Pure database.

PUB_PERSON

Associates research outputs with persons (authors).

Column	Description
PUB_UUID	Foreign key to PUB.
PERSON_UUID	Foreign key to PERSON.
PERSON_ORDINAL	The position of the person in the author list for the research output in Pure.
PERSON_ROLE	"author" or "editor". Need to find Pure documentation on any other possible values.
PERSON_PURE_INTERNAL	"Y" if Pure classified the person as UMN-internal at the time of publication of the research output, "N" otherwise. Note that, because we have not loaded data for all UMN persons into Pure, some UMN persons will be classified as external in Pure.
FIRST_NAME	The given name for the person as it appears in the author list for the research output in Pure. Note that this may be differ from PERSON.FIRST_NAME.
LAST_NAME	The family name for the person as it appears in the author list for the research output in Pure. Note that this may be differ from PERSON.LAST_NAME.
EMPLID	De-normalization column. See the description in PERSON.

PURE_ORG

An organization (e.g. university, college, department, etc.) in Pure. May be internal or external to UMN. Pure requires all UMN-internal organizations to be part of a single hierarchy, with UMN itself as the root. Note that sometimes we combine multiple UMN departments into one Pure organization. UMN-external organizations are never part of a hierarchy in Pure, and Pure gives us limited information for them in general.

Column	Description
PURE_UUID	Universally-unique ID for the organization in our Elsevier Pure database.
PURE_ID	Unique ID for the organization in our Elsevier Pure database. NULL for UMN-external organizations, and some UMN-internal organizations.
PARENT_PURE_UUID	Universally-unique ID for the parent organization in our Elsevier Pure database. NULL for UMN-external organizations.
PARENT_PURE_ID	Unique ID for the parent organization in our Elsevier Pure database. NULL for UMN-external organizations, and some UMN-internal organizations.
PURE_INTERNAL	"Y" if Pure classifies the organization as UMN-internal, "N" otherwise.
TYPE	"academic", "college", "corporate", "department", "government", "initiative", "institute", "medical", "private non-profit", "university", or "unknown"
NAME_EN	Name of the organization. Called "name_en" to be consistent with Pure naming, and to indicate that this is an English name.
NAME_VARIANT_EN	An alternative name of the organization. Called "name_variant_en" to be consistent with Pure naming, and to indicate that this is an English name.
URL	The website for the organization.

PURE_INTERNAL_ORG

The hierarchy (tree) of Pure UMN-internal organizations. This tree uses nested sets, as implemented by the Python package sqlalchemy_mptt. However, because Oracle supports recursive queries, this may not be the best implementation. Because parent-child relationships (adjacency lists) already exist in the PURE_ORG table, this entire table may be unnecessary and may go away.

Column	Description
ID	The unique ID for the node. Defined by sqlalchemy_mptt.
LFT	The left number for the node. Defined by sqlalchemy_mptt.
PARENT_ID	The unique ID for the parent of the node. Defined by sqlalchemy_mptt.
level	The depth (i.e. generation) of this node in the tree. Defined by sqlalchemy_mptt.
RGT	The right number for the node. Defined by sqlalchemy_mptt.
TREE_ID	The unique ID of the tree that contains the node. Defined by sqlalchemy_mptt, which supports multiple trees in a single table.
PURE_UUID	See the description in PURE_ORG.
PURE_ID	See the description in PURE_ORG.
NAME_EN	See the description in PURE_ORG.

UMN_DEPT_PURE_ORG

Associates UMN departments with Pure organizations. Note that many UMN departments may map to one Pure organization.

Column	Description
UMN_DEPT_ID	Unique ID for the UMN department in PeopleSoft.
UMN_DEPT_NAME	Name of the UMN department in PeopleSoft. De-normalization column.
PURE_ORG_UUID	Foreign key to PURE_ORG.
PURE_ORG_ID	De-normalization column. See the description in PURE_ORG.

UMN_PERSON_PURE_ORG

Associates persons that Pure classifies as UMN-internal with Pure organizations. We use this table, in addition to PERSON_PURE_ORG, because Pure attaches far more data to UMN-internal persons, some of which we use to ensure row uniqueness. Note that there are four columns in the primary key: PURE_ORG_UUID, PERSON_UUID, JOB_DESCRIPTION, and START_DATE. This is because UMN-internal persons may change positions, and also organization affiliations, over time. There may be multiple rows for the same person in this table.

Column	Description
PERSON_UUID	Foreign key to PERSON.
PURE_ORG_UUID	Foreign key to PURE_ORG.
JOB_DESCRIPTION	The description of this job in PeopleSoft. Maybe be better to use a job code here instead.
EMPLOYED_AS	Always "Academic" for the data we have loaded so far. Uncertain whether we will have other values in the future.
STAFF_TYPE	"academic" or "nonacademic".
START_DATE	The date the person started this job with this organization.
END_DATE	The date the person ended this job with this organization.
PRIMARY	"Y" if this is the person"s primary organization affiliation, otherwise "N".
EMPLID	De-normalization column. See the description in PERSON.
PURE_PERSON_ID	De-normalization column. See the description for PERSON.PURE_ID.
PURE_ORG_ID	De-normalization column. See the description for PURE_ORG.PURE_ID.

PERSON_PURE_ORG

Associates persons with their organizations.

Column	Description
PERSON_UUID	Foreign key to PERSON.
PURE_ORG_UUID	Foreign key to PURE_ORG.

PUB_PERSON_PURE_ORG

Associates with persons with their organization affiliations at the time of publication of a research output.

Column	Description
PUB_UUID	Foreign key to PUB.
PERSON_UUID	Foreign key to PERSON.
PURE_ORG_UUID	Foreign key to PURE_ORG.

Python API

Required environment variables

This package uses these environment variables to configure Oracle connections:

Variable	Description
EXPERTS_DB_USER	Database connection username
EXPERTS_DB_PASS	Database connection password
EXPERTS_DB_SERVICE_NAME	Oracle database service name (e.g. TNSNAMES definition)
EXPERTS_DB_HOSTNAME	Oracle database domain name. Necessary for making cx_Oracle connections.

Contributing

Running Migrations

This project uses SQLAlchemy with Alembic to track database schema changes in migrations.

Alembic has been configured to read the required environment variables from the environment if already present, or to load them automatically from dotenv files using dotenv_switch. Create dotenv files in the project directory, e.g., .env.test, .env.prod, as needed.

dotenv_switch will use the test environment by default; instruct it to load a different environment by setting the APP_ENV variable. For instance APP_ENV=prod will cause dotenv_switch to load .env.prod.

Using Alembic with poetry and dotenv_switch

This project uses poetry to manage dependencies, so Alembic must be invoked within a poetry-created virtual environment. One way to do that is by preceding commands with poetry run. Examples, including the use of the APP_ENV variable described above:

# Run migrations against test environment (default)
$ poetry run alembic upgrade head

# Run migrations against prod environment
$ APP_ENV=prod poetry run alembic upgrade head

# Upgrade to a specific revision from alembic/versions
$ poetry run alembic upgrade fd1239632c06

# Downgrade to a previous revision
$ poetry run alembic downgrade fd1239632c06

Name		Name	Last commit message	Last commit date
Latest commit History 477 Commits
alembic		alembic
dist		dist
docs		docs
experts_dw		experts_dw
legacy		legacy
reports		reports
tests		tests
views		views
.editorconfig		.editorconfig
.gitignore		.gitignore
COPYRIGHT		COPYRIGHT
README.md		README.md
alembic.ini		alembic.ini
env.dist		env.dist
erd.pdf		erd.pdf
erd.png		erd.png
grad_prog_orgs_one_off.py		grad_prog_orgs_one_off.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experts@Minnesota Database Schema

Overview

Access

Announcements & Discussion

Maturity

Data Model Design

Organizational Complexity

Entity Relationship Diagram (ERD)

Data Dictionary

PERSON

PUB

PUB_PERSON

PURE_ORG

PURE_INTERNAL_ORG

UMN_DEPT_PURE_ORG

UMN_PERSON_PURE_ORG

PERSON_PURE_ORG

PUB_PERSON_PURE_ORG

Python API

Required environment variables

Contributing

Running Migrations

Using Alembic with poetry and dotenv_switch

About

Releases

Packages

Contributors 4

Languages

UMNLibraries/experts_dw

Folders and files

Latest commit

History

Repository files navigation

Experts@Minnesota Database Schema

Overview

Access

Announcements & Discussion

Maturity

Data Model Design

Organizational Complexity

Entity Relationship Diagram (ERD)

Data Dictionary

PERSON

PUB

PUB_PERSON

PURE_ORG

PURE_INTERNAL_ORG

UMN_DEPT_PURE_ORG

UMN_PERSON_PURE_ORG

PERSON_PURE_ORG

PUB_PERSON_PURE_ORG

Python API

Required environment variables

Contributing

Running Migrations

Using Alembic with poetry and dotenv_switch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages