Utilities for parsing ITS DataHub Metadata Questionnaire and ingesting metadata of datasets to ITS DataHub.
These instructions will get you a copy of the project up and running on your local machine for use, development, and testing purposes.
- Have access to Python 3.6+. You can check your python version by entering
python --version
andpython3 --version
in command line. - Have access to the command line of a machine. If you're using a Mac, the command line can be accessed via the Terminal, which comes with Mac OS. If you're using a PC, the command line can be accessed via the Command Prompt, which comes with Windows, or via Cygwin64, a suite of open source tools that allow you to run something similar to Linux on Windows.
- Download the script by cloning the module's code repository on GitHub. You can do so by running one of the following in command line. If unfamiliar with how to clone a repository, follow the official GitHub guide.
- via HTTP:
git clone https://github.com/usdot-its-jpo-data-portal/metadata_ingest.git
- via SSH (if using 2-factor authentication):
git clone [email protected]:usdot-its-jpo-data-portal/metadata_ingest.git
- via HTTP:
- Navigate into the repository folder by entering
cd metadata_ingest
in command line. - Run
pip install -e .
to install the metadata_ingest Python package. - Install the required packages by running
pip install -r requirements.txt
. - Update the
metadata_ingest/const.py
file with your credentials, OR, copy the CONFIG SECTION of theconst.py
file to createmetadata_ingest/const_local.py
and update theconst_local.py
file with your Socrata credentials
Run python metadata_ingest/form_parsers.py
to test parsing the sample metadata questionnaire included in the forms
folder of this repository. The parsed information will be shown in the command line interface.
Run python metadata_ingest/socrata_ingestor.py
to test creating a dataset in the Socrata platform of your choice, using the metadata information from the sample Metadata Questionnaire at forms/ITSJPO_MetadataQuestionnaire_fillable_sample.pdf
The form parsers and Socrata dataset creator can be imported into your own code by adding the following statement:
from metadata_ingest.form_parsers import ITSMetadataQuestionnaire, PDFQuestionnaire
from metadata_ingest.socrata_ingestor import SocrataDataset
Sample usage have been provided in the demo.ipynb file in this repository.
- Python 3.6+
- PyPDF2: Python package used to parse the fillable PDF.
- socrata-py: Python SDK for the Socrata Data Management API.
- requests: elegant and simple HTTP library for Python.
- Fork it
- Create your feature branch (git checkout -b feature/fooBar)
- Commit your changes (git commit -am 'Add some fooBar')
- Push to the branch (git push origin feature/fooBar)
- Create a new Pull Request
Please read CONTRIBUTING.md for general good practices on code of conduct, and the process for submitting pull requests.
This project is licensed under the Apache 2.0 License. - see the LICENSE file for details
- 1.1.0
- Added "Section D. Promotion" section for promotion/communication questions to metadata questionnaire.
- 1.0.0
- Initial version
ITS DataHub Team: [email protected] Distributed under Apache 2.0 License. See LICENSE for more information.
Thank you to the Department of Transportation for funding to develop this project.
- Agency: DOT
- Short Description: Python package for parsing ITS DataHub Metadata Questionnaire and ingesting metadata of datasets to the U.S. DOT's Socrata platform (data.transportation.gov).
- Status: Beta
- Tags: transportation, fillable PDF, intelligent transportation systems, python, ITS DataHub
- Labor Hours: 0
- Contact Name: Brian Brotsos
- Contact Phone: 202-366-9013