This repository contains the ElasticSearch index definition for the datasets.
The definition for the index dataassets is contained in the file file index-dataassets-sts.json.
Sample data is provided under the data folder in file datahub-dataassets-all-sample.txt
Bash shell scripts are provided to create the index and load the sample data into the created index.
Because the provided scripts are executed using bash shell and curl the following steps considers that the tools are already installed.
Command Format
create-indexes.sh INDEX HOST
- INDEX : Name of the Index to be created
- HOST : Host address including port and protocol (http/https)
Command Format
load_sample_data.sh INDEX HOST DATAFILE
- INDEX: Name of the Index where the data will be loaded.
- HOST: Host address including port and protocol (http/https)
- DATAFILE: File in text format that includes the data to bulk insert into the Index.
- See Elasticsearch Bulk API for the file's format content.
The required version of Elasticsearch is 7.x.
For installation of Elasticsearch refer to Installing Elasticsearch instruction.
Amazon Web Services (AWS) offers Amazon Elasticsearch Service and could be a good alternative to host an Elasticsearch instance, see the instructions for the developer guide
- create-indexes.sh : Creates an Index in an Elasticsearch instance.
- schemas/dataassets-index.json : Elasticsearch index definition for the data assets index; holds metadata information on data assets.
- schemas/configurations-index.json : Elasticsearch index definition for the configurations index; holds configuration information on ITS DataHub.
- schemas/related-index.json : Elasticsearch index definition for the related index; holds linkage information on related source code assets from ITS CodeHub.
- schemas/metrics-index.json : Elasticsearch index definition for the metrics index; holds usage metrics on ITS DataHub assets.
- data/load_sample_data.sh : Loads data into an existing index in Elasticsearch.
- data/datahub-dataassets-all-sample.txt : Contains sample data to be loaded into a data assets Index.
- data/datahub-related-sample.txt : Contains sample data to be loaded into a related Index.
- data/configurations-document.json : Contains sample data to be loaded into a configurations Index.
- data/datahub-metrics-sample.txt : Contains sample data to be loaded into a metrics Index.
Requires to have Elasticsearch instance running and available.
- Open a command line window
./create-indexes.sh dataassets [http://elasticsearch-url:port]
- Validate whether the index was created by listing the indexes in the Elasticsearch.
curl -X GET [http://elasticsearch-url:port]/_cat/indices?v
- The dataassets should be listed in the reponse of the previous command.
- Load sample data
./data/load_sample_data.sh dataassets [http://elasticsearch-url:port] ./data/datahub-dataassets-all-sample.txt
- Validate if data was loaded
curl -X GET [http://elasticsearch-url:port]/dataassets/_search
- 0.1.0
- Initial version
ITS CodeHub Support Team : [email protected]
Distributed under XYZ license. See LICENSE for more information
- Fork it (https://github.com/usdot-its-jpo-data-portal/datahub-search/fork)
- Create your feature branch (git checkout -b feature/fooBar)
- Commit your changes (git commit -am 'Add some fooBar')
- Push to the branch (git push origin feature/fooBar)
- Create a new Pull Request
Thank you to the Department of Transportation for funding to develop this project.
- Agency: DOT
- Short Description: Defines and creates Elasticsearch indexes for ITS DataHub.
- Status: Beta
- Tags: transportation, connected vehicles, intelligent transportation systems, python, DMP, Sufficiency Checklist, Elasticsearch
- Labor Hours:
- Contact Name:
- Contact Phone: