Requires the Go programming language binaries with the GOPATH
environment variable specified and $GOPATH/bin
in your PATH
.
go get github.com/uncharted-distil/distil-ingest
Clone the repository:
mkdir $GOPATH/src/github.com/unchartedsoftware
cd $GOPATH/src/github.com/unchartedsoftware
git clone [email protected]:uncharted-distil/distil-ingest.git
Install dependencies:
cd distil-ingest
make install
Build executable:
make build
The repository contains CLIs used to parse, and ingest 3M OpenML datasets (those with a name beginning with o_
) into elasticsearch.
- Download D3M datasets of interest from https://datadrivendiscovery.org/data and unzip.
- Update and ensure the arguments in
./merge_all.sh
are correct - Run
./merge_all.sh
- Update and ensure the arguments in
./classify_all.sh
are correct - Run
./classify_all.sh
- Update and ensure the arguments in
./ingest_all.sh
are correct - Run
./ingest_all.sh
- The Elasticsearch instance does not have
http.compression
enabled. - The
mappings
json argument is invalid, most likely missing a closing bracket
- You are accessing an Elasticsearch instance that requires a VPN and it is not on.
- The Elasticsearch instance is temporarily down.
- Cause:
$GOPATH/bin
has not been added to your$PATH
. - Solution: Add
export PATH=$PATH:$GOPATH/bin
to your.bash_profile
or.bashrc
.