DataAssistant

DataAssistant is an artificial intelligence library that can generate GraphQL queries from natural language prompts, with the help of a specifically trained neural network.

The project uses a chat version of TinyLlama to retrain the model using a provided dataset.

Why?

The goal of this project is to create a tool that can help users, developers and database administrators to easily query databases without having to learn SQL or other query languages.

Advantages

Easy to use: The user only needs to write a natural language prompt, and the library will generate the corresponding query.
Single point of entry: Using the recommended Apollo Graphql Server as a backend, the user can query multiple databases using a single endpoint, without having to migrate data or change the database schema.
Flexibility: The library can be used with any database, parts of your database schema should be passed to the model to get better results.

Setup

Preparation

By default the model tries to use an ROCm instance. Change your pip packages accordingly.

Install Python requirements using:

pip install -r requirements.txt

Commands

App Arguments:

--action (evaluate/serve/train/visualize) What action to perform with the model
- evaluate: Run some examples on the model and store the results in results.json
- serve: Run a simple REST API that can generate a query for a request
- train: Train the model using the 'data/training_data.json' dataset
- visualize: Create a head view and a model view using BertViz
--dtype (float16/float32) What data type to use with your model
--device (gpu/cpu) Which device to run the model on

Before training the model will download a version of TinyLlama-Chat model to fine tune, then it will save the model to the 'model' directory.

After training the 'evaluate', 'serve' and 'visualize' actions will load the trained model from the 'model' directory.

Training Data

The library uses a neural network to generate queries from natural language prompts, and the neural network needs to be trained with a dataset of natural language prompts and their corresponding queries.

You can find the training data used for the example in the data/training_data.json file.

Documentation

Thesis and technical documentation

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.docs		.docs
data		data
dbassistant/exceptions		dbassistant/exceptions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
finetune.sh		finetune.sh
main.py		main.py
requirements.txt		requirements.txt
results.json		results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataAssistant

Why?

Advantages

Setup

Preparation

Commands

Training Data

Documentation

About

Languages

License

RawEnchilada/DataAssistant-Thesis

Folders and files

Latest commit

History

Repository files navigation

DataAssistant

Why?

Advantages

Setup

Preparation

Commands

Training Data

Documentation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages