GitHub - the-sea-ink/data-science-pipelines

This is a TU-Berlin Master's Thesis. It is created and implemented for research and educational purposes only, not for commercial use.

To begin with, install the requirements:

python -m pip install -r requirements.txt

Switch to virtual environment and install Regraph by executing following command inside the third_library/Regraph folder:

python setup.py install

To use CLI:

start virtual environment
cd into the project folder data-science-pipelines
run commands

Commands:

initialize database:    python data_science_pipelines init_db 
create pipeline:        python data_science_pipelines create_pipeline path_to_script language --hpath hook_path --wtfile True --opath output_file_path 
extract rule:           python data_science_pipelines extract_rule path_to_g1 path_to_g2 rule_type 
confirm rule:           python data_science_pipelines path_to_pattern path_to_result rule_name rule_description language rule_type rule_priority
list rules by language: python data_science_pipelines list_rules language 
visualize rule:         python data_science_pipelines visualize_rule rule_name 
delete a rule:          python data_science_pipelines delete_rule rule_name
add a rule:             python data_science_pipelines add_rule path_to_rule
add new module to kb:   python data_science_pipelines add_module module_name module_version date(yyyy-mm-dd) language
add function to kb:     python data_science_pipelines add_function module_name function_title function_description function_language ds_task --dlink link_to_documentation
add data science dask:  python data_science_pipelines add_ds_task module_name function_title language ds_task 
add description:        python data_science_pipelines add_description module_name function_title language description

To populate our knowledge base, we used publicly available information from Pandas (BSD-3-Clause license) and Scikit-learn (BSD-3-Clause license) documentation. We also used some of the data provided by the Data Science Ontology project (CC-BY-4.0 license).

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.idea		.idea
data		data
data_science_pipelines		data_science_pipelines
dist		dist
playground		playground
src		src
third_party		third_party
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
graph.json		graph.json
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
upload.ts		upload.ts
upload.tsx		upload.tsx
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

the-sea-ink/data-science-pipelines

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages