GitHub - splicing-ai/splicing: Splicing: Gen-AI Copilot for Data Engineering

Open-Source AI Copilot for Effortless Data Pipeline Building

Key Features

Notebook-style interface with chat capabilities in a web UI: Work on your data pipelines in a familiar Jupyter notebook interface, while the AI copilot assists and guides you by generating, executing, and debugging data engineering code throughout the process.
No vendor lock-in: Build your data pipelines with any data stack of your choice, and select the LLM you prefer for your copilot, with full flexibility.
Fully customizable: Break down your pipeline into multiple components—such as data movement, transformation, and more—and tailor each component to your specific needs. Splicing then seamlessly assembles these components into a complete, functional data pipeline.
Secure and manageable: Host Splicing on your own infrastructure, with full control over your data and LLMs. Your data and secret keys are never shared with LLM providers at any time.

Quick Start

The easiest way to run Splicing is in Docker:

Install Docker.
Run the following command to run Splicing:

docker run -v $(pwd)/.splicing:/app/.splicing \
  -p 3000:3000 \
  -p 8000:8000 \
  -it --rm splicingai/splicing:latest

By default, all application data is stored in the ./.splicing folder within the current directory where you run the above command. If you want to persist the data, make sure to back up this folder.

Navigate to http://localhost:3000/ to access the web UI.

You can also install Splicing without Docker for development by following the instructions in the CONTRIBUTING guide.

Roadmap

Data pipeline deployment: Support deploying data pipelines to your production environments with a push-to-deploy experience.
More data pipeline components: Support for more essential components in data pipelines, such as data quality checks and data lineage.
More integrations:
- Support for a wide range of data integrations in data pipelines (e.g., various data sources and warehouses).
- Support more LLMs as copilots (e.g., Claude and local models).
- Streamline source code structure, making it easier for the community to add integrations.
Smarter copilot: Enhance the copilot with more capabilities, such as automatically generating semantic models and ER diagrams for data in warehouses, making it easier to build data pipelines.

Resources

Tech Stacks

Frontend: Next.js, Tailwind CSS and Shadcn
Backend: FastAPI and Redis
Agentic framework: LangGraph

Contributing

Please refer to CONTRIBUTING.md for more details.

FAQs

What are the primary use cases for Splicing?

Splicing assists in building data pipelines, including tasks like data ingestion, transformation, and orchestration, to prepare your data for downstream processes such as data analysis and machine learning.

Who is Splicing for?

Splicing is designed for data engineers, data scientists, and anyone who needs to build data pipelines. Even if you have limited data engineering experience, Splicing's AI Copilot will guide you step-by-step, and you can ask for help at any time using natural language.

How is Splicing different from other code generation tools and AI copilots?

Splicing is specifically designed for data engineering, a field with many complex choices that hasn't fully adopted generative AI for productivity. Unlike generic tools, Splicing focuses on optimizing language models for the fixed steps common in data pipelines. It's also deeply integrated with data sources and tools, allowing the copilot to understand your project's context—your configurations, data, and more—leading to more accurate and useful code generation compared to general-purpose copilots.

How secure is Splicing? Will my data be shared?

Splicing is open-source and can be hosted on your own infrastructure. Your data and secret keys are never shared with us or any LLM providers by design. Additionally, the Splicing Copilot doesn't automatically execute generated code—you control when and how it's run.

Can I run data pipelines built with Splicing elsewhere?

Yes! Splicing generates code using your preferred data integrations and tools. You can export the code with a single click and run or deploy it anywhere you like. There's no vendor lock-in.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
docs		docs
splicing		splicing
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-Source AI Copilot for Effortless Data Pipeline Building

Key Features

Quick Start

Roadmap

Resources

Tech Stacks

Contributing

FAQs

About

Releases

Packages

Languages

License

splicing-ai/splicing

Folders and files

Latest commit

History

Repository files navigation

Open-Source AI Copilot for Effortless Data Pipeline Building

Key Features

Quick Start

Roadmap

Resources

Tech Stacks

Contributing

FAQs

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages