Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NL to workflow code generation #270

Merged
merged 5 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* Couler is included in [CNCF Cloud Native Landscape](https://landscape.cncf.io/) and [LF AI Landscape](https://landscape.lfai.foundation).
* Check out our technical report published on ICDE 2024 [here](https://arxiv.org/abs/2403.07608).

> Note that while one of ambitious goals of Couler is to support multiple workflow engines, Couler currently only supports Argo Workflows as the workflow orchestration backend. An ambitious goal of Couler is to provide support for multiple workflow engines. While it initially supported only Argo Workflows for workflow orchestration, efforts are now underway to extend support to other workflow engines such as Tekton Pipelines and Apache Airflow.
> Note that while one of ambitious goals of Couler is to support multiple workflow engines, Couler currently only supports Argo Workflows as the workflow orchestration backend. An ambitious goal of Couler is to provide support for multiple workflow engines. While it initially supported only Argo Workflows for workflow orchestration, we are actively working on enhancing our support for Airflow and the current system supports about 40-50% of the Airflow API.
> In addition, if you are looking for a Python SDK that provides access to all the available features from Argo Workflows, you might want to check out [the low-level Python SDK maintained by the Argo Workflows team](https://argoproj.github.io/argo-workflows/client-libraries/).


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions docs/NL-to-Unified-Programming-Interface/Method.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
In this file, we introduce the application of LLMs for converting Natural Language (NL) to Unified Programming Interface. Traditional methods involve defining workflows using various techniques and submitting them to a cluster. Lately, LLMs have demonstrated remarkable performance across a wide array of inference tasks. However, upon direct application of LLMs for unified programming code generation, certain challenges arise: Firstly, the overall workflow complexity hampers the performance of LLMs in complete workflow conversion. Secondly, LLMs possess limited knowledge regarding Couler's unified programming interface.

To address these challenges, we introduce a method that leverages LLMs to automatically translate natural language into unified programming code via the crafting of task-specific prompts. This approach enables users to articulate their desired workflows in natural language, which are then automatically translated into executable unified programming code. As a result, our method simplifies the Couler workflow creation process and improves usability for individuals with limited programming experience. The transition from NL descriptions to Couler code encompasses four pivotal steps:

**Step 1: Modular Decomposition:**

Initially, we employ a chain of thought strategy to decompose natural language descriptions into smaller, more concise task modules, such as data loading, data processing, model generation, and evaluation metrics. Each module should encapsulate a singular, coherent task to ensure the precision and correctness of the generated Couler code. A series of predefined task types can be established to identify and extract pertinent tasks based on the input of natural language descriptions automatically. They provide a structured approach to ensure the precision and correctness of the code generated.

**Step 2: Code Generation:**

For each independent subtask, we utilize LLMs to generate code. Considering that LLMs have limited knowledge about Couler, we construct a Code Lake containing code for various functions. We search for relevant code from the Code Lake for each subtask and provide it to LLMs for reference. This significantly improves the ability for unified programming code generation.

**Step 3: Self-calibration:**

After generating the code for each subtask, we integrate a self-calibration strategy to optimize the generated code. This strategy evaluates the generated code by having LLMs critique it. Initially, we define a baseline score as the standard evaluation score. We use LLMs to evaluate the generated code for a score between 0 and 1, and if, we will provide feedback of LLMs and repeat the code generation. After this self-calibration, we will have improved code for each subtask.

**Step 4: User Feedback:**

Finally, users can review and validate the generated workflow code. If the generated code fails to meet the users' requirements, they have the opportunity to provide feedback and suggestions in textual format. The system will leverage this feedback to optimize the code and enhance the precision of code generation.


Binary file not shown.
Binary file not shown.
Loading