Skip to content

Commit

Permalink
Rename to CoML (#3)
Browse files Browse the repository at this point in the history
  • Loading branch information
ultmaster authored Sep 18, 2023
1 parent e432607 commit d036405
Show file tree
Hide file tree
Showing 25 changed files with 167 additions and 150 deletions.
41 changes: 21 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,46 @@
# MLCopilot
# CoML

MLCopilot is a tool to help you find the best models/hyperparametes for your task. It uses Large Language Models(LLMs) to suggest models and hyperparameters based on your task description and previous experiments.
CoML (formerly MLCopilot) assists users in generating practical ML solutions based on historical experiences, streamlining complex ML challenges. Users input specific ML tasks they want to solve, such as classifying emails as spam or not. The system provides suggested solutions, including recommended ML models, data processing methods, and explanations that are easy for humans to understand. [Paper](https://arxiv.org/abs/2304.14979)

![](assets/demo.gif)

## Quickstart
(TODO: The demo needs an update.)

1. [Get an OpenAI API Key](#get-an-openai-api-key)
2. [Install requirements](#install-requirements)
3. [Run](#run)
### Installation

### Get an OpenAI API Key
We currently do not support installation from pypi. Please follow the steps below to install CoML:

1. Create an account [here](https://beta.openai.com/signup)
2. Create an API key [here](https://beta.openai.com/account/api-keys)
1. Clone this repo: `git clone REPO_URL; cd coml`
2. Put assets/coml.db in your home directory: `cp assets/coml.db ~/.coml/coml.db`
3. Copy `coml/.env.template` to `~/.coml/.env` and put your API keys in the file.
3. Install the package via `pip install -e .`.

### Install requirements
### Command line utility

0. Clone this repo: `git clone REPO_URL; cd mlcopilot`
1. Put assets/mlcopilot.db in your home directory: `cp assets/mlcopilot.db ~/.mlcopilot/mlcopilot.db`
2. Install Python 3.8 or higher
3. Install: `pip install .`. If you want to develop, use `pip install -e .[dev]` instead.
CoML can suggest a ML configuration within a specific task, for a specific task. Use the following command line:

### Run
```
coml --space <space> --task <task>
```

Command line: `mlcopilot`
If you feel uncertain about what to put into `<space>` or `<task>`, see the demo above, or try the interactive usage below:

```
coml --interactive
```

### API Usage

```python
from mlcopilot.suggest import suggest
from coml.suggest import suggest

space = import_space("YOUR_SPACE_ID")
task_desc = "YOUR_TASK_DESCRIPTION_FOR_NEW_TASK"
suggest_configs, knowledge = suggest(space, task_desc)
```



## Citation

If you find this work useful in your method, you can cite the paper as below:

@article{zhang2023mlcopilot,
Expand All @@ -51,4 +52,4 @@ If you find this work useful in your method, you can cite the paper as below:

## License

The entire codebase is under [MIT license](LICENSE).
The entire codebase is under [MIT license](LICENSE).
1 change: 1 addition & 0 deletions assets/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/private
File renamed without changes.
19 changes: 19 additions & 0 deletions coml/.env.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
### OPENAI
## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
OPENAI_API_KEY=your-openai-api-key

### DB
## COML_DB_BACKEND - Database backend (Example: sqlite)
COML_DB_BACKEND=sqlite
## COML_DB_PATH - Path to database file (Example: ~/.coml/coml.db) - Only for sqlite
COML_DB_PATH=~/.coml/coml.db
## COML_DB_NAME - Database name (Example: coml)
COML_DB_NAME=coml
## COML_DB_HOST - Database host (Example: localhost)
COML_DB_HOST=localhost
## COML_DB_PORT - Database port (Example: 5432)
COML_DB_PORT=5432
## COML_DB_USER - Database user (Example: postgres)
COML_DB_USER=postgres
## COML_DB_PASSWORD - Database password (Example: '')
COML_DB_PASSWORD=''
2 changes: 1 addition & 1 deletion mlcopilot/__init__.py → coml/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from dotenv import load_dotenv

dotenv_dir = Path.home() / ".mlcopilot"
dotenv_dir = Path.home() / ".coml"
dotenv_path = (dotenv_dir / ".env").resolve()

if not os.path.exists(dotenv_dir):
Expand Down
4 changes: 4 additions & 0 deletions coml/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from . import cli

if __name__ == "__main__":
cli.main()
37 changes: 28 additions & 9 deletions mlcopilot/cli.py → coml/cli.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,37 @@
from typing import Optional

import click

from mlcopilot.orm import database_proxy
from .orm import database_proxy


@click.group(invoke_without_command=True)
@click.option("--space", help="Space ID.")
@click.option("--task", help="Task description.")
@click.option("--interactive", help="Interactive mode.", is_flag=True)
@click.pass_context
def main(
ctx: click.Context,
space: Optional[str] = None,
task: Optional[str] = None,
interactive: bool = False,
) -> None:
if ctx.invoked_subcommand is None:
from mlcopilot.suggest import suggest_interactive
if ctx.params["interactive"]:
from .suggest import suggest_interactive

suggest_interactive()
database_proxy.close()
else:
if ctx.params["space"] is None or ctx.params["task"] is None:
print("Please specify space ID and a task description.")
return
from .space import import_space
from .suggest import print_suggested_configs, suggest

suggest_interactive()
database_proxy.close()
results = suggest(import_space(ctx.params["space"]), ctx.params["task"])
print_suggested_configs(*results)
database_proxy.close()


@main.command()
Expand All @@ -36,9 +55,9 @@ def create(
space: str
The ID of the space to identify the space.
history: str
The path to the history of configurations. A csv file, format see `mlcopilot::experience::ingest_experience`.
The path to the history of configurations. A csv file, format see `coml.experience.ingest_experience`.
task_desc: str
The JSON path to the task description. A json file, format see `mlcopilot::experience::ingest_experience`.
The JSON path to the task description. A json file, format see `coml.experience.ingest_experience`.
space_desc: str
The text path to the space description. Optional.
no_knowledge: bool
Expand All @@ -48,15 +67,15 @@ def create(
-------
None
"""
from mlcopilot.space import create_space
from .space import create_space

create_space(space, history, task_desc, space_desc, no_knowledge)
database_proxy.close()


@main.command()
def list() -> None:
from mlcopilot.space import print_space
from .space import print_space

print_space()
database_proxy.close()
Expand All @@ -65,7 +84,7 @@ def list() -> None:
@main.command()
@click.argument("space", nargs=1)
def delete(space: str) -> None:
from mlcopilot.space import delete_space
from .space import delete_space

delete_space(space)
database_proxy.close()
30 changes: 15 additions & 15 deletions mlcopilot/constants.py → coml/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
from pathlib import Path

__all__ = [
"MLCOPILOT_DB_PATH",
"COML_DB_PATH",
"TOP_K",
"EMBED_DIM",
"bin_map",
"inverse_bin_map",
"q_num",
"MLCOPILOT_DB_BACKEND",
"MLCOPILOT_DB_NAME",
"MLCOPILOT_DB_HOST",
"MLCOPILOT_DB_PORT",
"MLCOPILOT_DB_USER",
"MLCOPILOT_DB_PASSWORD",
"COML_DB_BACKEND",
"COML_DB_NAME",
"COML_DB_HOST",
"COML_DB_PORT",
"COML_DB_USER",
"COML_DB_PASSWORD",
"PROMPT_FORMATS",
"DEFAULT_PROMPT_PREFIX",
"DEFAULT_PROMPT_SUFFIX",
Expand All @@ -28,17 +28,17 @@
TOKEN_COMPLETION_LIMIT = 800
RELAX_TOKEN = 500 # RELAX_TOKEN is the number of tokens to void token limit

MLCOPILOT_DB_BACKEND = os.environ.get("MLCOPILOT_DB_BACKEND", "sqlite")
COML_DB_BACKEND = os.environ.get("COML_DB_BACKEND", "sqlite")

MLCOPILOT_DB_PATH = Path(
os.environ.get("MLCOPILOT_DB_PATH", Path.home() / ".mlcopilot" / "mlcopilot.db")
COML_DB_PATH = Path(
os.environ.get("COML_DB_PATH", Path.home() / ".coml" / "coml.db")
).expanduser()

MLCOPILOT_DB_NAME = os.environ.get("MLCOPILOT_DB_NAME", "mlcopilot")
MLCOPILOT_DB_HOST = os.environ.get("MLCOPILOT_DB_HOST", "localhost")
MLCOPILOT_DB_PORT = os.environ.get("MLCOPILOT_DB_PORT", 5432)
MLCOPILOT_DB_USER = os.environ.get("MLCOPILOT_DB_USER", "postgres")
MLCOPILOT_DB_PASSWORD = os.environ.get("MLCOPILOT_DB_PASSWORD", "")
COML_DB_NAME = os.environ.get("COML_DB_NAME", "coml")
COML_DB_HOST = os.environ.get("COML_DB_HOST", "localhost")
COML_DB_PORT = os.environ.get("COML_DB_PORT", 5432)
COML_DB_USER = os.environ.get("COML_DB_USER", "postgres")
COML_DB_PASSWORD = os.environ.get("COML_DB_PASSWORD", "")

bin_map = {
0.1: "very small",
Expand Down
6 changes: 3 additions & 3 deletions mlcopilot/experience.py → coml/experience.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
from langchain.cache import InMemoryCache
from peewee import ModelSelect, fn

from mlcopilot.constants import *
from mlcopilot.orm import Knowledge, Solution, Space, Task, database_proxy
from mlcopilot.utils import format_config, get_llm
from .constants import *
from .orm import Knowledge, Solution, Space, Task, database_proxy
from .utils import format_config, get_llm

SAVE_OPTIONS = orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_SERIALIZE_DATACLASS

Expand Down
12 changes: 6 additions & 6 deletions mlcopilot/knowledge.py → coml/knowledge.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

from mlcopilot.constants import *
from mlcopilot.constants import TOKEN_COMPLETION_LIMIT, TOKEN_LIMIT
from mlcopilot.experience import gen_experience
from mlcopilot.orm import Knowledge, Solution, Space, Task, database_proxy
from mlcopilot.surrogate_utils import evaluate_configs
from mlcopilot.utils import get_llm, get_token_count_func, parse_configs
from .constants import *
from .constants import TOKEN_COMPLETION_LIMIT, TOKEN_LIMIT
from .experience import gen_experience
from .orm import Knowledge, Solution, Space, Task, database_proxy
from .surrogate_utils import evaluate_configs
from .utils import get_llm, get_token_count_func, parse_configs

prefix_sep = "__DUMM_SEP__"

Expand Down
32 changes: 14 additions & 18 deletions mlcopilot/orm.py → coml/orm.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
except ImportError:
from_db = to_db = None

from mlcopilot.constants import *
from mlcopilot.utils import get_llm
from .constants import *
from .utils import get_llm


class ArrayField(BlobField):
Expand Down Expand Up @@ -72,34 +72,32 @@ def cosine_distance(self, text: str):

database_proxy = DatabaseProxy()

if MLCOPILOT_DB_BACKEND == "sqlite":
if COML_DB_BACKEND == "sqlite":
from peewee import SqliteDatabase

init_db_func = lambda: SqliteDatabase(MLCOPILOT_DB_PATH)
elif MLCOPILOT_DB_BACKEND == "postgres":
init_db_func = lambda: SqliteDatabase(COML_DB_PATH)
elif COML_DB_BACKEND == "postgres":
from peewee import PostgresqlDatabase

init_db_func = lambda: PostgresqlDatabase(
MLCOPILOT_DB_NAME,
host=MLCOPILOT_DB_HOST,
port=MLCOPILOT_DB_PORT,
user=MLCOPILOT_DB_USER,
password=MLCOPILOT_DB_PASSWORD,
COML_DB_NAME,
host=COML_DB_HOST,
port=COML_DB_PORT,
user=COML_DB_USER,
password=COML_DB_PASSWORD,
)
else:
raise NotImplementedError(
f"MLCOPILOT_DB_BACKEND {MLCOPILOT_DB_BACKEND} not supported."
)
raise NotImplementedError(f"COML_DB_BACKEND {COML_DB_BACKEND} not supported.")


def init_db():
database_proxy.initialize(init_db_func())
conn = database_proxy.connection()
if MLCOPILOT_DB_BACKEND == "postgres":
if COML_DB_BACKEND == "postgres":
register_vector(conn)
database_proxy.create_tables([Space, Task, Solution, Knowledge])

if MLCOPILOT_DB_BACKEND == "sqlite":
if COML_DB_BACKEND == "sqlite":
_cache = {}

@database_proxy.func()
Expand Down Expand Up @@ -128,9 +126,7 @@ class Space(BaseModel):

class Task(BaseModel):
task_id: str = TextField(primary_key=True)
embedding = (
ArrayField() if MLCOPILOT_DB_BACKEND == "sqlite" else VectorField(EMBED_DIM)
)
embedding = ArrayField() if COML_DB_BACKEND == "sqlite" else VectorField(EMBED_DIM)
desc = TextField()
row_desc = TextField()

Expand Down
14 changes: 7 additions & 7 deletions mlcopilot/space.py → coml/space.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@

import pandas as pd

from mlcopilot.experience import ingest_experience
from mlcopilot.knowledge import get_knowledge
from mlcopilot.orm import Knowledge, Solution, Space, Task, database_proxy
from .experience import ingest_experience
from .knowledge import get_knowledge
from .orm import Knowledge, Solution, Space, Task, database_proxy


def gen_space_description(
Expand Down Expand Up @@ -64,9 +64,9 @@ def create_space(
space_id: str
The ID of the space to identify the space.
history: str
The path to the history of configurations. A csv file, format see `mlcopilot::experience::ingest_experience`.
The path to the history of configurations. A csv file, format see `coml.experience.ingest_experience`.
task_desc: str
The JSON path to the task description. A json file, format see `mlcopilot::experience::ingest_experience`.
The JSON path to the task description. A json file, format see `coml.experience.ingest_experience`.
space_desc: str
The text path to the space description. Optional.
no_knowledge: bool
Expand All @@ -93,8 +93,8 @@ def create_space(
space = ingest_experience(history_df, task_desc, space_desc, space_id)

if not no_knowledge and get_knowledge(space) == "":
from mlcopilot.knowledge import post_validation
from mlcopilot.surrogate_utils import process_history_df, train_surrogate
from .knowledge import post_validation
from .surrogate_utils import process_history_df, train_surrogate

history_df_processed, config_names = process_history_df(history_df)
surrogate_fn = train_surrogate(history_df_processed)
Expand Down
Loading

0 comments on commit d036405

Please sign in to comment.