diff --git a/README.md b/README.md index 53210ef7..b44e63b0 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,7 @@ The benefits of db-ally can be described in terms of its four main characteristi ## Quickstart -In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views) and **filters**. A list of possible filters is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language. +In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views), **filters** and **aggregations**. A list of possible filters and aggregations is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language. This is a basic implementation of a db-ally view for an example HR application, which retrieves candidates from an SQL database: @@ -60,8 +60,10 @@ class CandidateView(SqlAlchemyBaseView): """ return Candidate.country == country -engine = create_engine('sqlite:///examples/recruiting/data/candidates.db') + llm = LiteLLM(model_name="gpt-3.5-turbo") +engine = create_engine("sqlite:///examples/recruiting/data/candidates.db") + my_collection = create_collection("collection_name", llm) my_collection.add(CandidateView, lambda: CandidateView(engine)) diff --git a/docs/about/roadmap.md b/docs/about/roadmap.md index a6f5312e..600f6cf2 100644 --- a/docs/about/roadmap.md +++ b/docs/about/roadmap.md @@ -9,7 +9,7 @@ Below you can find a list of planned features and integrations. ## Planned Features -- [ ] **Support analytical queries**: support for exposing operations beyond filtering. +- [x] **Support analytical queries**: support for exposing operations beyond filtering. - [x] **Few-shot prompting configuration**: allow users to configure the few-shot prompting in View definition to improve IQL generation accuracy. - [ ] **Request contextualization**: allow to provide extra context for db-ally runs, such as user asking the question. diff --git a/docs/concepts/iql.md b/docs/concepts/iql.md index c4496aca..5f79c5ec 100644 --- a/docs/concepts/iql.md +++ b/docs/concepts/iql.md @@ -1,12 +1,45 @@ # Concept: IQL -Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](./structured_views.md), LLM utilizes IQL to express complex queries in a simplified way. +Intermediate Query Language (IQL) is a simple language that serves as an abstraction layer between natural language and data source-specific query syntax, such as SQL. With db-ally's [structured views](structured_views.md), LLM utilizes IQL to express complex queries in a simplified way. IQL allows developers to model operations such as filtering and aggregation on the underlying data. + +## Filtering For instance, an LLM might generate an IQL query like this when asked "Find me French candidates suitable for a senior data scientist position": +```python +from_country("France") AND senior_data_scientist_position() ``` -from_country('France') AND senior_data_scientist_position() + +The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use boolean operators (`AND`, `OR`, `NOT`) to combine individual filters into more complex expressions. + +## Aggregation + +Similar to filtering, developers can define special methods in [structured views](structured_views.md) that perform aggregation. These methods are also exposed to the LLM via IQL. For example, an LLM might generate the following IQL query when asked "What's the average salary for each country?": + +```python +average_salary_by_country() ``` -The capabilities made available to the AI model via IQL differ between projects. Developers control these by defining special [Views](structured_views.md). db-ally automatically exposes special methods defined in structured views, known as "filters", via IQL. For instance, the expression above suggests that the specific project contains a view that includes the `from_country` and `senior_data_scientist_position` methods (and possibly others that the LLM did not choose to use for this particular question). Additionally, the LLM can use Boolean operators (`and`,`or`, `not`) to combine individual filters into more complex expressions. +The `average_salary_by_country` groups candidates by country and calculates the average salary for each group. + +The aggregation IQL call has access to the raw query, so it can perform even more complex aggregations. Like grouping different columns, or applying a custom functions. We can ask db-ally to generate candidates raport with the following IQL query: + +```python +candidate_report() +``` + +In this case, the `candidate_report` method is defined in a structured view, and it performs a series of aggregations and calculations to produce a report with the average salary, number of candiates, and other metrics, by country. + +## Operation chaining + +Some queries require filtering and aggregation. For example, to calculate the average salary for a data scientist in the US, we first need to filter the data to include only US candidates who are senior specialists, and then calculate the average salary. In this case, db-ally will first generate an IQL query to filter the data, and then another IQL query to calculate the average salary. + +```python +from_country("USA") AND senior_data_scientist_position() +``` + +```python +average_salary() +``` +In this case, db-ally will execute queries sequentially to build a single query plan to execute on the data source. diff --git a/docs/concepts/structured_views.md b/docs/concepts/structured_views.md index db048c8f..aab4970f 100644 --- a/docs/concepts/structured_views.md +++ b/docs/concepts/structured_views.md @@ -7,7 +7,7 @@ Structured views are a type of [view](../concepts/views.md), which provide a way Given different natural language queries, a db-ally view will produce different responses while maintaining a consistent data structure. This consistency offers a reliable interface for integration - the code consuming responses from a particular structured view knows what data structure to expect and can utilize this knowledge when displaying or processing the data. This feature of db-ally makes it stand out in terms of reliability and stability compared to standard text-to-SQL approaches. -Each structured view can contain one or more “filters”, which the LLM may decide to choose and apply to the extracted data so that it meets the criteria specified in the natural language query. Given such a query, LLM chooses which filters to use, provides arguments to the filters, and connects the filters with Boolean operators. The LLM expresses these filter combinations using a special language called [IQL](iql.md), in which the defined view filters provide a layer of abstraction between the LLM and the raw syntax used to query the data source (e.g., SQL). +Each structured view can contain one or more **filters** or **aggregations**, which the LLM may decide to choose and apply to the extracted data so that it meets the criteria specified in the natural language query. Given such a query, LLM chooses which filters to use, provides arguments to the filters, and connects the filters with boolean operators. For aggregations, the LLM selects an appropriate aggregation method and applies it to the data. The LLM expresses these filter combinations and aggregation using a special language called [IQL](iql.md), in which the defined view filters and aggregations provide a layer of abstraction between the LLM and the raw syntax used to query the data source (e.g., SQL). !!! example For instance, this is a simple [view that uses SQLAlchemy](../how-to/views/sql.md) to select data from specific columns in a SQL database. It contains a single filter, that the LLM may optionally use to control which table rows to fetch: @@ -18,14 +18,14 @@ Each structured view can contain one or more “filters”, which the LLM may de A view for retrieving candidates from the database. """ - def get_select(self): + def get_select(self) -> Select: """ Defines which columns to select """ return sqlalchemy.select(Candidate.id, Candidate.name, Candidate.country) @decorators.view_filter() - def from_country(self, country: str): + def from_country(self, country: str) -> ColumnElement: """ Filter candidates from a specific country. """ diff --git a/docs/index.md b/docs/index.md index 18f11b05..64471909 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,8 +10,8 @@ hide:
@@ -49,7 +49,7 @@ The benefits of db-ally can be described in terms of its four main characteristi
## Quickstart
-In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views) and **filters**. A list of possible filters is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language.
+In db-ally, developers define their use cases by implementing [**views**](https://db-ally.deepsense.ai/concepts/views), **filters** and **aggregations**. A list of possible filters and aggregations is presented to the LLM in terms of [**IQL**](https://db-ally.deepsense.ai/concepts/iql) (Intermediate Query Language). Views are grouped and registered within a [**collection**](https://db-ally.deepsense.ai/concepts/views), which then serves as an entry point for asking questions in natural language.
This is a basic implementation of a db-ally view for an example HR application, which retrieves candidates from an SQL database:
@@ -76,8 +76,10 @@ class CandidateView(SqlAlchemyBaseView):
"""
return Candidate.country == country
-engine = create_engine('sqlite:///examples/recruiting/data/candidates.db')
+
llm = LiteLLM(model_name="gpt-3.5-turbo")
+engine = create_engine("sqlite:///examples/recruiting/data/candidates.db")
+
my_collection = create_collection("collection_name", llm)
my_collection.add(CandidateView, lambda: CandidateView(engine))
diff --git a/docs/quickstart/aggregations.md b/docs/quickstart/aggregations.md
new file mode 100644
index 00000000..951543fb
--- /dev/null
+++ b/docs/quickstart/aggregations.md
@@ -0,0 +1,93 @@
+# Quickstart: Aggregations
+
+This guide is a continuation of the [Intro](./intro.md) guide. It assumes that you have already set up the views and the collection. If not, please refer to the complete Part 1 code on [GitHub](https://github.com/deepsense-ai/db-ally/blob/main/examples/intro.py){:target="_blank"}.
+
+In this guide, we will add aggregations to our view to calculate general metrics about the candidates.
+
+## View Definition
+
+To add aggregations to our [structured view](../concepts/structured_views.md), we'll define new methods. These methods will allow the LLM model to perform calculations and summarize data across multiple rows. Let's add three aggregation methods to our `CandidateView`:
+
+```python
+class CandidateView(SqlAlchemyBaseView):
+ """
+ A view for retrieving candidates from the database.
+ """
+
+ def get_select(self) -> sqlalchemy.Select:
+ """
+ Creates the initial SqlAlchemy select object, which will be used to build the query.
+ """
+ return sqlalchemy.select(Candidate)
+
+ @decorators.view_aggregation()
+ def average_years_of_experience(self) -> sqlalchemy.Select:
+ """
+ Calculates the average years of experience of candidates.
+ """
+ return self.select.with_only_columns(
+ sqlalchemy.func.avg(Candidate.years_of_experience).label("average_years_of_experience")
+ )
+
+ @decorators.view_aggregation()
+ def positions_per_country(self) -> sqlalchemy.Select:
+ """
+ Returns the number of candidates per position per country.
+ """
+ return (
+ self.select.with_only_columns(
+ sqlalchemy.func.count(Candidate.position).label("number_of_positions"),
+ Candidate.position,
+ Candidate.country,
+ )
+ .group_by(Candidate.position, Candidate.country)
+ .order_by(sqlalchemy.desc("number_of_positions"))
+ )
+
+ @decorators.view_aggregation()
+ def candidates_per_country(self) -> sqlalchemy.Select:
+ """
+ Returns the number of candidates per country.
+ """
+ return (
+ self.select.with_only_columns(
+ sqlalchemy.func.count(Candidate.id).label("number_of_candidates"),
+ Candidate.country,
+ )
+ .group_by(Candidate.country)
+ )
+```
+
+By setting up these aggregations, you enable the LLM to calculate metrics about the average years of experience, the number of candidates per position per country, and the top universities based on the number of candidates.
+
+## Query Execution
+
+Having already defined and registered the view with the collection, we can now execute the query:
+
+```python
+result = await collection.ask("What is the average years of experience of candidates?")
+print(result.results)
+```
+
+This will return the average years of experience of candidates.
+
+The expected output
+```
+The generated SQL query is: SELECT avg(candidates.years_of_experience) AS average_years_of_experience
+FROM candidates
+
+Number of rows: 1
+{'average_years_of_experience': 4.98}
+```
+