What is PyPika?
PyPika is a Python API for building SQL queries. The motivation behind PyPika is to provide a simple interface for building SQL queries without limiting the flexibility of handwritten SQL. Designed with data analysis in mind, PyPika leverages the builder design pattern to construct queries to avoid messy string formatting and concatenation. It is also easily extended to take full advantage of specific features of SQL database vendors.
PyPika is a fast, expressive and flexible way to replace handwritten SQL (or even ORM for the courageous souls amongst you). Validation of SQL correctness is not an explicit goal of PyPika. With such a large number of SQL database vendors providing a robust validation of input data is difficult. Instead you are encouraged to check inputs you provide to PyPika or appropriately handle errors raised from your SQL database - just as you would have if you were writing SQL yourself.
Read the docs: http://pypika.readthedocs.io/en/latest/
PyPika supports python 3.5+
. It may also work on pypy, cython, and jython, but is not being tested for these versions.
To install PyPika run the following command:
pip install pypika
The main classes in pypika are pypika.Query
, pypika.Table
, and pypika.Field
.
from pypika import Query, Table, Field
The entry point for building queries is pypika.Query
. In order to select columns from a table, the table must
first be added to the query. For simple queries with only one table, tables and columns can be references using
strings. For more sophisticated queries a pypika.Table
must be used.
q = Query.from_('customers').select('id', 'fname', 'lname', 'phone')
To convert the query into raw SQL, it can be cast to a string.
str(q)
Alternatively, you can use the Query.get_sql() function:
q.get_sql()
In simple queries like the above example, columns in the "from" table can be referenced by passing string names into
the select
query builder function. In more complex examples, the pypika.Table
class should be used. Columns can be
referenced as attributes on instances of pypika.Table
.
from pypika import Table, Query
customers = Table('customers')
q = Query.from_(customers).select(customers.id, customers.fname, customers.lname, customers.phone)
Both of the above examples result in the following SQL:
SELECT id,fname,lname,phone FROM customers
An alias for the table can be given using the .as_
function on pypika.Table
Table('x_view_customers').as_('customers')
q = Query.from_(customers).select(customers.id, customers.phone)
SELECT id,phone FROM x_view_customers customers
A schema can also be specified. Tables can be referenced as attributes on the schema.
from pypika import Table, Query, Schema
views = Schema('views')
q = Query.from_(views.customers).select(customers.id, customers.phone)
SELECT id,phone FROM views.customers
Also references to databases can be used. Schemas can be referenced as attributes on the database.
from pypika import Table, Query, Database
my_db = Database('my_db')
q = Query.from_(my_db.analytics.customers).select(customers.id, customers.phone)
SELECT id,phone FROM my_db.analytics.customers
Results can be ordered by using the following syntax:
from pypika import Order
Query.from_('customers').select('id', 'fname', 'lname', 'phone').orderby('id', order=Order.desc)
This results in the following SQL:
SELECT "id","fname","lname","phone" FROM "customers" ORDER BY "id" DESC
Arithmetic expressions can also be constructed using pypika. Operators such as +, -, *, and / are implemented
by pypika.Field
which can be used simply with a pypika.Table
or directly.
from pypika import Field
q = Query.from_('account').select(
Field('revenue') - Field('cost')
)
SELECT revenue-cost FROM accounts
Using pypika.Table
accounts = Table('accounts')
q = Query.from_(accounts).select(
accounts.revenue - accounts.cost
)
SELECT revenue-cost FROM accounts
An alias can also be used for fields and expressions.
q = Query.from_(accounts).select(
(accounts.revenue - accounts.cost).as_('profit')
)
SELECT revenue-cost profit FROM accounts
More arithmetic examples
table = Table('table')
q = Query.from_(table).select(
table.foo + table.bar,
table.foo - table.bar,
table.foo * table.bar,
table.foo / table.bar,
(table.foo+table.bar) / table.fiz,
)
SELECT foo+bar,foo-bar,foo*bar,foo/bar,(foo+bar)/fiz FROM table
Queries can be filtered with pypika.Criterion
by using equality or inequality operators
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
customers.lname == 'Mustermann'
)
SELECT id,fname,lname,phone FROM customers WHERE lname='Mustermann'
Query methods such as select, where, groupby, and orderby can be called multiple times. Multiple calls to the where method will add additional conditions as
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
customers.fname == 'Max'
).where(
customers.lname == 'Mustermann'
)
SELECT id,fname,lname,phone FROM customers WHERE fname='Max' AND lname='Mustermann'
Filters such as IN and BETWEEN are also supported
customers = Table('customers')
q = Query.from_(customers).select(
customers.id,customers.fname
).where(
customers.age[18:65] & customers.status.isin(['new', 'active'])
)
SELECT id,fname FROM customers WHERE age BETWEEN 18 AND 65 AND status IN ('new','active')
Filtering with complex criteria can be created using boolean symbols &
, |
, and ^
.
AND
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) & (customers.lname == 'Mustermann')
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 AND lname='Mustermann'
OR
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) | (customers.lname == 'Mustermann')
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 OR lname='Mustermann'
XOR
customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) ^ customers.is_registered
)
SELECT id,fname,lname,phone FROM customers WHERE age>=18 XOR is_registered
In the Criterion class, there are the static methods any and all that allow building chains AND and OR expressions with a list of terms.
from pypika import Criterion
customers = Table('customers')
q = Query.from_(customers).select(
customers.id,
customers.fname
).where(
Criterion.all([
customers.is_registered,
customers.age >= 18,
customers.lname == "Jones",
])
)
SELECT id,fname FROM customers WHERE is_registered AND age>=18 AND lname = "Jones"
Grouping allows for aggregated results and works similar to SELECT
clauses.
from pypika import functions as fn
customers = Table('customers')
q = Query \
.from_(customers) \
.where(customers.age >= 18) \
.groupby(customers.id) \
.select(customers.id, fn.Sum(customers.revenue))
SELECT id,SUM("revenue") FROM "customers" WHERE "age">=18 GROUP BY "id"
After adding a GROUP BY
clause to a query, the HAVING
clause becomes available. The method
Query.having()
takes a Criterion
parameter similar to the method Query.where()
.
from pypika import functions as fn
payments = Table('payments')
q = Query \
.from_(payments) \
.where(payments.transacted[date(2015, 1, 1):date(2016, 1, 1)]) \
.groupby(payments.customer_id) \
.having(fn.Sum(payments.total) >= 1000) \
.select(payments.customer_id, fn.Sum(payments.total))
SELECT customer_id,SUM(total) FROM payments
WHERE transacted BETWEEN '2015-01-01' AND '2016-01-01'
GROUP BY customer_id HAVING SUM(total)>=1000
Tables and subqueries can be joined to any query using the Query.join()
method. Joins can be performed with either
a USING
or ON
clauses. The USING
clause can be used when both tables/subqueries contain the same field and
the ON
clause can be used with a criterion. To perform a join, ...join()
can be chained but then must be
followed immediately by ...on(<criterion>)
or ...using(*field)
.
All join types are supported by PyPika.
Query \
.from_(base_table)
...
.join(join_table, JoinType.left)
...
Query \
.from_(base_table)
...
.left_join(join_table) \
.right_join(join_table) \
.inner_join(join_table) \
.outer_join(join_table) \
.cross_join(join_table) \
...
See the list of join types here pypika.enums.JoinTypes
history, customers = Tables('history', 'customers')
q = Query \
.from_(history) \
.join(customers) \
.on(history.customer_id == customers.id) \
.select(history.star) \
.where(customers.id == 5)
SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."id" WHERE "customers"."id"=5
As a shortcut, the Query.join().on_field()
function is provided for joining the (first) table in the FROM
clause
with the joined table when the field name(s) are the same in both tables.
history, customers = Tables('history', 'customers')
q = Query \
.from_(history) \
.join(customers) \
.on_field('customer_id', 'group') \
.select(history.star) \
.where(customers.group == 'A')
SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."customer_id" AND "history"."group"="customers"."group" WHERE "customers"."group"='A'
history, customers = Tables('history', 'customers')
q = Query \
.from_(history) \
.join(customers) \
.using('customer_id') \
.select(history.star) \
.where(customers.id == 5)
SELECT "history".* FROM "history" JOIN "customers" USING "customer_id" WHERE "customers"."id"=5
history, customers = Tables('history', 'customers')
last_purchase_at = Query.from_(history).select(
history.purchase_at
).where(history.customer_id==customers.customer_id).orderby(
history.purchase_at, order=Order.desc
).limit(1)
q = Query.from_(customers).select(
customers.id, last_purchase_at._as('last_purchase_at')
)
SELECT
"id",
(SELECT "history"."purchase_at"
FROM "history"
WHERE "history"."customer_id" = "customers"."customer_id"
ORDER BY "history"."purchase_at" DESC
LIMIT 1) "last_purchase_at"
FROM "customers"
Both UNION
and UNION ALL
are supported. UNION DISTINCT
is synonomous with "UNION`` so and PyPika does not
provide a separate function for it. Unions require that queries have the same number of SELECT
clauses so
trying to cast a unioned query to string with through a UnionException
if the column sizes are mismatched.
To create a union query, use either the Query.union()
method or + operator with two query instances. For a
union all, use Query.union_all()
or the * operator.
provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
) + Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)
SELECT "created_time","foo","bar" FROM "provider_a" UNION SELECT "created_time","fiz","buz" FROM "provider_b"
Using pypika.Interval
, queries can be constructed with date arithmetic. Any combination of intervals can be
used except for weeks and quarters, which must be used separately and will ignore any other values if selected.
from pypika import functions as fn
fruits = Tables('fruits')
q = Query.from_(fruits) \
.select(fruits.id, fruits.name) \
.where(fruits.harvest_date + Interval(months=1) < fn.Now())
SELECT id,name FROM fruits WHERE harvest_date+INTERVAL 1 MONTH<NOW()
Tuples are supported through the class pypika.Tuple
but also through the native python tuple wherever possible.
Tuples can be used with pypika.Criterion
in WHERE clauses for pairwise comparisons.
from pypika import Query, Tuple
q = Query.from_(self.table_abc) \
.select(self.table_abc.foo, self.table_abc.bar) \
.where(Tuple(self.table_abc.foo, self.table_abc.bar) == Tuple(1, 2))
SELECT "foo","bar" FROM "abc" WHERE ("foo","bar")=(1,2)
Using pypika.Tuple
on both sides of the comparison is redundant and PyPika supports native python tuples.
from pypika import Query, Tuple
q = Query.from_(self.table_abc) \
.select(self.table_abc.foo, self.table_abc.bar) \
.where(Tuple(self.table_abc.foo, self.table_abc.bar) == (1, 2))
SELECT "foo","bar" FROM "abc" WHERE ("foo","bar")=(1,2)
Tuples can be used in IN clauses.
Query.from_(self.table_abc) \
.select(self.table_abc.foo, self.table_abc.bar) \
.where(Tuple(self.table_abc.foo, self.table_abc.bar).isin([(1, 1), (2, 2), (3, 3)]))
SELECT "foo","bar" FROM "abc" WHERE ("foo","bar") IN ((1,1),(2,2),(3,3))
There are several string operations and function wrappers included in PyPika. Function wrappers can be found in the
pypika.functions
package. In addition, LIKE and REGEX queries are supported as well.
from pypika import functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
customers.fname,
customers.lname,
).where(
customers.lname.like('Mc%')
)
SELECT id,fname,lname FROM customers WHERE lname LIKE 'Mc%'
from pypika import functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
customers.fname,
customers.lname,
).where(
customers.lname.regex(r'^[abc][a-zA-Z]+&')
)
SELECT id,fname,lname FROM customers WHERE lname REGEX '^[abc][a-zA-Z]+&';
from pypika import functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
fn.Concat(customers.fname, ' ', customers.lname).as_('full_name'),
)
SELECT id,CONCAT(fname, ' ', lname) full_name FROM customers
Custom Functions allows us to use any function on queries, as some functions are not covered by PyPika as default, we can appeal to Custom functions.
from pypika import CustomFunction
customers = Tables('customers')
DateDiff = CustomFunction('DATE_DIFF', ['interval', 'start_date', 'end_date'])
q = Query.from_(customers).select(
customers.id,
customers.fname,
customers.lname,
DateDiff('day', customers.created_date, customers.updated_date)
)
SELECT id,fname,lname,DATE_DIFF('day',created_date,updated_date) FROM customers
Case statements allow fow a number of conditions to be checked sequentially and return a value for the first condition
met or otherwise a default value. The Case object can be used to chain conditions together along with their output
using the when
method and to set the default value using else_
.
from pypika import Case, functions as fn
customers = Tables('customers')
q = Query.from_(customers).select(
customers.id,
Case()
.when(customers.fname == "Tom", "It was Tom")
.when(customers.fname == "John", "It was John")
.else_("It was someone else.").as_('who_was_it')
)
SELECT "id",CASE WHEN "fname"='Tom' THEN 'It was Tom' WHEN "fname"='John' THEN 'It was John' ELSE 'It was someone else.' END "who_was_it" FROM "customers"
With clause allows give a sub-query block a name, which can be referenced in several places within the main SQL query. The SQL WITH clause is basically a drop-in replacement to the normal sub-query.
from pypika import Table, AliasedQuery, Query
customers = Table('customers')
sub_query = (Query
.from_(customers)
.select('*'))
test_query = (Query
.with_(sub_query, "an_alias")
.from_(AliasedQuery("an_alias"))
.select('*'))
You can use as much as .with_() as you want.
WITH an_alias AS (SELECT * FROM "customers") SELECT * FROM an_alias
Data can be inserted into tables either by providing the values in the query or by selecting them through another query.
By default, data can be inserted by providing values for all columns in the order that they are defined in the table.
customers = Table('customers')
q = Query.into(customers).insert(1, 'Jane', 'Doe', '[email protected]')
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]')
customers = Table('customers')
q = customers.insert(1, 'Jane', 'Doe', '[email protected]')
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]')
Multiple rows of data can be inserted either by chaining the insert
function or passing multiple tuples as args.
customers = Table('customers')
q = Query.into(customers).insert(1, 'Jane', 'Doe', '[email protected]').insert(2, 'John', 'Doe', '[email protected]')
customers = Table('customers')
q = Query.into(customers).insert((1, 'Jane', 'Doe', '[email protected]'),
(2, 'John', 'Doe', '[email protected]'))
customers = Table('customers')
q = Query.into(customers)\
.insert(1, 'Jane', 'Doe', '[email protected]')\
.on_duplicate_key_update(customers.email, Values(customers.email))
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]') ON DUPLICATE KEY UPDATE `email`=VALUES(`email`)
.on_duplicate_key_update
works similar to .set
for updating rows, additionally it provides the Values
wrapper to update to the value specified in the INSERT
clause.
INSERT INTO customers VALUES (1,'Jane','Doe','[email protected]'),(2,'John','Doe','[email protected]')
To specify the columns and the order, use the columns
function.
customers = Table('customers')
q = Query.into(customers).columns('id', 'fname', 'lname').insert(1, 'Jane', 'Doe')
INSERT INTO customers (id,fname,lname) VALUES (1,'Jane','Doe','[email protected]')
Inserting data with a query works the same as querying data with the additional call to the into
method in the
builder chain.
customers, customers_backup = Tables('customers', 'customers_backup')
q = Query.into(customers_backup).from_(customers).select('*')
INSERT INTO customers_backup SELECT * FROM customers
customers, customers_backup = Tables('customers', 'customers_backup')
q = Query.into(customers_backup).columns('id', 'fname', 'lname')
.from_(customers).select(customers.id, customers.fname, customers.lname)
INSERT INTO customers_backup SELECT "id", "fname", "lname" FROM customers
The syntax for joining tables is the same as when selecting data
customers, orders, orders_backup = Tables('customers', 'orders', 'orders_backup')
q = Query.into(orders_backup).columns('id', 'address', 'customer_fname', 'customer_lname')
.from_(customers)
.join(orders).on(orders.customer_id == customers.id)
.select(orders.id, customers.fname, customers.lname)
INSERT INTO "orders_backup" ("id","address","customer_fname","customer_lname")
SELECT "orders"."id","customers"."fname","customers"."lname" FROM "customers"
JOIN "orders" ON "orders"."customer_id"="customers"."id"
PyPika allows update queries to be constructed with or without where clauses.
customers = Table('customers')
Query.update(customers).set(customers.last_login, '2017-01-01 10:00:00')
Query.update(customers).set(customers.lname, 'smith').where(customers.id == 10)
UPDATE "customers" SET "last_login"='2017-01-01 10:00:00'
UPDATE "customers" SET "lname"='smith' WHERE "id"=10
The syntax for joining tables is the same as when selecting data
customers, profiles = Tables('customers', 'profiles')
Query.update(customers)
.join(profiles).on(profiles.customer_id == customers.id)
.set(customers.lname, profiles.lname)
UPDATE "customers"
JOIN "profiles" ON "profiles"."customer_id"="customers"."id"
SET "customers"."lname"="profiles"."lname"
Using pypika.Table
alias to perform the update
customers = Table('customers')
customers.update()
.set(customers.lname, 'smith')
.where(customers.id == 10)
UPDATE "customers" SET "lname"='smith' WHERE "id"=10
Using limit
for performing update
customers = Table('customers')
customers.update()
.set(customers.lname, 'smith')
.limit(2)
UPDATE "customers" SET "lname"='smith' LIMIT 2
PyPika allows you to use Parameter(str)
term as a placeholder for parametrized queries.
customers = Table('customers')
q = Query.into(customers).columns('id', 'fname', 'lname')
.insert(Parameter(':1'), Parameter(':2'), Parameter(':3'))
INSERT INTO customers (id,fname,lname) VALUES (:1,:2,:3)
This allows you to build prepared statements, and/or avoid SQL-injection related risks.
Due to the mix of syntax for parameters, depending on connector/driver, it is required that you specify the parameter token explicitly.
An example of some common SQL parameter styles used in Python drivers are:
- PostgreSQL:
$number
OR%s
+:name
(depending on driver)- MySQL:
%s
- SQLite:
?
- Vertica:
:name
- Oracle:
:number
+:name
- MSSQL:
%(name)s
OR:name
+:number
(depending on driver)
You can find out what parameter style is needed for DBAPI compliant drivers here: https://www.python.org/dev/peps/pep-0249/#paramstyle or in the DB driver documentation.
Copyright 2016 KAYAK Germany, GmbH
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Crafted with ♥ in Berlin.