PurpleQuery

Overview

PurpleQuery is a partial implementation of BigQuery for testing. All table creation and querying happens in-memory. The parser is a straightforward translation of the published BigQuery syntax, and the backend is based on Pandas. The provided API client is a drop-in replacement for the Google Python BigQuery client.

PurpleQuery also integrates smoothly with Verily's analysis-py-utils BigQuery wrapper, which provides a simplified API for interacting with BigQuery.

PurpleQuery is compatible with Python 2 and 3.

Installation

pip install git+https://github.com/verilylifesciences/purplequery.git

Supported Features

PurpleQuery supports

WITH ... SELECT [DISTINCT] [EXCEPT] [REPLACE] ... FROM ... JOIN ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...
INNER, OUTER, CROSS joins
UNION ALL
Analytic functions
Arithmetic expressions

Functions including

ARRAY
ARRAY_AGG
CASE
CAST
CONCAT
COUNT
EXISTS
EXTRACT
IF
(NOT) IN
IS (NOT) NULL
MAX
MIN
MOD
NOT
ROW_NUMBER
STRUCT
SUM
TIMESTAMP

Unsupported features

UNNEST columns (UNNEST array expressions are supported)
many functions
set operations besides UNION ALL
UPDATE and other mutation operations in queries (these are supported via the Python API)
Window identifiers and window frames for analytic function calls

Usage

from google.cloud.bigquery import Dataset, DatasetReference, Table, TableReference
from google.cloud.bigquery.job import QueryJobConfig
from google.cloud.bigquery.schema import SchemaField
from purplequery import Client

bq_client = Client('my_project')

dataset_ref = DatasetReference('my_project', 'dataset1')
bq_client.create_dataset(Dataset(dataset_ref))
table = Table(TableReference(dataset_ref, 'table1'),
              [SchemaField(name='num', field_type='INTEGER'),
               SchemaField(name='ID', field_type='STRING')])
bq_client.create_table(table)
bq_client.insert_rows(table, [{'num': 1, 'ID': 'first'},
                              {'num': 2, 'ID': 'second'}])
job = bq_client.query('SELECT * FROM `my_project.dataset1.table1`',
                      QueryJobConfig())
rows = [list(row.values()) for row in job.result()]

print rows

[[1, 'first'], [2, 'second']]

Usage with bigquery_wrapper

To use with the Verily BigQuery wrapper, use the alternate_bq_client_class parameter to use PurpleQuery instead, and the self.bq_client object available to test methods will use PurpleQuery instead of the Google BigQuery client.

from verily.bigquery_wrapper import bq_test_case
from purplequery import Client as FakeClient

class MyTest(bq_test_case.BQTestCase):

    @classmethod
    def setUpClass(cls, use_mocks=False):
        super(Mytest, cls).setUpClass(
            use_mocks=use_mocks,
            alternate_bq_client_class=FakeClient)

    ...

License

See the license

Contributing

See instructions for contributing to this project

Troubleshooting

PurpleQuery is not a complete implementation of BigQuery. For (rare) cases where an SQL statement is not understood by PurpleQuery's parser, the result will be an exception Could not fully parse query, followed by the tokens following the section that could be parsed. Typically the beginning of this section will be the syntactic element that PurpleQuery does not understand. See grammar.py for the grammar understood; feel free to extend it!

NotImplementedError is raised when a syntactic element is parsed but is not implemented in the backend, e.g. a function. See evaluatable_node.py to add a new function or other expression element.

Running tests

Clone the repository:

git clone https://github.com/verilylifesciences/purplequery.git

Build the docker container:

docker build -t purplequery purplequery/

Run the tests in the docker container:

docker run --rm -ti -v `pwd`/purplequery:/pq -w /pq purplequery /pq/run_tests.sh

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
purplequery		purplequery
.dockerignore		.dockerignore
.gitignore		.gitignore
.isort.cfg		.isort.cfg
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
WORKSPACE		WORKSPACE
class_hierarchy.md		class_hierarchy.md
lint-requirements.txt		lint-requirements.txt
mypy.ini		mypy.ini
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PurpleQuery

Overview

Installation

Supported Features

Unsupported features

Usage

Usage with bigquery_wrapper

License

Contributing

Troubleshooting

Running tests

About

Releases

Packages

Contributors 4

Languages

License

verilylifesciences/purplequery

Folders and files

Latest commit

History

Repository files navigation

PurpleQuery

Overview

Installation

Supported Features

Unsupported features

Usage

Usage with bigquery_wrapper

License

Contributing

Troubleshooting

Running tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages