Skip to content

Commit

Permalink
Uml 3111 script for monthly reports (#2441)
Browse files Browse the repository at this point in the history
* refactored in prep to add polling functionality

* basic polling is working, and athena functions WIP

* athena flag now working

* athena query now runs

* show results of athena query

* put s3 path in dictionary ready for this to be used for location

* drop database first so we don't have tables with old data in

* add ddl file

* query all 4 tables

* sql result now prints put

* export stats table also

* temp (WIP) describe table

* ddl files, refactoring, stats table commented for now until working

* stats table too

* take in date range, with default

* now runs 4 queries

* tidy up output

* tidy up output further

* date substitution in sql string

* refactor, tidy up, update README

* refactoring

* fix

* add polling to query

* print query output tidily

* fix csv output

* results

* gitignore results files

* run black to format python properly

* get all results using token

* maxresults and filename

* fix dates
  • Loading branch information
nickdavis2001 authored Nov 21, 2023
1 parent a9f7fe5 commit 9c84967
Show file tree
Hide file tree
Showing 10 changed files with 395 additions and 234 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,11 @@ scripts/getStatsResultDemo.json
scripts/analysis_scripts/*.csv

query/s3_objects/**
query/results/*.csv
terraform/**/modules/**.terraform.lock.hcl

.structurizr
docs/diagrams/dsl/**/workspace.json

*.pem
tests/vendor
tests/vendor
147 changes: 13 additions & 134 deletions query/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Queries using DynamoDB data
# Queries using DynamoDB data and Athena

Returns plaintext or json data of accounts matching either an LPA ID or a user's email address.

Expand All @@ -15,157 +15,36 @@ Install pip modules
pip install -r ./requirements.txt
```

## Export Dynamodb data
## Export Dynamodb data, load Athena

Run the following script to request a DynanmoDB export to S3
Run the following script to request a DynanmoDB export to S3, drop and re-creation of Athena database and tables, and query against Athena

```shell
aws-vault exec identity -- python ./dynamodb_export.py --environment demo

DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ActorCodes
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
IN_PROGRESS s3://use-a-lpa-dynamodb-exports-development/demo-ActorCodes/AWSDynamoDB/01617269934602-162d6355/data/


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ActorUsers
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
IN_PROGRESS s3://use-a-lpa-dynamodb-exports-development/demo-ActorUsers/AWSDynamoDB/01617269934827-b65efa47/data/


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ViewerCodes
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
IN_PROGRESS s3://use-a-lpa-dynamodb-exports-development/demo-ViewerCodes/AWSDynamoDB/01617269935076-45fe3523/data/


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ViewerActivity
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
IN_PROGRESS s3://use-a-lpa-dynamodb-exports-development/demo-ViewerActivity/AWSDynamoDB/01617269935310-6968000e/data/


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-UserLpaActorMap
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
IN_PROGRESS s3://use-a-lpa-dynamodb-exports-development/demo-UserLpaActorMap/AWSDynamoDB/01617269935547-9e1e9b04/data/
```

You can check sthe status of the last export by running the command again with the `--check_exports` flag
## Script options
TODO TODO
You can check the status of the last dynamo export by running the command again with the `--check_exports` flag

```shell
aws-vault exec identity -- python ./dynamodb_export.py --environment demo --check_exports

DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ActorCodes
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
COMPLETED s3://use-a-lpa-dynamodb-exports-development/demo-ActorCodes/AWSDynamoDB/01617269934602-162d6355/data/

Queries will be run for date range 2023-11-01 to 2023-11-30
Waiting for DynamoDb export to be complete ( if run with Athena only option, this is just checking the previous export is complete )
.

DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ActorUsers
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
COMPLETED s3://use-a-lpa-dynamodb-exports-development/demo-ActorUsers/AWSDynamoDB/01617269934827-b65efa47/data/
DynamoDB export is complete


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ViewerCodes
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
COMPLETED s3://use-a-lpa-dynamodb-exports-development/demo-ViewerCodes/AWSDynamoDB/01617269935076-45fe3523/data/


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-ViewerActivity
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
COMPLETED s3://use-a-lpa-dynamodb-exports-development/demo-ViewerActivity/AWSDynamoDB/01617269935310-6968000e/data/


DynamoDB Table ARN: arn:aws:dynamodb:eu-west-1:367815980639:table/demo-UserLpaActorMap
S3 Bucket Name: use-a-lpa-dynamodb-exports-development
COMPLETED s3://use-a-lpa-dynamodb-exports-development/demo-UserLpaActorMap/AWSDynamoDB/01617269935547-9e1e9b04/data/
```
## AWS Athena
We can use AWS Athena to create a database and tables of the exported DynamoDB data so that we can use SQL to further explore and query our data.

### Getting started

See the getting started guide and follow Step 1: Creating a Database here <https://docs.aws.amazon.com/athena/latest/ug/getting-started.html>

After this you are able to write and run queries. Queries can be used to create tables.

### Creating tables

Here are some example SQL statements for creating tables from each DynamoDB Export

Note:

- the location for each export can be copied from the out put of the dynamodb_export.p script
- These queries create tables if they don't already exists. If the query is changed to add some new data, either delete and recreate the table or use an UPDATE query.

Examples:

Creating the viewer activity Table

```SQL
CREATE EXTERNAL TABLE IF NOT EXISTS viewer_activity (
Item struct <ViewerCode:struct<S:string>,
ViewedBy:struct<S:string>,
Viewed:struct<S:date>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' )
LOCATION 's3://use-a-lpa-dynamodb-exports-development/demo-ViewerActivity/AWSDynamoDB/01616672353743-e52c5c67/data/'
TBLPROPERTIES ('has_encrypted_data'='true');
```

Creating the viewer codes Table

```SQL
CREATE EXTERNAL TABLE IF NOT EXISTS viewer_codes (
Item struct <ViewerCode:struct<S:string>,
Added:struct<S:date>,
Expires:struct<S:date>,
Organisation:struct<S:string>,
SiriusUid:struct<S:string>,
UserLpaActor:struct<S:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' )
LOCATION 's3://use-a-lpa-dynamodb-exports-development/demo-ViewerCodes/AWSDynamoDB/01616672353584-6ff1f666/data/'
TBLPROPERTIES ('has_encrypted_data'='true');
```

Creating the actor users Table

```SQL
CREATE EXTERNAL TABLE IF NOT EXISTS actor_users (
Item struct <Id:struct<S:string>,
Email:struct<S:string>,
LastLogin:struct<S:date>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' )
LOCATION 's3://use-a-lpa-dynamodb-exports-development/demo-ViewerCodes/AWSDynamoDB/01616672353584-6ff1f666/data/'
TBLPROPERTIES ('has_encrypted_data'='true');
```

Creating the user-lpa-actor map Table

```SQL
CREATE EXTERNAL TABLE IF NOT EXISTS user_lpa_actor_map (
Item struct <Id:struct<S:string>,
ActorId:struct<S:string>,
Added:struct<S:date>,
SiriusUid:struct<S:string>,
UserId:struct<S:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1' )
LOCATION 's3://use-a-lpa-dynamodb-exports-development/demo-ViewerCodes/AWSDynamoDB/01616672353584-6ff1f666/data/'
TBLPROPERTIES ('has_encrypted_data'='true');
```
The idea going foraward is for the dynamdb_export script to provide regularly run Athena queries. For ad-hoc queries that are one-shot or aren't yet automated, we can access Athena via the AWS Console, and run SQL queries against the ual database that this script creates.
### Querying the newly created tables
### Querying the Athena tables
After creating tables, you can run queries. Here is an example Select Query for Athena.
Here is an example Select Query for Athena which can be run in the AWS console.
```SQL
-- issues SELECT query
Expand Down
Loading

0 comments on commit 9c84967

Please sign in to comment.