-
Notifications
You must be signed in to change notification settings - Fork 28
[Design] Reports From Druid
This document describes the design to generate the data for the portal dashboards from Druid OLAP data store and export the report data to cloud storage. This mainly consists of following modules
- Configure Report API - This API will be used to submit a request for configuration of a new report.
- Job Scheduler Engine - This Scheduler will submit the reports for execution based on execution frequency.
- Disable Report API - This API will mark an existing report as disabled and will be excluded from the list of reports to be executed.
- Report Data Generator - The report data generator will be a spark job which will generate report data file by executing the query configured in the report configuration against the druid data store. The report data file will then be exported to cloud storage to complete the report execution.
-
Input:
- ReportName
- QueryEngine - druid/cassandra/ES
- ExecutionFrequency - Daily/Weekly/Monthly
- ChannelId
- ReportInterval
- JsonQuery
- OutputFormat - json/csv
Request Object
{ "id":"sunbird.analytics.report.submit", "ver":"1.0", "ts":"2019-03-07T12:40:40+05:30", "params":{ "msgid":"4406df37-cd54-4d8a-ab8d-3939e0223580", "client_key":"analytics-team" }, "request":{ "channel_id":"in.ekstep", "report_name":"avg_collection_downloads", "query_engine": "druid", "execution_frequency": "Daily", "report_interval":"LAST_7_DAYS", "output_format": "json" "query_json":{ "queryType":"groupBy", "dataSource":"telemetry-events", "granularity":"day", "dimensions":[ "eid" ], "aggregations":[ { "type":"count", "name":"context_did", fieldName":"context_did" } ], "filter":{ "type":"and", "fields":[ { "type":"selector", "name":"eid", fieldName":"IMPRESSION" }, { "type":"selector", "name":"edata_type", fieldName":"detail" }, { "type":"selector", "name":"edata_pageid", fieldName":"collection-detail" }, { "type":"selector", "name":"context_pdata_id", fieldName":"prod.diksha.app" } ] }, "postAggregations":[ { "type":"arithmetic", "name":"avg__edata_value", "fn":"/", "fields":[ { "type":"fieldAccess", "name":"total_edata_value", "fieldName":"total_edata_value" }, { "type":"fieldAccess", "name":"rows", "fieldName":"rows" } ] } ], "intervals":[ "2019-02-20T00:00:00.000/2019-01-27T23:59:59.000" ] } } }
-
Output:
The individual report configurations can be saved to a Cassandra table. The druid query JSON will be saved to Azure blob storage and the following will be the fields in the report configuration table.
# Schema of table TABLE platform_db.druid_report_config ( report_id text, report_name text, report_interval text, query_engine text, execution_frequency text, is_enabled boolean, report_query_location text, report_output_format text, report_output_location text, report_output_filename text, PRIMARY KEY (report_id) ); )
-
Output location and file format of query:
/druid-reports/query/druid/report-id.json /druid-reports/query/cassandra/report-id.cql
-
Report Intervals:
YESTERDAY LAST_7_DAYS LAST_30_DAYS LAST_MONTH LAST_6_MONTHS LAST_YEAR StartDate(DD/MM/YYYY)-EndDate(DD/MM/YYYY)
-
Input:
-
druid_report_config
cassandra table
-
-
Output:
- Based on execution_frequency in
platform_db.druid_report_config
submits a request for execution by inserting a record inplatform_db.job_request
Cassandra table.
- Based on execution_frequency in
-
Input:
- report-id
-
Output:
- The report will be disabled in the
platform_db.druid_report_config
Cassandra table
- The report will be disabled in the
-
Input:
-
Set of Requests
i.e
All records inplatform_db.job_request
where status=SUBMITTED
and job_name=druid-reports
-
-
Output:
- Report data file will be saved in Azure with specified format
-
platform_db.job_request
table will be updated with job status and output file details will be updated inplatform_db.druid_report_config
-
Output location and file format in Azure:
Once a request has been submitted and processing complete, the report data file with the name of the file being the report name suffixed with genaration date saved under :
/druid-reports/report-id/report-name-mmddyyyy.csv /druid-reports/report-id/report-name-mmddyyyy.json