Solution based on the Google Megalista project.
Sample integration code for onboarding offline/CRM data from BigQuery as custom audiences or offline conversions in Google Ads, Google Analytics 360, Google Display & Video 360, Google Campaign Manager and Facebook Ads.
-
Google Ads
-
Google Analytics (Universal analytics)
-
Campaign Manager
- Offline Conversions API (user id, device id, match id, gclid, dclid) [details]
-
Google Analytics 4
- Measurement protocol (Web + App) [details]
-
Appsflyer
- S2S Offline events API (conversion upload), to be used for audience creation and in-app events with Google Ads and DV360 [details]
MDS was design to separate the configuration of conversion/audience upload rules from the engine, giving more freedom for non-technical teams (i.e. Media and Business Inteligence) to setup multiple upload rules on their own.
The solution consists in #1 a Google Spreadsheet (template) in which all rules are defined by mapping a data source (BigQuery Table) to a destination (data upload endpoint) and #2, an apache beam workflow running on Google Dataflow, scheduled to upload the data in batch mode.
- Google Cloud Platform account
- Billing enabled
- BigQuery enabled
- Dataflow enabled
- Cloud storage enabled
- Cloud scheduler enabled
- At least one of:
- Google Ads API Access
- Campaign Manager API Access
- Google Analytics API Access
- Python3
- Google Cloud SDK
Those are the minimum roles necessary to deploy MDS:
- OAuth Config Editor
- BigQuery User
- BigQuery Job User
- BigQuery Data Viewer
- Cloud Scheduler Admin
- Storage Admin
- Dataflow Admin
- Service Account Admin
- Logs Viewer
- Service Consumer
Required APIs will depend on upload endpoints in use. We recomend you to enable all of them:
- Google Sheets (required for any use case) [link]
- Google Analytics [link]
- Google Analytics Reporting [link]
- Google Ads [link]
- Campaign Manager [link]
WIP
To access campaigns and user lists on Google's platforms, this dataflow will need OAuth tokens for a account that can authenticate in those systems.
In order to create it, follow these steps:
- Access GCP console
- Go to the API & Services section on the top-left menu.
- On the OAuth Consent Screen and configure an Application name
- Then, go to the Credentials and create an OAuth client Id with Application type set as Desktop App
- This will generate a Client Id and a Client secret
- Run the generate_mds_token.sh script in this folder providing these two values and follow the instructions
- Sample:
./generate_mds_token.sh client_id client_secret
- Sample:
- This will generate the Access Token and the Refresh token
This bucket will hold the deployed code for this solution. To create it, navigate to the Storage link on the top-left menu on GCP and click on Create bucket. You can use Regional location and Standard data type for this bucket.
We recommend first running it locally and make sure that everything works. Make some sample tables on BigQuery for one of the uploaders and make sure that the data is getting correctly to the destination. After that is done, upload the Dataflow template to GCP and try running it manually via the UI to make sure it works. Lastly, configure the Cloud Scheduler to run MDS in the frequency desired and you'll have a fully functional data integration pipeline.
python3 mds_dataflow/main.py \
--runner DirectRunner \
--developer_token ${GOOGLE_ADS_DEVELOPER_TOKEN} \
--setup_sheet_id ${CONFIGURATION_SHEET_ID} \
--refresh_token ${REFRESH_TOKEN} \
--access_token ${ACCESS_TOKEN} \
--client_id ${CLIENT_ID} \
--client_secret ${CLIENT_SECRET} \
--project ${GCP_PROJECT_ID} \
--region us-central1 \
--temp_location gs://{$GCS_BUCKET}/tmp
To deploy, use the following commands from the root folder:
cd terraform
./scripts/deploy_cloud.sh project_id bucket_name region_name
To execute the pipeline, use the following steps:
- Go to Dataflow on GCP console
- Click on Create job from template
- On the template selection dropdown, select Custom template
- Find the mds file on the bucket you've created, on the templates folder
- Fill in the parameters required and execute
To schedule daily/hourly runs, go to Cloud Scheduler:
- Click on create job
- Add a name and frequency as desired
- For target set as HTTP
- Configure a POST for url: https://dataflow.googleapis.com/v1b3/projects/${YOUR_PROJECT_ID}/locations/${LOCATION}/templates:launch?gcsPath=gs://${BUCKET_NAME}/templates/mds, replacing the params with the actual values
- For a sample on the body of the request, check cloud_config/scheduler.json
- Add OAuth Headers
- Scope: https://www.googleapis.com/auth/cloud-platform
It's recommended to create a new Service Account to be used with the Cloud Scheduler
- Go to IAM & Admin > Service Accounts
- Create a new Service Account with the following roles:
- Cloud Dataflow Service Agent
- Dataflow Admin
- Storage Objects Viewer
Every upload method expects as source a BigQuery data with specific fields, in addition to specific configuration metadata. For details on how to setup your upload routines, refer to the MDS Wiki or the MDS user guide.
Only contributions that meet the following requirements will be accepted:
DP6 Koopa-troopa Team
e-mail: [email protected]