Name		Name	Last commit message	Last commit date
parent directory ..
src/main/java/com/google/cloud/bigtable/dataflow/example		src/main/java/com/google/cloud/bigtable/dataflow/example
README.md		README.md
pom.xml		pom.xml

README.md

Cloud Bigtable / Cloud Dataflow Connector examples

A starter set of examples for writing Google Cloud Dataflow programs using Cloud Bigtable.

Project setup

Provision your project for Cloud Dataflow

Follow the Cloud Dataflow getting started instructions. (if required) Including:
- Create a project
- Enable Billing
- Enable APIs
- Create a Google Cloud Storage Bucket
- Development Environment Setup
  - Install Google Cloud SDK
  - Install Java
  - Install Maven
- You may wish to also Run an Example Pipeline

Provision a Bigtable Instance

Create a Cloud Bigtable cluster using the Developer Console by clicking on the Storage > Cloud Bigtable > New Instance button. After that, enter the Instance name, ID, zone, and number of nodes. Once you have entered those values, click the Create button.

Create a Google Cloud Storage Bucket

Using the Developer Console click on Storage > Cloud Storage > Browser then click on the Create Bucket button. You will need a globally unique name for your bucket, such as your projectID.

Create a Pub/Sub topic

This step is required for the Pub / Sub sample.

Using the Developer Console click on Bigdata > Pub/Sub, then click on the New topic button. 'shakes' is a good topic name.

Create a Bigtable Table

Using the HBase shell

create 'Dataflow_test', 'cf'

Note - you may wish to keep the HBase shell open in a tab throughout.

Required Options for Cloud Bigtable

This pipeline needs to be configured with four command line options for Cloud Bigtable:

-Dbigtable.projectID=<projectID> - this will also be used for your Dataflow projectID
-Dbigtable.instanceID=<instanceID>
-Dgs=gs://my_bucket - A Google Cloud Storage bucket.

Optional Arguments

-Dbigtable.table=<Table to Read / Write> defaults to 'Dataflow_test'

HelloWorld - Writing Data

The HelloWorld examples take two strings, converts them to their upper-case representation and writes them to Bigtable.

HelloWorldWrite does a few Puts to show the basics of writing to Cloud Bigtable through Cloud Dataflow.

mvn package exec:exec -DHelloWorldWrite -Dbigtable.projectID=<projectID> -Dbigtable.instanceID=<instanceID> -Dgs=<Your bucket>

You can verify that the data was written by using HBase shell and typing scan 'Dataflow_test'. You can also remove the data, if you wish, using:

deleteall 'Dataflow_test', 'Hello'
deleteall 'Dataflow_test', 'World'

SourceRowCount - Reading from Cloud Bigtable

SourceRowCount shows the use of a Bigtable Source - a construct that knows how to scan a Bigtable Table. SourceRowCount performs a simple row count using the Cloud Bigtable Source and writes the count to a file in Google Storage.

mvn package exec:exec -DSourceRowCount -Dbigtable.projectID=<projectID> -Dbigtable.instanceID=<instanceID> -Dgs=<Your bucket>

You can verify the results by frist typing:

gsutil ls gs://my_bucket/**

There should be a file that looks like count-XXXXXX-of-YYYYYY. Type:

gsutil cp gs://my_bucket/count-XXXXXX-of-YYYYYY .
cat count-XXXXXX-of-YYYYYY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataflow-connector-examples

dataflow-connector-examples

README.md

Cloud Bigtable / Cloud Dataflow Connector examples

Project setup

Provision your project for Cloud Dataflow

Provision a Bigtable Instance

Create a Google Cloud Storage Bucket

Create a Pub/Sub topic

Create a Bigtable Table

Required Options for Cloud Bigtable

HelloWorld - Writing Data

SourceRowCount - Reading from Cloud Bigtable

Files

dataflow-connector-examples

Directory actions

More options

Directory actions

More options

Latest commit

History

dataflow-connector-examples

Folders and files

parent directory

README.md

Cloud Bigtable / Cloud Dataflow Connector examples

Project setup

Provision your project for Cloud Dataflow

Provision a Bigtable Instance

Create a Google Cloud Storage Bucket

Create a Pub/Sub topic

Create a Bigtable Table

Required Options for Cloud Bigtable

HelloWorld - Writing Data

SourceRowCount - Reading from Cloud Bigtable