This Greengrass accelerator demonstrates the use of the AWS IoT Greengrass stream manager (stream manager) and the Greengrass Docker Application Deployment Connector features as depicted in this blog post.
Common use cases for stream manager include:
- Local video processing
- Image recognition
- High-volume data collection
- Handles disconnected or intermittent connectivity scenarios
- Prioritization of data path to the cloud for multiple streams
In this use case, we create two local data streams, with the first stream, LocalDataStream, configured for low-priority export to AWS IoT Analytics and a set amount of local disk space and overwriting the old data. The second stream, AggregateDataStream, exports data with high priority to an Amazon Kinesis Data Stream, again using local resources for data persistence.
There are three deployed Lambda functions and a Greengrass managed Docker container that serve different purposes:
- Sensor Source Function - Generates simulated data for use by the rest of the accelerator.
- Stream Producer Function - Reads sensor data at high volume (20 readings per-second) and writes it to the LocalDataStream.
- Stream Aggregator Function - Reads the LocalDataStream, aggregates the data by creating an average per sensor every five seconds, then writes the aggregate message to the AggregateDataStream.
- Greengrass Managed Docker Container - A Flask web application running in a Docker container that reads the aggregate values from the Stream Aggregator Function, displaying the aggregated information.
The two streams are used locally and also export the data to AWS IoT Analytics for the raw data, and to Amazon Kinesis Data Streams for the aggregated data.
To demonstrate both Stream Manager and the Greengrass Docker application deployment connector, a Lambda function is created to generate test sensor data, which is then read by the Producer Function and processed as described above. The data is sent AWS IoT Analytics and an Amazon Kinesis Data Stream in the cloud.
The Stream Manager creates and persists the data locally, and sends to the cloud when there is connectivity. The Greengrass managed container, a web Flask application, displays the aggregate data from the Stream Aggregator function.
Greengrass can execute as either daemon on a host (Docker optional) or as a Docker container. All functionality is available in both methods, with the exception that when running as a container, all Docker-related actions are managed by the dockerd
daemon at the host level.
This accelerator has been designed to run within a Docker container and utilizing the hosts Docker daemon to run Greengrass and the contents of the docker-compose
file sent to Greengrass. In this manner, the accelerator can run on any host with Docker support installed. The main difference in running both Greengrass and the managed containers is that they container(s) are running side-by-side on the host system.
The accelerator can also be modified to run on an existing Greengrass 1.10.0 or newer host installation by not running the docker-compose up -d
steps below, but by using the contents of the gg_docker/config/config.json
and gg_docker/certs/
with you own host system.
The Cloud Development Kit (CDK) is used to deploy a CloudFront stack, and a Python script creates the local assets required, such as the certificate and key pair used by the Greengrass core. The second component is a Docker compose file that runs Greengrass locally.
stream_manager/
├── README.md <-- This file!
├── cdk
├── docker_compose_stack <-- Compose stack to be managed by Greengrass
│ ├── docker-compose.yml
├── gg_docker <-- Creates Greengrass instance as container
│ ├── Dockerfile-greengrass
│ ├── certs
│ ├── config
│ ├── deployment
│ ├── docker-compose.yml
│ └── log
├── docs
└── test
There are two main components to using the accelerator. The cdk/
directory contains the CDK assets to create the Greengrass configuration and long-lived Lambda functions as a Greengrass deployment.
The gg_docker/
directory container the assets to create a Docker image and run Greengrass as a container locally. It also has directories to hold the Greengrass configuration and credentials for the Greengrass Core, along with the log files.
To launch this accelerator as a Docker container, there are a few prerequisites and steps to complete. It is assumed you have basic experience with AWS IoT via the console and CLI.
The following is a list of prerequisites to deploy the accelerator:
- Ensure you have an AWS user account with permissions to create and manage
iot
,greengrass
,lambda
,cloudwatch
, and other services used by CloudFormation (CDK). - Install the AWS CLI locally and create a named profile with credentials for the AWS user account. For Cloud9, there is a profile already created named default.
- Install the AWS Cloud Development Kit and perform a bootstrap in the region you will be working.
- Verify Docker Desktop or Docker Machine installed, and you have the ability to create or download images locally and run containers.
There are two methods describe below:
- If you are familiar with Nodejs, Python and working with the command line on your local system, select the Create and Launch the Accelerator Locally method.
- For all others, use the Step-by-Step: Create and Launch the Accelerator via AWS Cloud9 method.
Click here to show/hide the steps to run locally
💡These steps assume familiarity with installation of NPM packages, Python, and working at the command line. For a getting-started deployment process, see the next section for a step-by-step deploying via AWS Cloud9.
-
Install and bootstrap the CDK:
npm install -g aws-cdk export ACCOUNT=$(aws sts get-caller-identity --query Account --output text) # Set REGION to where the accelerator will be installed export REGION=us-east-1 cdk bootstrap aws://$ACCOUNT/$REGION
-
Clone the repository and change into the
cdk/
directory, build and deploy the CloudFormation stack:git clone https://github.com/awslabs/aws-iot-greengrass-accelerators.git cd aws-iot-greengrass-accelerators/accelerators/stream_manager/cdk npm install npm run build # Set region for deployment (optional if the AWS CLI profile has the region set to where you want to deploy) export CDK_DEPLOY_REGION=us-east-1 # replace PROFILE_NAME with your specific AWS CLI profile that has username and region cdk --profile PROFILE_NAME deploy
-
At this point the CloudFormation stack is deployed. Next, run the
deploy_resources.py
script, which will:- Read the local CDK output to determine the CloudFormation stack name
- From the CloudFormation stack output, read the values to:
- Create the
config.json
file for Greengrass in thestream_manager/gg_docker/config
directory - Create the
certificate.pem
andprivate_key.pem
files in thestream_manager/gg_docker/certs
directory
- Create the
- And finally upload the
stream_manager/docker_compose_stack/docker-compose.yml
file to the S3 directory referenced by the Greengrass Docker Application Deployment Connector.
python3 deploy_resources.py -p default
-
Next, change to the
stream_manager/gg_docker
directory and start Greengrass running as Docker container. If you intend to run Greengrass on a physical device, copy the contents of thestream_manager/gg_docker/certs
and thestream_manager/gg_docker/config
directories to your core (e.g.,/greengrass/certs
and/greengrass/config
).cd ../gg_docker # Build from the latest published version of Greengrass docker pull amazon/aws-iot-greengrass:1.11.5-1-amazonlinux-x86-64 docker-compose build docker-compose up -d
At this point, the CloudFormation stack has been deployed and the Greengrass container is running. The CloudFormation stack will also trigger an initial deployment of all resources to the Greengrass Core, so the Lambda functions, Stream Manager, and docker containers are also running.
Greengrass will start to write files into the gg_docker/log
directory, and the web interface to the Flask application can be locally accessed via http://localhost:8082 (IP address dependent on your local Docker process).
Click here to show/hide the steps to run via AWS Cloud9
💡 All steps below use a Cloud9 IDE in the same account and region where the accelerator will be run. If running locally, ensure you have the AWS CLI installed, and change the AWS named profile from default to one you have created with proper permissions.
Prior to launching the accelerator container locally, the AWS CDK is used to generate a CloudFormation template and deploy it. From Cloud9, follow the steps to create and launch the stack via the CDK.
-
Create a new Cloud9 IDE and make sure that Amazon Linux 2 and t3.small are selected.
❗The following steps will not work if Ubuntu is selected.
-
Once the Cloud9 environment starts, follow these steps to resize the disk. Create the
resize.sh
file and runbash resize.sh 40
to extend the disk to 40GiB. -
Pre-requisites (only needs be run once and the environment will reboot) - Open a new Terminal window and run these commands:
# Cloud9 Commands - change as needed for local development environment # Install pre-requisites, bootstrap CDK for use in account/region, and reboot npm uninstall -g cdk npm install -g aws-cdk@latest sudo yum install iptables-services -y # Bootstrap CDK for current AWS account and region where Cloud9 runs ACCOUNT=$(aws sts get-caller-identity --query Account --output text) REGION=$(aws configure get region) cdk bootstrap aws://$ACCOUNT/$REGION sudo curl -L "https://github.com/docker/compose/releases/download/1.25.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose pip3 install --user boto3 # Enable soft/hard links sudo cat <<EOF | sudo tee /etc/sysctl.d/98-cloud9-greengrass.conf fs.protected_hardlinks = 1 fs.protected_symlinks = 1 EOF # Allow access to Cloud9 instance for local Flask app # Add inbound for port 80 to the flask app (80->8082) sudo systemctl enable iptables sudo /sbin/iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 8082 sudo service iptables save aws ec2 authorize-security-group-ingress --group-name $(curl -s http://169.254.169.254/latest/meta-data/security-groups) --protocol tcp --port 80 --cidr 0.0.0.0/0 sudo reboot
-
Build and deploy. Once Cloud9 has restarted, issue these commands. If the terminal window is unresponsive, open a new one.
# After reboot open a new terminal window and issue these commands # NOTE: If terminal window spins when restarted, close the terminal window and launch a new one cd ~/environment # Clone the repository git clone https://github.com/awslabs/aws-iot-greengrass-accelerators.git cd ~/environment/aws-iot-greengrass-accelerators/accelerators/stream_manager/cdk # Build and deploy the CDK (CloudFormation stack) npm install npm run build cdk --profile default deploy # Acknowledge the creation above, then run python3 deploy_resources.py -p default # Build and start the Greengrass docker container cd ../gg_docker docker pull amazon/aws-iot-greengrass:1.11.3-amazonlinux-x86-64 docker-compose build docker-compose up -d # Get the IP address for accessing the Flask docker container once it is operational MY_IP=$(curl -s ifconfig.co) echo "This is the URL to access the Flask Container: http://$MY_IP" # Done!
-
At this point, the CloudFormation stack has been deployed and the Greengrass container is running. The CloudFormation stack will also trigger an initial deployment of all resources to the Greengrass Core, so the Lambda functions, Stream Manager, and docker containers are also running.
Greengrass will start to write files into the
gg_docker/log
directory, and the web interface to the Flask application can be accessed via the returned URL on the This is the URL to access... line above. -
❗ The Docker containers run as the root process in Cloud9 (and other Linux environments). If you wish to look at log or deployment files locally, it is easiest to launch another terminal tab and set the user to root:
sudo su - cd ~ec2-user/environment/aws-iot-greengrass-accelerators/accelerators/stream_manager/gg_docker/ # You can now cat|more|less|tail files from here tail -F log/system/runtime.log ...
With the accelerator running via either method, there are few ways to see how the Greengrass Core is creating and processing data both locally and via the Cloud.
In the gg_docker/log
directory, there will be both system/
and user/
directories where log files are stored. Review the contents of the various Lambda log files for both the deployed Lambda functions and the Docker Application Deployment connector.
💡 Using a local IDE such as Visual Studio Code is an easy way to navigate to and review the contents of the log files. From the command line, the tail -F
command will follow files as the filenames rotate.
From the stream_manager/gg_docker/log
directory, review the following in order:
Directory | Filename | Contents |
---|---|---|
user/REGION/ACCOUNT/ |
greengrass-stream-mgr-accel-GreengrassLambda-SensorSource.log |
Indication that Flask is running (not currently used). |
user/REGION/ACCOUNT/ |
greengrass-stream-mgr-accel-GreengrassLambda-StreamProducer.log |
20 messages per second of accepting and publishing to the LocalDataStream (approx 100 messages for each second). |
user/REGION/ACCOUNT/ |
greengrass-stream-mgr-accel-GreengrassLambda-StreamAggregator.log |
Reading batches of LocalDataStream messages, then aggregating them to the AggregateDataStream. |
aws |
DockerApplicationDeployment.log |
Details on the deployment and running state of Docker containers. |
In normal operation, these log files will write and different rates. By changing the log level settings via the Greengrass Group. These are initially defined in the CDK index.ts
file and set to an informational (INFO) level.
As with the local log files, the CDK stack also enables sending log files to CloudWatch Logs under log groups starting with the prefix of /aws/greengrass/GreengrassSystem
for system logs, and /aws/greengrass/Lambda
for AWS and user Lambda functions. The log level setting for CloudWatch Logs is set to a warning (WARN) level.
When running, data from both the LocalDataStream and the AggregateDataStream and being sent to AWS IoT Analytics and Amazon Kinesis Data Streams respective. While this accelerator does not utilize these services, you can review the data being sent by investigating the Analytics data set and create a Kinesis Data Analytics application to review the Data Stream content (or by using a local Kinesis client).
For AWS IoT Analytics, go to the AWS Console->AWS IoT Analytics->Data sets, and select the greengrass_stream_mgr_accel_sensor_datatset. This will show the Detail configuration for the dataset along with a Result preview of the values being sent. Select the Content menu and download and open the CSV file to see the raw sensor data.
For Amazon Kinesis Data Streams, go to the AWS Console->Kinesis->Data Analytics. Create an application and for the Source, select the AggregateData data stream then click on Discover schema. This will read the data stream and extract the message attributes such as avg_temperature, avg_hertz, COL_timestamp, etc. This demonstrates that the data is available in the Kinesis Data Stream for use in other applications.
The Docker managed application starts a Docker container running a web server (Flask) that reads the aggregated simulated data and charts it. If running locally, you can access via http://localhost:8082 or http://0.0.0.0:8082).
When using Cloud9, the flask app runs on the Cloud9 instance and can be accessed from the URL returned at the end of the deployment steps. Part of the Cloud9 setup is to open inbound port 80 on the Cloud9 instance, then map that to the port the Flask app container runs on (TCP port 8082).
To make changes to this application, you can modify and upload the docker_compose_stack/docker-compose.yml
file and deploy. Note: for the Flask app to access the Greengrass running lambda functions, the docker-compose.ynl
file references and uses the network configuration initiated by the Greengrass docker-compose configuration.
To stop and completely remove this accelerator, follow these steps:
-
From the command line, either locally on in Cloud 9, stop the Greengrass container:
docker-compose down
-
With the container stopped, change to the CDK directory and issue the command to destroy the CloudFormation stack:
cd ../cdk # For Cloud9 cdk --profile default destroy # For locally running (replace PROFILE_NAME with one used to create stack) cdk --profile PROFILE_NAME destroy
If there are any errors abut not being able to completely destroy the stack, review the error messages and resolve. In most cases it will be related to assets that may have been modified. Once resolve, run the
cdk destroy
command again, or delete the stack from the CloudFormation console.Also, at this point the certificate and private key are no longer valid in the Greengrass
certs/
directory. If you wish to redeploy the stack, clear out all of thecerts/
andconfig/
directories, -
If you created any assets that references or used AWS IoT Analytics or Amazon Kinesis Data Streams such as a Kinesis Analytics Application, delete these also.
-
Review any CloudWatch Logs log groups and delete these if needed.
-
Finally, change out of the GitHub repository and fully delete the directory.
That's it! Fully deployed, ran, and cleaned up!
When stopping Greengrass via docker-compose down
, you make see this error:
$ docker-compose down
Stopping greengrass-stream-manager-accelerator ... done
Removing greengrass-stream-manager-accelerator ... done
Removing network greengrass_network
ERROR: error while removing network: network greengrass_network id 4517716a2d362a334ecf7901845a5e69645baf867a607ee7175353f93cb65c7a has active endpoints
This is the result of the Flask application container still referencing the Greengrass container's network.
-
Issue the command again:
$ docker-compose down Removing network greengrass_network
Technical details on how the different accelerator components run.
Different components use either MQTT topics or direct IPC calls to query and command each other.
Details on Lambda Functions and How They Operate
This is a long-running Lambda that generates and publishes simulated sensor data to a local topic, sensor_data
. It also exposes a simplistic API for turning on and off the sensor data which is currently unused, but made available on port 0.0.0.0:8180 .
This is a long-running Lambda that is invoked for every message published by the sensor-source Lambda function on the sensor_data
topic, and then appends the message to the LocalDataStream and 20 messages per second.
This is a long-running Lambda that reads the LocalDataStream for 5 seconds with of raw data, aggregate and average the data, and then append the message to the AggregateDataStream. It also exposes an API call that returns the latest aggregate data, which is used by the Flask app running on the Docker Managed Application Container.