GATD is a cloud based system for managing and storing data streams. It was born out of a need to handle data generated by disparate sensors with varying data types, transmission protocols, and end-use goals.
GATD has three major design goals:
-
Modularity. GATD is a relatively loose collection of modules connected with infinite length queues and a database layer. Each module is part of a certain block of the system and many modules can exist for the same block. For example, in the receiver block there is one module that listens for UDP packets and another module that listens for HTTP requests. This allows GATD to be trivially extended as functionality changes and new sensors come online.
-
Flexibility. GATD makes virtually no assumptions about the format, type, or content of any data coming into the system. The exclusive requirement is that a sensor must be able to identify its data stream to the system so it can be processed properly. Each data stream has a custom parser that knows how to make sense of its own data. The parser simply returns key,value pairs with no restrictions on the key names or value types. GATD is designed to adapt to the sensors, and not vice-versa.
-
Timeliness. GATD is specifically designed to support real-time streaming applications where data comes in as it is generated and is sent out to interested clients immediately. Every component is optimized for this workflow. Additionally, all data is stored and can be retrieved and processed later if necessary.
diagram
The major blocks of GATD are as follows:
-
Receiver. Responsible for accepting data from any sensors. Records all relevant metadata with the data before passing it all to the formatter.
-
Formatter. The formatter is a stateless block that converts raw data from sensors into key,value pairs. The formatter calls the appropriate parser to interpret the raw data before storing them in a database and passing them on to any streamers.
-
Streamer. The streamer block sends data to any interested clients. Clients register a query with a streamer and any matching packets are sent to the client.
The current version of GATD is a research oriented implementation designed for speed of development and experimentability rather than performance. Most modules are written in Python, although due to the loose, modular approach some are written in Node.js and C as well.
GATD uses RabbitMQ for the inter-module queues and MongoDB for data storage.
- Python 2.7.*
- MongoDB
- RabbitMQ
- Node.js
- tup
-
Install MongoDB and RabbitMQ Server.
-
Install dependencies
sudo apt-get install python-pip git python-dev screen --- or --- sudo yum install python-pip git python-devel screen --- or --- sudo port install py27-pip git-core
-
Setup user and checkout gatd. You will also want to add yourself to the
gatd
group and then log out and back in. Probably can skip this step on Mac.sudo adduser gatd cd /opt sudo git clone https://github.com/lab11/gatd.git sudo chown gatd:gatd gatd -R sudo chmod -R g+w gatd sudo usermod -a -G gatd <username>
-
Copy the example GATD config file and set the necessary values. You will want to make sure any passwords set in the next steps are reflected in this file.
cd /opt/gatd/config cp gatd.config.example gatd.config
-
Configure MongoDB using the template config file in the
mongo
folder. -
Copy the config file to
/etc/mongodb.conf
.sudo cp /opt/gatd/mongo/mongodb.conf /etc/mongodb.conf
-
Edit the config file with the port you want to use.
-
Create a directory for the database.
sudo mkdir -p /data/mongodb sudo chown mongodb:mongodb /data/mongodb
-
Restart the MongoDB daemon.
sudo service mongod restart
-
Add the gatd user to the Mongo database
mongo --port <mongo db port> use getallthedata db.createUser({ user: "reportsUser", pwd: "12345678", roles: [ { role: "dbAdmin", db: "getallthedata" } ] } )
-
Configure RabbitMQ using the config files in the
rabbitmq
folder. -
Copy the config files to
/etc/rabbitmq
.sudo cp /opt/gatd/rabbitmq/rabbitmq* /etc/rabbitmq
-
Edit
rabbitmq-gatd.config
with the port you want to use. -
Restart the rabbitmq server.
sudo rabbitmqctl stop sudo service rabbitmq-server start
-
Delete the default rabbitmq user, create a GATD user, and set permissions.
sudo rabbitmqctl delete_user guest sudo rabbitmqctl add_user gatd <password> sudo rabbitmqctl set_user_tags gatd administrator sudo rabbitmqctl set_permissions -p / gatd ".*" ".*" ".*"
-
Set up Python environment.
sudo pip2 install virtualenv cd /opt/gatd virtualenv . source ./bin/activate pip2 install -r requirements.pip
-
Setup the database in MongoDB.
cd /opt/gatd/mongo ./init_mongo.py
-
Run GATD
-
Start the receivers.
cd /opt/gatd/receiver ./run_receiver.sh
-
Run the formatter.
cd /opt/gatd/formatter ./run_formatter.sh
-
Run the streamers.
cd /opt/gatd/streamer ./run_streamer.sh