Releases: MeltwaterArchive/datasift-connector
1.0.44
This is a critical bug fix update for the Connector.
Upgrade steps
To apply the fixes, and take advantage of the new feature, it will be necessary to create a new AMI or Vagrant VM using packer or vagrant, following the Quick Deployment steps in README.md.
If you are currently running an instance of the Connector, it is strongly recommended that you provision a new AMI, and launch a new instance in parallel. Once the new Connector instance has been configured, and is running correctly, the gnip-reader service on the old Connector should be stopped: sudo supervisorctl stop gnip-reader
. This will allow datasift-writer to fully consume and send the data stored in the Kafka queue. The read and send metrics within the datasift-writer row in Grafana should indicate the point at which no more items are available.
Unfortunately a manual upgrade with an existing provisioned Connector is not advised, due a change of Java runtime and significant changes to the Chef provisioning recipes to address the below issues.
Bug fixes
- Fixes issue where datasift-writer does not commit Kafka offset correctly. Issue: #46
- Fixes issue where datasift-writer would throw KeyException errors on handshake with https://in.datasift.com. Issue #45
- Fixes issue where datasift-writer would not attempt reconnection to Kafka when VM was rebooted, and required a restart to operate correctly. Issue #32
Changes
- datasift-writer now applies back-off policy automatically when DataSift ingestion endpoint responds with HTTP code 429, in addition to 413.
- Default EC2 instance type for building an AMI with Packer has been changed from t2.micro to t2.small. This was to accommodate increased memory usage during provisioning. The Connector launched from the built AMI can still be run under a t2.micro instance. Please refer to the Pricing section in README.md for details.
Initial Historics Reader & Twitter API Reader Release
This release includes support for Gnip Historics and use of the Twitter API.
Changes
- The configuration file for the Gnip Reader has changed format. retries, buffer_size and buffer_timeout are now children of a property hosebird which is a sibling to Gnip. See README.md for an example.
- A new reader Twitter API is provided. Note that either the Gnip reader or the Twitter API reader can currently run, but not both at the same time. A new Twitter API managed source is required to take advantage of this reader, contact DataSift for this to be enabled on your account. The default configured Kafka topic is now twitter rather than twitter_gnip and both the Gnip reader and the Twitter API reader will write to it.
- Gnip Historics components are included in 1.0.19. An Historics API service will run on port 8888 of the provisioned machine, to which Gnip historics job IDs may be sent as per README file instructions. An Historics Reader service will now be installed, which will execute every 5 minutes and process any Gnip Historics jobs which have completed and are available for download. Interactions within the Historics files will be sent to Kafka, as with our other reader components. A web frontend to this API is in development, which will simplify the submission of Gnip jobs and allow for easy listing of in-progress and completed job processing.
Upgrade steps
To take advantage of the new features, it will be necessary to create a new AMI/VM using packer or vagrant, following the Quick Deployment steps in README.md, and switch to using this new instance. Unfortunately a manual upgrade will not be supported due to the new components and heavy modifications to Chef provisioning processes.
1.0.15
This release fixes issues with the Chef recipes and is only required if creating new instances of the connector.
1.0.11
This release contains a new version of datasift-writer and a new metric added to the default Grafana dashboard.
Changes
- The DataSift Writer now supports a sent-items metric. This metric measures the rate of interactions sent into the DataSift Ingestion Endpoint. Previously the only metric available was the number of bulk posts, which could have contained many interactions.
- The default Grafana dashboard includes this new metric.
Upgrade steps
On your instance(s) of the DataSift Connector:
- Download the latest version of the datasift-writer RPM from Github
- Change /etc/datasift/datasift-writer/writer.json
- Upgrade datasift-writer sudo yum --nogpgcheck localinstal datasift-writer-1.0.x.1.noarch.rpm
- sudo supervisorctl restart datasift-writer
- Add the metric to the DataSift writer row in Grafana on the right axis. See the Grafana documentation for details.
Or alternatively:
- Create a new instance using packer or vagrant.
1.0.9
This release contains a new version of datasift-writer and requires a change to the config file (writer.json), add "bulk_items": 1000
and change "bulk_size"
to 100000
.
Changes
- Auto retry is disabled in the DataSift Writer http client, allowing more accurate monitoring in the metrics.
- A
bulk_size
option has been introduced in the DataSift Writer configuration to help improve bulk upload rate.
Upgrade steps
On your instance(s) of the DataSift Connector:
- Download the latest version of the datasift-writer RPM from Github
- Change /etc/datasift/datasift-writer/writer.json
- Upgrade datasift-writer
sudo yum --nogpgcheck localinstal datasift-writer-1.0.x.1.noarch.rpm
sudo supervisorctl restart datasift-writer
Alternatively:
- Create a new instance using packer or vagrant.
or
- Run chef to reprovision your instance.
1.0.4
Initial release.