NOTICE
This library has been moved to yuuna/fluent-plugin-hbase in October 2014, as the author is no longer a member of this organization and decided to hand over the maintenance of the library to active users/maintainers.
HBase output plugin buffers event logs in local file and puts it to HBase periodically.
Simply use RubyGems:
gem install fluent-plugin-hbase
<match pattern> type hbase include_time_key TRUE_OR_FALSE time_key KEY_IN_RECORD include_tag_key TRUE_OR_FALSE tag_key KEY_IN_RECORD fields_to_columns_mapping MAPPING_FROM_JSON_FIELDS_TO_HBASE_COLUMNS hbase_host YOUR_HBASE_HOST hbase_port YOUR_HBASE_PORT hbase_table YOUR_HBASE_TABLE_NAME # DEPRECATED # The below configuration keys are deprecated to # stay along with fluentd's well-known configuration keys. # Instead, use: # - include_time_key # - time_key # - include_tag_key # - tag_key # - fields_to_columns_mapping tag_column_name HBASE_COLUMN time_column_name HBASE_COLUMN </match>
- include_time_key (optional)
-
If you need to write each event’s time to HBase, set this true
and add a mapping from the ‘time_key’ to HBase column to the ‘fields_to_columns_mapping’ configuration.
The default is false.
- time_key (optional)
-
The key where each event’s time is kept for mapping to a HBase column.
You MUST add a mapping from the key configured here to a HBase column, to actually write times to HBase.
The default is “time”.
- include_tag_key (optional)
-
If you need to write each event’s tag to HBase, set this true
and add a mapping from the ‘tag_key’ to HBase column to the ‘fields_to_columns_mapping’ configuration.
The default is false.
- tag_key (optional)
-
The key where each event’s tag is kept for mapping to a HBase column.
You MUST add a mapping from the key configured here to a HBase column, to actually write tags to HBase.
The default is “tag”.
- tag_column_name (deprecated)
-
The HBase column to save the tag attached to each Fleuntd event log,
in the format “[Column family name]:[Column name]”. For example, to save tags to the column “c” in the column family “cf”, use “cf:c”.
- time_column_name (deprecated)
-
The HBase column to save the time each Fluentd event log was sent,
in the format “[Column family name]:[Colum name]”. For example, to save the time to the column “t” in the column family “cf”, use “cf:t”.
- fields_to_columns_mapping (required)
-
The mapping from JSON fields to HBase columns,
in the format “[JSON_FIELD1]=>,[JSON_FIELD2]=>,…”.
Each JSON_FIELD is formatted as field names separated by dot(.)s, e.g. “a.b.c”. Each HBASE_COLUMN is formatted as a tripled of a column family name, a colon(:), a column name, e.g. “cf:c”.
- hbase_host (required)
-
HBase host
- hbase_port (required)
-
HBase port
- hbase_table (required)
-
HBase table name
See example/fluent.conf for the configuration should work.
You can also test the configuration running a fluentd instance:
fluentd -c example/fluent.conf --plugin lib/fluent/plugin
You must setup your own Hadoop and HBase clusters and open appropriate ports to enable the plugin to access HBase via HBase Thrift Server.
The plugin is tested solely on the system with:
-
Hadoop 1.0.4
-
HBase 0.94.0
-
Java 1.6.0_37
-
Mac OS X 10.8 Mountain Lion
for now.
Please let me know if you find the plugin to work in any other environments.
To make the plugin work, you need running instances of:
-
Hadoop
-
HBase
-
HBase Thrift Server w/ the compact (buffered) protocol (not the framed protocol)
The procedure may be:
-
Start Hadoop
$ start-all.sh
-
Start HBase
$ start-hbase.sh
-
Start HBase Thrift Server with the compact protocol
Use the thread pool server as it is the only server supports the compact protocol:
$ hbase thrift start -threadpool
-
Run Fluentd
- Copyright
-
Copyright © 2012 FURYU CORPORATION
- License
-
Apache License, Version 2.0