-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to set directory or filename dynamically based on the information in a tuple? #26
Comments
It will also take care of our use cases where we want to partition the data in HDFS based on the time when event was generated. The events generation and their entry into data ingestion pipeline can get delayed by several hours and in some cases few days. Flume already has this feature. Lack of this feature is blocking us not to use storm for the time being. |
hello,Thanks for your contiribution. Could you help me find the fault? |
1 similar comment
hello,Thanks for your contiribution. Could you help me find the fault? |
Hi there,
Thanks much for this contrib.
We kind of design a log aggregating system which collects logs from sources. The system puts all the log lines into a single Kafka topic. With the help of storm-kafka, we could consume each log line right now, but encounter a problem when we are going to transform each line into an HDFS file.
Sounds like storm-hdfs can only specify the directory and file name at the very first stage. We could not route different log lines from different log sources to different HDFS files.
by the way, we rewrote a whole framework like what you offered to by-pass this problem but ran into a performance issue when frequently appending and closing HDFS file, which made us give up.
Is there any plan that storm-hdfs is able to support this scenario in the future? Thanks!
The text was updated successfully, but these errors were encountered: