forked from rcrowley/bashreduce
-
Notifications
You must be signed in to change notification settings - Fork 4
Home
jweslley edited this page Sep 13, 2010
·
15 revisions
br supports processing a directory full of files rather than a single file. Any of the files may be compressed using gzip and bashreduce will detect that and transparently handle decompression when -i is specified. However, gzipped stdin is not supported, use zcat
instead.
$ ls input_directory file1.gz file2.gz file3.gz $ br -m "grep abc" -i input_directory -o output
or
$ br -m "grep abc" -i input.gz -o output
The -M option allows you to specify your own merge program instead of the default (sort -M). It enables you to create a re-reduce step on the end. A user still has to be careful to do as little work as possible in the merge step since it is serializing the output of map and reduce. Attention: if your merge step is significant, it will dominate and performance gains will be reduced.
$ br -m "cut -f2 | sort" -r "uniq -c" -M ""merge.rb":https://gist.github.com/eeaf80f29d0d27342feb"