Skip to content
jweslley edited this page Sep 13, 2010 · 15 revisions

User Guide

Taking directories as input and handling gzipped files

br supports processing a directory full of files rather than a single file. Any of the files may be compressed using gzip and bashreduce will detect that and transparently handle decompression when -i is specified. However, gzipped stdin is not supported, use zcat instead.

$ ls input_directory
file1.gz file2.gz file3.gz

$ br -m "grep abc" -i input_directory -o output

or

$ br -m "grep abc" -i input.gz -o output

Applying a re-reduce, the merge option

The -M option allows you to specify your own merge program instead of the default (sort -M). It enables you to create a re-reduce step on the end. A user still has to be careful to do as little work as possible in the merge step since it is serializing the output of map and reduce. Attention: if your merge step is significant, it will dominate and performance gains will be reduced.

$ br -m "cut -f2 | sort" -r "uniq -c" -M ""merge.rb":https://gist.github.com/eeaf80f29d0d27342feb"
Clone this wiki locally