pig-hive-wordcount

Wordcount is the "Hello World" for Hadoop, yet most of the Pig and Hive wordcount examples I've seen either require UDFs, external scripts, or they just don't do a very good job of counting words.

So, my goal here was not efficiency, but merely to create Pig and Hive scripts that:

Use only stock functions that ship with the language (no UDFs or external scripts)
Are short and simple
Do a pretty good job of counting words
Produce diff-able output

To make it diffable, I reformat the Hive output to look like the output of the Pig DUMP operator. In my few tests, output of the two scripts has been identical, or very close, most of the time, though Hive still insists on counting some invisible character occasionally.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
versionWithStopList		versionWithStopList
README.md		README.md
wordcount.hql		wordcount.hql
wordcount.pig		wordcount.pig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pig-hive-wordcount

About

Releases

Packages

slimandslam/pig-hive-wordcount

Folders and files

Latest commit

History

Repository files navigation

pig-hive-wordcount

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages