COMP5349A1

Analyse big dataset with hadoop mapreduce

Refer to assignment1_handout.pdf for detailed requirement

##How to run ###requirement Hadoop 2.6.0
###Steps

create a hdfs dirctory in your hdfs home named place and upload the place.txt into it
create another hdfs directory in your hdfs home named photo and upload n01.txt into it
set A1_HOME environment variable to store the intermidiate output for each jobs
In the pom.xml directory : mvn package
cd to task1.sh and task2.sh for each tasks,making sure the scripts stay in the same directory as the MRDriverTask1.class (MRDriverTask2.class)
pass an integer argument to the task1.sh (or task2.sh) indicating the job start from.for the first time,the argument is always 1

Provide feedback