changes to use hs2 interface and to run suite in a loop #5

epkalyanr · 2016-10-13T17:43:02Z

No description provided.

epkalyanr · 2016-10-13T17:44:01Z

epkalyanr · 2016-10-13T17:46:44Z

@abhijith31

dharmeshkakadia · 2016-10-14T20:56:59Z

tpch-scripts/CollectPerfData.sh


 echo "Completed Running PerfData Collection Scripts"

-zip -r $BENCH_HOME/$BENCHMARK/PerfData.zip $PERFDATA_OUTPUTDIR
+zip -r $BENCH_HOME/$BENCHMARK/PerfData_$RUN_ID.zip $PERFDATA_OUTPUTDIR


We currently Zip full path in the zip (e.g. home/hdiuser/hive-testbench/PerfData_2/pat/tpch_query_2/.... ). Can we correct the zipping to not include the unnecessary /hdiuser/hive-testbench/ ?

dharmeshkakadia · 2016-10-14T20:58:27Z

tpch-scripts/RunQueriesAndCollectPATData.sh


-LOG_DIR=$BENCH_HOME/$BENCHMARK/logs/
+LOG_DIR=$BENCH_HOME/$BENCHMARK/logs_$RUN_ID/


Can we include everything about one run under a single dir?

dharmeshkakadia · 2016-10-14T20:58:59Z

tpch-scripts/RunQueriesAndCollectPATData.sh

 BENCH_HOME=$( cd "$( dirname "${BASH_SOURCE[0]}" )/../../" && pwd );
 echo "\$BENCH_HOME is set to $BENCH_HOME";

 BENCHMARK=hive-testbench

-RESULT_DIR=$BENCH_HOME/$BENCHMARK/results/
+RESULT_DIR=$BENCH_HOME/$BENCHMARK/results_$RUN_ID/



Can we include everything about one run under a single dir?

dharmeshkakadia · 2016-10-14T20:59:20Z

tpch-scripts/RunSuiteLoop.sh

@@ -0,0 +1,22 @@
+#!/bin/bash
+#usage: ./RunSingleQueryLoop QUERY_NUMBER REPEAT_COUNT SCALCE_FACTOR CLUSTER_SSH_PASSWORD


Wrong usage.

dharmeshkakadia · 2016-10-14T21:00:44Z

tpch-scripts/TpchQueryExecute.sh


-PLAN_DIR=$BENCH_HOME/$BENCHMARK/plans/
+PLAN_DIR=$BENCH_HOME/$BENCHMARK/plans_$RUN_ID/



same as above. Under single dir?

dharmeshkakadia · 2016-10-14T21:01:05Z

tpch-scripts/TpchQueryExecute.sh


-		timeout ${TIMEOUT} hive -i ${HIVE_SETTING} --database ${DATABASE} -d EXPLAIN="" -f ${QUERY_DIR}/tpch_query${2}.sql > ${RESULT_DIR}/${DATABASE}_query${j}.txt 2>&1
+		 beeline -u ${CONNECTION_STRING} -i ${HIVE_SETTING} --hivevar EXPLAIN="" -f ${QUERY_DIR}/tpch_query${2}.sql > ${RESULT_DIR}/${DATABASE}_query${j}.txt 2>&1


nit: extra space at the start.

dharmeshkakadia · 2016-10-14T21:02:54Z

tpch-scripts/ValidateDataGen.sh

@@ -13,6 +13,8 @@ fi

 >${STATS_DIR}/tableinfo_${DATABASE}.txt;

-hive -d DB=${DATABASE} -f gettpchtablecounts.sql > ${STATS_DIR}/tablecounts_${DATABASE}.txt ;
-hive -d DB=${DATABASE} -f gettpchtableinfo.sql >> ${STATS_DIR}/tableinfo_${DATABASE}.txt ;
+CONNECTION_STRING="jdbc:hive2://localhost:10001/${DATABASE};transportMode=http"


Will this work in case of failover?

dharmeshkakadia · 2016-10-14T21:03:46Z

tpch-setup.sh

@@ -53,7 +53,7 @@ hdfs dfs -mkdir -p ${DIR}
 hdfs dfs -ls ${DIR}/${SCALE}/lineitem > /dev/null
 if [ $? -ne 0 ]; then
 	echo "Generating data at scale factor $SCALE."
-	(cd tpch-gen; hadoop jar target/*.jar -d ${DIR}/${SCALE}/ -s ${SCALE})
+	(cd tpch-gen; hadoop jar target/*.jar -D mapreduce.map.memory.mb=8192 -d ${DIR}/${SCALE}/ -s ${SCALE})


We should not hard code settings here. May be have a global variable or something if you really want.

dharmeshkakadia · 2016-10-14T21:05:03Z

tpch-setup.sh

-runcommand "hive -i settings/load-flat.sql -f ddl-tpch/bin_flat/alltables.sql -d DB=tpch_text_${SCALE} -d LOCATION=${DIR}/${SCALE}"
+
+DATABASE=tpch_text_${SCALE}
+CONNECTION_STRING="jdbc:hive2://localhost:10001/$DATABASE;transportMode=http"


Same as above.
Also, may be we should have all of these settings in a config file rather than repeating it everytime. This is prone to error.

changes to use hs2 interface and to run suite in a loop

542659b

dharmeshkakadia suggested changes Oct 14, 2016

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changes to use hs2 interface and to run suite in a loop #5

changes to use hs2 interface and to run suite in a loop #5

epkalyanr commented Oct 13, 2016

epkalyanr commented Oct 13, 2016

epkalyanr commented Oct 13, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016

dharmeshkakadia Oct 14, 2016


		LOG_DIR=$BENCH_HOME/$BENCHMARK/logs/
		LOG_DIR=$BENCH_HOME/$BENCHMARK/logs_$RUN_ID/

		@@ -0,0 +1,22 @@
		#!/bin/bash
		#usage: ./RunSingleQueryLoop QUERY_NUMBER REPEAT_COUNT SCALCE_FACTOR CLUSTER_SSH_PASSWORD


		PLAN_DIR=$BENCH_HOME/$BENCHMARK/plans/
		PLAN_DIR=$BENCH_HOME/$BENCHMARK/plans_$RUN_ID/


		timeout ${TIMEOUT} hive -i ${HIVE_SETTING} --database ${DATABASE} -d EXPLAIN="" -f ${QUERY_DIR}/tpch_query${2}.sql > ${RESULT_DIR}/${DATABASE}_query${j}.txt 2>&1
		beeline -u ${CONNECTION_STRING} -i ${HIVE_SETTING} --hivevar EXPLAIN="" -f ${QUERY_DIR}/tpch_query${2}.sql > ${RESULT_DIR}/${DATABASE}_query${j}.txt 2>&1

changes to use hs2 interface and to run suite in a loop #5

Are you sure you want to change the base?

changes to use hs2 interface and to run suite in a loop #5

Conversation

epkalyanr commented Oct 13, 2016

epkalyanr commented Oct 13, 2016

epkalyanr commented Oct 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment