Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need some CI pipelines to validate the scripts to avoid any mistake #27

Open
GaryShen2008 opened this issue Jun 22, 2022 · 2 comments
Open

Comments

@GaryShen2008
Copy link
Collaborator

No description provided.

@GaryShen2008 GaryShen2008 changed the title Need some CI pipelines to validate the scripts to avoid any breaking change Need some CI pipelines to validate the scripts to avoid any mistake Jun 22, 2022
@wjxiz1992
Copy link
Collaborator

We can add a pre-merge CI job for this repo.

@wjxiz1992 wjxiz1992 self-assigned this Jun 23, 2022
@wjxiz1992
Copy link
Collaborator

Generate base data local python3 nds_gen_data.py local 1 2 $PWD/raw_sf1 --overwrite_output
Generate base data hdfs python3 nds_gen_data.py hdfs 1 2 hdfs:/nds2.0_ci/raw_sf1 --overwrite_output
Generate refresh data local python3 nds_gen_data.py local 1 2 /user/$USER/raw_refresh_sf1 --overwrite_output --update
Generate refresh data hdfs python3 nds_gen_data.py hdfs 1 2 hdfs:/nds2.0_ci/raw_refresh_sf1 --overwrite_output --update
Convert fresh data to parquet hdfs ./spark-submit-template convert_submits_gpu.template nds_transcode.py hdfs:/nds2.0_ci/raw_refresh_sf1 hdfs:/nds2.0_ci/parquet_refresh_sf1 report.txt --output_format parquet --output_mode overwrite --update
Convert base data to iceberg hdfs ./spark-submit-template convert_submits_gpu.template nds_transcode.py hdfs:/nds2.0_ci/raw_sf1 hdfs:/nds2.0_ci/iceberg_sf1 report.txt --output_format iceberg --output_mode overwrite
Generate query stream python nds_gen_query_stream.py $TPCDS_HOME/query_templates 3000 ./query_streams --streams 1
Power run ./spark-submit-template power_run_gpu.template \nds_power.py \hdfs:/nds2.0_ci/iceberg_sf1 ./nds_query_streams/query_0.sql \time.csv --property_file properties/aqe-on.properties --input_format iceberg --output_prefix hdfs:/nds2.0_ci/gpu_output_sf1
Data validation python nds_validate.py \hdfs:/nds2.0_ci/gpu_output_sf1 \hdfs:/nds2.0_ci/cpu_output_sf1 ./nds_query_streams/query_0.sql --ignore_ordering
Data maintenance ./spark-submit-template convert_submit_gpu_iceberg.template \nds_maintenance.py \hdfs:/nds2.0_ci/parquet_refresh_sf1./data_maintenance \time.csv
Throughput run ./nds-throughput 1,2 ./spark-submit-template power_run_gpu.template \nds_power.py \hdfs:/nds2.0_ci/iceberg_sf1 ./nds_query_streams/query_'{}'.sql \Time_'{}'.csv --input_format iceberg --output_prefix hdfs:/nds2.0_ci/gpu_output_sf1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants