Skip to content

Spark job to run PDGF in parallel on a Spark cluster

Notifications You must be signed in to change notification settings

bankmark/pdgf-spark-exec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDGF Spark Exec

Execute PDGF on top of Spark

Build

sbt package

Usage

export PDGF_HOME=/dir/to/pdgf
spark-submit --jars "$PDGF_HOME/pdgf.jar, $PDGF_HOME/extlib/*" --files "$PDGF_HOME/dicts/*" target/scala-2.11/pdgf-spark-exec_2.11-1.0.1.jar [PDGF OPTIONS]

Example - TPCx-AI

Generate the training data sets for customer and CUSTOMER_IMAGES using the TPCx-AI data generator

export PDGF_HOME="$TPCXAI_HOME/lib/pdgf"
spark-submit --jars "$PDGF_HOME/pdgf.jar,$PDGF_HOME/extlib/*" --files "$PDGF_HOME/dicts/*" target/scala-2.11/pdgf-spark-exec_2.11-1.0.1.jar -ns -sp MY_SEED 1234.0 -sf 1 -sp includeLabels 1.0 -sp TTVF 1.0 -s customer  

About

Spark job to run PDGF in parallel on a Spark cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages