[template] create templates for use in generating actions #1282

cjac · 2024-12-20T02:38:51Z

This PR should resolve #1276 and is an attempt at better solving the problem space of #1030

I believe that #1259 could be implemented easier using this change.

The mlvm/mlvm.sh action is generated using the templates defined here. There is also code to generate other actions in the templates committed with this PR, but only this action is generated and tested here. See other PRs for each individual action.

cjac · 2024-12-29T22:08:45Z

templates/generate-action.pl

Hello @davorg - As I was reading through the literature, bringing myself back up to speed on the state of the art of template toolkits, I saw that there was a book with one of my friends' names on it that I had glanced at many times over the last couple of decades. I did not realize until just a few days ago that the dlc who was in charge of desk allocation in my cube farm when I started was the same dlc who wrote the book on this particular subject.

Anyway, I've been thinking of you and our peers as I've been hacking away at this installer. If you felt like looking things over and picking some nits, I'd love to hear your feedback. I hope your holidays are merry and all that!

@shlomif oh, hey I see that you are actively participating in Template.pm development. I'm not doing a lot with it in this repository; everything is pretty straightforward, I think. If you had some spare time to take a peek at the new templates/ directory in this repo, and especially the templates/generate-action.pl, it might be fun to chat about it. I hope your holidays went well!

@cjac: hi! Where can I find the templates directory? Please give a url.

@shlomif thanks for your prompt response! That URL is

https://github.com/LLC-Technologies-Collier/initialization-actions/tree/template-gpu-20241219

cjac · 2025-01-02T20:03:24Z

/gcbrun

cjac · 2025-01-02T20:10:50Z

/gcbrun

cjac · 2025-01-02T20:26:39Z

/gcbrun

cjac · 2025-01-02T20:31:40Z

/gcbrun

cjac

added some comments to address issues with documentation

cjac · 2025-01-02T20:37:02Z

templates/spark-rapids/mig.sh.in

+# --metadata=ENABLE_MIG can be used to enable or disable MIG. The default is to enable it.
+# The script does a reboot to fully enable MIG and then configures the MIG device based on the
+# user specified MIG_CGI profiles specified via: --metadata=^:^MIG_CGI='9,9'. If MIG_CGI
+# is not specified it assumes it's using an A100 and configures 2 instances with profile id 9.


s/A100/H100/

cjac · 2025-01-02T20:37:28Z

templates/spark-rapids/mig.sh.in

+#
+# This script should be specified in --metadata=startup-script-url= option and
+# --metadata=ENABLE_MIG can be used to enable or disable MIG. The default is to enable it.
+# The script does a reboot to fully enable MIG and then configures the MIG device based on the


does not ever reboot, and neither should you

templates/spark-rapids/mig.sh.in

cjac · 2025-01-02T20:44:59Z

/gcbrun

cjac · 2025-01-03T00:45:59Z

/gcbrun

cjac · 2025-01-03T02:17:03Z

/gcbrun

cjac · 2025-01-03T02:33:57Z

using the test suite I just cleaned up for #1275

cjac · 2025-01-03T03:04:37Z

/gcbrun

cjac · 2025-01-03T03:25:02Z

2.1-debian11 failure:

2025-01-03T03:18:35.157639402Z AssertionError: 1 != 0 : Failed to execute command:
2025-01-03T03:18:35.157650162Z gcloud dataproc jobs submit spark --cluster=test-gpu-standard-2-1-20250103-030909-kdee --region=us-central1 --jars=file:///usr/lib/spark/examples/jars/spark-examples.jar --class=org.apache.spark.examples.ml.JavaIndexToStringExample --properties=spark.executor.resource.gpu.amount=1,spark.executor.cores=6,spark.executor.memory=4G,spark.task.resource.gpu.amount=0.333,spark.task.cpus=2,spark.yarn.unmanagedAM.enabled=false
2025-01-03T03:18:35.157660172Z STDOUT:
2025-01-03T03:18:35.157694322Z 
2025-01-03T03:18:35.157706472Z STDERR:
2025-01-03T03:18:35.157715992Z Job [474683bad64a45e8af6cc00ccc9695ae] submitted.
2025-01-03T03:18:35.157726222Z Waiting for job output...
2025-01-03T03:18:35.157735722Z 25/01/03 03:14:42 INFO SparkEnv: Registering MapOutputTracker
2025-01-03T03:18:35.157768422Z 25/01/03 03:14:42 INFO SparkEnv: Registering BlockManagerMaster
2025-01-03T03:18:35.157778362Z 25/01/03 03:14:42 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
2025-01-03T03:18:35.157787802Z 25/01/03 03:14:42 INFO SparkEnv: Registering OutputCommitCoordinator
2025-01-03T03:18:35.157797152Z 25/01/03 03:14:43 INFO DataprocSparkPlugin: Registered 128 driver metrics
2025-01-03T03:18:35.157805932Z 25/01/03 03:14:43 INFO ShimLoader: Loading shim for Spark version: 3.3.2
2025-01-03T03:18:35.157815022Z 25/01/03 03:14:43 INFO ShimLoader: Complete Spark build info: 3.3.2, https://bigdataoss-internal.googlesource.com/third_party/apache/spark, dataproc-branch-3.3.2, 5672c094ffe3ff9aa967db7b81163e1cc586a093, 2024-10-23T22:06:45Z
2025-01-03T03:18:35.157824862Z 25/01/03 03:14:43 INFO ShimLoader: findURLClassLoader found a URLClassLoader org.apache.spark.util.MutableURLClassLoader@61ab89b0
2025-01-03T03:18:35.157836082Z 25/01/03 03:14:43 INFO ShimLoader: Updating spark classloader org.apache.spark.util.MutableURLClassLoader@61ab89b0 with the URLs: jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark3xx-common/, jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark332/
2025-01-03T03:18:35.157845492Z 25/01/03 03:14:43 INFO ShimLoader: Spark classLoader org.apache.spark.util.MutableURLClassLoader@61ab89b0 updated successfully
2025-01-03T03:18:35.157869502Z 25/01/03 03:14:43 INFO ShimLoader: Updating spark classloader org.apache.spark.util.MutableURLClassLoader@61ab89b0 with the URLs: jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark3xx-common/, jar:file:/usr/lib/spark/jars/rapids-4-spark_2.12-23.08.2.jar!/spark332/
2025-01-03T03:18:35.157880132Z 25/01/03 03:14:43 INFO ShimLoader: Spark classLoader org.apache.spark.util.MutableURLClassLoader@61ab89b0 updated successfully
2025-01-03T03:18:35.157891322Z 25/01/03 03:14:43 INFO RapidsPluginUtils: RAPIDS Accelerator build: {date=2023-10-05T09:57:39Z, cudf_version=23.08.0, version=23.08.2, user=, branch=HEAD, url=https://github.com/NVIDIA/spark-rapids.git, revision=56da18a1be0148025cb00ced2ffe039fbf9c3391}
2025-01-03T03:18:35.157900352Z 25/01/03 03:14:43 INFO RapidsPluginUtils: RAPIDS Accelerator JNI build: {date=2023-08-10T03:31:37Z, version=23.08.0, user=, branch=HEAD, url=https://github.com/NVIDIA/spark-rapids-jni.git, revision=73fcd5ce22a622e5937a613bc5c4a1b32a40aec1}
2025-01-03T03:18:35.157909062Z 25/01/03 03:14:43 INFO RapidsPluginUtils: cudf build: {date=2023-08-10T03:31:37Z, version=23.08.0, user=, branch=HEAD, url=https://github.com/rapidsai/cudf.git, revision=8150d38e080c8fb021921ade83fe3aa3be04b47d}
2025-01-03T03:18:35.157917332Z 25/01/03 03:14:43 WARN RapidsPluginUtils: RAPIDS Accelerator 23.08.2 using cudf 23.08.0.
2025-01-03T03:18:35.157926612Z 25/01/03 03:14:43 WARN RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.
2025-01-03T03:18:35.157935632Z 25/01/03 03:14:43 WARN RapidsPluginUtils: The current setting of spark.task.resource.gpu.amount (0.333) is not ideal to get the best performance from the RAPIDS Accelerator plugin. It's recommended to be 1/{executor core count} unless you have a special use case.
2025-01-03T03:18:35.157944382Z 25/01/03 03:14:43 WARN RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.
2025-01-03T03:18:35.157954512Z 25/01/03 03:14:43 WARN RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.
2025-01-03T03:18:35.157963672Z 25/01/03 03:14:44 INFO DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at test-gpu-standard-2-1-20250103-030909-kdee-m.us-central1-f.c.cloud-dataproc-ci.internal./10.128.0.50:8032
2025-01-03T03:18:35.157972112Z 25/01/03 03:14:44 INFO AHSProxy: Connecting to Application History server at test-gpu-standard-2-1-20250103-030909-kdee-m.us-central1-f.c.cloud-dataproc-ci.internal./10.128.0.50:10200
2025-01-03T03:18:35.157991762Z 25/01/03 03:14:44 INFO Configuration: found resource resource-types.xml at file:/etc/hadoop/conf.empty/resource-types.xml
2025-01-03T03:18:35.158000832Z 25/01/03 03:14:44 INFO ResourceUtils: Adding resource type - name = yarn.io/gpu, units = , type = COUNTABLE
2025-01-03T03:18:35.158009842Z 25/01/03 03:14:46 INFO YarnClientImpl: Submitted application application_1735873832026_0001
2025-01-03T03:18:35.158020202Z 25/01/03 03:14:56 INFO GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
2025-01-03T03:18:35.158029782Z 25/01/03 03:15:00 WARN GpuOverrides: 
2025-01-03T03:18:35.158038982Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158050092Z   @Expression <AggregateExpression> stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158059502Z     ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158069402Z       ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158101482Z         ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158112172Z           @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158123202Z       ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158131912Z         ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158140402Z       ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158148742Z         ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158158062Z       ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158166272Z         ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158186702Z   @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158195922Z   @Expression <Alias> StringIndexerAggregator(org.apache.spark.sql.Row)#14 AS StringIndexerAggregator(org.apache.spark.sql.Row)#15 could run on GPU
2025-01-03T03:18:35.158204532Z     @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158213622Z   !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.158222172Z     @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.158230522Z     !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158239022Z       @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158247572Z         ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158256122Z           ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158265132Z             ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158284682Z               @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158293312Z           ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158302412Z             ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158311202Z           ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158327352Z             ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158336902Z           ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158345412Z             ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158353802Z       @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.158372552Z       @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.158381252Z       ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.158390802Z         @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158399782Z 
2025-01-03T03:18:35.158408922Z 25/01/03 03:15:00 INFO GpuOverrides: Plan conversion to the GPU took 82.60 ms
2025-01-03T03:18:35.158418282Z 25/01/03 03:15:00 WARN GpuOverrides: 
2025-01-03T03:18:35.158427332Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158436242Z   @Expression <AggregateExpression> stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158444812Z     ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158453752Z       ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158462122Z         ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158470662Z           @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158478612Z       ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158487022Z         ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158495642Z       ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158504672Z         ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158513622Z       ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158522272Z         ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158531532Z   @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158540392Z   @Expression <Alias> StringIndexerAggregator(org.apache.spark.sql.Row)#14 AS StringIndexerAggregator(org.apache.spark.sql.Row)#15 could run on GPU
2025-01-03T03:18:35.158563802Z     @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158572982Z   !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.158581902Z     @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.158591402Z     !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158600182Z       @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158609412Z         ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158617432Z           ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158626352Z             ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158635312Z               @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158644272Z           ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158653102Z             ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158661752Z           ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158690872Z             ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158701652Z           ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158710832Z             ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158719952Z       @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.158729072Z       @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.158738432Z       ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.158759192Z         @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158768182Z 
2025-01-03T03:18:35.158777272Z 25/01/03 03:15:00 INFO GpuOverrides: Plan conversion to the GPU took 7.28 ms
2025-01-03T03:18:35.158786472Z 25/01/03 03:15:01 INFO GpuOverrides: Plan conversion to the GPU took 1.26 ms
2025-01-03T03:18:35.158795052Z 25/01/03 03:15:01 WARN GpuOverrides: 
2025-01-03T03:18:35.158804352Z !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158827882Z   @Expression <AggregateExpression> stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.158838842Z     ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.158848372Z       ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.158857722Z         ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.158867352Z           @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.158876662Z       ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158888172Z         ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158897632Z       ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.158906542Z         ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158915172Z       ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.158923792Z         ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.158932872Z   @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158941622Z   @Expression <Alias> StringIndexerAggregator(org.apache.spark.sql.Row)#14 AS StringIndexerAggregator(org.apache.spark.sql.Row)#15 could run on GPU
2025-01-03T03:18:35.158962062Z     @Expression <AttributeReference> StringIndexerAggregator(org.apache.spark.sql.Row)#14 could run on GPU
2025-01-03T03:18:35.158971042Z   !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.158980162Z     @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.158988692Z     !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.158997872Z       @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.159030592Z         ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.159043982Z           ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.159053702Z             ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.159062732Z               @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159081082Z           ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159091482Z             ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159100782Z           ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.159109962Z             ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159119532Z           ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159128692Z             ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159138382Z       @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.159147352Z       @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.159156532Z       ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.159177462Z         @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159186652Z 
2025-01-03T03:18:35.159195722Z 25/01/03 03:15:01 INFO GpuOverrides: Plan conversion to the GPU took 4.75 ms
2025-01-03T03:18:35.159204872Z 25/01/03 03:15:01 INFO GpuOverrides: GPU plan transition optimization took 13.66 ms
2025-01-03T03:18:35.159214022Z 25/01/03 03:15:01 WARN GpuOverrides: 
2025-01-03T03:18:35.159223582Z !Exec <ShuffleExchangeExec> cannot run on GPU because Columnar exchange without columnar children is inefficient
2025-01-03T03:18:35.159233342Z   @Partitioning <SinglePartition$> could run on GPU
2025-01-03T03:18:35.159242602Z   !Exec <ObjectHashAggregateExec> cannot run on GPU because not all expressions can be replaced
2025-01-03T03:18:35.159252392Z     @Expression <AggregateExpression> partial_stringindexeraggregator(org.apache.spark.ml.feature.StringIndexerAggregator@f243d5c, Some(createexternalrow(category#1.toString, StructField(category,StringType,false))), Some(interface org.apache.spark.sql.Row), Some(StructType(StructField(category,StringType,false))), encodeusingserializer(input[0, java.lang.Object, true], true), decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true), encodeusingserializer(input[0, java.lang.Object, true], true), BinaryType, true, 0, 0) could run on GPU
2025-01-03T03:18:35.159261702Z       ! <ComplexTypedAggregateExpression> StringIndexerAggregator(org.apache.spark.sql.Row) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression
2025-01-03T03:18:35.159282182Z         ! <CreateExternalRow> createexternalrow(category#1.toString, StructField(category,StringType,false)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
2025-01-03T03:18:35.159293432Z           ! <Invoke> category#1.toString cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
2025-01-03T03:18:35.159302532Z             @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159311742Z         ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159321662Z           ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159333852Z         ! <DecodeUsingSerializer> decodeusingserializer(input[0, binary, true], Array[org.apache.spark.util.collection.OpenHashMap], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.DecodeUsingSerializer
2025-01-03T03:18:35.159343432Z           ! <BoundReference> input[0, binary, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159353252Z         ! <EncodeUsingSerializer> encodeusingserializer(input[0, java.lang.Object, true], true) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.EncodeUsingSerializer
2025-01-03T03:18:35.159362232Z           ! <BoundReference> input[0, java.lang.Object, true] cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.BoundReference
2025-01-03T03:18:35.159371352Z     @Expression <AttributeReference> buf#19 could run on GPU
2025-01-03T03:18:35.159390812Z     @Expression <AttributeReference> buf#20 could run on GPU
2025-01-03T03:18:35.159400042Z     ! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
2025-01-03T03:18:35.159408912Z       @Expression <AttributeReference> category#1 could run on GPU
2025-01-03T03:18:35.159417562Z 
2025-01-03T03:18:35.159426412Z 25/01/03 03:15:01 INFO GpuOverrides: Plan conversion to the GPU took 4.15 ms
2025-01-03T03:18:35.159435922Z 25/01/03 03:15:01 INFO GpuOverrides: GPU plan transition optimization took 7.25 ms
2025-01-03T03:18:35.159446272Z 25/01/03 03:15:43 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container from a bad node: container_1735873832026_0001_01_000003 on host: test-gpu-standard-2-1-20250103-030909-kdee-w-0.us-central1-f.c.cloud-dataproc-ci.internal. Exit status: 1. Diagnostics: [2025-01-03 03:15:43.085]Exception from container-launch.
2025-01-03T03:18:35.159455422Z Container id: container_1735873832026_0001_01_000003
2025-01-03T03:18:35.159464442Z Exit code: 1
2025-01-03T03:18:35.159472832Z Exception message: Launch container failed
2025-01-03T03:18:35.159481212Z Shell error output: Nonzero exit code=1, error message='Invalid argument number'

cjac · 2025-01-04T03:46:06Z

/gcbrun

cjac · 2025-01-04T05:17:29Z

/gcbrun

cjac · 2025-01-04T05:33:08Z

/gcbrun

cjac · 2025-01-04T07:24:59Z

/gcbrun

cjac · 2025-01-04T07:37:31Z

well that's good news, then.

cjac · 2025-01-04T07:39:48Z

/gcbrun

cjac · 2025-01-04T09:19:40Z

/gcbrun

rapids.sh.in into a function install_conda_packages * removed redundant yarn service restarts in rapids.sh.in * added conda prep and exit handlers

templates/dask/util_functions, templates/gpu/install_gpu_driver.sh.in, templates/gpu/util_functions, templates/rapids/rapids.sh.in, templates/spark-rapids/spark-rapids.sh.in: * cleaned up definition of RAPIDS_RUNTIME ; default to SPARK and use DASK only for dask-rapids templates/dask/util_functions, templates/gpu/util_functions, templates/common/util_functions: * added utility functions to check whether a phase has been complete, mark a phase complete and mark a phase as incomplete templates/dask/util_functions: * conda environment is now archived from the environment directory rather than from / templates/rapids/rapids.sh.in: * Now executing gpu installer logic before installing dask-rapids * now exiting if rapids runtime is not DASK

* increased minimum memory threshold for ram disk * moved apt_add_repo and friends to common/install_functions templates/dask/util_functions: * validating conda tarball before caching to gcs templates/generate-action.pl: * improved usage documentation a little templates/gpu/install_functions * using /opt/conda/miniconda3/bin/python3 instead of /usr/bin/ for venv pre-install

* increase wait time for scheduler to come online * reduce noise from tar -t templates/gpu/yarn_functions, templates/gpu/install_functions: * protect many functions from running without attached accelerator templates/gpu/install_gpu_driver.sh.in * set +e in exit handler templates/gpu/spark_functions: * re-factor new function into this template templates/spark-rapids/spark-rapids.sh.in * removed redundant call to configure_gpu_script * set +e in exit handler

…ing the readonly operations

…-20241225 branch

* include version in action generator

cjac · 2025-01-10T01:14:05Z

templates/generate-action.pl

@dlc - Can I get a review of the templates/ directory in this repository, please? I tried to keep it simple for the initial implementation, but if you have any advice about how we can further reduce duplication, I'd be all ears. I'm thinking about picking up your book and getting into the minutia, but the PR will be closed far before then, I hope!

cjac self-assigned this Dec 20, 2024

cjac mentioned this pull request Dec 20, 2024

[gpu][spark-rapids] Fix MIG script #1269

Open

cjac requested review from bsidhom, abmodi, ahzaz, aalexx-S and bradmiro December 22, 2024 03:26

cjac mentioned this pull request Dec 24, 2024

update rapids version for 24.10 release #1248

Merged

cjac commented Dec 29, 2024

View reviewed changes

cjac force-pushed the template-gpu-20241219 branch from 8d28938 to e511a6e Compare January 2, 2025 20:23

cjac commented Jan 2, 2025

View reviewed changes

cjac mentioned this pull request Jan 4, 2025

[FEA]Support pyspark.ml.feature.OneHotEncoder, StringIndexer NVIDIA/spark-rapids-ml#63

Open

cjac marked this pull request as ready for review January 4, 2025 07:59

cjac added 12 commits January 8, 2025 22:09

moved knox dask config to templates/dask/util_functions

2e45a75

added copyright to templates/legal/license_header

33fdd38

latest generated action

4f974c5

removed redundant template disclaimer

75d8e32

setup and tear-down for actions which work with conda

34fce25

* refactored common conda installer functionality from dask.sh.in and

bbe062e

rapids.sh.in into a function install_conda_packages * removed redundant yarn service restarts in rapids.sh.in * added conda prep and exit handlers

tested rapids.sh init action with dataproc-repro

10f1698

refactor yarn functions into their own template

b01b867

refactor mig functions into their own template

c6c09db

state before gpu rebranch

88f9f7f

cjac force-pushed the template-gpu-20241219 branch from 019f562 to 119f1b1 Compare January 9, 2025 06:09

cjac added 10 commits January 8, 2025 22:10

refactored spark variable definition and reduced excess lines by bulk…

a7b4707

…ing the readonly operations

development on these scripts will happen in the spark-rapids-template…

35ca704

…-20241225 branch

revert dask/ to master

43232b2

moving that .in suffix to the correct variable

4b6e520

reverted to master ; changes ended up in gpu-template-20250107

4a024e0

including libtemplate-perl as a dependency

f00e2f8

moved to dask-template-20250104

7118ebf

moved to gpu-template-20250107

f2b50f7

* include version in template disclaimer

900c10a

* include version in action generator

cjac force-pushed the template-gpu-20241219 branch from 763e1ff to 900c10a Compare January 9, 2025 22:03

cjac commented Jan 10, 2025

View reviewed changes

migrated rapids.sh base template to rapids-template-20250106

bef08b1

cjac force-pushed the template-gpu-20241219 branch from cea2aa3 to 2afff45 Compare January 10, 2025 03:20

script to generate all actions from templates

aa792c3

cjac force-pushed the template-gpu-20241219 branch from 2afff45 to aa792c3 Compare January 10, 2025 03:23

spark prepare steps belong in common

824bcf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[template] create templates for use in generating actions #1282

[template] create templates for use in generating actions #1282

cjac commented Dec 20, 2024 •

edited

Loading

cjac Dec 29, 2024

cjac Jan 8, 2025

shlomif Jan 8, 2025

cjac Jan 8, 2025 •

edited

Loading

cjac commented Jan 2, 2025

cjac commented Jan 2, 2025

cjac commented Jan 2, 2025

cjac commented Jan 2, 2025

cjac left a comment

cjac Jan 2, 2025

cjac Jan 2, 2025

cjac commented Jan 2, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac Jan 10, 2025

[template] create templates for use in generating actions #1282

Are you sure you want to change the base?

[template] create templates for use in generating actions #1282

Conversation

cjac commented Dec 20, 2024 • edited Loading

cjac Dec 29, 2024

Choose a reason for hiding this comment

cjac Jan 8, 2025

Choose a reason for hiding this comment

shlomif Jan 8, 2025

Choose a reason for hiding this comment

cjac Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

cjac commented Jan 2, 2025

cjac commented Jan 2, 2025

cjac commented Jan 2, 2025

cjac commented Jan 2, 2025

cjac left a comment

Choose a reason for hiding this comment

cjac Jan 2, 2025

Choose a reason for hiding this comment

cjac Jan 2, 2025

Choose a reason for hiding this comment

cjac commented Jan 2, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 3, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac commented Jan 4, 2025

cjac Jan 10, 2025

Choose a reason for hiding this comment

cjac commented Dec 20, 2024 •

edited

Loading

cjac Jan 8, 2025 •

edited

Loading