Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run GRPC on dataproc due to error: java.lang.NoClassDefFoundError: io/opentelemetry/api/OpenTelemetry #1269

Open
surjikal opened this issue Nov 7, 2024 · 1 comment

Comments

@surjikal
Copy link

surjikal commented Nov 7, 2024

For a Spark cluster running on dataproc 2.2, with gcs connector version 3.0.3, I'm getting this error:

java.lang.NoClassDefFoundError: io/opentelemetry/api/OpenTelemetry
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.storage.GrpcStorageOptions.resolveSettingsAndOpts(GrpcStorageOptions.java:298)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.storage.GrpcStorageOptions.access$1400(GrpcStorageOptions.java:105)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.storage.GrpcStorageOptions$GrpcStorageFactory.create(GrpcStorageOptions.java:757)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.storage.GrpcStorageOptions$GrpcStorageFactory.create(GrpcStorageOptions.java:726)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.ServiceOptions.getService(ServiceOptions.java:582)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageClientImpl.createStorage(GoogleCloudStorageClientImpl.java:266)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageClientImpl.<init>(GoogleCloudStorageClientImpl.java:115)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.AutoBuilder_GoogleCloudStorageClientImpl_Builder.build(AutoBuilder_GoogleCloudStorageClientImpl_Builder.java:96)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystemImpl.createCloudStorage(GoogleCloudStorageFileSystemImpl.java:139)
        at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystemImpl.<init>(GoogleCloudStorageFileSystemImpl.java:175)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.createGcsFs(GoogleHadoopFileSystem.java:431)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.initializeGcsFs(GoogleHadoopFileSystem.java:381)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.initialize(GoogleHadoopFileSystem.java:311)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3612)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:175)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3713)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3664)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:558)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
        at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:366)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:229)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:211)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:362)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: io.opentelemetry.api.OpenTelemetry
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
        ... 38 more

The same error happens when I configure the connector at the cluster-level or at the job-level.

When using core:fs.gs.client.type=STORAGE_CLIENT (cluster-level), I get the error during the initialization actions (when HDFS is being set up).

When using spark.hadoop.fs.gs.client.type=STORAGE_CLIENT (job-level), it happens when I try to read a file from gcs.

@surjikal surjikal changed the title java.lang.NoClassDefFoundError: io/opentelemetry/api/OpenTelemetry Can't run GRPC on dataproc due to error: java.lang.NoClassDefFoundError: io/opentelemetry/api/OpenTelemetry Nov 7, 2024
@irajhedayati
Copy link

I had the same issue and it was related to some dependencies. Do you have google-cloud-storage or any of io.grpc in your dependency list? if not, can you find the effective version of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants