-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run IIS experiments by relying on spark 3.4 version #1426
Comments
marekhorst
added a commit
that referenced
this issue
Sep 22, 2023
Upgrading dependencies in pom.xml files, aligning with scala 2.12 and spark 3.4.1.
marekhorst
added a commit
that referenced
this issue
Sep 27, 2023
WIP. Commenting out avro related methods from scala sources which were relying on avro deserializer class which was made private in spark3. Aligning sources with those changes by changing the way dataframes are constructed from collections of avro records and other refactoring required to compile IIS sources successsfully. This does not mean the code is already operational, some tests fail and still need to be fixed. Upgrading logging system dependencies to match sharelib log4j dependencies version. Upgrading maven-plugin-plugin version to solve build bug induced by upgraded log4j version.
marekhorst
added a commit
that referenced
this issue
Sep 27, 2023
WIP. Commenting out avro related methods from scala sources which were relying on avro deserializer class which was made private in spark3. Aligning sources with those changes by changing the way dataframes are constructed from collections of avro records and other refactoring required to compile IIS sources successsfully. This does not mean the code is already operational, some tests fail and still need to be fixed. Upgrading logging system dependencies to match sharelib log4j dependencies version. Upgrading maven-plugin-plugin version to solve build bug induced by upgraded log4j version.
marekhorst
added a commit
that referenced
this issue
Sep 29, 2023
WIP. Fixing task serialization issue by upgrading avro dependency from 1.8.10 to 1.11.1 which is already a part of sharelib342. This induced the requirement to align JsonConverter with the new code and one of the requirements to move it to a different package due to limited visibility of one of the crucial methods. Further logging system dependency alignment to make unit tests output produced on console visible.
marekhorst
added a commit
that referenced
this issue
Oct 2, 2023
WIP. Replacing scala source code in iis-common module with java-based counterpart. Simplifying the code, aligning other classes with changes in avro read/write code.
marekhorst
added a commit
that referenced
this issue
Oct 3, 2023
WIP. Removing `provided` scope from the `spark-avro_2.12` dependency until making it part of sharelib342. Introducing required fixes for `eu/dnetlib/iis/wf/export/actionmanager/relation/citation/default` integration test to let it run relying on spark3: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors)
marekhorst
added a commit
that referenced
this issue
Oct 10, 2023
WIP. Fixing the changed results order in patent and software entity exporter integration tests. Introducing required fixes for various `iis-wf-export-actionmanager` exporters relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with similar workflow.xml related changes but their spark3 compatibility was not fully tested yet: * `iis-wf-affmatching` * `iis-wf-citationmatching-direct` * `iis-wf-citationmatching` * `iis-wf-documentsclassification` * `iis-wf-import` (`content_url/core_parquet`, `infospace`, `patent`) * `iis-wf-referenceextraction` (`community`, `concept`, `covid19`, `patent`, `project/funder_report`, `researchinitiative`, `softwareurl`) * `iis-wf-transformers` (`avro2json`)
marekhorst
added a commit
that referenced
this issue
Oct 11, 2023
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-affmatching` * `iis-wf-citationmatching-direct` * `iis-wf-documentsclassification` This was introduced to avoid the following exception: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class Adding `hadoop-mapreduce-client-core` and `hadoop-common` dependencies in `iis-wf-affmatching` and `iis-wf-citationmatching-direct` modules to reflect dependencies set from `iis-wf-export-actionmanager` and to avoid exception: IncompatibleClassChangeError: Class org.apache.hadoop.fs.AvroFSInput does not implement the requested interface org.apache.avro.file.SeekableInput
marekhorst
added a commit
that referenced
this issue
Oct 13, 2023
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-referenceextraction`
marekhorst
added a commit
that referenced
this issue
Oct 16, 2023
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-documentssimilarity` (explicitly excluded `hadoop-mapreduce-client-app` is still among spark342 sharelib dependencies what causes test failres) * `iis-wf-import` (infospace importer still fails due to spark3 regression, more details in #8941#note-35)
marekhorst
added a commit
that referenced
this issue
Jan 16, 2024
WIP. Upgrading spark dependency version from 3.4.1 to 3.4.2.
marekhorst
added a commit
that referenced
this issue
May 8, 2024
Upgrading dependencies in pom.xml files, aligning with scala 2.12 and spark 3.4.1.
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Commenting out avro related methods from scala sources which were relying on avro deserializer class which was made private in spark3. Aligning sources with those changes by changing the way dataframes are constructed from collections of avro records and other refactoring required to compile IIS sources successsfully. This does not mean the code is already operational, some tests fail and still need to be fixed. Upgrading logging system dependencies to match sharelib log4j dependencies version. Upgrading maven-plugin-plugin version to solve build bug induced by upgraded log4j version.
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Fixing task serialization issue by upgrading avro dependency from 1.8.10 to 1.11.1 which is already a part of sharelib342. This induced the requirement to align JsonConverter with the new code and one of the requirements to move it to a different package due to limited visibility of one of the crucial methods. Further logging system dependency alignment to make unit tests output produced on console visible.
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Replacing scala source code in iis-common module with java-based counterpart. Simplifying the code, aligning other classes with changes in avro read/write code.
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Removing `provided` scope from the `spark-avro_2.12` dependency until making it part of sharelib342. Introducing required fixes for `eu/dnetlib/iis/wf/export/actionmanager/relation/citation/default` integration test to let it run relying on spark3: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors)
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Fixing the changed results order in patent and software entity exporter integration tests. Introducing required fixes for various `iis-wf-export-actionmanager` exporters relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with similar workflow.xml related changes but their spark3 compatibility was not fully tested yet: * `iis-wf-affmatching` * `iis-wf-citationmatching-direct` * `iis-wf-citationmatching` * `iis-wf-documentsclassification` * `iis-wf-import` (`content_url/core_parquet`, `infospace`, `patent`) * `iis-wf-referenceextraction` (`community`, `concept`, `covid19`, `patent`, `project/funder_report`, `researchinitiative`, `softwareurl`) * `iis-wf-transformers` (`avro2json`)
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-affmatching` * `iis-wf-citationmatching-direct` * `iis-wf-documentsclassification` This was introduced to avoid the following exception: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class Adding `hadoop-mapreduce-client-core` and `hadoop-common` dependencies in `iis-wf-affmatching` and `iis-wf-citationmatching-direct` modules to reflect dependencies set from `iis-wf-export-actionmanager` and to avoid exception: IncompatibleClassChangeError: Class org.apache.hadoop.fs.AvroFSInput does not implement the requested interface org.apache.avro.file.SeekableInput
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-referenceextraction`
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-documentssimilarity` (explicitly excluded `hadoop-mapreduce-client-app` is still among spark342 sharelib dependencies what causes test failres) * `iis-wf-import` (infospace importer still fails due to spark3 regression, more details in #8941#note-35)
marekhorst
added a commit
that referenced
this issue
May 8, 2024
WIP. Upgrading spark dependency version from 3.4.1 to 3.4.2.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
Upgrading dependencies in pom.xml files, aligning with scala 2.12 and spark 3.4.1.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Commenting out avro related methods from scala sources which were relying on avro deserializer class which was made private in spark3. Aligning sources with those changes by changing the way dataframes are constructed from collections of avro records and other refactoring required to compile IIS sources successsfully. This does not mean the code is already operational, some tests fail and still need to be fixed. Upgrading logging system dependencies to match sharelib log4j dependencies version. Upgrading maven-plugin-plugin version to solve build bug induced by upgraded log4j version.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Fixing task serialization issue by upgrading avro dependency from 1.8.10 to 1.11.1 which is already a part of sharelib342. This induced the requirement to align JsonConverter with the new code and one of the requirements to move it to a different package due to limited visibility of one of the crucial methods. Further logging system dependency alignment to make unit tests output produced on console visible.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Replacing scala source code in iis-common module with java-based counterpart. Simplifying the code, aligning other classes with changes in avro read/write code.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Removing `provided` scope from the `spark-avro_2.12` dependency until making it part of sharelib342. Introducing required fixes for `eu/dnetlib/iis/wf/export/actionmanager/relation/citation/default` integration test to let it run relying on spark3: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors)
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Fixing the changed results order in patent and software entity exporter integration tests. Introducing required fixes for various `iis-wf-export-actionmanager` exporters relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with similar workflow.xml related changes but their spark3 compatibility was not fully tested yet: * `iis-wf-affmatching` * `iis-wf-citationmatching-direct` * `iis-wf-citationmatching` * `iis-wf-documentsclassification` * `iis-wf-import` (`content_url/core_parquet`, `infospace`, `patent`) * `iis-wf-referenceextraction` (`community`, `concept`, `covid19`, `patent`, `project/funder_report`, `researchinitiative`, `softwareurl`) * `iis-wf-transformers` (`avro2json`)
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-affmatching` * `iis-wf-citationmatching-direct` * `iis-wf-documentsclassification` This was introduced to avoid the following exception: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class Adding `hadoop-mapreduce-client-core` and `hadoop-common` dependencies in `iis-wf-affmatching` and `iis-wf-citationmatching-direct` modules to reflect dependencies set from `iis-wf-export-actionmanager` and to avoid exception: IncompatibleClassChangeError: Class org.apache.hadoop.fs.AvroFSInput does not implement the requested interface org.apache.avro.file.SeekableInput
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-referenceextraction`
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Introducing required workflow.xml fixes for various workflows relying on spark3 to let their integration tests to succeed: * setting `spark.extraListeners` and `spark.sql.queryExecutionListeners` explicitly to empty values in order to avoid relying on incompatible, spark2 compliant, cloudera listeners * setting `spark.shuffle.useOldFetchProtocol=true` in order to address `2.4 to 3.0 migration guide` requirement regarding protocol for fetching shuffle blocks backward compatibility (and avoiding `IllegalArgumentException: Unexpected message type: <number>` kind of errors) The following modules were covered with workflow.xml related changes which resulted in successful integration tests execution: * `iis-wf-documentssimilarity` (explicitly excluded `hadoop-mapreduce-client-app` is still among spark342 sharelib dependencies what causes test failres) * `iis-wf-import` (infospace importer still fails due to spark3 regression, more details in #8941#note-35)
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
WIP. Upgrading spark dependency version from 3.4.1 to 3.4.2.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
Making citation matching module compilable with spark3 after it got rewritten from spark1.6 to spark2.4 (git#1475). It was not tested if it works.
marekhorst
added a commit
that referenced
this issue
Oct 24, 2024
Making SoftwareHeritage Origins importer spark3 compatible.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This task is related to running a subset of IIS modules currently written in spark 2.4 on the newly available spark 3.4 version.
This may require:
iis-common
scala sources which are difficult to maintain (and results in false positive errors indicated by the Eclipse IDE) and replacing those simple utility classes with java equivalentsThe text was updated successfully, but these errors were encountered: