spark-eclib is a framework written in Scala to support the development of distributed population-based metaheuristics and their application to the global optimization of large-scale problems in Spark clusters.
The spark-eclib framework is being developed by the Computer Architecture Group (GAC) at Universidade da Coruña (UDC) in collaboration with the Computational Biology Lab (formerly (Bio)Process Engineering Group) at Misión Biológica de Galicia (MBG-CSIC). Both groups maintain a long-term research collaboration on the field of global optimization of large-scale problems from Computational Biology using distributed frameworks on computational clusters. After having implemented and evaluated different ad-hoc implementations of distributed population-based metaheuristics on frameworks like Hadoop or Spark (e.g. SiPDE, eSS), the development of spark-eclib was started with the main objective of avoiding to reinvent the wheel every time a new metaheuristic is implemented and to improve the automation and reproducibility of the optimization experiments.
The framework provides a reduced set of abstractions to represent the general structure of population-based metaheuristics as templates from which different variants of algorithms can be instantiated by the implementation of strategies. Strategies can be reused between metaheuristics, thus enforcing code reusability. To validate the approach, a template for Particle Swarm Optimization (PSO) was implemented applying the general abstractions provided by the framework. The template supports the instantiation of different variants of the PSO algorithm, a long list of configurable topologies, and several execution models (i.e. sequential, master-worker and island-based).
This repository contains a snapshot of the state of the source code of the framework as described in 10.1016/j.swevo.2024.101483
Please, if you use spark-eclib, cite our work using the following reference:
Xoán C. Pardo, Patricia González, Julio R. Banga, Ramón Doallo. Population based metaheuristics in Spark: towards a general framework using PSO as a case study. Swarm and Evolutionary Computation, 85 (2024), article 101483, 10.1016/j.swevo.2024.101483
To build the project use the Maven command:
mvn clean package
The resulting fat jar eclib-0.0.1-test-jar-with-dependencies.jar
will be placed in the target
folder of the project.
The simplest command to submit to a Spark cluster a job using spark-eclib would be:
spark-submit --class gal.udc.gac.eclib.EclibTest \
--master <master-url> \
eclib-0.0.1-test-jar-with-dependencies.jar \
<configuration_file>
Refer to the README.md file in the testbed\kubernetes
and testbed\cluster
directories.
Besides master
, there are branches with different combinations of the optimizations implemented
to reduce the number of Spark jobs per iteration. These branches were used to profile the
performance of the parallel implementations with and without optimizations.
This code is open source software licensed under the GPLv3 License.
Xoán C. Pardo <[email protected]>
Computer Architecture Group / CITIC
Universidade da Coruña (UDC)
Spain