-Uses of Class org.apache.sysds.resource.CloudUtils.InstanceType (Apache SystemDS 3.3.0-SNAPSHOT API)
+Uses of Class org.apache.sysds.resource.CloudUtils.InstanceFamily (Apache SystemDS 3.3.0-SNAPSHOT API)
@@ -94,7 +94,7 @@
-
Uses of Class org.apache.sysds.resource.CloudUtils.InstanceType
+
Uses of Class org.apache.sysds.resource.CloudUtils.InstanceFamily
diff --git a/pom.xml b/pom.xml
index 8e68fa2de7e..64616b94de9 100644
--- a/pom.xml
+++ b/pom.xml
@@ -64,6 +64,8 @@
3.5.03.11.03.1.0
+
+
11{java.level}
@@ -274,7 +276,7 @@
truelib/
- org.apache.sysds.api.ropt.Executor
+ org.apache.sysds.resource.ResourceOptimizerSystemDS.jar ${project.artifactId}-${project.version}.jar
@@ -413,6 +415,18 @@
run
+
+ rename-ropt-jar
+ package
+
+
+
+
+
+
+ run
+
+
@@ -1337,6 +1351,18 @@
+
+
+
+
commons-loggingcommons-logging
diff --git a/scripts/resource/README.md b/scripts/resource/README.md
new file mode 100644
index 00000000000..dadcfae77b0
--- /dev/null
+++ b/scripts/resource/README.md
@@ -0,0 +1,169 @@
+# Resource Optimizer
+The **Resource Optimizer** is an extension that allows for automatic generation of (near) optimal cluster configurations for
+executing a given SystemDS script in a cloud environment - currently only AWS.
+The target execution platform on AWS is EMR (Elastic MapReduce), but single node executions run on EC2 (Elastic Cloud Compute). In both cases files are expected to be pulled from Amazon S3 (Simple Storage Service).
+
+## Functionality
+
+The Resource Optimizer is extension of SystemDS and while employing the systems' general tools for program compilation and execution, this extension does not operate in execution time but is rather a separate executable program meant to be used upon running a DML script (program) in the cloud (AWS). The extension runs an optimization algorithm for finding the optimal cluster configurations for executing the target SystemDS program considering the following properties:
+
+* The input dataset properties (providing metadata files is required)
+* A defined set of available machine type for running the program in the cloud
+* The resulting execution plan of the program for the corresponding combination of dataset characteristics and hardware characteristics
+* The optimization objective: user-defined goal of optimization with options to aim for shortest execution time, lowest monetary cost or (by default) for configuration set fairly balancing both objectives.
+
+The result is a set of configuration files with all options needed for launching SystemDS in the cloud environment.
+
+For complete the automation process of running SystemDS program in the cloud, the extension is complimented with corresponding scripts to allow the user to launch the program with the generated optimal configurations with minimal additional effort.
+
+## User Manual
+
+### Installation
+The extension, although not part of the execution time of SystemDS, is fully integrated with the project and is compiled
+as part of the general compilation process using Maven. Check [here](../../README.md) for more information about that.
+
+Maven compiles a JAR file which is to be used for running the extension.
+The path to the file is `\target\ResourceOptimizer.jar`,
+but for easier usage you can complete the following steps from the project root folder to configure it on path:
+```shell
+# general step for SystemDS
+export SYSTEMDS_ROOT=$(pwd)
+# specific steps for the extension
+export PATH=$PATH:$SYSTEMDS_ROOT/scripts/resource/bin
+```
+The proper execution requires JDK 11 so make sure to export the correct JDK version to `$JAVA_HOME` environmental variable.
+
+### Usage
+
+The extension is installed as a separate java executable files, so it can be launched with `java -jar \target\ResourceOptimizer.jar ...`, but if you completed all additional steps from the **Installation** section then you can run directly the extension using the `systemds-ropt` command globally without specifying a target jar-file.
+
+The executable takes the following arguments:
+```txt
+ -args specifies positional parameters; first value will replace $1 in DML program, $2 will replace 2nd and so on
+
+ -f specifies DML file to execute; path should be local
+
+ -help shows usage message
+
+ -nvargs parameterizes DML script with named parameters of the form ; should be a valid identifier in DML
+
+ -options specifies options file for the resource optimization
+
+```
+Either `-help` or `-f` is required, where `-f` lead to the actual program execution. In that case if `-options` is not provided, that program will look for a default `options.properties` in the current directory and if no file found the program will fail. Like for SystemDS program execution, `-args` and `-nvargs` provide additional script arguments and they cannot be specified simultaneously. The `-options` arguments point to a file with options to customize further the optimization process. These options provide paths to further properties files and a optional optimization configurations and constraints.
+
+It is important for program arguments being dataset filepaths to be specified with their URI address that will be actually used by the SystemDS program later. Currently the only supported and tested option is S3 for storing input and outputs on AWS. Check [Required Options](#required-options) for further details.
+
+## Providing Options
+
+Although automatic, the optimization process requires providing certain properties beyond the SystemDS program and the dataset characteristics. These are in general hardware characteristics for potential resources available on the target cloud platform. In addition to that, the user can (and in some cases should) provide a further set of constraints options for to limit ensure that the resulting configuration would be feasible for the target scenario. All properties and constraints are provided with the options file (using the `-options` argument) and a full set all possible options can be found in the default options file: `./scripts/resource/options.properties`
+
+### Required Options
+
+For the proper operation certain options should be always provided:
+
+* `REGION` - AWS region which is crucial for the monetary estimations
+* `INFO_TABLE` - a `.csv` table providing the needed hardware and price characteristics for each of the potential server machines (instances) for consideration
+* `REGION_TABLE` - another `.csv` table providing additional pricing parameters (EMR fee, EBS Storage prices)
+* `OUTPUT_FOLDER` - path for the resulting configuration files
+
+Depending on the chosen optimization objective method further options could be required:
+* `MAX_TIME` - in case of choosing optimizing for monetary cost only
+* `MAX_PRICE` - in case of choosing optimizing for execution time only
+
+The enumeration strategy and optimization functions has defaults values and are not required to be further set in the options:
+* Default for enumeration strategy is: grid
+* Default for optimization function is: costs
+
+We provide a table comprising the relevant EC2 instances characteristics supported currently by the resource optimizer
+and table with pricing parameters for all regions supported by EMR. The python script from `update_prcies.py` provides and automation
+for updating the prices of the EC2 instances in the fist mention table be setting a target AWS region.
+
+As mentioned in [Usage](#usage), the filepath s for the input datasets should be the URI addresses for the distributed S3 file. This allows Resource Optimizer to account for the costs of fetching these external files. To allow greater flexibility at using the extension, the user is provided with the possibility of omitting this requirement in certain scenarios: the user provides the program arguments as S3 URI paths but with an imaginary name (and leading S3 URI schema - `s3://`) and then fills the `LOCAL_INPUTS` property. This property holds key-value pairs where the key is the imaginary S3 file paths and the value is the local path for these files. The local path could also point to a non existing file as long as a corresponding metadata (`.mtd`) file is locally available on this path.
+
+### Further Options
+
+As mentioned above, the user can decide to switch between different options for the enumeration strategy (option `ENUMERATION`) and the optimization functions (options `OPTIMIZATION_FUNCTION`).
+
+The enumeration strategy has influence mostly on the speed of completing the process of finding the optimal configuration. The default value is `grid` and this sets a grid-based enumeration where each configuration combination within the configured constraints is being evaluated. The next possible option value is `prune` which prunes dynamically certain configuration during the progress of the enumeration process based on the intermediate results for the already evaluated configurations. In theory this should deliver the same optimal result as the grid-based enumeration and the experiments so far proved that while showed great speed-up. The last possibility is fot that option is `interest`, which uses a interest-based enumeration which uses several (configurable) criterias for statically reducing the search space for the optimal configuration base on program and hardware properties.
+
+Here is a list of all the rest of the options available:
+
+* `CPU_QUOTA` (default 1152) - specifies the limit of (virtual) CPU cores allowed for evaluation. This corresponds to the EC2 service quota for limiting the running instances within the same region at the same moment.
+* `COSTS_WEIGHT` (default 0.01) - specifies the weighing factor for the multi-objective function for optimization
+* `MIN_EXECUTORS` (default 0) - specifies minimum desired executors, where 0 includes single node execution. Allows configuring minimum cluster size.
+* `MAX_EXECUTORS` (default 200) - specifies maximum desired executors. The maximum number of executors can be limited dynamically further more in case of reaching the CPU quota number.
+* `INSTANCE_FAMILIES` - specifies VM instance types for consideration at searching for optimal configuration. If not specified, all instances from the table with instance metadata are considered
+* `INSTANCE_SIZES` - specifies VM instance sizes for consideration at searching for optimal configuration. If not specified, all instances from the table with instance metadata are considered
+* `STEP_SIZE` (default 1) - specific to grid-based enumeration strategy: specifies step size for enumerating number of executors
+* `EXPONENTIAL_BASE` - specific to grid-based enumeration strategy: specifies exponential base for increasing the number of executors exponentially if a value greater than 1 given
+* `USE_LARGEST_ESTIMATE` (default *true*) - specific to the enumeration strategy with interest criterias: boolean ('true'/'false') to indicate if single node execution should be considered only in case of sufficient memory budget for the driver
+* `USE_CP_ESTIMATES` (default *true*) - specific to the enumeration strategy with interest criterias: boolean ('true'/'false') to indicate if the CP memory is an interest for the enumeration
+* `USE_BROADCASTS` (default *true*) - specific to the enumeration strategy with interest criterias: boolean ('true'/'false') to indicate if potential broadcast variables' sizes is an interest for driver and executors memory budget
+* `USE_OUTPUTS` (default *false*) - specific to the enumeration strategy with interest criterias: boolean ('true'/'false') to indicate if the size of the outputs (potentially cached) is an interest for the enumerated number of executors. False by default since the caching process is not considered by the current version of the Resource Optimizer.
+
+## Launching Program in the Cloud
+
+For optimal solution with single node, the target environment in AWS is an EC2 instance and the resulting file from the optimization process is a single file: `ec2_configurations.json`. This files store the all values required for launching an EC2 instance and running SystemDS on it. The file has a custom format is not supported by AWS CLI.
+
+For optimal solution with Spark cluster, the target environment is EMR cluster. The Resource optimizer generates in this case two files:
+* `emr_instance_groups.json` - contains hardware configurations for machines in the cluster
+* `emr_configurations.json` - contains Spark-specific configurations for the cluster
+
+Both files follow a structure supported by AWS CLI so they can be directly passed in a command for launching an EMR cluster in combination with all further required options.
+
+*Note: For proper execution of all of the shell scripts mentioned bellow, please do not delete any of the optional variables in the corresponding `.env` files but just leave them empty, otherwise the scripts could fail due to "unbound variable" error*
+
+### Launching in Single-node Mode Automatically
+
+For saving additional user effort and the need of background knowledge for Amazon EC2, the project includes scripts for automating the processes of launching an instance based on the output configurations and running the target SystemDS script on it.
+
+The shell script `./scripts/resource/launch/single_node_launch.sh` is used for launching the EC2 instance based in a corresponding `ec2_configurations.json`. All configuration and options should be specified in `./scripts/resource/launch/single_node.env` and the script expects no arguments passed. The options file includes all required and optional configurations and explanations for their functionality. The launch process utilizes AWS CLI and executes automatically the following steps:
+1. Query the AWS API for an OS image with Ubuntu 24.04 for the corresponding processor architecture and AWS region.
+2. Generates instance profile for accessing S3 by the instance
+3. Launches the target EC2 with all additional options needed in our case and with providing a bootstrap script for SystemDS installation
+4. Waits for the instance to enter state `RUNNING`
+5. Waits for the completion of the SystemDS installation
+
+*Note: the SSH key generation is not automated and an existing key should be provided in the options file before executing the scripts, otherwise the launch will fail.*
+
+After the script completes without any errors, the instance is already fully prepared for running SystemDS programs. However, before running the script for automated program execution, the user needs to manually uploads the needed files (including the DML script) to S3.
+
+Once all files are uploaded to S3, the user can execute the `./scripts/resource/launch/single_node_run_script.sh` script. All configurations are again provided via the same options file like for the launch. This scripts does the following:
+1. Prepare a command for executing the SystemDS program with all the required arguments and with optimal JVM configurations.
+2. Submits the command to the target machine and additionally sets simple a logging mechanism: the execution writes directly to log files that are compressed after program completion/fail to S3.
+3. Optionally (depending on the option `AUTO_TERMINATION`) the scripts sets that the instance should be stopped after the program completes/fails and the log files are uploaded to S3. In that case the script wait for the machine to enters state `STOPPED` and trigger its termination.
+
+The provided URI addresses for S3 files should always use the `s3a://` prefix to allow for the proper functionality
+of the Hadoop-AWS S3 connector.
+
+*Note 1: if automatic termination is disabled the user should manually check for program completion and terminate the EC2 instance*
+
+*Note 2: if automatic termination is enabled the user should ensure that all the output files are written to S3 because the EC2 instance storage is always configured to be ephemeral.*
+
+### Launching in Cluster (Hybrid) Mode Automatically
+
+The project includes also the equivalent script files to automate the launching process of EMR cluster
+and the submission of steps for executing SystemDS programs.
+
+The shell script `./scripts/resource/launch/cluster_launch.sh` is used ofr launching the cluster
+based on the auto-generated files from the Resource Optimizer. Additional configurations regarding the launching
+process or submitting steps should be defined in `./scripts/resource/launch/cluster.env` and the script does not
+expect any passed arguments. Like for EC2 launch, the script uses AWC CLI and executed the following steps:
+1. Queries the default subnet in the user have not defined one
+2. In case of provided SystemDS script for execution in the configuration file, it prepares the whole step definition
+3. Launched the cluster with all provided configurations. Depending on the set value for `AUTO_TERMINATION_TIME` the cluster can be set
+to be automatically terminated after the completion of the initially provided step (of one provided at all) or terminate automatically
+after staying for a given period of time in idle state.
+4. The script waits until the cluster enter state `RUNNING` and completes.
+
+The script `./scripts/resource/launch/cluster_run_script.sh` can be used for submitting
+steps to running EMR cluster, again by getting all its arguments from the file `./scripts/resource/launch/cluster.env`.
+The script will for the completion of the step by polling for the step's state and
+if `AUTO_TERMINATION_TIME` is set to 0 the cluster will be automatically terminated.
+
+
+The provided URI addresses for S3 files should always use the `s3://` prefix to allow for the proper functionality
+of the EMR-specific S3 connector.
+
+*The same notes as for the launch on programs as EC2 are valid here as well!*
\ No newline at end of file
diff --git a/scripts/resource/aws_regional_prices.csv b/scripts/resource/aws_regional_prices.csv
new file mode 100644
index 00000000000..fa41d69e8c2
--- /dev/null
+++ b/scripts/resource/aws_regional_prices.csv
@@ -0,0 +1,31 @@
+Region,Fee Ratio,EBS Price
+af-south-1,0.195918367,0.1047
+ap-east-1,0.181818182,0.1056
+ap-northeast-1,0.193548387,0.096
+ap-northeast-2,0.203389831,0.0912
+ap-northeast-3,0.193548387,0.096
+ap-south-1,0.237623762,0.0912
+ap-south-2,0.237623762,0.0912
+ap-southeast-1,0.2,0.096
+ap-southeast-2,0.2,0.096
+ap-southeast-3,0.2,0.096
+ap-southeast-4,0.2,0.096
+ap-southeast-5,0.235294118,0.0864
+ca-central-1,0.224299065,0.088
+ca-west-1,0.224299065,0.088
+eu-central-1,0.208695652,0.0952
+eu-central-2,0.18972332,0.1142
+eu-north-1,0.235294118,0.0836
+eu-south-1,0.214285714,0.0924
+eu-south-2,0.224299065,0.088
+eu-west-1,0.224299065,0.088
+eu-west-2,0.216216216,0.0928
+eu-west-3,0.214285714,0.0928
+il-central-1,0.213333333,0.1056
+me-central-1,0.204255319,0.0968
+me-south-1,0.204255319,0.0968
+sa-east-1,0.156862745,0.152
+us-east-1,0.25,0.08
+us-east-2,0.25,0.08
+us-west-1,0.214285714,0.096
+us-west-2,0.25,0.08
\ No newline at end of file
diff --git a/scripts/resource/bin/systemds-ropt b/scripts/resource/bin/systemds-ropt
new file mode 100755
index 00000000000..1318c264722
--- /dev/null
+++ b/scripts/resource/bin/systemds-ropt
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+
+ROPT_JAR_FILE="${SYSTEMDS_ROOT}/target/ResourceOptimizer.jar"
+DEFAULT_PROPERTIES="${SYSTEMDS_ROOT}/scripts/resource/options.properties"
+
+java -jar "$ROPT_JAR_FILE" "$@" -options "$DEFAULT_PROPERTIES"
+
diff --git a/scripts/resource/ec2_stats.csv b/scripts/resource/ec2_stats.csv
new file mode 100644
index 00000000000..796d38ca846
--- /dev/null
+++ b/scripts/resource/ec2_stats.csv
@@ -0,0 +1,373 @@
+API_Name,Price,Memory,vCPUs,Cores,GFLOPS,memBandwidth,NVMe,storageVolumes,sizePerVolume,readStorageBandwidth,writeStorageBandwidth,networkBandwidth
+c5.12xlarge,2.0400000000,96.0,48,24,2304,131230,false,8,96,1000,1000,1500
+c5.18xlarge,3.0600000000,144.0,72,36,3456,196845,false,8,128,1000,1000,2500
+c5.24xlarge,4.0800000000,192.0,96,48,4608,262460,false,8,192,1000,1000,3125
+c5.2xlarge,0.3400000000,16.0,8,4,384,21871.66667,false,4,32,287.5,287.5,312.5
+c5.4xlarge,0.6800000000,32.0,16,8,768,43743.33333,false,4,64,500,500,625
+c5.9xlarge,1.5300000000,72.0,36,18,1728,98422.5,false,8,64,850,850,1250
+c5.xlarge,0.1700000000,8.0,4,2,192,10935.83333,false,2,32,143.72,143.72,156.25
+c5a.12xlarge,1.8480000000,96.0,48,24,2150.4,102400,false,8,96,593.75,593.75,1500
+c5a.16xlarge,2.4640000000,122.0,64,32,2867.2,136533.3333,false,8,128,787.5,787.5,2500
+c5a.24xlarge,3.6960000000,183.0,96,48,4300.8,204800,false,8,192,1000,1000,3125
+c5a.2xlarge,0.3080000000,15.25,8,4,358.4,17066.66667,false,4,32,100,100,312.5
+c5a.4xlarge,0.6160000000,30.5,16,8,716.8,34133.33333,false,4,64,197.5,197.5,625
+c5a.8xlarge,1.2320000000,61.0,32,16,1433.6,68266.66667,false,8,64,396.25,396.25,1250
+c5a.xlarge,0.1540000000,7.5,4,2,179.2,8533.333333,false,2,32,50,50,156.25
+c5ad.12xlarge,2.0640000000,91.5,48,24,2150.4,102400,true,2,900,1689.6,737.28,1500
+c5ad.16xlarge,2.7520000000,122.0,64,32,2867.2,136533.3333,true,2,1200,2134.228992,931.299328,2500
+c5ad.24xlarge,4.1280000000,183.0,96,48,4300.8,204800,true,2,1900,3379.2,1474.56,3125
+c5ad.2xlarge,0.3440000000,15.25,8,4,358.4,17066.66667,true,1,300,266.780672,116.412416,312.5
+c5ad.4xlarge,0.6880000000,30.5,16,8,716.8,34133.33333,true,2,300,533.553152,232.824832,625
+c5ad.8xlarge,1.3760000000,61.0,32,16,1433.6,68266.66667,true,2,600,1067.114496,465.649664,1250
+c5ad.xlarge,0.1720000000,7.5,4,2,179.2,8533.333333,true,1,150,133.390336,58.208256,156.25
+c5d.12xlarge,2.3040000000,96.0,48,24,2304,131230,true,2,900,2867.2,1392.64,1500
+c5d.18xlarge,3.4560000000,144.0,72,36,3456,196845,true,2,900,2867.2,1392.64,2500
+c5d.24xlarge,4.6080000000,192.0,96,48,4608,262460,true,4,900,5734.4,2785.28,3125
+c5d.2xlarge,0.3840000000,16.0,8,4,384,21871.66667,true,1,200,327.68,151.552,312.5
+c5d.4xlarge,0.7680000000,32.0,16,8,768,43743.33333,true,1,400,716.8,307.2,625
+c5d.9xlarge,1.7280000000,72.0,36,18,1728,98422.5,true,1,900,1433.6,696.32,1250
+c5d.xlarge,0.1920000000,8.0,4,2,192,10935.83333,true,1,100,163.84,73.728,156.25
+c5n.18xlarge,3.8880000000,192.0,72,36,3456,238420,false,8,128,1000,1000,12500
+c5n.2xlarge,0.4320000000,21.0,8,4,384,26491.11111,false,4,32,287.5,287.5,1250
+c5n.4xlarge,0.8640000000,42.0,16,8,768,52982.22222,false,4,64,500,500,1875
+c5n.9xlarge,1.9440000000,96.0,36,18,1728,119210,false,8,64,850,850,6250
+c5n.xlarge,0.2160000000,10.5,4,2,192,13245.55556,false,2,32,143.72,143.72,625
+c6a.12xlarge,1.8360000000,91.5,48,24,2035.2,102400,false,8,96,1000,1000,2312.5
+c6a.16xlarge,2.4480000000,122.0,64,32,2713.6,136533.3333,false,8,128,1000,1000,3125
+c6a.24xlarge,3.6720000000,183.0,96,48,4070.4,204800,false,8,192,1000,1000,4687.5
+c6a.2xlarge,0.3060000000,15.25,8,4,339.2,17066.66667,false,4,32,312.5,312.5,390.625
+c6a.32xlarge,4.8960000000,244.0,128,64,5427.2,273066.6667,false,8,256,1000,1000,6250
+c6a.48xlarge,7.3440000000,366.0,192,96,8140.8,409600,false,12,256,1500,1500,6250
+c6a.4xlarge,0.6120000000,30.5,16,8,678.4,34133.33333,false,4,64,500,500,781.25
+c6a.8xlarge,1.2240000000,61.0,32,16,1356.8,68266.66667,false,8,64,1000,1000,1562.5
+c6a.xlarge,0.1530000000,7.5,4,2,169.6,8533.333333,false,2,32,156.25,156.25,195.25
+c6g.12xlarge,1.6320000000,91.5,48,48,1920,153600,false,8,96,1000,1000,1500
+c6g.16xlarge,2.1760000000,122.0,64,64,2560,204800,false,8,128,1000,1000,2500
+c6g.2xlarge,0.2720000000,15.25,8,8,320,25600,false,4,32,296.88,296.88,312.5
+c6g.4xlarge,0.5440000000,30.5,16,16,640,51200,false,4,64,500,500,625
+c6g.8xlarge,1.0880000000,61.0,32,8,1280,102400,false,8,64,1000,1000,1250
+c6g.xlarge,0.1360000000,7.5,4,4,160,12800,false,2,32,148.5,148.5,156.25
+c6gd.12xlarge,1.8432000000,91.5,48,48,1920,153600,true,2,1425,2641.92,1105.92,1500
+c6gd.16xlarge,2.4576000000,122.0,64,64,2560,204800,true,2,1900,3522.56,1474.56,2500
+c6gd.2xlarge,0.3072000000,15.25,8,8,320,25600,true,1,474,440.32,184.32,312.5
+c6gd.4xlarge,0.6144000000,30.5,16,16,640,51200,true,1,950,880.64,368.64,625
+c6gd.8xlarge,1.2288000000,61.0,32,8,1280,102400,true,1,1900,1761.28,737.28,1250
+c6gd.xlarge,0.1536000000,7.5,4,4,160,12800,true,1,237,220.16,92.16,156.25
+c6gn.12xlarge,2.0736000000,91.5,48,48,1920,153600,false,8,96,1000,1000,9375
+c6gn.16xlarge,2.7648000000,122.0,64,64,2560,204800,false,8,128,1000,1000,12500
+c6gn.2xlarge,0.3456000000,15.25,8,8,320,25600,false,4,32,500,500,1562.5
+c6gn.4xlarge,0.6912000000,30.5,16,16,640,51200,false,4,64,500,500,3125
+c6gn.8xlarge,1.3824000000,61.0,32,32,1280,102400,false,8,64,1000,1000,6250
+c6gn.xlarge,0.1728000000,7.5,4,4,160,12800,false,2,32,250,250,787.5
+c6i.12xlarge,2.0400000000,91.5,48,24,2227.2,153600,false,8,96,1000,1000,2312.5
+c6i.16xlarge,2.7200000000,122.0,64,32,2969.6,204800,false,8,128,1000,1000,3125
+c6i.24xlarge,4.0800000000,183.0,96,48,4454.4,307200,false,8,192,1000,1000,4687.5
+c6i.2xlarge,0.3400000000,15.25,8,4,371.2,25600,false,4,32,312.5,312.5,390.625
+c6i.32xlarge,5.4400000000,244.0,128,64,5939.2,409600,false,8,256,1000,1000,6250
+c6i.4xlarge,0.6800000000,30.5,16,8,742.4,51200,false,4,64,500,500,781.25
+c6i.8xlarge,1.3600000000,61.0,32,16,1484.8,102400,false,8,64,1000,1000,1562.5
+c6i.xlarge,0.1700000000,7.5,4,2,185.6,12800,false,2,32,156.25,156.25,195.25
+c6id.12xlarge,2.4192000000,91.5,48,24,2227.2,153600,true,2,1425,2312.5,3297.271808,2312.5
+c6id.16xlarge,3.2256000000,122.0,64,32,2969.6,204800,true,2,1900,3125,4396.367872,3125
+c6id.24xlarge,4.8384000000,183.0,96,48,4454.4,307200,true,4,1425,4687.5,6594.543616,4687.5
+c6id.2xlarge,0.4032000000,15.25,8,4,371.2,25600,true,1,474,390.625,549.548032,390.625
+c6id.32xlarge,6.4512000000,244.0,128,64,5939.2,409600,true,4,1900,6250,8792.735744,6250
+c6id.4xlarge,0.8064000000,30.5,16,8,742.4,51200,true,1,950,781.25,1099.091968,781.25
+c6id.8xlarge,1.6128000000,61.0,32,16,1484.8,102400,true,1,1900,1562.5,2198.183936,1562.5
+c6id.xlarge,0.2016000000,7.5,4,2,185.6,12800,true,1,237,195.25,274.771968,195.25
+c6in.12xlarge,2.7216000000,91.5,48,24,2227.2,153600,false,8,96,1000,1000,9375
+c6in.16xlarge,3.6288000000,122.0,64,32,2969.6,204800,false,8,128,1000,1000,12500
+c6in.24xlarge,5.4432000000,183.0,96,48,4454.4,307200,false,8,192,1000,1000,18750
+c6in.2xlarge,0.4536000000,15.25,8,4,371.2,25600,false,4,32,312.5,312.5,1562.5
+c6in.32xlarge,7.2576000000,244.0,128,64,5939.2,409600,false,8,256,1000,1000,25000
+c6in.4xlarge,0.9072000000,30.5,16,8,742.4,51200,false,4,64,500,500,3125
+c6in.8xlarge,1.8144000000,61.0,32,16,1484.8,102400,false,8,64,1000,1000,6250
+c6in.xlarge,0.2268000000,7.5,4,2,185.6,12800,false,2,32,156.25,156.25,781.25
+c7a.12xlarge,2.4633600000,91.5,48,24,1996.8,115200,false,8,96,1000,1000,2312.5
+c7a.16xlarge,3.2844800000,122.0,64,32,2662.4,153600,false,8,128,1000,1000,3125
+c7a.24xlarge,4.9267200000,183.0,96,48,3993.6,230400,false,8,192,1000,1000,4687.5
+c7a.2xlarge,0.4105600000,15.25,8,4,332.8,19200,false,4,32,312.5,312.5,390.625
+c7a.32xlarge,6.5689600000,244.0,128,64,5324.8,307200,false,8,256,1000,1000,6250
+c7a.48xlarge,9.8534400000,366.0,192,96,7987.2,460800,false,12,256,1500,1500,6250
+c7a.4xlarge,0.8211200000,30.5,16,8,665.6,38400,false,4,64,500,500,781.25
+c7a.8xlarge,1.6422400000,61.0,32,16,1331.2,76800,false,8,64,1000,1000,1562.5
+c7a.xlarge,0.2052800000,7.5,4,2,166.4,9600,false,2,32,156.25,156.25,195.25
+c7g.12xlarge,1.7400000000,91.5,48,48,1996.8,230400,false,8,96,1000,1000,2812.5
+c7g.16xlarge,2.3200000000,122.0,64,64,2662.4,307200,false,8,128,1000,1000,3750
+c7g.2xlarge,0.2900000000,15.25,8,8,332.8,38400,false,4,32,312.5,312.5,468.75
+c7g.4xlarge,0.5800000000,30.5,16,16,665.6,76800,false,4,64,500,500,937.5
+c7g.8xlarge,1.1600000000,61.0,32,32,1331.2,153600,false,8,64,1000,1000,1875
+c7g.xlarge,0.1450000000,7.5,4,4,166.4,19200,false,2,32,156.25,156.25,234.5
+c7gd.12xlarge,2.1773000000,91.5,48,48,1996.8,230400,true,2,1425,3297.271808,1648.64,2812.5
+c7gd.16xlarge,2.9030000000,122.0,64,64,2662.4,307200,true,2,1900,4396.367872,2198.192128,3750
+c7gd.2xlarge,0.3629000000,15.25,8,8,332.8,38400,true,1,474,549.548032,274.776064,468.75
+c7gd.4xlarge,0.7258000000,30.5,16,16,665.6,76800,true,1,950,1099.091968,549.548032,937.5
+c7gd.8xlarge,1.4515000000,61.0,32,32,1331.2,153600,true,1,1900,2198.183936,1099.096064,1875
+c7gd.xlarge,0.1814000000,7.5,4,4,166.4,19200,true,1,237,274.771968,137.388032,234.5
+c7gn.12xlarge,2.9952000000,91.5,48,48,1996.8,230400,false,8,96,1000,1000,18750
+c7gn.16xlarge,3.9936000000,122.0,64,64,2662.4,307200,false,8,128,1000,1000,25000
+c7gn.2xlarge,0.4992000000,15.25,8,8,332.8,38400,false,4,32,500,500,3125
+c7gn.4xlarge,0.9984000000,30.5,16,16,665.6,76800,false,4,64,500,500,6250
+c7gn.8xlarge,1.9968000000,61.0,32,32,1331.2,153600,false,8,64,1000,1000,12500
+c7gn.xlarge,0.2496000000,7.5,4,4,166.4,19200,false,2,32,250,250,1562.5
+c7i.12xlarge,2.1420000000,91.5,48,24,1843.2,153600,false,8,96,1000,1000,2312.5
+c7i.16xlarge,2.8560000000,122.0,64,32,2457.6,204800,false,8,128,1000,1000,3125
+c7i.24xlarge,4.2840000000,183.0,96,48,3686.4,307200,false,8,192,1000,1000,4687.5
+c7i.2xlarge,0.3570000000,15.25,8,4,307.2,25600,false,4,32,312.5,312.5,390.625
+c7i.48xlarge,8.5680000000,366.0,192,96,7372.8,614400,false,12,256,1500,1500,6250
+c7i.4xlarge,0.7140000000,30.5,16,8,614.4,51200,false,4,64,500,500,781.25
+c7i.8xlarge,1.4280000000,61.0,32,16,1228.8,102400,false,8,64,1000,1000,1562.5
+c7i.xlarge,0.1785000000,7.5,4,2,153.6,12800,false,2,32,156.25,156.25,195.25
+m5.12xlarge,2.3040000000,192.0,48,24,1920,119210,false,8,96,1000,1000,1500
+m5.16xlarge,3.0720000000,256.0,64,32,2560,158946.6667,false,8,128,1000,1000,2500
+m5.24xlarge,4.6080000000,384.0,96,48,3840,238420,false,8,192,1000,1000,3125
+m5.2xlarge,0.3840000000,32.0,8,4,320,19868.33333,false,4,32,287.5,287.5,312.5
+m5.4xlarge,0.7680000000,64.0,16,8,640,39736.66667,false,4,64,500,500,625
+m5.8xlarge,1.5360000000,128.0,32,16,1280,79473.33333,false,8,64,850,850,1250
+m5.xlarge,0.1920000000,16.0,4,2,160,9934.166667,false,2,32,143.72,143.72,156.25
+m5a.12xlarge,2.0640000000,192.0,48,24,1689.6,119210,false,8,96,1000,1000,1500
+m5a.16xlarge,2.7520000000,256.0,64,32,2252.8,158946.6667,false,8,128,1000,1000,2500
+m5a.24xlarge,4.1280000000,384.0,96,48,3379.2,238420,false,8,192,1000,1000,3125
+m5a.2xlarge,0.3440000000,32.0,8,4,281.6,19868.33333,false,4,32,287.5,287.5,312.5
+m5a.4xlarge,0.6880000000,64.0,16,8,563.2,39736.66667,false,4,64,500,500,625
+m5a.8xlarge,1.3760000000,128.0,32,16,1126.4,79473.33333,false,8,64,850,850,1250
+m5a.xlarge,0.1720000000,16.0,4,2,140.8,9934.166667,false,2,32,143.72,143.72,156.25
+m5ad.12xlarge,2.4720000000,183.0,48,24,1689.6,119210,true,2,900,2734.375,1328.125,1500
+m5ad.16xlarge,3.2960000000,244.0,64,32,2252.8,158946.6667,true,4,600,3645.828125,1822.921875,2500
+m5ad.24xlarge,4.9440000000,366.0,96,48,3379.2,238420,true,4,900,5468.75,2656.25,3125
+m5ad.2xlarge,0.4120000000,30.5,8,4,281.6,19868.33333,true,1,300,457.03125,222.65625,312.5
+m5ad.4xlarge,0.8240000000,61.0,16,8,563.2,39736.66667,true,2,300,914.0625,445.3125,625
+m5ad.8xlarge,1.6480000000,122.0,32,16,1126.4,79473.33333,true,2,600,1822.914063,911.4609375,1250
+m5ad.xlarge,0.2060000000,15.25,4,2,140.8,9934.166667,true,1,150,230.46875,113.28125,156.25
+m5d.12xlarge,2.7120000000,192.0,48,24,1920,119210,true,2,900,2734.375,1328.125,1500
+m5d.16xlarge,3.6160000000,256.0,64,32,2560,158946.6667,true,4,600,3645.828125,1822.921875,2500
+m5d.24xlarge,5.4240000000,384.0,96,48,3840,238420,true,4,900,5468.75,2656.25,3125
+m5d.2xlarge,0.4520000000,32.0,8,4,320,19868.33333,true,1,300,457.03125,222.65625,312.5
+m5d.4xlarge,0.9040000000,64.0,16,8,640,39736.66667,true,2,300,914.0625,445.3125,625
+m5d.8xlarge,1.8080000000,128.0,32,16,1280,79473.33333,true,2,600,1822.914063,911.4609375,1250
+m5d.xlarge,0.2260000000,16.0,4,2,160,9934.166667,true,1,150,230.46875,113.28125,156.25
+m5dn.12xlarge,3.2640000000,192,48,24,1920,119210,true,2,900,2734.375,1328.125,6250
+m5dn.16xlarge,4.3520000000,256,64,32,2560,158946.6667,true,4,600,3645.828125,1822.921875,9375
+m5dn.24xlarge,6.5280000000,384,96,48,3840,238420,true,4,900,5468.75,2656.25,12500
+m5dn.2xlarge,0.5440000000,32,8,4,320,19868.33333,true,1,300,457.03125,222.65625,1015.625
+m5dn.4xlarge,1.0880000000,64,16,8,640,39736.66667,true,2,300,914.0625,445.3125,2031.25
+m5dn.8xlarge,2.1760000000,128,32,16,1280,79473.33333,true,2,600,1822.914063,911.4609375,3125
+m5dn.xlarge,0.2720000000,16,4,2,160,9934.166667,true,1,150,230.46875,113.28125,512.5
+m5n.12xlarge,2.8560000000,192,48,24,1920,119210,false,8,96,1000,1000,6250
+m5n.16xlarge,3.8080000000,256,64,32,2560,158946.6667,false,8,128,1000,1000,9375
+m5n.24xlarge,5.7120000000,384,96,48,3840,238420,false,8,192,1000,1000,12500
+m5n.2xlarge,0.4760000000,32,8,4,320,19868.33333,false,4,32,287.5,287.5,1015.625
+m5n.4xlarge,0.9520000000,64,16,8,640,39736.66667,false,4,64,500,500,2031.25
+m5n.8xlarge,1.9040000000,128,32,16,1280,79473.33333,false,8,64,850,850,3125
+m5n.xlarge,0.2380000000,16,4,2,160,9934.166667,false,2,32,143.72,143.72,512.5
+m5zn.12xlarge,3.9641000000,183.0,48,24,2918.4,262460,false,8,96,1000,1000,12500
+m5zn.2xlarge,0.6607000000,30.5,8,4,486.4,43743.33333,false,4,32,396.25,396.25,1250
+m5zn.3xlarge,0.9910000000,45.75,12,6,972.8,87486.66667,false,4,64,500,500,1875
+m5zn.6xlarge,1.9820000000,91.5,24,12,1459.2,131230,false,8,64,1000,1000,6250
+m5zn.xlarge,0.3303000000,15.0,4,2,243.2,21871.66667,false,2,32,195.5,195.5,625
+m6a.12xlarge,2.0736000000,183.0,48,24,2035.2,102400,false,8,96,1000,1000,2312.5
+m6a.16xlarge,2.7648000000,244.0,64,32,2713.6,136533.3333,false,8,128,1000,1000,3125
+m6a.24xlarge,4.1472000000,366.0,96,48,4070.4,204800,false,8,192,1000,1000,4687.5
+m6a.2xlarge,0.3456000000,30.5,8,4,339.2,17066.66667,false,4,32,500,312.5,390.625
+m6a.32xlarge,5.5296000000,488.0,128,64,5427.2,273066.6667,false,8,256,1000,1000,6250
+m6a.48xlarge,8.2944000000,732.0,192,96,8140.8,409600,false,12,256,1500,1500,6250
+m6a.4xlarge,0.6912000000,61.0,16,8,678.4,34133.33333,false,4,64,500,500,781.25
+m6a.8xlarge,1.3824000000,122.0,32,4,1356.8,68266.66667,false,8,64,1000,1000,1562.5
+m6a.xlarge,0.1728000000,15.25,4,2,169.6,8533.333333,false,2,32,250,156.25,195.25
+m6g.12xlarge,1.8480000000,185.0,48,48,1920,153600,false,8,96,1000,1000,1500
+m6g.16xlarge,2.4640000000,244.0,64,64,2560,204800,false,8,128,1000,1000,2500
+m6g.2xlarge,0.3080000000,30.5,8,8,320,25600,false,4,32,296.88,296.88,312.5
+m6g.4xlarge,0.6160000000,61.0,16,16,640,51200,false,4,64,500,500,625
+m6g.8xlarge,1.2320000000,122.0,32,32,1280,102400,false,8,64,1000,1000,1250
+m6g.xlarge,0.1540000000,15.25,4,4,160,12800,false,2,32,148.5,148.5,156.25
+m6gd.12xlarge,2.1696000000,185.0,48,48,1920,153600,true,2,1425,2641.92,1105.92,1500
+m6gd.16xlarge,2.8928000000,244.0,64,64,2560,204800,true,2,1900,3522.56,1474.56,2500
+m6gd.2xlarge,0.3616000000,30.5,8,8,320,25600,true,1,474,440.32,184.32,312.5
+m6gd.4xlarge,0.7232000000,61.0,16,16,640,51200,true,1,950,880.64,368.64,625
+m6gd.8xlarge,1.4464000000,122.0,32,32,1280,102400,true,1,1900,1761.28,737.28,1250
+m6gd.xlarge,0.1808000000,15.25,4,4,160,12800,true,1,237,220.16,92.16,156.25
+m6i.12xlarge,2.3040000000,185.0,48,24,2227.2,153600,false,8,96,1000,1000,2312.5
+m6i.16xlarge,3.0720000000,244.0,64,32,2969.6,204800,false,8,128,1000,1000,3125
+m6i.24xlarge,4.6080000000,366.0,96,48,4454.4,307200,false,8,192,1000,1000,4687.5
+m6i.2xlarge,0.3840000000,30.5,8,4,371.2,25600,false,4,32,312.5,312.5,390.625
+m6i.32xlarge,6.1440000000,488.0,128,64,5939.2,409600,false,8,256,1000,1000,6250
+m6i.4xlarge,0.7680000000,61.0,16,8,742.4,51200,false,4,64,500,500,781.25
+m6i.8xlarge,1.5360000000,122.0,32,16,1484.8,102400,false,8,64,1000,1000,1562.5
+m6i.xlarge,0.1920000000,15.25,4,2,185.6,12800,false,2,32,156.25,156.25,195.25
+m6id.12xlarge,2.8476000000,183.0,48,24,2227.2,153600,true,2,1425,3297.271808,1648.64,2312.5
+m6id.16xlarge,3.7968000000,244.0,64,32,2969.6,204800,true,2,1900,4396.367872,2198.192128,3125
+m6id.24xlarge,5.6952000000,366.0,96,48,4454.4,307200,true,4,1425,6594.543616,3297.28,4687.5
+m6id.2xlarge,0.4746000000,30.5,8,4,371.2,25600,true,1,474,549.548032,274.776064,390.625
+m6id.32xlarge,7.5936000000,488.0,128,64,5939.2,409600,true,4,1900,8792.735744,4396.384256,6250
+m6id.4xlarge,0.9492000000,61.0,16,8,742.4,51200,true,1,950,1099.091968,549.548032,781.25
+m6id.8xlarge,1.8984000000,122.0,32,16,1484.8,102400,true,1,1900,2198.183936,1099.096064,1562.5
+m6id.xlarge,0.2373000000,15.25,4,2,185.6,12800,true,1,237,274.771968,137.388032,195.25
+m6idn.12xlarge,3.8188800000,183.0,48,24,2227.2,153600,true,2,1425,3297.271808,1648.64,9375
+m6idn.16xlarge,5.0918400000,244.0,64,32,2969.6,204800,true,2,1900,4396.367872,2198.192128,12500
+m6idn.24xlarge,7.6377600000,366.0,96,48,4454.4,307200,true,4,1425,6594.543616,3297.28,18750
+m6idn.2xlarge,0.6364800000,30.5,8,4,371.2,25600,true,1,474,549.548032,274.776064,1562.5
+m6idn.32xlarge,10.1836800000,488.0,128,64,5939.2,409600,true,4,1900,8792.735744,4396.384256,25000
+m6idn.4xlarge,1.2729600000,61.0,16,8,742.4,51200,true,1,950,1099.091968,549.548032,3125
+m6idn.8xlarge,2.5459200000,122.0,32,16,1484.8,102400,true,1,1900,2198.183936,1099.096064,6250
+m6idn.xlarge,0.3182400000,15.25,4,2,185.6,12800,true,1,237,274.771968,137.388032,781.25
+m6in.12xlarge,3.3415200000,185.0,48,24,2227.2,153600,false,8,96,1000,1000,9375
+m6in.16xlarge,4.4553600000,244.0,64,32,2969.6,204800,false,8,128,1000,1000,12500
+m6in.24xlarge,6.6830400000,366.0,96,48,4454.4,307200,false,8,192,1000,1000,18750
+m6in.2xlarge,0.5569200000,30.5,8,4,371.2,25600,false,4,32,312.5,312.5,1562.5
+m6in.32xlarge,8.9107200000,488.0,128,64,5939.2,409600,false,8,256,1000,1000,25000
+m6in.4xlarge,1.1138400000,61.0,16,8,742.4,51200,false,4,64,500,500,3125
+m6in.8xlarge,2.2276800000,122.0,32,16,1484.8,102400,false,8,64,1000,1000,6250
+m6in.xlarge,0.2784600000,15.25,4,2,185.6,12800,false,2,32,156.25,156.25,781.25
+m7a.12xlarge,2.7820800000,183.0,48,24,1996.8,115200,false,8,96,1000,1000,2312.5
+m7a.16xlarge,3.7094400000,244.0,64,32,2662.4,153600,false,8,128,1000,1000,3125
+m7a.24xlarge,5.5641600000,366.0,96,48,3993.6,230400,false,8,192,1000,1000,4687.5
+m7a.2xlarge,0.4636800000,30.5,8,4,332.8,19200,false,4,32,312.5,312.5,390.625
+m7a.32xlarge,7.4188800000,488.0,128,64,5324.8,307200,false,8,256,1000,1000,6250
+m7a.48xlarge,11.1283200000,732.0,192,96,7987.2,460800,false,12,256,1500,1500,6250
+m7a.4xlarge,0.9273600000,61.0,16,8,665.6,38400,false,4,64,500,500,781.25
+m7a.8xlarge,1.8547200000,122.0,32,4,1331.2,76800,false,8,64,1000,1000,1562.5
+m7a.xlarge,0.2318400000,15.25,4,2,166.4,9600,false,2,32,156.25,156.25,195.25
+m7g.12xlarge,1.9584000000,183.0,48,48,1996.8,230400,false,8,96,1000,1000,2812.5
+m7g.16xlarge,2.6112000000,244.0,64,64,2662.4,307200,false,8,128,1000,1000,3750
+m7g.2xlarge,0.3264000000,30.5,8,8,332.8,38400,false,4,32,312.5,312.5,468.75
+m7g.4xlarge,0.6528000000,61.0,16,16,665.6,76800,false,4,64,500,500,937.5
+m7g.8xlarge,1.3056000000,122.0,32,32,1331.2,153600,false,8,64,1000,1000,1875
+m7g.xlarge,0.1632000000,15.25,4,4,166.4,19200,false,2,32,156.25,156.25,234.5
+m7gd.12xlarge,2.5628000000,183.0,48,48,1996.8,230400,true,2,1425,3297.271808,1648.64,2812.5
+m7gd.16xlarge,3.4171000000,244.0,64,64,2662.4,307200,true,2,1900,4396.367872,2198.192128,3750
+m7gd.2xlarge,0.4271000000,30.5,8,8,332.8,38400,true,1,474,549.548032,274.776064,468.75
+m7gd.4xlarge,0.8543000000,61.0,16,16,665.6,76800,true,1,950,1099.091968,549.548032,937.5
+m7gd.8xlarge,1.7086000000,122.0,32,32,1331.2,153600,true,1,1900,2198.183936,1099.096064,1875
+m7gd.xlarge,0.2136000000,15.25,4,4,166.4,19200,true,1,237,274.771968,137.388032,234.5
+m7i.12xlarge,2.4192000000,185.0,48,24,1996.8,76800,false,8,96,1000,1000,2312.5
+m7i.16xlarge,3.2256000000,244.0,64,32,2662.4,102400,false,8,128,1000,1000,3125
+m7i.24xlarge,4.8384000000,366.0,96,48,3993.6,153600,false,8,192,1000,1000,4687.5
+m7i.2xlarge,0.4032000000,30.5,8,4,332.8,12800,false,4,32,312.5,312.5,390.625
+m7i.48xlarge,9.6768000000,732.0,192,96,7987.2,307200,false,12,256,1500,1500,6250
+m7i.4xlarge,0.8064000000,61.0,16,8,665.6,25600,false,4,64,500,500,781.25
+m7i.8xlarge,1.6128000000,122.0,32,16,1331.2,51200,false,8,64,1000,1000,1562.5
+m7i.xlarge,0.2016000000,15.25,4,2,166.4,6400,false,2,32,156.25,156.25,195.25
+r5.12xlarge,3.0240000000,384.0,48,24,1920,119210,false,8,96,1000,1000,1500
+r5.16xlarge,4.0320000000,512.0,64,32,2560,158946.6667,false,8,128,1000,1000,2500
+r5.24xlarge,6.0480000000,768.0,96,48,3840,238420,false,8,192,1000,1000,3125
+r5.2xlarge,0.5040000000,64.0,8,4,320,19868.33333,false,4,32,287.5,287.5,312.5
+r5.4xlarge,1.0080000000,128.0,16,8,640,39736.66667,false,4,64,500,500,625
+r5.8xlarge,2.0160000000,256.0,32,16,1280,79473.33333,false,8,64,850,850,1250
+r5.xlarge,0.2520000000,32.0,4,2,160,9934.166667,false,2,32,143.72,143.72,156.25
+r5a.12xlarge,2.7120000000,384.0,48,24,1689.6,119210,false,8,96,1000,1000,1500
+r5a.16xlarge,3.6160000000,512.0,64,32,2252.8,158946.6667,false,8,128,1000,1000,2500
+r5a.24xlarge,5.4240000000,768.0,96,48,3379.2,238420,false,8,192,1000,1000,3125
+r5a.2xlarge,0.4520000000,64.0,8,4,281.6,19868.33333,false,4,32,287.5,287.5,312.5
+r5a.4xlarge,0.9040000000,128.0,16,8,563.2,39736.66667,false,4,64,500,500,625
+r5a.8xlarge,1.8080000000,256.0,32,16,1126.4,79473.33333,false,8,64,850,850,1250
+r5a.xlarge,0.2260000000,32.0,4,2,140.8,9934.166667,false,2,32,143.72,143.72,156.25
+r5ad.12xlarge,3.1440000000,366.0,48,24,1689.6,119210,true,2,900,2734.375,1328.125,1500
+r5ad.16xlarge,4.1920000000,496.0,64,32,2252.8,158946.6667,true,4,600,3645.828125,1822.921875,2500
+r5ad.24xlarge,6.2880000000,732.0,96,48,3379.2,238420,true,4,900,5468.75,2656.25,3125
+r5ad.2xlarge,0.5240000000,61.0,8,4,281.6,19868.33333,true,1,300,457.03125,222.65625,312.5
+r5ad.4xlarge,1.0480000000,122.0,16,8,563.2,39736.66667,true,2,300,914.0625,445.3125,625
+r5ad.8xlarge,2.0960000000,244.0,32,16,1126.4,79473.33333,true,2,600,1822.914063,911.4609375,1250
+r5ad.xlarge,0.2620000000,30.5,4,2,140.8,9934.166667,true,1,150,230.46875,113.28125,156.25
+r5d.12xlarge,3.4560000000,384.0,48,24,1920,119210,true,2,900,2734.375,1328.125,1500
+r5d.16xlarge,4.6080000000,512.0,64,32,2560,158946.6667,true,4,600,3645.828125,1822.921875,2500
+r5d.24xlarge,6.9120000000,768.0,96,48,3840,238420,true,4,900,5468.75,2656.25,3125
+r5d.2xlarge,0.5760000000,64.0,8,4,320,19868.33333,true,1,300,457.03125,222.65625,312.5
+r5d.4xlarge,1.1520000000,128.0,16,8,640,39736.66667,true,2,300,914.0625,445.3125,625
+r5d.8xlarge,2.3040000000,256.0,32,16,1280,79473.33333,true,2,600,1822.914063,911.4609375,1250
+r5d.xlarge,0.2880000000,32.0,4,2,160,9934.166667,true,1,150,230.46875,113.28125,156.25
+r5dn.12xlarge,4.0080000000,384,48,24,1920,119210,true,2,900,2734.375,1328.125,6250
+r5dn.16xlarge,5.3440000000,512,64,32,2560,158946.6667,true,4,600,3645.828125,1822.921875,9375
+r5dn.24xlarge,8.0160000000,768,96,48,3840,238420,true,4,900,5468.75,2656.25,12500
+r5dn.2xlarge,0.6680000000,64,8,4,320,19868.33333,true,1,300,457.03125,222.65625,1015.625
+r5dn.4xlarge,1.3360000000,128,16,8,640,39736.66667,true,2,300,914.0625,445.3125,2031.25
+r5dn.8xlarge,2.6720000000,256,32,16,1280,79473.33333,true,2,600,1822.914063,911.4609375,3125
+r5dn.xlarge,0.3340000000,32,4,2,160,9934.166667,true,1,150,230.46875,113.28125,512.5
+r5n.12xlarge,3.5760000000,384,48,24,1920,119210,false,8,96,1000,1000,6250
+r5n.16xlarge,4.7680000000,512,64,32,2560,158946.6667,false,8,128,1000,1000,9375
+r5n.24xlarge,7.1520000000,768,96,48,3840,238420,false,8,192,1000,1000,12500
+r5n.2xlarge,0.5960000000,64,8,4,320,19868.33333,false,4,32,287.5,287.5,1015.625
+r5n.4xlarge,1.1920000000,128,16,8,640,39736.66667,false,4,64,500,500,2031.25
+r5n.8xlarge,2.3840000000,256,32,16,1280,79473.33333,false,8,64,850,850,3125
+r5n.xlarge,0.2980000000,32,4,2,160,9934.166667,false,2,32,143.72,143.72,512.5
+r6a.12xlarge,2.7216000000,366.0,48,24,2035.2,102400,false,8,96,1000,1000,2312.5
+r6a.16xlarge,3.6288000000,488.0,64,32,2713.6,136533.3333,false,8,128,1000,1000,3125
+r6a.24xlarge,5.4432000000,732.0,96,48,4070.4,204800,false,8,192,1000,1000,4687.5
+r6a.2xlarge,0.4536000000,61.0,8,4,339.2,17066.66667,false,4,32,500,312.5,390.625
+r6a.32xlarge,7.2576000000,976.0,128,64,5427.2,273066.6667,false,8,256,1000,1000,6250
+r6a.48xlarge,10.8864000000,1464.0,192,96,8140.8,409600,false,12,256,1500,1500,6250
+r6a.4xlarge,0.9072000000,122.0,16,8,678.4,34133.33333,false,4,64,500,500,781.25
+r6a.8xlarge,1.8144000000,244.0,32,16,1356.8,68266.66667,false,8,64,1000,1000,1562.5
+r6a.xlarge,0.2268000000,30.5,4,2,169.6,8533.333333,false,2,32,250,156.25,195.25
+r6g.12xlarge,2.4192000000,366.0,48,48,1920,153600,false,8,96,1000,1000,1500
+r6g.16xlarge,3.2256000000,488.0,64,64,2560,204800,false,8,128,1000,1000,2500
+r6g.2xlarge,0.4032000000,61.0,8,8,320,25600,false,4,32,296.88,296.88,312.5
+r6g.4xlarge,0.8064000000,122.0,16,16,640,51200,false,4,64,500,500,625
+r6g.8xlarge,1.6128000000,244.0,32,32,1280,102400,false,8,64,1000,1000,1250
+r6g.xlarge,0.2016000000,30.5,4,4,160,12800,false,2,32,148.5,148.5,156.25
+r6gd.12xlarge,2.7648000000,366.0,48,48,1920,153600,true,2,1425,2641.92,1105.92,1500
+r6gd.16xlarge,3.6864000000,488.0,64,64,2560,204800,true,2,1900,3522.56,1474.56,2500
+r6gd.2xlarge,0.4608000000,61.0,8,8,320,25600,true,1,474,440.32,184.32,312.5
+r6gd.4xlarge,0.9216000000,122.0,16,16,640,51200,true,1,950,880.64,368.64,625
+r6gd.8xlarge,1.8432000000,244.0,32,32,1280,102400,true,1,1900,1761.28,737.28,1250
+r6gd.xlarge,0.2304000000,30.5,4,4,160,12800,true,1,237,220.16,92.16,156.25
+r6i.12xlarge,3.0240000000,366.0,48,24,2227.2,153600,false,8,96,1000,1000,2312.5
+r6i.16xlarge,4.0320000000,488.0,64,32,2969.6,204800,false,8,128,1000,1000,3125
+r6i.24xlarge,6.0480000000,732.0,96,48,4454.4,307200,false,8,192,1000,1000,4687.5
+r6i.2xlarge,0.5040000000,61.0,8,4,371.2,25600,false,4,32,312.5,312.5,390.625
+r6i.32xlarge,8.0640000000,950.0,128,64,5939.2,409600,false,8,256,1000,1000,6250
+r6i.4xlarge,1.0080000000,122.0,16,8,742.4,51200,false,4,64,500,500,781.25
+r6i.8xlarge,2.0160000000,244.0,32,16,1484.8,102400,false,8,64,1000,1000,1562.5
+r6i.xlarge,0.2520000000,30.5,4,2,185.6,12800,false,2,32,156.25,156.25,195.25
+r6id.12xlarge,3.6288000000,366.0,48,24,2227.2,153600,true,2,1425,3297.271808,1648.64,2312.5
+r6id.16xlarge,4.8384000000,488.0,64,32,2969.6,204800,true,2,1900,4396.367872,2198.192128,3125
+r6id.24xlarge,7.2576000000,732.0,96,48,4454.4,307200,true,4,1425,6594.543616,3297.28,4687.5
+r6id.2xlarge,0.6048000000,61.0,8,4,371.2,25600,true,1,474,549.548032,274.776064,390.625
+r6id.32xlarge,9.6768000000,976.0,128,64,5939.2,409600,true,4,1900,8792.735744,4396.384256,6250
+r6id.4xlarge,1.2096000000,122.0,16,8,742.4,51200,true,1,950,1099.091968,549.548032,781.25
+r6id.8xlarge,2.4192000000,244.0,32,16,1484.8,102400,true,1,1900,2198.183936,1099.096064,1562.5
+r6id.xlarge,0.3024000000,30.5,4,2,185.6,12800,true,1,237,274.771968,137.388032,195.25
+r6idn.12xlarge,4.6893600000,366.0,48,24,2227.2,153600,true,2,1425,3297.271808,1648.64,9375
+r6idn.16xlarge,6.2524800000,488.0,64,32,2969.6,204800,true,2,1900,4396.367872,2198.192128,12500
+r6idn.24xlarge,9.3787200000,732.0,96,48,4454.4,307200,true,4,1425,6594.543616,3297.28,18750
+r6idn.2xlarge,0.7815600000,61.0,8,4,371.2,25600,true,1,474,549.548032,274.776064,1562.5
+r6idn.32xlarge,12.5049600000,976.0,128,64,5939.2,409600,true,4,1900,8792.735744,4396.384256,25000
+r6idn.4xlarge,1.5631200000,122.0,16,8,742.4,51200,true,1,950,1099.091968,549.548032,3125
+r6idn.8xlarge,3.1262400000,244.0,32,16,1484.8,102400,true,1,1900,2198.183936,1099.096064,6250
+r6idn.xlarge,0.3907800000,30.5,4,2,185.6,12800,true,1,237,274.771968,137.388032,781.25
+r6in.12xlarge,4.1839200000,366.0,48,24,2227.2,153600,false,8,96,1000,1000,9375
+r6in.16xlarge,5.5785600000,488.0,64,32,2969.6,204800,false,8,128,1000,1000,12500
+r6in.24xlarge,8.3678400000,732.0,96,48,4454.4,307200,false,8,192,1000,1000,18750
+r6in.2xlarge,0.6973200000,61.0,8,4,371.2,25600,false,4,32,312.5,312.5,1562.5
+r6in.32xlarge,11.1571200000,976.0,128,64,5939.2,409600,false,8,256,1000,1000,25000
+r6in.4xlarge,1.3946400000,122.0,16,8,742.4,51200,false,4,64,500,500,3125
+r6in.8xlarge,2.7892800000,244.0,32,16,1484.8,102400,false,8,64,1000,1000,6250
+r6in.xlarge,0.3486600000,30.5,4,2,185.6,12800,false,2,32,156.25,156.25,781.25
+r7a.12xlarge,3.6516000000,366.0,48,24,1996.8,115200,false,8,96,1000,1000,2312.5
+r7a.16xlarge,4.8688000000,488.0,64,32,2662.4,153600,false,8,128,1000,1000,3125
+r7a.24xlarge,7.3032000000,732.0,96,48,3993.6,230400,false,8,192,1000,1000,4687.5
+r7a.2xlarge,0.6086000000,61.0,8,4,332.8,19200,false,4,32,312.5,312.5,390.625
+r7a.32xlarge,9.7376000000,976.0,128,64,5324.8,307200,false,8,256,1000,1000,6250
+r7a.48xlarge,14.6064000000,1464.0,192,96,7987.2,460800,false,12,256,1500,1500,6250
+r7a.4xlarge,1.2172000000,122.0,16,8,665.6,38400,false,4,64,500,500,781.25
+r7a.8xlarge,2.4344000000,244.0,32,16,1331.2,76800,false,8,64,1000,1000,1562.5
+r7a.xlarge,0.3043000000,30.5,4,2,166.4,9600,false,2,32,156.25,156.25,195.25
+r7g.12xlarge,2.5704000000,366.0,48,48,1996.8,230400,false,8,96,1000,1000,2812.5
+r7g.16xlarge,3.4272000000,488.0,64,64,2662.4,307200,false,8,128,1000,1000,3750
+r7g.2xlarge,0.4284000000,61.0,8,8,332.8,38400,false,4,32,312.5,312.5,468.75
+r7g.4xlarge,0.8568000000,122.0,16,16,665.6,76800,false,4,64,500,500,937.5
+r7g.8xlarge,1.7136000000,244.0,32,32,1331.2,153600,false,8,64,1000,1000,1875
+r7g.xlarge,0.2142000000,30.5,4,4,166.4,19200,false,2,32,156.25,156.25,234.5
+r7gd.12xlarge,3.2659000000,366.0,48,48,1996.8,230400,true,2,1425,3297.271808,1648.64,2812.5
+r7gd.16xlarge,4.3546000000,488.0,64,64,2662.4,307200,true,2,1900,4396.367872,2198.192128,3750
+r7gd.2xlarge,0.5443000000,61.0,8,8,332.8,38400,true,1,474,549.548032,274.776064,468.75
+r7gd.4xlarge,1.0886000000,122.0,16,16,665.6,76800,true,1,950,1099.091968,549.548032,937.5
+r7gd.8xlarge,2.1773000000,244.0,32,32,1331.2,153600,true,1,1900,2198.183936,1099.096064,1875
+r7gd.xlarge,0.2722000000,30.5,4,4,166.4,19200,true,1,237,274.771968,137.388032,234.5
+r7i.12xlarge,3.1752000000,366.0,48,24,1843.2,153600,false,8,96,1000,1000,2312.5
+r7i.16xlarge,4.2336000000,488.0,64,32,2457.6,204800,false,8,128,1000,1000,3125
+r7i.24xlarge,6.3504000000,732.0,96,48,3686.4,307200,false,8,192,1000,1000,4687.5
+r7i.2xlarge,0.5292000000,61.0,8,4,307.2,25600,false,4,32,312.5,312.5,390.625
+r7i.48xlarge,12.7008000000,1464.0,192,96,7372.8,614400,false,12,256,1500,1500,6250
+r7i.4xlarge,1.0584000000,122.0,16,8,614.4,51200,false,4,64,500,500,781.25
+r7i.8xlarge,2.1168000000,244.0,32,16,1228.8,102400,false,8,64,1000,1000,1562.5
+r7i.xlarge,0.2646000000,30.5,4,2,153.6,12800,false,2,32,156.25,156.25,195.25
diff --git a/scripts/resource/launch/cluster.env b/scripts/resource/launch/cluster.env
new file mode 100644
index 00000000000..4a48f880347
--- /dev/null
+++ b/scripts/resource/launch/cluster.env
@@ -0,0 +1,77 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# Configurations for EMR launch
+
+# User-defined configurations --------------------------------
+
+# Program specific --------------------------------
+
+# URI addres for the SystemDS jar file on S3
+SYSTEMDS_JAR_URI=
+# DML script path (use s3a:// URI schema for remote scripts in S3)
+SYSTEMDS_PROGRAM=s3://systemds-testing/dml_scripts/Algorithm_L2SVM.dml
+# Set the the file path arguments with adapted URI address
+# for the actual file location and always s3a:// schema
+# comma separated values
+SYSTEMDS_ARGS=
+# comma separated key=value pairs
+SYSTEMDS_NVARGS=m=200000,n=10000
+#Y=s3://systemds-testing/data/Y.csv,B=s3a://systemds-testing/data/B.csv
+
+# AWS specific -------------------------
+
+# Inspect the version difference before changing to version defferent form 7.3.0
+EMR_VERSION="emr-7.3.0"
+# output file of the resource optimization: hardware configurations
+INSTANCE_CONFIGS=
+# output file of the resource optimization: Spark configurations
+SPARK_CONFIGS=
+# existing SSH key (not created automatically)
+KEYPAIR_NAME=
+# Choose the same region as at executing resource optimizer
+REGION=us-east-1
+# Provide optionally a (signle) security group id to be added as additional to the master node
+# If value empy the option won't be used and AWS won't attach an additional group and the SSH may be blocked
+# Multiple additional groups are not supported by the launch script and this one is attached to the master only
+SECURITY_GROUP_ID=
+# Provide already created names
+# or desired names for generation with 'generate_instance_profile.sh'
+INSTANCE_PROFILE_NAME=
+IAM_ROLE_NAME=
+# Desired subnet to be used by the cluster, if not defined a default one will be used
+TARGET_SUBNET=
+# S3 folder URI for landing of log files
+LOG_URI=
+
+# Execution specific -------------------------
+
+# (number) - if 0 the cluster will be terminated automatically after program execution
+# - if greater than 0 the cluster will be terminated automatically after the given number of second in state idle
+# - if less than 0 no automatic temrination rules will be applied
+AUTO_TERMINATION_TIME=-1
+
+# Automatic configurations (read only for users) -------------
+
+# Current EMR Cluster ID
+CLUSTER_ID=
+# Public DNS name of the moster node in the current cluster
+CLUSTER_URL=
\ No newline at end of file
diff --git a/scripts/resource/launch/cluster_launch.sh b/scripts/resource/launch/cluster_launch.sh
new file mode 100755
index 00000000000..7ffe0329125
--- /dev/null
+++ b/scripts/resource/launch/cluster_launch.sh
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# exit in case of error or unbound var
+set -euo pipefail
+
+# get file directory to allow finding the file with the utils
+SCRIPT_DIR="$(dirname "$(realpath "$0")")"
+
+source cluster.env
+source "$SCRIPT_DIR/cluster_utils.sh"
+
+if [ -n "$TARGET_SUBNET" ]; then
+ SUBNET=$TARGET_SUBNET
+else
+ #Get the first available subnet in the default VPC of the configured region
+ SUBNET=$(aws ec2 describe-subnets --region $REGION \
+ --filter "Name=defaultForAz,Values=true" --query "Subnets[0].SubnetId" --output text)
+fi
+
+# generate the step definition into STEP variable
+generate_step_definition
+
+echo -e "\nLaunching EMR cluster via AWS CLI and adding a step to run $SYSTEMDS_PROGRAM with SystemDS"
+CLUSTER_INFO=$(aws emr create-cluster \
+ --applications Name=AmazonCloudWatchAgent Name=Spark \
+ --ec2-attributes '{
+ "KeyName":"'${KEYPAIR_NAME}'",
+ "InstanceProfile":"EMR_EC2_DefaultRole",
+ '"$( [ -n "$SECURITY_GROUP_ID'" ] && echo '"AdditionalMasterSecurityGroups": ["'${SECURITY_GROUP_ID}'"],' )"'
+ "SubnetId": "'${SUBNET}'"
+ }'\
+ --service-role EMR_DefaultRole \
+ --enable-debugging \
+ --release-label $EMR_VERSION \
+ --log-uri $LOG_URI \
+ --name "SystemDS cluster" \
+ --instance-groups file://$INSTANCE_CONFIGS \
+ --configurations file://$SPARK_CONFIGS \
+ --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
+ --no-termination-protected \
+ $( [ -n "$STEP" ] && echo "--steps $STEP" ) \
+ $( [ "$AUTO_TERMINATION_TIME" = 0 ] && echo "--auto-terminate" ) \
+ $( [ "$AUTO_TERMINATION_TIME" -gt 0 ] && echo "--auto-termination-policy IdleTimeout=$AUTO_TERMINATION_TIME" ) \
+ --region $REGION)
+
+CLUSTER_ID=$(echo $CLUSTER_INFO | jq .ClusterId | tr -d '"')
+echo "Cluster successfully initialized with cluster ID: "${CLUSTER_ID}
+set_config "CLUSTER_ID" $CLUSTER_ID
+
+# Wait for cluster to start
+echo -e "\nWaiting for cluster to enter running state..."
+aws emr wait cluster-running --cluster-id $CLUSTER_ID --region $REGION
+
+CLUSTER_URL=$(aws emr describe-cluster --cluster-id $CLUSTER_ID --region $REGION | jq .Cluster.MasterPublicDnsName | tr -d '"')
+set_config "CLUSTER_URL" "$CLUSTER_URL"
+
+echo "...launching process has finished and the cluster is not in state running."
+
+if [ "$AUTO_TERMINATION_TIME" = 0 ]; then
+ echo -e "\nImmediate automatic termination was enabled so the cluster will terminate directly after the step completion"
+elif [ "$AUTO_TERMINATION_TIME" -gt 0 ]; then
+ echo -e "\nDelayed automatic termination was enabled so the cluster will terminate $AUTO_TERMINATION_TIME
+ seconds after entering idle state"
+else
+ echo -e "\nAutomatic termination was not enabled so you should manually terminate the cluster"
+fi
\ No newline at end of file
diff --git a/scripts/resource/launch/cluster_run_script.sh b/scripts/resource/launch/cluster_run_script.sh
new file mode 100755
index 00000000000..3434dadbd2e
--- /dev/null
+++ b/scripts/resource/launch/cluster_run_script.sh
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# exit in case of error or unbound var
+set -euo pipefail
+
+# get file directory to allow finding the file with the utils
+SCRIPT_DIR="$(dirname "$(realpath "$0")")"
+
+source cluster.env
+source "$SCRIPT_DIR/cluster_utils.sh"
+
+# generate the step definition into STEP variable
+generate_step_definition
+if [ $STEP -z ]; then
+ echo "Error: Empty state definition, probably due to empty SYSTEMDS_PROGRAM option."
+ exit 1
+fi
+
+echo "Adding a step to run $SYSTEMDS_PROGRAM with SystemDS"
+STEP_INFO=$(aws emr add-steps --cluster-id $CLUSTER_ID --region $REGION --steps $STEP)
+
+if [ "$AUTO_TERMINATION_TIME" = 0 ]; then
+ STEP_ID=$(echo $STEP_INFO | jq .StepIds | tr -d '"' | tr -d ']' | tr -d '[' | tr -d '[:space:]' )
+ echo "Waiting for the step to finish before termination (immediate automatic termination enabled)"
+ aws emr wait step-complete --cluster-id $CLUSTER_ID --step-id $STEP_ID --region $REGION
+ echo "The step has finished and now the cluster will before immediately terminated"
+ aws emr terminate-clusters --cluster-ids $CLUSTER_ID
+elif [ "$AUTO_TERMINATION_TIME" -gt 0 ]; then
+ echo "Delayed automatic termination will apply only in case this option was set on cluster launch."
+ echo "You should manually track the step completion"
+else
+ echo "Automatic termination was not enabled so you should manually track the step completion and terminate the cluster"
+fi
+
+
diff --git a/scripts/resource/launch/cluster_utils.sh b/scripts/resource/launch/cluster_utils.sh
new file mode 100644
index 00000000000..ea476e1dbef
--- /dev/null
+++ b/scripts/resource/launch/cluster_utils.sh
@@ -0,0 +1,58 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# $1 key, $2 value
+function set_config(){
+ sed -i "" "s/\($1 *= *\).*/\1$2/" cluster.env
+}
+
+function generate_step_definition() {
+ # return empty step generate_step_definition bin case no program given
+ if [ -z $SYSTEMDS_PROGRAM ]; then
+ STEP=""
+ return 0
+ fi
+ # define ActionOnFailure
+ if [ "$AUTO_TERMINATION_TIME" = 0 ]; then
+ echo "The cluster will be terminated bin case of failure cat step execution
+ (immediate automatic termination enabled)"
+ ACTION_ON_FAILURE="TERMINATE_CLUSTER"
+ else
+ ACTION_ON_FAILURE="CANCEL_AND_WAIT"
+ fi
+ STEP=$(cat < output.log.gz &&
+ aws s3 cp output.log.gz $LOG_URI/output_$INSTANCE_ID.log.gz --content-type \"text/plain\" --content-encoding \"gzip\" &&
+ gzip -c error.log > error.log.gz &&
+ aws s3 cp error.log.gz $LOG_URI/error_$INSTANCE_ID.log.gz --content-type \"text/plain\" --content-encoding \"gzip\" &&
+ { if [ \"$AUTO_TERMINATION\" = true ]; then sudo shutdown now; fi; }' >> output.log 2>> error.log &"
+
+echo "... the program has been launched"
+
+if [ $AUTO_TERMINATION != true ]; then
+ echo -e "\nYou need to check for its completion and stop/terminate the instance manually (automatic termination disabled)"
+ exit 0
+fi
+
+echo -e "\nWaiting for the instance being stopped upon program completion (automatic termination enabled)..."
+set +e # avoid exiting on error to achieve waiting in a loop
+while true; do
+ # wait and poll every 15 up to 40 times
+ aws ec2 wait instance-stopped --instance-ids "$INSTANCE_ID" --region "$REGION"
+ # get the status to decide for the next loop
+ status=$?
+ if [ "$status" -eq 0 ]; then
+ # the instance was indeed stopped."
+ break
+ elif [ "$status" -eq 255 ]; then
+ echo "Restart the waiting mechanism..."
+ sleep 15 # wait another window before retrying
+ else
+ echo "Unknown error '$status' occurred while waiting for the instance to stop"
+ exit 1
+ fi
+done
+set -e # restore the state
+echo "...the DML finished, the logs where written to $LOG_URI and the EC2 instance was stopped"
+
+echo "The instance will be terminated directly now..."
+aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" --region "$REGION" >/dev/null
+
+echo "... termination was successful!"
\ No newline at end of file
diff --git a/scripts/resource/launch/single_node_utils.sh b/scripts/resource/launch/single_node_utils.sh
new file mode 100644
index 00000000000..0ad7db359af
--- /dev/null
+++ b/scripts/resource/launch/single_node_utils.sh
@@ -0,0 +1,137 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# $1 key, $2 value
+function set_config(){
+ sed -i "" "s/\($1 *= *\).*/\1$2/" single_node.env
+}
+
+# expects $INSTANCE_TYPE loaded
+function get_image_details() {
+ echo "Getting a suitable image for the target EC2 instance: $INSTANCE_TYPE ..."
+ if [[ ${INSTANCE_TYPE:2:1} == "g" ]]; then
+ ARCHITECTURE="arm64"
+ else
+ ARCHITECTURE="x86_64"
+ fi
+ # get lates ubuntu 24.04 LTS image for target CPU architecture
+ IMAGE_DETAILS=$(aws ec2 describe-images \
+ --owners 137112412989 \
+ --region "$REGION" \
+ --filters "Name=name,Values=al2023-ami-minimal-2023.*.2024*-$ARCHITECTURE" \
+ --query "Images | sort_by(@, &CreationDate) | [-1].[ImageId,RootDeviceName]" \
+ )
+ UBUNTU_IMAGE_ID=$(echo "$IMAGE_DETAILS" | jq -r '.[0]')
+ ROOT_DEVICE=$(echo "$IMAGE_DETAILS" | jq -r '.[1]')
+ echo "... using image with id '$UBUNTU_IMAGE_ID' for $ARCHITECTURE architecture"
+ echo ""
+}
+
+# expects $ROOT_VOLUME_SIZE and $ROOT_VOLUME_INSTANCE_TYPE loaded
+function generate_ebs_configs() {
+
+ EBS_CONFIGS="{VolumeSize=$ROOT_VOLUME_SIZE,VolumeType=$ROOT_VOLUME_TYPE,DeleteOnTermination=true}"
+ echo "Using the following EBS_CONFIGS configurations:"
+ echo $EBS_CONFIGS
+ echo
+}
+
+# expects $CONFIGURATIONS loaded
+function generate_jvm_configs() {
+ JVM_MAX_MEM=$(jq -r '.JvmMaxMemory' $CONFIGURATIONS)
+ JVM_START_MEM=$(echo "$JVM_MAX_MEM * 0.7" | bc | awk '{print int($1)}')
+ JVM_YOUNG_GEN_MEM=$(echo "$JVM_MAX_MEM * 0.1" | bc | awk '{print int($1)}')
+ echo "The target instance $INSTANCE_TYPE will be setup to use ${JVM_MAX_MEM}MB at executing SystemDS programs"
+}
+
+# create EC2 instance profile with corresponding IAM role for S3 access
+function generate_instance_profile() {
+ if aws iam get-role --role-name "$IAM_ROLE_NAME" >/dev/null 2>&1; then
+ echo "Role $IAM_ROLE_NAME already exists."
+ else
+ echo "Role $IAM_ROLE_NAME does not exist. Creating role..."
+
+ # temp trust policy for EC2 to allow assuming the role
+ cat > trust-policy.json </dev/null
+
+ # 2. attach the relevant policies to the role
+ aws iam attach-role-policy --role-name "$IAM_ROLE_NAME" --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
+ if [ -n $CLOUDWATCH_CONFIGS ]; then
+ aws iam attach-role-policy --role-name "$IAM_ROLE_NAME" \
+ --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
+ aws iam attach-role-policy --role-name "$IAM_ROLE_NAME" \
+ --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy
+ fi
+
+ echo "Role $IAM_ROLE_NAME has been created and AmazonS3FullAccess policy attached."
+
+ # delete the temp trust policy
+ rm trust-policy.json
+ fi
+
+ # create an according IAM instance policy if not created in previous runs
+ if aws iam get-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" >/dev/null; then
+ echo "Instance profile $INSTANCE_PROFILE_NAME already exists."
+ else
+ echo "Instance profile $INSTANCE_PROFILE_NAME does not exist. Creating..."
+ # 1. create the instance profile
+ aws iam create-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" >/dev/null
+
+ # 2. attach the IAM role for S3 to the instance profile
+ aws iam add-role-to-instance-profile --instance-profile-name "$INSTANCE_PROFILE_NAME" --role-name "$IAM_ROLE_NAME"
+
+ echo "Instance profile $INSTANCE_PROFILE_NAME created"
+ fi
+}
+
+function check_installation_status() {
+ ssh -o StrictHostKeyChecking=no -i "$KEYPAIR_NAME".pem "ec2-user@$PUBLIC_DNS_NAME" \
+ 'while [ ! -f /tmp/systemds_installation_completed ]; do sleep 5; done;'
+}
+
+function start_cloudwatch_agent() {
+ # launch the ssm agent first
+ ssh -i "$KEYPAIR_NAME".pem "ec2-user@$PUBLIC_DNS_NAME" sudo systemctl start amazon-ssm-agent
+ sleep 5
+ # configure and launch start_cloudwatch agent with pre-defined SSM command
+ aws ssm send-command --document-name "AmazonCloudWatch-ManageAgent" \
+ --targets "Key=InstanceIds,Values=$INSTANCE_ID" \
+ --parameters "action=configure,mode=ec2,optionalConfigurationSource=ssm,optionalConfigurationLocation=$CLOUDWATCH_CONFIGS,optionalRestart=yes" \
+ --region $REGION >/dev/null
+ sleep 5 # sleep 5 seconds should always be enough for execution of the underlying command
+}
\ No newline at end of file
diff --git a/scripts/resource/options.properties b/scripts/resource/options.properties
new file mode 100644
index 00000000000..64d7c8ae7dd
--- /dev/null
+++ b/scripts/resource/options.properties
@@ -0,0 +1,86 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+# Options for executing the Resource Optimizer
+
+# AWS specific options ---------------------------------------
+
+# specifies cloud region (using the corresponding abbreviation)
+REGION=us-east-1
+# specifies filename of CSV table containing the meta data about all available cloud VM instance
+INFO_TABLE=scripts/resource/ec2_stats.csv
+# specifies filename of CSV table containing the extra price metrics depending on the target cloud region
+REGION_TABLE=scripts/resource/aws_regional_prices.csv
+# output folder for configurations files; existing configurations files will be overwritten
+OUTPUT_FOLDER=scripts/resource/output
+# local inputs to that would be later read from s3;
+# allows to save accessing s3 objects + metadata for speedup and avioding the need of setting local AWS credentials;
+# define them as comma separated key=value pairs (no spaces)
+LOCAL_INPUTS=
+
+# Options for the enumeration process ------------------------
+
+# specifies enumeration strategy; it should be one of the following: 'grid', 'interest', 'prune'; default 'grid'
+ENUMERATION=prune
+# specifies optimization strategy (scoring function);
+# it should be one of the following: 'costs', 'time', 'price'; default 'costs'
+OPTIMIZATION_FUNCTION=price
+# specifies the weighting factor for the optimization function 'costs',
+# value should be always between 0 and 1 (default is 0.01),
+# bigger values prioritize the time over the price
+COSTS_WEIGHT=0.01
+# specifies constraint for maximum execution time; required and only relevant for OPTIMIZATION_FUNCTION=price
+MAX_TIME=1000000
+# specifies constraint for maximum price on AWS for execution; required and only relevant for OPTIMIZATION_FUNCTION=time
+MAX_PRICE=
+# specifies the limit of (virtual) CPU cores allowed for evaluation;
+# this corresponds to the most common VM service quota set by cloud providers
+CPU_QUOTA=256
+# specifies minimum desired executors; default 0 (single node execution allowed);
+# a negative value lead to setting the default
+MIN_EXECUTORS=
+# specifies maximum desired executors; default 200; a negative value leads to setting the default
+MAX_EXECUTORS=
+# specifies VM instance types for consideration at searching for optimal configuration;
+# if not specified, all instances from the table with instance metadata are considered;
+# define them as comma separated values (no spaces)
+INSTANCE_FAMILIES=
+# specifies VM instance sizes for consideration at searching for optimal configuration;
+# if not specified, all instances from the table with instance metadata are considered;
+# define them as comma separated values (no spaces)
+INSTANCE_SIZES=
+# specific to grid-based enum. strategy; specifies step size for enumerating number of executors; default 1
+STEP_SIZE=
+# specific to grid-based strategy; specifies exponential base for increasing the number of executors exponentially;
+# apply only if specified as larger than 1
+EXPONENTIAL_BASE=
+# specific to interest-based enum. strategy; boolean ('true'/'false') to indicate if single node execution should be
+# considered only in case of sufficient memory budget for the driver; default true
+USE_LARGEST_ESTIMATE=
+# specific to interest-based enum. strategy; boolean ('true'/'false') to indicate if the CP memory is an interest
+# for the enumeration; default true
+USE_CP_ESTIMATES=
+# specific to interest-based enum. strategy; boolean ('true'/'false') to indicate if potential broadcast variables'
+# size is an interest for driver and executors memory budget; default true
+USE_BROADCASTS=
+# specific to interest-based enum. strategy; boolean ('true'/'false') to indicate if the size of the outputs
+# (potentially cached) is an interest for the enumerated number of executors; default false
+USE_OUTPUTS=
diff --git a/scripts/resource/requirements.txt b/scripts/resource/requirements.txt
new file mode 100644
index 00000000000..b650973470d
--- /dev/null
+++ b/scripts/resource/requirements.txt
@@ -0,0 +1,2 @@
+pandas
+boto3
diff --git a/scripts/resource/update_prices.py b/scripts/resource/update_prices.py
new file mode 100644
index 00000000000..a633c34b278
--- /dev/null
+++ b/scripts/resource/update_prices.py
@@ -0,0 +1,81 @@
+import argparse
+import csv
+import json
+import os
+import pandas as pd
+import boto3
+
+
+def update_prices(region: str, table_file: str):
+ price_table = dict()
+
+ # init target instance types
+ with open(table_file, 'r') as file:
+ reader = csv.reader(file)
+ for row in reader:
+ col1 = row[0]
+ if col1 != "API_Name":
+ # NaN to indicate types not supported for the target region
+ price_table[col1] = pd.NA
+
+ # now get the actual prices from the AWS API
+ client = boto3.client('pricing', region_name=region)
+
+ # fetch all products using pagination
+ print(f"Fetching current priced for the target instances in region '{region}'...")
+ next_token = None
+ while True:
+ filters = [
+ {"Type": "TERM_MATCH", "Field": "operatingSystem", "Value": "Linux"},
+ {"Type": "TERM_MATCH", "Field": "preInstalledSw", "Value": "NA"},
+ {"Type": "TERM_MATCH", "Field": "regionCode", "Value": region},
+ {"Type": "TERM_MATCH", "Field": "capacitystatus", "Value": "Used"},
+ {"Type": "TERM_MATCH", "Field": "tenancy", "Value": "Shared"},
+ {"Type": "TERM_MATCH", "Field": "gpuMemory", "Value": "NA"}
+ ]
+
+ if next_token:
+ response = client.get_products(ServiceCode="AmazonEC2", Filters=filters, MaxResults=100, NextToken=next_token)
+ print("\tanother 100 records have been fetched...")
+ else:
+ response = client.get_products(ServiceCode="AmazonEC2", Filters=filters, MaxResults=100)
+ print("\t100 records have been fetched...")
+
+ # extract the price from the response
+ for product in response.get("PriceList", []):
+ product_data = json.loads(product)
+ instance_type = product_data["product"]["attributes"]["instanceType"]
+ # get price only for target instances
+ if instance_type in price_table:
+ price = next(iter(next(iter(product_data["terms"]["OnDemand"].values()))["priceDimensions"].values()))["pricePerUnit"]["USD"]
+ price_table[instance_type] = float(price)
+
+ # handle pagination
+ next_token = response.get('NextToken')
+ if not next_token:
+ break
+
+ print(f"...all prices has been fetched successfully.")
+ # update the csv table
+ ec2_df = pd.read_csv(table_file)
+ for instance_type, price in price_table.items():
+ ec2_df.loc[ec2_df["API_Name"] == instance_type, "Price"] = price
+ ec2_df.to_csv(table_file, index=False, na_rep="N/A")
+ print(f"Prices have been updated to file {table_file}")
+
+
+def main():
+ parser = argparse.ArgumentParser(description='Update prices in table with EC2 instance stats')
+ parser.add_argument('region', help='Target AWS region (e.g., us-east-1).')
+ parser.add_argument('table_file', help='CSV file to be updated')
+
+ args = parser.parse_args()
+
+ if not os.path.exists(args.table_file):
+ print(f"The given file for update does not exists")
+ exit(1)
+ # the actual price update logic
+ update_prices(args.region, args.table_file)
+
+if __name__ == "__main__":
+ main()
\ No newline at end of file
diff --git a/src/main/java/org/apache/sysds/hops/recompile/Recompiler.java b/src/main/java/org/apache/sysds/hops/recompile/Recompiler.java
index a56c630c524..63d0f1eab60 100644
--- a/src/main/java/org/apache/sysds/hops/recompile/Recompiler.java
+++ b/src/main/java/org/apache/sysds/hops/recompile/Recompiler.java
@@ -318,7 +318,7 @@ public static ArrayList recompileHopsDagInstructions( Hop hop )
* @param tid thread id, 0 for main or before worker creation
* @return modified list of instructions
*/
- private static ArrayList recompile(StatementBlock sb, ArrayList hops, ExecutionContext ec, RecompileStatus status,
+ public static ArrayList recompile(StatementBlock sb, ArrayList hops, ExecutionContext ec, RecompileStatus status,
boolean inplace, boolean replaceLit, boolean updateStats, boolean forceEt, boolean pred, ExecType et, long tid )
{
boolean codegen = ConfigurationManager.isCodegenEnabled()
diff --git a/src/main/java/org/apache/sysds/lops/compile/Dag.java b/src/main/java/org/apache/sysds/lops/compile/Dag.java
index f67cb74cd34..b26c539e9a8 100644
--- a/src/main/java/org/apache/sysds/lops/compile/Dag.java
+++ b/src/main/java/org/apache/sysds/lops/compile/Dag.java
@@ -147,12 +147,6 @@ public static String getNextUniqueVarname(DataType dt) {
dt.isFrame() ? Lop.FRAME_VAR_NAME_PREFIX :
Lop.SCALAR_VAR_NAME_PREFIX) + var_index.getNextID();
}
-
- // to be used only resource optimization
- public static void resetUniqueMembers() {
- job_id.reset(-1);
- var_index.reset(-1);
- }
///////
// Dag modifications
diff --git a/src/main/java/org/apache/sysds/parser/DataExpression.java b/src/main/java/org/apache/sysds/parser/DataExpression.java
index f0be3f1eb64..142cc806aa8 100644
--- a/src/main/java/org/apache/sysds/parser/DataExpression.java
+++ b/src/main/java/org/apache/sysds/parser/DataExpression.java
@@ -984,7 +984,8 @@ public void validateExpression(HashMap ids, HashMap no extra fee)
+ // price = EC2 price + storage price (EBS) - use only the half of the price since
+ // the half of the storage will be automatically configured
+ // because in single-node mode SystemDS does not utilize HDFS
+ // (only minimal root EBS when Instance Store available)
+ CloudInstance singleNode = config.driverInstance;
+ pricePerSecond = singleNode.getPrice() + (singleNode.getExtraStoragePrice() / 2);
+ } else {
+ // price = EC2 price + EMR fee + extra storage (EBS) price
+ CloudInstance masterNode = config.driverInstance;
+ CloudInstance coreNode = config.executorInstance;
+ pricePerSecond = masterNode.getPrice() + masterNode.getExtraFee() + masterNode.getExtraStoragePrice();
+ pricePerSecond += config.numberExecutors * (coreNode.getPrice() + coreNode.getExtraFee() + coreNode.getExtraStoragePrice());
+ }
+ } else {
+ throw new IllegalArgumentException("AWS is the only cloud provider supported at the moment");
+ }
+ return time * pricePerSecond;
+ }
+
+ /**
+ * Performs read of csv file filled with relevant AWS fees/prices per region.
+ * Each record in the csv should carry the following information (including header):
+ *
+ *
Region - AWS region abbreviation
+ *
Fee Ratio - Ratio of EMR fee per instance to EC2 price per instance per hour
+ *
EBS Price- Price for EBS per month per GB
+ *
+ * @param feeTablePath csv file path
+ * @param region AWS region abbreviation
+ * @return static array of doubles with 2 elements: [EMR fee ratio, EBS price]
+ * @throws IOException in case of invalid file format
+ */
+ public static double[] loadRegionalPrices(String feeTablePath, String region) throws IOException {
+ try(BufferedReader br = new BufferedReader(new FileReader(feeTablePath))) {
+ // validate the file header
+ String parsedLine = br.readLine();
+ if (!parsedLine.equals("Region,Fee Ratio,EBS Price"))
+ throw new IOException("Fee Table: invalid CSV header: " + parsedLine);
+ while ((parsedLine = br.readLine()) != null) {
+ String[] values = parsedLine.split(",");
+ if (values.length != 3)
+ throw new IOException(String.format("Fee Table: invalid CSV line '%s' inside: %s", parsedLine, feeTablePath));
+ if (region.equals(values[0])) {
+ return new double[] { Double.parseDouble(values[1]), Double.parseDouble(values[2]) };
+ }
+ }
+ throw new IOException(String.format("Fee Table: region '%s' not found in the CSV table: %s", region, feeTablePath));
+ } catch (FileNotFoundException e) {
+ throw new RemoteException(feeTablePath+" read failed: "+e);
+ }
+ }
/**
* Performs read of csv file filled with VM instance characteristics.
* Each record in the csv should carry the following information (including header):
*
*
API_Name - naming for VM instance used by the provider
+ *
Price - price for instance per hour
*
Memory - floating number for the instance memory in GBs
*
vCPUs - number of physical threads
+ *
Cores - number of physical cores (not relevant at the moment)
*
gFlops - FLOPS capability of the CPU in GFLOPS (Giga)
- *
ramSpeed - memory bandwidth in MB/s
- *
diskSpeed - memory bandwidth in MB/s
- *
networkSpeed - memory bandwidth in MB/s
- *
Price - price for instance per hour
+ *
memoryBandwidth - memory bandwidth in MB/s
+ *
NVMe - flag if NVMe storage volume(s) are attached
+ *
storageVolumes - number of NVMe or EBS (to additionally configured) volumes
+ *
sizeVolumes - size of each NVMe or EBS (to additionally configured) volume
+ *
diskReadBandwidth - disk read bandwidth in MB/s
+ *
diskReadBandwidth - disk write bandwidth in MB/s
+ *
networkBandwidth - network bandwidth in MB/s
*
- * @param instanceTablePath csv file
+ *
+ * @param instanceTablePath csv file path
+ * @param emrFeeRatio EMR fee as fraction of the instance price (depends on the region)
+ * @param ebsStoragePrice EBS price per GB per month (depends on the region)
* @return map with filtered instances
* @throws IOException in case problem at reading the csv file
*/
- public HashMap loadInstanceInfoTable(String instanceTablePath) throws IOException {
+ public static HashMap loadInstanceInfoTable(
+ String instanceTablePath, double emrFeeRatio, double ebsStoragePrice) throws IOException {
+ // store as mapping the instance type name to the instance object
HashMap result = new HashMap<>();
int lineCount = 1;
// try to open the file
- try(BufferedReader br = new BufferedReader(new FileReader(instanceTablePath))){
+ try(BufferedReader br = new BufferedReader(new FileReader(instanceTablePath))) {
String parsedLine;
// validate the file header
parsedLine = br.readLine();
- if (!parsedLine.equals("API_Name,Memory,vCPUs,gFlops,ramSpeed,diskSpeed,networkSpeed,Price"))
- throw new IOException("Invalid CSV header inside: " + instanceTablePath);
+ if (!parsedLine.equals("API_Name,Price,Memory,vCPUs,Cores,GFLOPS,memBandwidth,NVMe,storageVolumes,sizePerVolume,readStorageBandwidth,writeStorageBandwidth,networkBandwidth"))
+ throw new IOException("Instance info table: invalid CSV header inside: " + instanceTablePath);
while ((parsedLine = br.readLine()) != null) {
String[] values = parsedLine.split(",");
- if (values.length != 8 || !validateInstanceName(values[0]))
- throw new IOException(String.format("Invalid CSV line(%d) inside: %s", lineCount, instanceTablePath));
+ if (values.length != 13 || !validateInstanceName(values[0]))
+ throw new IOException(String.format("Instance info table: invalid CSV line(%d) inside: %s, instance of type %s", lineCount, instanceTablePath, values[0]));
- String API_Name = values[0];
- long Memory = (long) (Double.parseDouble(values[1])*1024)*1024*1024;
- int vCPUs = Integer.parseInt(values[2]);
- double gFlops = Double.parseDouble(values[3]);
- double ramSpeed = Double.parseDouble(values[4]);
- double diskSpeed = Double.parseDouble(values[5]);
- double networkSpeed = Double.parseDouble(values[6]);
- double Price = Double.parseDouble(values[7]);
+ String name = values[0];
+ double price = Double.parseDouble(values[1]);
+ double extraFee = price * emrFeeRatio;
+ long memory = GBtoBytes(Double.parseDouble(values[2]));
+ int vCPUs = Integer.parseInt(values[3]);
+ double GFlops = Double.parseDouble(values[5]);
+ double memBandwidth = Double.parseDouble(values[6]);
+ double diskReadBandwidth = Double.parseDouble(values[10]);
+ double diskWriteBandwidth = Double.parseDouble(values[11]);
+ double networkBandwidth = Double.parseDouble(values[12]);
+ boolean NVMeStorage = Boolean.parseBoolean(values[7]);
+ int numberStorageVolumes = Integer.parseInt(values[8]);
+ double sizeStorageVolumes = Double.parseDouble(values[9]);
CloudInstance parsedInstance = new CloudInstance(
- API_Name,
- Memory,
- vCPUs,
- gFlops,
- ramSpeed,
- diskSpeed,
- networkSpeed,
- Price
+ name, price, extraFee, ebsStoragePrice,
+ memory, vCPUs, GFlops, memBandwidth,
+ diskReadBandwidth, diskWriteBandwidth,
+ networkBandwidth, NVMeStorage,
+ numberStorageVolumes, sizeStorageVolumes
);
- result.put(API_Name, parsedInstance);
+ result.put(name, parsedInstance);
lineCount++;
}
+
+ return result;
+ }
+ catch(Exception ex) {
+ throw new IOException("Read failed", ex);
+ }
+ }
+
+ /**
+ * Generates json file storing the instance type and relevant characteristics
+ * for single node executions.
+ * The resulting file is to be used only for parsing the attributes and
+ * is not suitable for direct options input to AWS CLI.
+ *
+ * @param instance EC2 instance object (always set one)
+ * @param filePath path for the json file
+ */
+ public static void generateEC2ConfigJson(CloudInstance instance, String filePath) {
+ try {
+ JSONObject ec2Config = new JSONObject();
+
+ ec2Config.put("InstanceType", instance.getInstanceName());
+ // EBS size of the root volume (only one volume in this case)
+ int ebsRootSize = EBS_DEFAULT_ROOT_SIZE_EC2;
+ if (!instance.isNVMeStorage()) // plan for only half of the EMR storage budget
+ ebsRootSize += (int) Math.ceil(instance.getNumStorageVolumes()*instance.getSizeStoragePerVolume()/2);
+ ec2Config.put("VolumeSize", ebsRootSize);
+ ec2Config.put("VolumeType", "gp3");
+ ec2Config.put("EbsOptimized", true);
+ // JVM memory budget used at resource optimization
+ int cpMemory = (int) (instance.getMemory()/ (1024*1024) * JVM_MEMORY_FACTOR);
+ ec2Config.put("JvmMaxMemory", cpMemory);
+
+ try (FileWriter file = new FileWriter(filePath)) {
+ file.write(ec2Config.write(true));
+ System.out.println("EC2 configuration JSON file: " + filePath);
+ }
+ } catch (Exception e) {
+ e.printStackTrace();
}
+ }
+
+ /**
+ * Generates json file with instance groups argument for
+ * launching AWS EMR cluster
+ *
+ * @param clusterConfig object representing EMR cluster configurations
+ * @param filePath path for the output json file
+ */
+ public static void generateEMRInstanceGroupsJson(ConfigurationPoint clusterConfig, String filePath) {
+ try {
+ JSONArray instanceGroups = new JSONArray();
+
+ // Master (Primary) instance group
+ JSONObject masterGroup = new JSONObject();
+ masterGroup.put("InstanceCount", 1);
+ masterGroup.put("InstanceGroupType", "MASTER");
+ masterGroup.put("InstanceType",clusterConfig.driverInstance.getInstanceName());
+ masterGroup.put("Name", "Master Instance Group");
+ attachEBSConfigsIfNeeded(clusterConfig.driverInstance, masterGroup);
+ instanceGroups.add(masterGroup);
+
+ // Core instance group
+ JSONObject coreGroup = new JSONObject();
+ coreGroup.put("InstanceCount", clusterConfig.numberExecutors);
+ coreGroup.put("InstanceGroupType", "CORE");
+ coreGroup.put("InstanceType", clusterConfig.executorInstance.getInstanceName());
+ coreGroup.put("Name", "Core Instance Group");
+ attachEBSConfigsIfNeeded(clusterConfig.executorInstance, coreGroup);
+ instanceGroups.add(coreGroup);
+
+ try (FileWriter file = new FileWriter(filePath)) {
+ file.write(instanceGroups.write(true));
+ System.out.println("Instance Groups JSON file created: " + filePath);
+ }
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ }
+
+ private static void attachEBSConfigsIfNeeded(CloudInstance instance, JSONObject instanceGroup) {
+ // in AWS CLI the root EBS volume is configured with a separate optional flag (default 15GB)
+ if (!instance.isNVMeStorage()) {
+ try {
+ JSONObject volumeSpecification = new JSONObject();
+ volumeSpecification.put("SizeInGB", (int) instance.getSizeStoragePerVolume());
+ volumeSpecification.put("VolumeType", "gp3");
+
+ JSONObject ebsBlockDeviceConfig = new JSONObject();
+ ebsBlockDeviceConfig.put("VolumesPerInstance", instance.getNumStorageVolumes());
+ ebsBlockDeviceConfig.put("VolumeSpecification", volumeSpecification);
+ JSONArray ebsBlockDeviceConfigsArray = new JSONArray();
+ ebsBlockDeviceConfigsArray.add(ebsBlockDeviceConfig);
+
+ JSONObject ebsConfiguration = new JSONObject();
+ ebsConfiguration.put("EbsOptimized", true);
+ ebsConfiguration.put("EbsBlockDeviceConfigs", ebsBlockDeviceConfigsArray);
+ instanceGroup.put("EbsConfiguration", ebsConfiguration);
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ }
+ }
+
+ /**
+ * Generate json file with configurations attribute for
+ * launching AWS EMR cluster with Spark
+ *
+ * @param clusterConfig object representing EMR cluster configurations
+ * @param filePath path for the output json file
+ */
+ public static void generateEMRConfigurationsJson(ConfigurationPoint clusterConfig, String filePath) {
+ try {
+ JSONArray configurations = new JSONArray();
+
+ // Spark Configuration
+ JSONObject sparkConfig = new JSONObject();
+ sparkConfig.put("Classification", "spark");
+ // do not use the automatic EMR cluster configurations
+ sparkConfig.put("Properties", new JSONObject().put("maximizeResourceAllocation", "false"));
+
+ // set custom defined cluster configurations
+ JSONObject sparkDefaultsConfig = new JSONObject();
+ sparkDefaultsConfig.put("Classification", "spark-defaults");
+
+ JSONObject sparkDefaultsProperties = new JSONObject();
+ long driverMemoryBytes = calculateEffectiveDriverMemoryBudget(clusterConfig.driverInstance.getMemory(),
+ clusterConfig.numberExecutors * clusterConfig.executorInstance.getVCPUs());
+ int driverMemory = (int) (driverMemoryBytes / (1024*1024));
+ sparkDefaultsProperties.put("spark.driver.memory", (driverMemory)+"m");
+ sparkDefaultsProperties.put("spark.driver.maxResultSize", String.valueOf(0));
+ // calculate the exact resource limits for YARN containers to maximize the utilization
+ int[] executorResources = getEffectiveExecutorResources(
+ clusterConfig.executorInstance.getMemory(),
+ clusterConfig.executorInstance.getVCPUs(),
+ clusterConfig.numberExecutors
+ );
+ sparkDefaultsProperties.put("spark.executor.memory", (executorResources[0])+"m");
+ sparkDefaultsProperties.put("spark.executor.cores", Integer.toString(executorResources[1]));
+ sparkDefaultsProperties.put("spark.executor.instances", Integer.toString(executorResources[2]));
+ // values copied from SparkClusterConfig.analyzeSparkConfiguation
+ sparkDefaultsProperties.put("spark.storage.memoryFraction", String.valueOf(0.6));
+ sparkDefaultsProperties.put("spark.memory.storageFraction", String.valueOf(0.5));
+ sparkDefaultsProperties.put("spark.executor.memoryOverheadFactor", String.valueOf(0.1));
+ // set the custom AM configurations
+ sparkDefaultsProperties.put("spark.yarn.am.memory", (executorResources[3])+"m");
+ sparkDefaultsProperties.put("spark.yarn.am.cores", Integer.toString(executorResources[4]));
+ sparkDefaultsConfig.put("Properties", sparkDefaultsProperties);
+
+ // Spark-env and export JAVA_HOME Configuration
+ JSONObject sparkEnvConfig = new JSONObject();
+ sparkEnvConfig.put("Classification", "spark-env");
+
+ JSONObject jvmVersion = new JSONObject().put("JAVA_HOME", "/usr/lib/jvm/jre-11");
+ JSONObject exportConfig = new JSONObject();
+ exportConfig.put("Classification", "export");
+ exportConfig.put("Properties", jvmVersion);
+ JSONArray jvmArray = new JSONArray();
+ jvmArray.add(exportConfig);
+ sparkEnvConfig.put("Configurations", jvmArray);
+
+ configurations.add(sparkConfig);
+ configurations.add(sparkDefaultsConfig);
+ configurations.add(sparkEnvConfig);
+
+ try (FileWriter file = new FileWriter(filePath)) {
+ file.write(configurations.write(true));
+ System.out.println("Configurations JSON file created: " + filePath);
+ }
+ } catch (Exception e) {
+ e.printStackTrace();
+ }
+ }
+
+ /**
+ * Calculates the effective resource values for SPark cluster managed by YARN.
+ * It considers the resource limits for scheduling containers by YARN
+ * and the need to fit an Application Master (AM) container in addition to the executor ones.
+ *
+ * @param memory total node memory inn bytes
+ * @param cores total node available virtual cores
+ * @param numExecutors number of available worker nodes
+ * @return arrays of length 5 -
+ * [executor mem. in MB, executor cores, num. executors, AM mem. in MB, AM cores]
+ */
+ public static int[] getEffectiveExecutorResources(long memory, int cores, int numExecutors) {
+ int effectiveExecutorMemoryMB, effectiveAmMemoryMB;
+ int effectiveExecutorCores, effectiveAmCores;
+ int effectiveNumExecutors;
+ // YARN reserves 25% of the total memory for other resources (OS, node management, etc.)
+ long yarnAllocationMemory = (long) (memory * 0.75);
+ // plan for resource allocation for YARN Application Master (AM) container
+ int totalExecutorCores = cores * numExecutors;
+ // Scale with the cluster size growth to allow for allocating efficient AM resource
+ int amMemoryMB = calculateAmMemoryMB(totalExecutorCores);
+ int amMemoryOverheadMB = Math.max(384, (int) (amMemoryMB * 0.1)); // Spark default config
+ long amTotalMemory = (long) (amMemoryMB + amMemoryOverheadMB) * 1024 * 1024;
+ int amCores = calculateAmCores(totalExecutorCores);
+
+ // decide if is more effective to launch AM alongside an executor or on a dedicated node
+ // plan for executor memory overhead -> 10% of the executor memory (division by 1.1, always over 384MB)
+ if (amTotalMemory * numExecutors >= yarnAllocationMemory) {
+ // the case only for a large cluster of small instances
+ // in this case dedicate a whole node for the AM
+ effectiveExecutorMemoryMB = (int) Math.floor(yarnAllocationMemory / (1.1 * 1024 * 1024));
+ effectiveExecutorCores = cores;
+ // maximize the AM resource since no resource will be left for an executor
+ effectiveAmMemoryMB = effectiveExecutorMemoryMB;
+ effectiveAmCores = cores;
+ effectiveNumExecutors = numExecutors - 1;
+ } else {
+ // in this case leave room in each worker node for executor + AM containers
+ effectiveExecutorMemoryMB = (int) Math.floor((yarnAllocationMemory - amTotalMemory) / (1.1 * 1024 * 1024));
+ effectiveExecutorCores = cores - amCores;
+ effectiveAmMemoryMB = amMemoryMB;
+ effectiveAmCores = amCores;
+ effectiveNumExecutors = numExecutors;
+ }
+
+ // always 5 return values
+ return new int[] {
+ effectiveExecutorMemoryMB,
+ effectiveExecutorCores,
+ effectiveNumExecutors,
+ effectiveAmMemoryMB,
+ effectiveAmCores
+ };
+ }
+
+ public static int calculateAmMemoryMB(int totalExecutorCores) {
+ // 512MB base Application Master memory budget + 256MB for each 16 cores extra
+ return 512 + (int) Math.floor((double) totalExecutorCores / 16) * 256;
+ }
+
+ public static int calculateAmCores(int totalExecutorCores) {
+ // at least 1 core per 64 cores in cluster
+ int scaledCores = (int) Math.ceil((totalExecutorCores) / 64.0);
+ // cap to 8 cores for large clusters (cores > 512)
+ return Math.min(8, scaledCores);
+ }
- return result;
+ public static long calculateEffectiveDriverMemoryBudget(long driverMemory, int totalExecutorCores) {
+ // 1GB Resource Manager memory budget + 256MB for each 16 cores extra
+ int effectiveBudgetMB = 1024 + (int) Math.floor((double) totalExecutorCores / 16) * 256;
+ long effectiveBudgetBytes = ((long) effectiveBudgetMB * 1024 * 1024);
+ // validation if the memory is negative or insufficient is to be done separately
+ // return value in bytes
+ return Math.min((long) (driverMemory * JVM_MEMORY_FACTOR),
+ driverMemory - effectiveBudgetBytes);
}
}
diff --git a/src/main/java/org/apache/sysds/resource/ResourceCompiler.java b/src/main/java/org/apache/sysds/resource/ResourceCompiler.java
index 4ddc381b7c8..eaa538339ca 100644
--- a/src/main/java/org/apache/sysds/resource/ResourceCompiler.java
+++ b/src/main/java/org/apache/sysds/resource/ResourceCompiler.java
@@ -23,26 +23,28 @@
import org.apache.sysds.api.DMLOptions;
import org.apache.sysds.api.DMLScript;
import org.apache.sysds.common.Types;
-import org.apache.sysds.conf.CompilerConfig;
import org.apache.sysds.conf.ConfigurationManager;
import org.apache.sysds.hops.Hop;
+import org.apache.sysds.hops.OptimizerUtils;
import org.apache.sysds.hops.recompile.Recompiler;
-import org.apache.sysds.lops.Lop;
-import org.apache.sysds.lops.compile.Dag;
-import org.apache.sysds.lops.rewrite.LopRewriter;
import org.apache.sysds.parser.*;
import org.apache.sysds.runtime.controlprogram.*;
+import org.apache.sysds.runtime.controlprogram.context.ExecutionContextFactory;
import org.apache.sysds.runtime.controlprogram.context.SparkExecutionContext;
import org.apache.sysds.runtime.instructions.Instruction;
import org.apache.sysds.utils.stats.InfrastructureAnalyzer;
import java.io.IOException;
import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import static org.apache.sysds.api.DMLScript.*;
+import static org.apache.sysds.parser.DataExpression.IO_FILENAME;
+import static org.apache.sysds.resource.CloudUtils.*;
/**
* This class does full or partial program recompilation
@@ -57,15 +59,12 @@ public class ResourceCompiler {
public static final long DEFAULT_EXECUTOR_MEMORY = 512*1024*1024; // 0.5GB
public static final int DEFAULT_EXECUTOR_THREADS = 2; // avoids creating spark context
public static final int DEFAULT_NUMBER_EXECUTORS = 2; // avoids creating spark context
- static {
- // TODO: consider moving to the executable of the resource optimizer once implemented
- // USE_LOCAL_SPARK_CONFIG = true; -> needs to be false to trigger evaluating the default parallelism
- ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.ALLOW_DYN_RECOMPILATION, false);
- ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.RESOURCE_OPTIMIZATION, true);
- }
- private static final LopRewriter _lopRewriter = new LopRewriter();
public static Program compile(String filePath, Map args) throws IOException {
+ return compile(filePath, args, null);
+ }
+
+ public static Program compile(String filePath, Map args, HashMap replaceVars) throws IOException {
// setting the dynamic recompilation flags during resource optimization is obsolete
DMLOptions dmlOptions =DMLOptions.defaultOptions;
dmlOptions.argVals = args;
@@ -73,14 +72,13 @@ public static Program compile(String filePath, Map args) throws
String dmlScriptStr = readDMLScript(true, filePath);
Map argVals = dmlOptions.argVals;
- Dag.resetUniqueMembers();
- // NOTE: skip configuring code generation
- // NOTE: expects setting up the initial cluster configs before calling
ParserWrapper parser = ParserFactory.createParser();
DMLProgram dmlProgram = parser.parse(null, dmlScriptStr, argVals);
DMLTranslator dmlTranslator = new DMLTranslator(dmlProgram);
dmlTranslator.liveVariableAnalysis(dmlProgram);
dmlTranslator.validateParseTree(dmlProgram);
+ if (replaceVars != null && !replaceVars.isEmpty()) {
+ replaceFilename(dmlProgram, replaceVars);}
dmlTranslator.constructHops(dmlProgram);
dmlTranslator.rewriteHopsDAG(dmlProgram);
dmlTranslator.constructLops(dmlProgram);
@@ -88,61 +86,61 @@ public static Program compile(String filePath, Map args) throws
return dmlTranslator.getRuntimeProgram(dmlProgram, ConfigurationManager.getDMLConfig());
}
- private static ArrayList recompile(StatementBlock sb, ArrayList hops) {
- // construct new lops
- ArrayList lops = new ArrayList<>(hops.size());
- Hop.resetVisitStatus(hops);
- for( Hop hop : hops ){
- Recompiler.rClearLops(hop);
- lops.add(hop.constructLops());
- }
- // apply hop-lop rewrites to cover the case of changed lop operators
- _lopRewriter.rewriteLopDAG(sb, lops);
+ public static void replaceFilename(DMLProgram dmlp, HashMap replaceVars)
+ {
+ for (int i = 0; i < dmlp.getNumStatementBlocks(); i++) {
+ StatementBlock sb = dmlp.getStatementBlock(i);
+ for (Statement statement: sb.getStatements()) {
+ if (!(statement instanceof AssignmentStatement ||
+ statement instanceof OutputStatement)) continue;
- Dag dag = new Dag<>();
- for (Lop l : lops) {
- l.addToDag(dag);
- }
+ StringIdentifier stringIdentifier;
+ if (statement instanceof AssignmentStatement) {
+ Expression assignExpression = ((AssignmentStatement) statement).getSource();
+ if (!(assignExpression instanceof StringIdentifier ||
+ assignExpression instanceof DataExpression)) continue;
- return dag.getJobs(sb, ConfigurationManager.getDMLConfig());
- }
+ if (assignExpression instanceof DataExpression) {
+ Expression filenameExpression = ((DataExpression) assignExpression).getVarParam(IO_FILENAME);
+ if (!(filenameExpression instanceof StringIdentifier)) continue;
- /**
- * Recompiling a given program for resource optimization for single node execution
- * @param program program to be recompiled
- * @param driverMemory target driver memory
- * @param driverCores target driver threads/cores
- * @return the recompiled program as a new {@code Program} instance
- */
- public static Program doFullRecompilation(Program program, long driverMemory, int driverCores) {
- setDriverConfigurations(driverMemory, driverCores);
- setSingleNodeExecution();
- return doFullRecompilation(program);
+ stringIdentifier = (StringIdentifier) filenameExpression;
+ } else {
+ stringIdentifier = (StringIdentifier) assignExpression;
+ }
+ } else {
+ Expression filenameExpression = ((OutputStatement) statement).getExprParam(IO_FILENAME);
+ if (!(filenameExpression instanceof StringIdentifier)) continue;
+
+ stringIdentifier = (StringIdentifier) filenameExpression;
+ }
+
+ if (!(replaceVars.containsKey(stringIdentifier.getValue()))) continue;
+ String valToReplace = replaceVars.get(stringIdentifier.getValue());
+ stringIdentifier.setValue(valToReplace);
+ }
+ }
}
/**
- * Recompiling a given program for resource optimization for Spark execution
+ * Recompiling a given program for resource optimization.
+ * This method should always be called after setting the target resources
+ * to {@code InfrastructureAnalyzer} and {@code SparkExecutionContext}
+ *
* @param program program to be recompiled
- * @param driverMemory target driver memory
- * @param driverCores target driver threads/cores
- * @param numberExecutors target number of executor nodes
- * @param executorMemory target executor memory
- * @param executorCores target executor threads/cores
* @return the recompiled program as a new {@code Program} instance
*/
- public static Program doFullRecompilation(Program program, long driverMemory, int driverCores, int numberExecutors, long executorMemory, int executorCores) {
- setDriverConfigurations(driverMemory, driverCores);
- setExecutorConfigurations(numberExecutors, executorMemory, executorCores);
- return doFullRecompilation(program);
- }
-
- private static Program doFullRecompilation(Program program) {
- Dag.resetUniqueMembers();
- Program newProgram = new Program();
+ public static Program doFullRecompilation(Program program) {
+ // adjust defaults for memory estimates of variables with unknown dimensions
+ OptimizerUtils.resetDefaultSize();
+ // init new Program object for the output
+ Program newProgram = new Program(program.getDMLProg());
+ // collect program blocks from all layers
ArrayList B = Stream.concat(
program.getProgramBlocks().stream(),
program.getFunctionProgramBlocks().values().stream())
.collect(Collectors.toCollection(ArrayList::new));
+ // recompile for each fo the program blocks and put attach the program object
doRecompilation(B, newProgram);
return newProgram;
}
@@ -157,46 +155,59 @@ private static void doRecompilation(ProgramBlock originBlock, Program target) {
if (originBlock instanceof FunctionProgramBlock)
{
FunctionProgramBlock fpb = (FunctionProgramBlock)originBlock;
- doRecompilation(fpb.getChildBlocks(), target);
- }
- else if (originBlock instanceof WhileProgramBlock)
- {
- WhileProgramBlock wpb = (WhileProgramBlock)originBlock;
- WhileStatementBlock sb = (WhileStatementBlock) originBlock.getStatementBlock();
- if(sb!=null && sb.getPredicateHops()!=null ){
- ArrayList inst = Recompiler.recompileHopsDag(sb.getPredicateHops(), null, null, true, true, 0);
- wpb.setPredicate(inst);
- target.addProgramBlock(wpb);
+ Recompiler.recompileProgramBlockHierarchy(fpb.getChildBlocks(), new LocalVariableMap(), 0, true, Recompiler.ResetType.NO_RESET);
+ String functionName = ((FunctionStatement) fpb.getStatementBlock().getStatement(0)).getName();
+ String namespace = null;
+ for (Map.Entry> pairNS: target.getDMLProg().getNamespaces().entrySet()) {
+ if (pairNS.getValue().containsFunction(functionName)) {
+ namespace = pairNS.getKey();
+ }
}
- doRecompilation(wpb.getChildBlocks(), target);
+ target.addFunctionProgramBlock(namespace, functionName, fpb);
}
else if (originBlock instanceof IfProgramBlock)
{
IfProgramBlock ipb = (IfProgramBlock)originBlock;
IfStatementBlock sb = (IfStatementBlock) ipb.getStatementBlock();
if(sb!=null && sb.getPredicateHops()!=null ){
- ArrayList inst = Recompiler.recompileHopsDag(sb.getPredicateHops(), null, null, true, true, 0);
+ ArrayList hopAsList = new ArrayList<>(Collections.singletonList(sb.getPredicateHops()));
+ ArrayList inst = Recompiler.recompile(null , hopAsList, null, null, true, false, true, false, false, null, 0);
ipb.setPredicate(inst);
target.addProgramBlock(ipb);
}
doRecompilation(ipb.getChildBlocksIfBody(), target);
doRecompilation(ipb.getChildBlocksElseBody(), target);
}
+ else if (originBlock instanceof WhileProgramBlock)
+ {
+ WhileProgramBlock wpb = (WhileProgramBlock)originBlock;
+ WhileStatementBlock sb = (WhileStatementBlock) originBlock.getStatementBlock();
+ if(sb!=null && sb.getPredicateHops()!=null ){
+ ArrayList hopAsList = new ArrayList<>(Collections.singletonList(sb.getPredicateHops()));
+ ArrayList inst = Recompiler.recompile(null , hopAsList, null, null, true, false, true, false, false, null, 0);
+ wpb.setPredicate(inst);
+ target.addProgramBlock(wpb);
+ }
+ doRecompilation(wpb.getChildBlocks(), target);
+ }
else if (originBlock instanceof ForProgramBlock) //incl parfor
{
ForProgramBlock fpb = (ForProgramBlock)originBlock;
ForStatementBlock sb = (ForStatementBlock) fpb.getStatementBlock();
if(sb!=null){
if( sb.getFromHops()!=null ){
- ArrayList inst = Recompiler.recompileHopsDag(sb.getFromHops(), null, null, true, true, 0);
+ ArrayList hopAsList = new ArrayList<>(Collections.singletonList(sb.getFromHops()));
+ ArrayList inst = Recompiler.recompile(null , hopAsList, null, null, true, false, true, false, false, null, 0);
fpb.setFromInstructions( inst );
}
if(sb.getToHops()!=null){
- ArrayList inst = Recompiler.recompileHopsDag(sb.getToHops(), null, null, true, true, 0);
+ ArrayList hopAsList = new ArrayList<>(Collections.singletonList(sb.getToHops()));
+ ArrayList inst = Recompiler.recompile(null , hopAsList, null, null, true, false, true, false, false, null, 0);
fpb.setToInstructions( inst );
}
if(sb.getIncrementHops()!=null){
- ArrayList inst = Recompiler.recompileHopsDag(sb.getIncrementHops(), null, null, true, true, 0);
+ ArrayList hopAsList = new ArrayList<>(Collections.singletonList(sb.getIncrementHops()));
+ ArrayList inst = Recompiler.recompile(null , hopAsList, null, null, true, false, true, false, false, null, 0);
fpb.setIncrementInstructions(inst);
}
target.addProgramBlock(fpb);
@@ -208,56 +219,76 @@ else if (originBlock instanceof ForProgramBlock) //incl parfor
{
BasicProgramBlock bpb = (BasicProgramBlock)originBlock;
StatementBlock sb = bpb.getStatementBlock();
- ArrayList inst = recompile(sb, sb.getHops());
+ ArrayList inst = Recompiler.recompile(sb, sb.getHops(), ExecutionContextFactory.createContext(target), null, true, false, true, false, false, null, 0);
bpb.setInstructions(inst);
target.addProgramBlock(bpb);
}
}
/**
- * Sets resource configurations for the node executing the control program.
+ * Sets resource configurations for executions in single-node mode
+ * including the hardware configurations for the node running the CP.
*
- * @param nodeMemory memory in Bytes
- * @param nodeNumCores number of CPU cores
+ * @param nodeMemory memory budget for the node running CP
+ * @param nodeCores number of CPU cores for the node running CP
*/
- public static void setDriverConfigurations(long nodeMemory, int nodeNumCores) {
+ public static void setSingleNodeResourceConfigs(long nodeMemory, int nodeCores) {
+ DMLScript.setGlobalExecMode(Types.ExecMode.SINGLE_NODE);
// use 90% of the node's memory for the JVM heap -> rest needed for the OS
- InfrastructureAnalyzer.setLocalMaxMemory((long) (0.9 * nodeMemory));
- InfrastructureAnalyzer.setLocalPar(nodeNumCores);
+ long effectiveSingleNodeMemory = (long) (nodeMemory * JVM_MEMORY_FACTOR);
+ // CPU core would be shared with OS -> no further limitation
+ InfrastructureAnalyzer.setLocalMaxMemory(effectiveSingleNodeMemory);
+ InfrastructureAnalyzer.setLocalPar(nodeCores);
}
/**
- * Sets resource configurations for the cluster of nodes
- * executing the Spark jobs.
+ * Sets resource configurations for executions in hybrid mode
+ * including the hardware configurations for the node running the CP
+ * and the worker nodes running Spark executors
*
- * @param numExecutors number of nodes in cluster
- * @param nodeMemory memory in Bytes per node
- * @param nodeNumCores number of CPU cores per node
+ * @param driverMemory memory budget for the node running CP
+ * @param driverCores number of CPU cores for the node running CP
+ * @param numExecutors number of nodes in cluster
+ * @param executorMemory memory budget for the nodes running executors
+ * @param executorCores number of CPU cores for the nodes running executors
*/
- public static void setExecutorConfigurations(int numExecutors, long nodeMemory, int nodeNumCores) {
- // TODO: think of reasonable factor for the JVM heap as prt of the node's memory
- if (numExecutors > 0) {
- DMLScript.setGlobalExecMode(Types.ExecMode.HYBRID);
- SparkConf sparkConf = SparkExecutionContext.createSystemDSSparkConf();
- // ------------------ Static Configurations -------------------
- // TODO: think how to avoid setting them every time
- sparkConf.set("spark.master", "local[*]");
- sparkConf.set("spark.app.name", "SystemDS");
- sparkConf.set("spark.memory.useLegacyMode", "false");
- // ------------------ Static Configurations -------------------
- // ------------------ Dynamic Configurations -------------------
- sparkConf.set("spark.executor.memory", (nodeMemory/(1024*1024))+"m");
- sparkConf.set("spark.executor.instances", Integer.toString(numExecutors));
- sparkConf.set("spark.executor.cores", Integer.toString(nodeNumCores));
- // not setting "spark.default.parallelism" on purpose -> allows re-initialization
- // ------------------ Dynamic Configurations -------------------
- SparkExecutionContext.initLocalSparkContext(sparkConf);
- } else {
- throw new RuntimeException("The given number of executors was 0");
+ public static void setSparkClusterResourceConfigs(long driverMemory, int driverCores, int numExecutors, long executorMemory, int executorCores) {
+ if (numExecutors <= 0) {
+ throw new RuntimeException("The given number of executors was non-positive");
}
- }
+ // ------------------- CP (driver) configurations -------------------
+ // use at most 90% of the node's memory for the JVM heap -> rest needed for the OS and resource management
+ // adapt the minimum based on the need for YAN RM
+ long effectiveDriverMemory = calculateEffectiveDriverMemoryBudget(driverMemory, numExecutors*executorCores);
+ // require that always at least half of the memory budget is left for driver memory or 1GB
+ if (effectiveDriverMemory <= GBtoBytes(1) || driverMemory > 2*effectiveDriverMemory) {
+ throw new IllegalArgumentException("Driver resources are not sufficient to handle the cluster");
+ }
+ // CPU core would be shared -> no further limitation
+ InfrastructureAnalyzer.setLocalMaxMemory(effectiveDriverMemory);
+ InfrastructureAnalyzer.setLocalPar(driverCores);
- public static void setSingleNodeExecution() {
- DMLScript.setGlobalExecMode(Types.ExecMode.SINGLE_NODE);
+ // ---------------------- Spark Configurations -----------------------
+ DMLScript.setGlobalExecMode(Types.ExecMode.HYBRID);
+ SparkConf sparkConf = SparkExecutionContext.createSystemDSSparkConf();
+
+ // ------------------ Static Spark Configurations --------------------
+ sparkConf.set("spark.master", "local[*]");
+ sparkConf.set("spark.app.name", "SystemDS");
+ sparkConf.set("spark.memory.useLegacyMode", "false");
+
+ // ------------------ Dynamic Spark Configurations -------------------
+ // calculate the effective resource that would be available for the executor containers in YARN
+ int[] effectiveValues = getEffectiveExecutorResources(executorMemory, executorCores, numExecutors);
+ int effectiveExecutorMemory = effectiveValues[0];
+ int effectiveExecutorCores = effectiveValues[1];
+ int effectiveNumExecutor = effectiveValues[2];
+ sparkConf.set("spark.executor.memory", (effectiveExecutorMemory)+"m");
+ sparkConf.set("spark.executor.instances", Integer.toString(effectiveNumExecutor));
+ sparkConf.set("spark.executor.cores", Integer.toString(effectiveExecutorCores));
+ // not setting "spark.default.parallelism" on purpose -> allows re-initialization
+
+ // ------------------- Load Spark Configurations ---------------------
+ SparkExecutionContext.initLocalSparkContext(sparkConf);
}
}
diff --git a/src/main/java/org/apache/sysds/resource/ResourceOptimizer.java b/src/main/java/org/apache/sysds/resource/ResourceOptimizer.java
new file mode 100644
index 00000000000..8a2427561a1
--- /dev/null
+++ b/src/main/java/org/apache/sysds/resource/ResourceOptimizer.java
@@ -0,0 +1,487 @@
+package org.apache.sysds.resource;
+
+import org.apache.commons.cli.*;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.commons.configuration2.ex.ConfigurationException;
+import org.apache.sysds.conf.CompilerConfig;
+import org.apache.sysds.conf.ConfigurationManager;
+import org.apache.sysds.resource.enumeration.EnumerationUtils;
+import org.apache.sysds.resource.enumeration.Enumerator;
+import org.apache.sysds.runtime.controlprogram.Program;
+import org.apache.commons.configuration2.io.FileHandler;
+
+import java.io.IOException;
+import java.nio.file.*;
+import java.util.HashMap;
+import java.util.Map;
+
+
+import static org.apache.sysds.resource.CloudUtils.DEFAULT_CLUSTER_LAUNCH_TIME;
+
+public class ResourceOptimizer {
+ private static final String DEFAULT_OPTIONS_FILE = "./options.properties";
+
+ static {
+ ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.RESOURCE_OPTIMIZATION, true);
+ ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.REJECT_READ_WRITE_UNKNOWNS, true);
+ }
+ private static final String EMR_INSTANCE_GROUP_FILENAME = "emr_instance_groups.json";
+ private static final String EMR_CONFIGURATIONS_FILENAME = "emr_configurations.json";
+ private static final String EC2_ARGUMENTS_FILENAME = "ec2_configurations.json";
+
+ @SuppressWarnings("static-access")
+ public static Options createOptions() {
+ Options options = new Options();
+
+ Option fileOpt = OptionBuilder.withArgName("filename")
+ .withDescription("specifies DML file to execute; path should be local")
+ .hasArg().create("f");
+ Option optionsOpt = OptionBuilder
+ .withDescription("specifies options file for the resource optimization")
+ .hasArg().create("options");
+ Option nvargsOpt = OptionBuilder.withArgName("key=value")
+ .withDescription("parameterizes DML script with named parameters of the form ; " +
+ " should be a valid identifier in DML")
+ .hasArgs().create("nvargs");
+ Option argsOpt = OptionBuilder.withArgName("argN")
+ .withDescription("specifies positional parameters; " +
+ "first value will replace $1 in DML program, $2 will replace 2nd and so on")
+ .hasArgs().create("args");
+ Option helpOpt = OptionBuilder
+ .withDescription("shows usage message")
+ .create("help");
+
+ options.addOption(fileOpt);
+ options.addOption(optionsOpt);
+ options.addOption(nvargsOpt);
+ options.addOption(argsOpt);
+ options.addOption(helpOpt);
+
+ OptionGroup helpOrFile = new OptionGroup()
+ .addOption(helpOpt)
+ .addOption(fileOpt);
+ helpOrFile.setRequired(true);
+ options.addOptionGroup(helpOrFile);
+
+ options.addOptionGroup(new OptionGroup()
+ .addOption(nvargsOpt).addOption(argsOpt));
+ options.addOption(helpOpt);
+
+ return options;
+ }
+
+ public static Enumerator initEnumerator(CommandLine line, PropertiesConfiguration options) throws ParseException, IOException {
+ // parse script arguments
+ HashMap argsMap = new HashMap<>();
+ if (line.hasOption("args")){
+ String[] argValues = line.getOptionValues("args");
+ for (int k=0; k localInputMap = new HashMap<>();
+ if (!localInputsOpt.isEmpty()) {
+ String[] inputParts = localInputsOpt.split(",");
+ for (String var : inputParts){
+ String[] varParts = var.split("=");
+ if (varParts.length != 2) {
+ throw new RuntimeException("Invalid local variable pairs declaration: " + var);
+ }
+ if (!argsMap.containsValue(varParts[0])) {
+ throw new RuntimeException("Option for local input does not match any given argument: " + varParts[0]);
+ }
+ String argName = getKeyByValue(argsMap, varParts[0]);
+ // update variables for compilation
+ argsMap.put(argName, varParts[1]);
+ // fill a map for later replacement back after first compilation
+ localInputMap.put(varParts[1], varParts[0]);
+ }
+ }
+ // replace S3 filesystem identifier to match the available hadoop connector if needed
+ if (argsMap.values().stream().anyMatch(var -> var.startsWith("s3"))) {
+ String s3Filesystem = getAvailableHadoopS3Filesystem();
+ replaceS3Filesystem(argsMap, s3Filesystem);
+ }
+
+ // materialize the options
+
+ Enumerator.EnumerationStrategy strategy;
+ if (enumerationOpt.isEmpty()) {
+ strategy = Enumerator.EnumerationStrategy.GridBased; // default
+ } else {
+ switch (enumerationOpt) {
+ case "grid":
+ strategy = Enumerator.EnumerationStrategy.GridBased;
+ break;
+ case "interest":
+ strategy = Enumerator.EnumerationStrategy.InterestBased;
+ break;
+ case "prune":
+ strategy = Enumerator.EnumerationStrategy.PruneBased;
+ break;
+ default:
+ throw new ParseException("Unsupported identifier for enumeration strategy: " + line.getOptionValue("enum"));
+ }
+ }
+
+ Enumerator.OptimizationStrategy optimizedFor;
+ if (optimizationOpt.isEmpty()) {
+ optimizedFor = Enumerator.OptimizationStrategy.MinCosts;
+ } else {
+ switch (optimizationOpt) {
+ case "costs":
+ optimizedFor = Enumerator.OptimizationStrategy.MinCosts;
+ break;
+ case "time":
+ optimizedFor = Enumerator.OptimizationStrategy.MinTime;
+ break;
+ case "price":
+ optimizedFor = Enumerator.OptimizationStrategy.MinPrice;
+ break;
+ default:
+ throw new ParseException("Unsupported identifier for optimization strategy: " + line.getOptionValue("optimizeFor"));
+ }
+ }
+
+ if (optimizedFor == Enumerator.OptimizationStrategy.MinCosts && !costsWeightOpt.isEmpty()) {
+ double costsWeighFactor = Double.parseDouble(costsWeightOpt);
+ if (costsWeighFactor < 0.0 || costsWeighFactor > 1.0) {
+ throw new ParseException("The provided option 'price' for -enum requires additionally an option for -maxTime");
+ }
+ Enumerator.setCostsWeightFactor(costsWeighFactor);
+ } else if (!costsWeightOpt.isEmpty()) {
+ System.err.println("Warning: option MAX_PRICE is relevant only for OPTIMIZATION_FUNCTION 'time'");
+ }
+
+ if (optimizedFor == Enumerator.OptimizationStrategy.MinTime) {
+ if (maxPriceOpt.isEmpty()) {
+ throw new ParseException("Providing the option MAX_PRICE value is required " +
+ "when OPTIMIZATION_FUNCTION is set to 'time'");
+ }
+ double priceConstraint = Double.parseDouble(maxPriceOpt);
+ if (priceConstraint <= 0) {
+ throw new ParseException("Invalid value for option MIN_PRICE " +
+ "when option OPTIMIZATION_FUNCTION is set to 'time'");
+ }
+ Enumerator.setMinPrice(priceConstraint);
+ } else if (!maxPriceOpt.isEmpty()) {
+ System.err.println("Warning: option MAX_PRICE is relevant only for OPTIMIZATION_FUNCTION 'time'");
+ }
+
+ if (optimizedFor == Enumerator.OptimizationStrategy.MinPrice) {
+ if (maxTimeOpt.isEmpty()) {
+ throw new ParseException("Providing the option MAX_TIME value is required " +
+ "when OPTIMIZATION_FUNCTION is set to 'price'");
+ }
+ double timeConstraint = Double.parseDouble(maxTimeOpt);
+ if (timeConstraint <= 0) {
+ throw new ParseException("Missing or invalid value for option MIN_TIME " +
+ "when option OPTIMIZATION_FUNCTION is set to 'price'");
+ }
+ Enumerator.setMinTime(timeConstraint);
+ } else if (!maxTimeOpt.isEmpty()) {
+ System.err.println("Warning: option MAX_TIME is relevant only for OPTIMIZATION_FUNCTION 'price'");
+ }
+
+ if (!cpuQuotaOpt.isEmpty()) {
+ int quotaForNumCores = Integer.parseInt(cpuQuotaOpt);
+ if (quotaForNumCores < 32) {
+ throw new ParseException("CPU quota of under 32 number of cores is not allowed");
+ }
+ Enumerator.setCpuQuota(quotaForNumCores);
+ }
+
+ int minExecutors = minExecutorsOpt.isEmpty()? -1 : Integer.parseInt(minExecutorsOpt);
+ int maxExecutors = maxExecutorsOpt.isEmpty()? -1 : Integer.parseInt(maxExecutorsOpt);
+ String[] instanceFamilies = instanceFamiliesOpt.isEmpty()? null : instanceFamiliesOpt.split(",");
+ String[] instanceSizes = instanceSizesOpt.isEmpty()? null : instanceSizesOpt.split(",");
+ // parse arguments specific to enumeration strategies
+ int stepSize = 1;
+ int expBase = -1;
+ if (strategy == Enumerator.EnumerationStrategy.GridBased) {
+ if (!stepSizeOpt.isEmpty())
+ stepSize = Integer.parseInt(stepSizeOpt);
+ if (!expBaseOpt.isEmpty())
+ expBase = Integer.parseInt(expBaseOpt);
+ } else {
+ if (!stepSizeOpt.isEmpty())
+ System.err.println("Warning: option STEP_SIZE is relevant only for option ENUMERATION 'grid'");
+ if (line.hasOption("expBase"))
+ System.err.println("Warning: option EXPONENTIAL_BASE is relevant only for option ENUMERATION 'grid'");
+ }
+ boolean interestLargestEstimate = true;
+ boolean interestEstimatesInCP = true;
+ boolean interestBroadcastVars = true;
+ boolean interestOutputCaching = false;
+ if (strategy == Enumerator.EnumerationStrategy.InterestBased) {
+ if (!useLargestEstOpt.isEmpty())
+ interestLargestEstimate = Boolean.parseBoolean(useLargestEstOpt);
+ if (!useCpEstOpt.isEmpty())
+ interestEstimatesInCP = Boolean.parseBoolean(useCpEstOpt);
+ if (!useBroadcastOpt.isEmpty())
+ interestBroadcastVars = Boolean.parseBoolean(useBroadcastOpt);
+ if (!useOutputsOpt.isEmpty())
+ interestOutputCaching = Boolean.parseBoolean(useOutputsOpt);
+ } else {
+ if (!useLargestEstOpt.isEmpty())
+ System.err.println("Warning: option -useLargestEst is relevant only for -enum 'interest'");
+ if (!useCpEstOpt.isEmpty())
+ System.err.println("Warning: option -useCpEstimates is relevant only for -enum 'interest'");
+ if (!useBroadcastOpt.isEmpty())
+ System.err.println("Warning: option -useBroadcasts is relevant only for -enum 'interest'");
+ if (!useOutputsOpt.isEmpty())
+ System.err.println("Warning: option -useOutputs is relevant only for -enum 'interest'");
+ }
+
+ double[] regionalPrices = CloudUtils.loadRegionalPrices(regionTablePathOpt, regionOpt);
+ HashMap allInstances = CloudUtils.loadInstanceInfoTable(infoTablePathOpt, regionalPrices[0], regionalPrices[1]);
+
+ // step 2: compile the initial runtime program
+ Program sourceProgram = ResourceCompiler.compile(line.getOptionValue("f"), argsMap, localInputMap);
+ // step 3: initialize the enumerator
+ // set the mandatory setting
+ Enumerator.Builder builder = new Enumerator.Builder()
+ .withRuntimeProgram(sourceProgram)
+ .withAvailableInstances(allInstances)
+ .withEnumerationStrategy(strategy)
+ .withOptimizationStrategy(optimizedFor);
+ // set min and max number of executors
+ if (maxExecutors >= 0 && minExecutors > maxExecutors) {
+ throw new ParseException("Option for MAX_EXECUTORS should be always greater or equal the option for -minExecutors");
+ }
+ builder.withNumberExecutorsRange(minExecutors, maxExecutors);
+ // set range of instance types
+ try {
+ if (instanceFamilies != null)
+ builder.withInstanceFamilyRange(instanceFamilies);
+ } catch (IllegalArgumentException e) {
+ throw new ParseException("Not all provided options for INSTANCE_FAMILIES are supported or valid. Error thrown at:\n"+e.getMessage());
+ }
+ // set range of instance sizes
+ try {
+ if (instanceSizes != null)
+ builder.withInstanceSizeRange(instanceSizes);
+ } catch (IllegalArgumentException e) {
+ throw new ParseException("Not all provided options for INSTANCE_SIZES are supported or valid. Error thrown at:\n"+e.getMessage());
+ }
+
+ // set step size for grid-based enum.
+ if (strategy == Enumerator.EnumerationStrategy.GridBased && stepSize > 1) {
+ builder.withStepSizeExecutor(stepSize);
+ } else if (stepSize < 1) {
+ throw new ParseException("Invalid option for -stepSize");
+ }
+ // set exponential base for grid-based enum.
+ if (strategy == Enumerator.EnumerationStrategy.GridBased) {
+ builder.withExpBaseExecutors(expBase);
+ }
+ // set flags for interest-based enum.
+ if (strategy == Enumerator.EnumerationStrategy.InterestBased) {
+ builder.withInterestLargestEstimate(interestLargestEstimate)
+ .withInterestEstimatesInCP(interestEstimatesInCP)
+ .withInterestBroadcastVars(interestBroadcastVars)
+ .withInterestOutputCaching(interestOutputCaching);
+
+ }
+ // build the enumerator
+ return builder.build();
+ }
+
+ public static void execute(CommandLine line, PropertiesConfiguration options) throws ParseException, IOException {
+ String outputPath = getOrDefault(options, "OUTPUT_FOLDER", "");
+ // validate the given output path now to avoid errors after the whole optimization process
+ Path folderPath;
+ try {
+ folderPath = Paths.get(outputPath);
+ } catch (InvalidPathException e) {
+ throw new RuntimeException("Given value for option 'OUTPUT_FOLDER' is not a valid path");
+ }
+ try {
+ Files.createDirectory(folderPath);
+ } catch (FileAlreadyExistsException e) {
+ System.err.printf("Folder '%s' already exists on the given path. Files will be overwritten!\n", folderPath);
+ } catch (IOException e) {
+ throw new RuntimeException("Given value for option 'OUTPUT_FOLDER' is not a valid path: "+e);
+ }
+
+ // initialize the enumerator (including initial program compilation)
+ Enumerator enumerator = initEnumerator(line, options);
+ if (enumerator == null) {
+ // help requested
+ return;
+ }
+ System.out.println("Number instances to be used for enumeration: " + enumerator.getInstances().size());
+ System.out.println("All options are set! Enumeration is now running...");
+
+ long startTime = System.currentTimeMillis();
+ // pre-processing (generating search space according to the enumeration strategy)
+ enumerator.preprocessing();
+ // processing (finding the optimal solution) + postprocessing (retrieving the solution)
+ enumerator.processing();
+ // processing (currently only fetching the optimal solution)
+ EnumerationUtils.SolutionPoint optConfig = enumerator.postprocessing();
+ long endTime = System.currentTimeMillis();
+ System.out.println("...enumeration finished for " + ((double) (endTime-startTime))/1000 + " seconds\n");
+ System.out.println("The resulted runtime plan for the optimal configurations is the following:");
+
+ // generate configuration files according the optimal solution (if solution not empty)
+ if (optConfig.getTimeCost() < Double.MAX_VALUE) {
+ if (optConfig.numberExecutors == 0) {
+ String filePath = Paths.get(folderPath.toString(), EC2_ARGUMENTS_FILENAME).toString();
+ CloudUtils.generateEC2ConfigJson(optConfig.driverInstance, filePath);
+ } else {
+ String instanceGroupsPath = Paths.get(folderPath.toString(), EMR_INSTANCE_GROUP_FILENAME).toString();
+ String configurationsPath = Paths.get(folderPath.toString(), EMR_CONFIGURATIONS_FILENAME).toString();
+ CloudUtils.generateEMRInstanceGroupsJson(optConfig, instanceGroupsPath);
+ CloudUtils.generateEMRConfigurationsJson(optConfig, configurationsPath);
+ }
+ } else {
+ System.err.println("Error: The provided combination of target instances and constraints leads to empty solution.");
+ return;
+ }
+ // step 7: provide final info to the user
+ String prompt = String.format(
+ "\nEstimated optimal execution time: %.2fs (%.1fs static bootstrap time), price: %.2f$" +
+ "\n\nCluster configuration:\n" + optConfig +
+ "\n\nGenerated configurations stored in folder %s\n",
+ optConfig.getTimeCost(), DEFAULT_CLUSTER_LAUNCH_TIME, optConfig.getMonetaryCost(), outputPath);
+ System.out.println(prompt);
+ System.out.println("Execution suggestions:\n");
+ String executionSuggestions;
+ if (optConfig.numberExecutors == 0) {
+ executionSuggestions =String.format(
+ "Launch the EC2 instance using the script %s.\nUse -help to check the options.\n\n" +
+ "SystemDS rely on memory only for all computations but in debugging more or longer estimated execution time,\n" +
+ "please adapt the root EBS volume in case no NVMe storage is attached.\n" +
+ "Note that the storage configurations for EBS from the instance info table are relevant only for EMR cluster executions.\n" +
+ "Increasing the EBS root volume size for larger instances is also recommended.\n" +
+ "Adjusting the root volume configurations is done manually in the %s file.\n" +
+ "\nMore details can be found in the README.md file."
+ , "/script/resource/launch_ec2.sh", EC2_ARGUMENTS_FILENAME
+ );
+ } else {
+ executionSuggestions =String.format(
+ "Launch the EMR cluster using the script %s.\nUse -help to check the options.\n\n" +
+ "If you you decide to run in debug mode and/or the estimated execution time is significantly long,\n" +
+ "please adjust the default EBS root volume size to account for larger log files!\n" +
+ "Currently the Resource Optimizer does not adapt the storage configurations\n" +
+ "and the defaults from the instance info table are used.\n" +
+ "In case of constraining the available instances for enumeration with the provided optional arguments and large input datasets,\n" +
+ "please adjust the EBS configurations in the %s file manually following the instructions from the README.md file!\n\n" +
+ "Disable the automatic cluster termination if you want to access the cluster logs or the any file not exported to S3 by the DML script.\n" +
+ "\nMore details can be found in the README.md file."
+ , "/script/resource/launch_emr.sh", EMR_INSTANCE_GROUP_FILENAME
+ );
+ }
+ System.out.println(executionSuggestions);
+ }
+
+ public static void main(String[] args) throws ParseException, IOException, ConfigurationException {
+ // load directly passed options
+ Options cliOptions = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = clParser.parse(cliOptions, args);
+ if (line.hasOption("help")) {
+ (new HelpFormatter()).printHelp("SystemDS Resource Optimizer", cliOptions);
+ return;
+ }
+ String optionsFile;
+ if (line.hasOption("options")) {
+ optionsFile = line.getOptionValue("options");
+ } else {
+ Path defaultOptions = Paths.get(DEFAULT_OPTIONS_FILE);
+ if (Files.exists(defaultOptions)) {
+ optionsFile = defaultOptions.toString();
+ } else {
+ throw new ParseException("File with options was neither provided or " +
+ "found in the current execution directory: "+DEFAULT_OPTIONS_FILE);
+ }
+ }
+ // load options
+ PropertiesConfiguration options = new PropertiesConfiguration();
+ FileHandler handler = new FileHandler(options);
+ handler.load(optionsFile);
+ // execute the actual main logic
+ execute(line, options);
+ }
+
+ // Helpers ---------------------------------------------------------------------------------------------------------
+
+ private static String getKeyByValue(HashMap hashmap, String value) {
+ for (Map.Entry pair : hashmap.entrySet()) {
+ if (pair.getValue().equals(value)) {
+ return pair.getKey();
+ }
+ }
+ return null;
+ }
+
+ private static void replaceS3Filesystem(HashMap argsMap, String filesystem) {
+ for (Map.Entry pair : argsMap.entrySet()) {
+ String[] currentFileParts = pair.getValue().split(":");
+ if (currentFileParts.length != 2) continue;
+ if (!currentFileParts[0].startsWith("s3")) continue;
+ pair.setValue(String.format("%s:%s", filesystem, currentFileParts[1]));
+ }
+ }
+
+ private static String getAvailableHadoopS3Filesystem() {
+ try {
+ Class.forName("org.apache.hadoop.fs.s3a.S3AFileSystem");
+ return "s3a";
+ } catch (ClassNotFoundException ignored) {}
+ try {
+ Class.forName("org.apache.hadoop.fs.s3.S3AFileSystem");
+ return "s3";
+ } catch (ClassNotFoundException ignored) {}
+ throw new RuntimeException("No Hadoop S3 Filesystem connector installed");
+ }
+
+ public static String getOrDefault(PropertiesConfiguration config, String key, String defaultValue) {
+ return config.containsKey(key) ? config.getString(key) : defaultValue;
+ }
+}
diff --git a/src/main/java/org/apache/sysds/resource/cost/CPCostUtils.java b/src/main/java/org/apache/sysds/resource/cost/CPCostUtils.java
index 7d82422050c..422dc443725 100644
--- a/src/main/java/org/apache/sysds/resource/cost/CPCostUtils.java
+++ b/src/main/java/org/apache/sysds/resource/cost/CPCostUtils.java
@@ -26,6 +26,7 @@
import org.apache.sysds.runtime.instructions.cp.*;
import org.apache.sysds.runtime.matrix.data.MatrixBlock;
import org.apache.sysds.runtime.matrix.operators.CMOperator;
+import org.apache.sysds.utils.stats.InfrastructureAnalyzer;
import static org.apache.sysds.resource.cost.IOCostUtils.IOMetrics;
import static org.apache.sysds.runtime.instructions.cp.CPInstruction.CPType;
@@ -121,6 +122,9 @@ public static double getUnaryInstTime(UnaryCPInstruction inst, VarStats input, V
long nflop = getInstNFLOP(instructionType, opcode, output, input);
if (includeWeights)
return getCPUTime(nflop, metrics, output, input, weights);
+ if (!opcodeRequiresScan(opcode)) {
+ return getCPUTime(nflop, metrics, output);
+ }
return getCPUTime(nflop, metrics, output, input);
}
@@ -196,6 +200,13 @@ public static double getMultiReturnBuiltinInstTime(MultiReturnBuiltinCPInstructi
}
// HELPERS
+ public static boolean opcodeRequiresScan(String opcode) {
+ return !opcode.equals("ncol") &&
+ !opcode.equals("nrow") &&
+ !opcode.equals("length") &&
+ !opcode.equals("exists") &&
+ !opcode.equals("lineage");
+ }
public static void assignOutputMemoryStats(CPInstruction inst, VarStats output, VarStats...inputs) {
CPType instType = inst.getCPInstructionType();
String opcode = inst.getOpcode();
@@ -722,12 +733,12 @@ public static long getInstNFLOP(
if (opcode.contains("_tl")) costs = inputs[0].getCellsWithSparsity();
if (opcode.contains("_tr")) costs = inputs[1].getCellsWithSparsity();
// else ba+*/pmm (or any of cpmm/rmm/mapmm from the Spark instructions)
- // reduce by factor of 2: matrix multiplication better than average FLOP count: 2*m*n*p=m*n*p
+ // reduce by factor of 2: matrix multiplication better than average FLOP count: 2*m*n*p->m*n*p
return (long) (inputs[0].getN() * inputs[0].getSparsity()) * output.getCells() + (long) costs;
case Append:
if (inputs.length < 2)
throw new RuntimeException("Not all required arguments for Append operation is passed initialized");
- return inputs[0].getCellsWithSparsity() * inputs[1].getCellsWithSparsity();
+ return inputs[0].getCellsWithSparsity() + inputs[1].getCellsWithSparsity();
case Covariance:
if (inputs.length < 1)
throw new RuntimeException("Not all required arguments for Covariance operation is passed initialized");
@@ -753,8 +764,9 @@ public static long getInstNFLOP(
switch (opcode) {
case "+*":
case "-*":
- case "ifelse":
return 2 * output.getCells();
+ case "ifelse":
+ return output.getCells();
case "_map":
throw new RuntimeException("Specific Frame operation with opcode '" + opcode + "' is not supported yet");
default:
@@ -767,7 +779,6 @@ public static long getInstNFLOP(
return 6 * inputs[0].getCellsWithSparsity();
throw new DMLRuntimeException("AggregateTernary operation with opcode '" + opcode + "' is not supported by SystemDS");
case Quaternary:
- //TODO pattern specific and all inputs required
if (inputs.length < 1)
throw new RuntimeException("Not all required arguments for Quaternary operation is passed initialized");
if (opcode.equals("wsloss") || opcode.equals("wdivmm") || opcode.equals("wcemm")) {
@@ -886,7 +897,10 @@ public static long getInstNFLOP(
default:
throw new DMLRuntimeException(" MultiReturnBuiltin operation with opcode '" + opcode + "' is not supported by SystemDS");
}
- return (long) (costs * inputs[0].getCells() * inputs[0].getN());
+ // scale up the nflop value to represent that the operations are executed by a single thread only
+ // adapt later for fft/fft_linearized since they utilize all threads
+ int cpuFactor = InfrastructureAnalyzer.getLocalParallelism();
+ return (long) (cpuFactor * costs * inputs[0].getCells() * inputs[0].getN());
case Prefetch:
case EvictLineageCache:
case Broadcast:
diff --git a/src/main/java/org/apache/sysds/resource/cost/CostEstimationWrapper.java b/src/main/java/org/apache/sysds/resource/cost/CostEstimationWrapper.java
deleted file mode 100644
index 259fb19f228..00000000000
--- a/src/main/java/org/apache/sysds/resource/cost/CostEstimationWrapper.java
+++ /dev/null
@@ -1,28 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-package org.apache.sysds.resource.cost;
-
-
-public class CostEstimationWrapper
-{
-
-
-
-}
diff --git a/src/main/java/org/apache/sysds/resource/cost/CostEstimator.java b/src/main/java/org/apache/sysds/resource/cost/CostEstimator.java
index 0f7fc5d71f3..bf6afbe9d23 100644
--- a/src/main/java/org/apache/sysds/resource/cost/CostEstimator.java
+++ b/src/main/java/org/apache/sysds/resource/cost/CostEstimator.java
@@ -39,6 +39,7 @@
import static org.apache.sysds.lops.Data.PREAD_PREFIX;
import static org.apache.sysds.lops.DataGen.*;
+import static org.apache.sysds.resource.cost.CPCostUtils.opcodeRequiresScan;
import static org.apache.sysds.resource.cost.IOCostUtils.*;
import static org.apache.sysds.resource.cost.SparkCostUtils.getRandInstTime;
@@ -52,16 +53,18 @@
public class CostEstimator
{
private static final long MIN_MEMORY_TO_TRACK = 1024 * 1024; // 1MB
- private static final int DEFAULT_NUM_ITER = 15;
+ private static final int DEFAULT_NUM_ITER = 10;
+ // leaves room for GC overhead
+ private static final double MEM_ALLOCATION_LIMIT_FRACTION = 0.9;
// Non-static members
private final Program _program;
- private final IOCostUtils.IOMetrics driverMetrics;
- private final IOCostUtils.IOMetrics executorMetrics;
// declare here the hashmaps
private final HashMap _stats;
private final HashSet _functions;
private final long localMemoryLimit; // refers to the drivers JVM memory
private long freeLocalMemory;
+ private final IOCostUtils.IOMetrics driverMetrics;
+ private final IOCostUtils.IOMetrics executorMetrics;
/**
* Entry point for estimating the execution time of a program.
@@ -78,13 +81,33 @@ public static double estimateExecutionTime(Program program, CloudInstance driver
}
public CostEstimator(Program program, CloudInstance driverNode, CloudInstance executorNode) {
_program = program;
- driverMetrics = new IOCostUtils.IOMetrics(driverNode);
- executorMetrics = executorNode != null? new IOCostUtils.IOMetrics(executorNode) : null;
// initialize here the hashmaps
_stats = new HashMap<>();
_functions = new HashSet<>();
- localMemoryLimit = (long) OptimizerUtils.getLocalMemBudget();
+ localMemoryLimit = (long) (OptimizerUtils.getLocalMemBudget() * MEM_ALLOCATION_LIMIT_FRACTION);
freeLocalMemory = localMemoryLimit;
+ driverMetrics = new IOMetrics(driverNode);
+ if (executorNode == null) {
+ // estimation for single node execution -> no executor resources
+ executorMetrics = null;
+ } else {
+ // estimation for hybrid execution
+ // adapt the CPU related metrics needed
+ int dedicatedExecutorCores =
+ SparkExecutionContext.getDefaultParallelism(false) / SparkExecutionContext.getNumExecutors();
+ long effectiveExecutorFlops = (long) (executorNode.getFLOPS() *
+ ((double) executorNode.getVCPUs() / dedicatedExecutorCores));
+ // adapting the rest of the metrics not needed since the OS and resource management tasks
+ // would not consume large portion of the memory/storage/network bandwidth in the general case
+ executorMetrics = new IOMetrics(
+ effectiveExecutorFlops,
+ dedicatedExecutorCores,
+ executorNode.getMemoryBandwidth(),
+ executorNode.getDiskReadBandwidth(),
+ executorNode.getDiskWriteBandwidth(),
+ executorNode.getNetworkBandwidth()
+ );
+ }
}
/**
@@ -274,6 +297,10 @@ public void maintainStats(Instruction inst) {
switch (opcode) {
case "createvar":
DataCharacteristics dataCharacteristics = vinst.getMetaData().getDataCharacteristics();
+ if (!dataCharacteristics.nnzKnown()) {
+ // assign NNZ if -1 to avoid any negative results at calculating estimated object size
+ dataCharacteristics.setNonZeros(dataCharacteristics.getLength());
+ }
VarStats varStats = new VarStats(varName, dataCharacteristics);
if (vinst.getInput1().getName().startsWith(PREAD_PREFIX)) {
// NOTE: add I/O here although at execution the reading is done when the input is needed
@@ -372,7 +399,7 @@ public double getTimeEstimateInst(Instruction inst) throws CostEstimationExcepti
*
* @param inst instruction for estimation
* @return estimated time in seconds
- * @throws CostEstimationException ?
+ * @throws CostEstimationException when the hardware configuration is not sufficient
*/
public double getTimeEstimateCPInst(CPInstruction inst) throws CostEstimationException {
double time = 0;
@@ -423,8 +450,9 @@ else if (inst instanceof UnaryCPInstruction) {
} else {
CPCostUtils.assignOutputMemoryStats(inst, output, input);
}
-
- time += loadCPVarStatsAndEstimateTime(input);
+ if (opcodeRequiresScan(inst.getOpcode())) {
+ time += loadCPVarStatsAndEstimateTime(input);
+ } // else -> // not read required
time += weights == null ? 0 : loadCPVarStatsAndEstimateTime(weights);
time += CPCostUtils.getUnaryInstTime(uinst, input, weights, output, driverMetrics);
}
@@ -551,6 +579,12 @@ else if (inst instanceof BuiltinNaryCPInstruction) {
if (output != null)
putInMemory(output);
+ // detection for functionality bugs
+ if (time < 0) {
+ throw new RuntimeException("Unexpected negative value at estimating CP instruction execution time");
+ } else if (time == Double.POSITIVE_INFINITY) {
+ throw new RuntimeException("Unexpected infinity value at estimating CP instruction execution time");
+ }
return time;
}
@@ -599,6 +633,8 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
output = getStats(cinst.getOutputVariableName());
SparkCostUtils.assignOutputRDDStats(inst, output, input);
+
+ output.fileInfo = input.fileInfo;
output.rddStats.checkpoint = true;
// assume the rdd object is only marked as checkpoint;
// adding spilling or serializing cost is skipped
@@ -649,7 +685,7 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
if (ixdinst.getOpcode().equals(LeftIndex.OPCODE)) {
loadTime += loadRDDStatsAndEstimateTime(input2);
} else { // mapLeftIndex
- loadTime += loadCPVarStatsAndEstimateTime(input2);
+ loadTime += loadBroadcastVarStatsAndEstimateTime(input2);
}
} else {
input1 = getStats(ixdinst.input1.getName());
@@ -717,7 +753,7 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
// handle input rdd loading
double loadTime = loadRDDStatsAndEstimateTime(input1);
if (inst instanceof BinaryMatrixBVectorSPInstruction) {
- loadTime += loadCPVarStatsAndEstimateTime(input2);
+ loadTime += loadBroadcastVarStatsAndEstimateTime(input2);
} else {
loadTime += loadRDDStatsAndEstimateTime(input2);
}
@@ -733,7 +769,7 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
double loadTime = loadRDDStatsAndEstimateTime(input1);
VarStats input2 = getStats(ainst.input2.getName());
if (ainst instanceof AppendMSPInstruction) {
- loadTime += loadCPVarStatsAndEstimateTime(input2);
+ loadTime += loadBroadcastVarStatsAndEstimateTime(input2);
} else {
loadTime += loadRDDStatsAndEstimateTime(input2);
}
@@ -757,7 +793,7 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
input2 = getStats(binst.input1.getName());
}
loadTime += loadRDDStatsAndEstimateTime(input1);
- loadTime += loadCPVarStatsAndEstimateTime(input2);
+ loadTime += loadBroadcastVarStatsAndEstimateTime(input2);
} else {
input1 = getStats(binst.input1.getName());
input2 = getStats(binst.input2.getName());
@@ -774,10 +810,10 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
VarStats input1 = getStats(mmchaininst.input1.getName());
VarStats input2 = getStats(mmchaininst.input1.getName());
VarStats input3 = null;
- double loadTime = loadRDDStatsAndEstimateTime(input1) + loadCPVarStatsAndEstimateTime(input2);
+ double loadTime = loadRDDStatsAndEstimateTime(input1) + loadBroadcastVarStatsAndEstimateTime(input2);
if (mmchaininst.input3 != null) {
input3 = getStats(mmchaininst.input3.getName());
- loadTime += loadCPVarStatsAndEstimateTime(input3);
+ loadTime += loadBroadcastVarStatsAndEstimateTime(input3);
}
output = getStats(mmchaininst.output.getName());
SparkCostUtils.assignOutputRDDStats(inst, output, input1, input2, input3);
@@ -810,7 +846,7 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
case "rmempty":
input2 = getParameterizedBuiltinParamStats("offset", paramInst.getParameterMap(), false);
if (Boolean.parseBoolean(paramInst.getParameterMap().get("bRmEmptyBC"))) {
- loadTime += input2 != null? loadCPVarStatsAndEstimateTime(input2) : 0; // broadcast
+ loadTime += input2 != null? loadBroadcastVarStatsAndEstimateTime(input2) : 0;
} else {
loadTime += input2 != null? loadRDDStatsAndEstimateTime(input2) : 0;
}
@@ -827,8 +863,36 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
output = getStatsWithDefaultScalar(paramInst.getOutputVariableName());
SparkCostUtils.assignOutputRDDStats(inst, output, input1);
- output.rddStats.cost = loadTime + SparkCostUtils.getParameterizedBuiltinInstTime(paramInst, input1, input2, output,
- driverMetrics, executorMetrics);
+ output.rddStats.cost = loadTime + SparkCostUtils.getParameterizedBuiltinInstTime(paramInst,
+ input1, input2, output, driverMetrics, executorMetrics);
+ } else if (inst instanceof TernarySPInstruction) {
+ TernarySPInstruction tInst = (TernarySPInstruction) inst;
+ VarStats input1 = getStatsWithDefaultScalar(tInst.input1.getName());
+ VarStats input2 = getStatsWithDefaultScalar(tInst.input2.getName());
+ VarStats input3 = getStatsWithDefaultScalar(tInst.input3.getName());
+ double loadTime = loadRDDStatsAndEstimateTime(input1) +
+ loadRDDStatsAndEstimateTime(input2) + loadRDDStatsAndEstimateTime(input3);
+
+ output = getStats(tInst.getOutputVariableName());
+ SparkCostUtils.assignOutputRDDStats(inst, output, input1, input2, input3);
+
+ output.rddStats.cost = loadTime + SparkCostUtils.getTernaryInstTime(tInst,
+ input1, input2, input3, output, executorMetrics);
+ } else if (inst instanceof QuaternarySPInstruction) {
+ // NOTE: not all quaternary instructions supported yet; only
+ // mapwdivmm, mapsigmoid, mapwumm, mapwsloss, mapwcemm
+ QuaternarySPInstruction quatInst = (QuaternarySPInstruction) inst;
+ VarStats input1 = getStats(quatInst.input1.getName());
+ VarStats input2 = getStats(quatInst.input2.getName());
+ VarStats input3 = getStats(quatInst.input3.getName());
+ double loadTime = loadRDDStatsAndEstimateTime(input1) +
+ loadBroadcastVarStatsAndEstimateTime(input2) + loadBroadcastVarStatsAndEstimateTime(input3);
+
+ output = getStatsWithDefaultScalar(quatInst.getOutputVariableName()); // matrix or aggregated scalar
+ SparkCostUtils.assignOutputRDDStats(inst, output, input1, input2, input3);
+
+ output.rddStats.cost = loadTime + SparkCostUtils.getQuaternaryInstTime(quatInst,
+ input1, input2, input3, output, driverMetrics, executorMetrics);
} else if (inst instanceof WriteSPInstruction) {
WriteSPInstruction wInst = (WriteSPInstruction) inst;
VarStats input = getStats(wInst.input1.getName());
@@ -847,12 +911,8 @@ public double parseSPInst(SPInstruction inst) throws CostEstimationException {
//
// } else if (inst instanceof QuantilePickSPInstruction) {
//
-// } else if (inst instanceof TernarySPInstruction) {
-//
// } else if (inst instanceof AggregateTernarySPInstruction) {
//
-// } else if (inst instanceof QuaternarySPInstruction) {
-//
// }
else {
throw new RuntimeException("Unsupported instruction: " + inst.getOpcode());
@@ -891,11 +951,13 @@ public double getTimeEstimateSparkJob(VarStats varToCollect) {
varToCollect.rddStats = null;
}
+ // detection for functionality bugs
if (computeTime < 0 || collectTime < 0) {
- // detection for functionality bugs
throw new RuntimeException("Unexpected negative value at estimating Spark Job execution time");
+ } else if (computeTime == Double.POSITIVE_INFINITY || collectTime == Double.POSITIVE_INFINITY) {
+ throw new RuntimeException("Unexpected infinity value at estimating Spark Job execution time");
}
- return computeTime + computeTime;
+ return collectTime + computeTime;
}
//////////////////////////////////////////////////////////////////////////////////////////////
@@ -915,15 +977,15 @@ private double loadCPVarStatsAndEstimateTime(VarStats input) throws CostEstimati
double loadTime;
// input.fileInfo != null output of reblock inst. -> execution not triggered
// input.rddStats.checkpoint for output of checkpoint inst. -> execution not triggered
- if (input.rddStats != null && (input.fileInfo == null || !input.rddStats.checkpoint)) {
+ if (input.rddStats != null && ((input.fileInfo == null && !input.rddStats.checkpoint) || (input.fileInfo != null && input.rddStats.checkpoint))) {
// loading from RDD
loadTime = getTimeEstimateSparkJob(input);
} else {
// loading from a file
if (input.fileInfo == null || input.fileInfo.length != 2) {
- throw new DMLRuntimeException("Time estimation is not possible without file info.");
- } else if (!input.fileInfo[0].equals(HDFS_SOURCE_IDENTIFIER) && !input.fileInfo[0].equals(S3_SOURCE_IDENTIFIER)) {
- throw new DMLRuntimeException("Time estimation is not possible for data source: " + input.fileInfo[0]);
+ throw new RuntimeException("Time estimation is not possible without file info.");
+ } else if (isInvalidDataSource((String) input.fileInfo[0])) {
+ throw new RuntimeException("Time estimation is not possible for data source: " + input.fileInfo[0]);
}
loadTime = IOCostUtils.getFileSystemReadTime(input, driverMetrics);
}
@@ -932,10 +994,31 @@ private double loadCPVarStatsAndEstimateTime(VarStats input) throws CostEstimati
return loadTime;
}
+ /**
+ * This method emulates the SystemDS mechanism of loading objects into
+ * the CP memory from a file or an existing RDD object and
+ * their preparation for broadcasting - create the broadcast object in-memory.
+ *
+ * @param input variable for broadcasting
+ * @return estimated time in seconds for loading into memory
+ */
+ private double loadBroadcastVarStatsAndEstimateTime(VarStats input) throws CostEstimationException {
+ // step 1: load CP variable as usual
+ double time = loadCPVarStatsAndEstimateTime(input);
+ // step 2: ensure the current memory is sufficient for creating the broadcast var.
+ // use the in-memory size for simplification
+ if (freeLocalMemory - input.allocatedMemory < 0) {
+ throw new CostEstimationException("Insufficient local memory for broadcasting");
+ }
+ // currently time for creating the broadcast var. is not considered
+ return time;
+ }
+
private void putInMemory(VarStats output) throws CostEstimationException {
if (output.isScalar() || output.allocatedMemory <= MIN_MEMORY_TO_TRACK) return;
- if (freeLocalMemory - output.allocatedMemory < 0)
+ if (freeLocalMemory - output.allocatedMemory < 0) {
throw new CostEstimationException("Insufficient local memory");
+ }
freeLocalMemory -= output.allocatedMemory;
}
@@ -988,10 +1071,9 @@ private double loadRDDStatsAndEstimateTime(VarStats input) {
// if input RDD size is initiated -> cost should be calculated
// transfer the cost to the output rdd for lineage proper handling
ret = input.rddStats.cost;
- if (input.rddStats.checkpoint) {
- // cost of checkpoint var transferred only once
- input.rddStats.cost = 0;
- }
+ // NOTE: currently all variables are considered as reusable (via lineage)
+ // or cached/persisted in memory
+ input.rddStats.cost = 0;
} else {
throw new RuntimeException("Initialized RDD stats without initialized data characteristics is undefined behaviour");
}
diff --git a/src/main/java/org/apache/sysds/resource/cost/IOCostUtils.java b/src/main/java/org/apache/sysds/resource/cost/IOCostUtils.java
index a04030c1f68..84abe189e78 100644
--- a/src/main/java/org/apache/sysds/resource/cost/IOCostUtils.java
+++ b/src/main/java/org/apache/sysds/resource/cost/IOCostUtils.java
@@ -29,16 +29,17 @@
public class IOCostUtils {
+ // empirical factor to scale down theoretical peak CPU performance
+ private static final double COMPUTE_EFFICIENCY = 0.5;
private static final double READ_DENSE_FACTOR = 0.5;
private static final double WRITE_DENSE_FACTOR = 0.3;
private static final double SPARSE_FACTOR = 0.5;
- private static final double TEXT_FACTOR = 0.3;
- // NOTE: skip using such factors for now
- // private static final double WRITE_MEMORY_FACTOR = 0.9;
- // private static final double WRITE_DISK_FACTOR = 0.5;
+ private static final double TEXT_FACTOR = 0.5;
+ // empirical value for data transfer between S3 and EC2 instances
+ private static final double S3_COMPUTE_BOUND = 1.2 * 1E+9; // GFLOP/MB
private static final double SERIALIZATION_FACTOR = 0.5;
private static final double DESERIALIZATION_FACTOR = 0.8;
- public static final long DEFAULT_FLOPS = 2L * 1024 * 1024 * 1024; // 2 gFLOPS
+ public static final long DEFAULT_FLOPS = 2L * 1024 * 1024 * 1024; // 2 GFLOPS
public static class IOMetrics {
// FLOPS value not directly related to I/O metrics,
@@ -57,11 +58,7 @@ public static class IOMetrics {
double hdfsReadTextSparseBandwidth;
double hdfsWriteTextDenseBandwidth;
double hdfsWriteTextSparseBandwidth;
- // no s3 read/write metrics since it will not be used for any intermediate operations
- double s3ReadTextDenseBandwidth;
- double s3ReadTextSparseBandwidth;
- double s3WriteTextDenseBandwidth;
- double s3WriteTextSparseBandwidth;
+ double s3BandwidthEfficiency;
// Metrics for main memory I/O operations
double memReadBandwidth;
double memWriteBandwidth;
@@ -72,33 +69,38 @@ public static class IOMetrics {
double deserializationBandwidth;
public IOMetrics(CloudInstance instance) {
- this(instance.getFLOPS(), instance.getVCPUs(), instance.getMemorySpeed(), instance.getDiskSpeed(), instance.getNetworkSpeed());
+ this(
+ instance.getFLOPS(),
+ instance.getVCPUs(),
+ instance.getMemoryBandwidth(),
+ instance.getDiskReadBandwidth(),
+ instance.getDiskWriteBandwidth(),
+ instance.getNetworkBandwidth()
+ );
}
- public IOMetrics(long flops, int cores, double memorySpeed, double diskSpeed, double networkSpeed) {
- cpuFLOPS = flops;
+ public IOMetrics(long flops, int cores, double memoryBandwidth, double diskReadBandwidth, double diskWriteBandwidth, double networkBandwidth) {
+ // CPU metrics
+ cpuFLOPS = (long) (flops * COMPUTE_EFFICIENCY);
cpuCores = cores;
+ // Metrics for main memory I/O operations
+ memReadBandwidth = memoryBandwidth;
+ memWriteBandwidth = memoryBandwidth;
+ // Metrics for networking operations
+ networkingBandwidth = networkBandwidth;
// Metrics for disk I/O operations
- localDiskReadBandwidth = diskSpeed;
- localDiskWriteBandwidth = diskSpeed;
+ localDiskReadBandwidth = diskReadBandwidth;
+ localDiskWriteBandwidth = diskReadBandwidth;
// Assume that the HDFS I/O operations is done always by accessing local blocks
- hdfsReadBinaryDenseBandwidth = diskSpeed * READ_DENSE_FACTOR;
+ hdfsReadBinaryDenseBandwidth = diskReadBandwidth * READ_DENSE_FACTOR;
hdfsReadBinarySparseBandwidth = hdfsReadBinaryDenseBandwidth * SPARSE_FACTOR;
- hdfsWriteBinaryDenseBandwidth = diskSpeed * WRITE_DENSE_FACTOR;
+ hdfsWriteBinaryDenseBandwidth = diskWriteBandwidth * WRITE_DENSE_FACTOR;
hdfsWriteBinarySparseBandwidth = hdfsWriteBinaryDenseBandwidth * SPARSE_FACTOR;
hdfsReadTextDenseBandwidth = hdfsReadBinaryDenseBandwidth * TEXT_FACTOR;
hdfsReadTextSparseBandwidth = hdfsReadBinarySparseBandwidth * TEXT_FACTOR;
hdfsWriteTextDenseBandwidth = hdfsWriteBinaryDenseBandwidth * TEXT_FACTOR;
hdfsWriteTextSparseBandwidth = hdfsWriteBinarySparseBandwidth * TEXT_FACTOR;
- s3ReadTextDenseBandwidth = networkingBandwidth * READ_DENSE_FACTOR * TEXT_FACTOR;
- s3ReadTextSparseBandwidth = s3ReadTextDenseBandwidth * SPARSE_FACTOR;
- s3WriteTextDenseBandwidth = networkingBandwidth * WRITE_DENSE_FACTOR * TEXT_FACTOR;
- s3WriteTextSparseBandwidth = s3WriteTextDenseBandwidth * SPARSE_FACTOR;
- // Metrics for main memory I/O operations
- memReadBandwidth = memorySpeed;
- memWriteBandwidth = memorySpeed;
- // Metrics for networking operations
- networkingBandwidth = networkSpeed;
- // Metrics for (de)serialization,
+ s3BandwidthEfficiency = (S3_COMPUTE_BOUND / cpuFLOPS); // [s/MB]
+ // Metrics for (de)serialization
double currentFlopsFactor = (double) DEFAULT_FLOPS / cpuFLOPS;
serializationBandwidth = memReadBandwidth * SERIALIZATION_FACTOR * currentFlopsFactor;
deserializationBandwidth = memWriteBandwidth * DESERIALIZATION_FACTOR * currentFlopsFactor;
@@ -109,18 +111,14 @@ public IOMetrics(long flops, int cores, double memorySpeed, double diskSpeed, do
//IO Read
public static final double DEFAULT_MBS_MEMORY_BANDWIDTH = 21328.0; // e.g. DDR4-2666
public static final double DEFAULT_MBS_DISK_BANDWIDTH = 600; // e.g. m5.4xlarge, baseline bandwidth: 4750Mbps = 593.75 MB/s
- public static final double DEFAULT_MBS_NETWORK_BANDWIDTH = 640; // e.g. m5.4xlarge, baseline speed bandwidth: 5Gbps = 640MB/s
+ public static final double DEFAULT_MBS_NETWORK_BANDWIDTH = 640; // e.g. m5.4xlarge, baseline bandwidth: 5Gbps = 640MB/s
public static final double DEFAULT_MBS_HDFS_READ_BINARY_DENSE = 150;
public static final double DEFAULT_MBS_HDFS_READ_BINARY_SPARSE = 75;
- public static final double DEFAULT_MBS_S3_READ_TEXT_DENSE = 50;
- public static final double DEFAULT_MBS_S3_READ_TEXT_SPARSE = 25;
public static final double DEFAULT_MBS_HDFS_READ_TEXT_DENSE = 75;
public static final double DEFAULT_MBS_HDFS_READ_TEXT_SPARSE = 50;
// IO Write
public static final double DEFAULT_MBS_HDFS_WRITE_BINARY_DENSE = 120;
public static final double DEFAULT_MBS_HDFS_WRITE_BINARY_SPARSE = 60;
- public static final double DEFAULT_MBS_S3_WRITE_TEXT_DENSE = 30;
- public static final double DEFAULT_MBS_S3_WRITE_TEXT_SPARSE = 20;
public static final double DEFAULT_MBS_HDFS_WRITE_TEXT_DENSE = 40;
public static final double DEFAULT_MBS_HDFS_WRITE_TEXT_SPARSE = 30;
@@ -143,10 +141,7 @@ public IOMetrics() {
hdfsReadTextSparseBandwidth = DEFAULT_MBS_HDFS_READ_TEXT_SPARSE;
hdfsWriteTextDenseBandwidth = DEFAULT_MBS_HDFS_WRITE_TEXT_DENSE;
hdfsWriteTextSparseBandwidth = DEFAULT_MBS_HDFS_WRITE_TEXT_SPARSE;
- s3ReadTextDenseBandwidth = DEFAULT_MBS_S3_READ_TEXT_DENSE;
- s3ReadTextSparseBandwidth = DEFAULT_MBS_S3_READ_TEXT_SPARSE;
- s3WriteTextDenseBandwidth = DEFAULT_MBS_S3_WRITE_TEXT_DENSE;
- s3WriteTextSparseBandwidth = DEFAULT_MBS_S3_WRITE_TEXT_SPARSE;
+ s3BandwidthEfficiency = (S3_COMPUTE_BOUND / cpuFLOPS);
// Metrics for main memory I/O operations
memReadBandwidth = DEFAULT_MBS_MEMORY_BANDWIDTH;
memWriteBandwidth = DEFAULT_MBS_MEMORY_BANDWIDTH;
@@ -273,25 +268,33 @@ public static double getHadoopReadTime(VarStats stats, IOMetrics metrics) {
// since getDiskReadTime() computes the write time utilizing the whole executor resources
// use the fact that / = * /
long hdfsBlockSize = InfrastructureAnalyzer.getHDFSBlockSize();
+
double numPartitions = Math.ceil((double) size / hdfsBlockSize);
- double sizePerExecutorMB = (double) (metrics.cpuCores * hdfsBlockSize) / (1024*1024);
+ double sizePerExecutorMB;
+ if (size < hdfsBlockSize) {
+ // emulate full executor utilization
+ sizePerExecutorMB = (double) (metrics.cpuCores * size) / (1024 * 1024);
+ } else {
+ sizePerExecutorMB =(double) (metrics.cpuCores * hdfsBlockSize) / (1024 * 1024);
+ }
boolean isSparse = MatrixBlock.evalSparseFormatOnDisk(stats.getM(), stats.getN(), stats.getNNZ());
double timePerCore = getStorageReadTime(sizePerExecutorMB, isSparse, sourceType, format, metrics); // same as time per executor
// number of execution waves (maximum task to execute per core)
- double numWaves = Math.ceil(numPartitions / (SparkExecutionContext.getNumExecutors() * metrics.cpuCores));
+ double numWaves = Math.ceil(numPartitions / (SparkExecutionContext.getDefaultParallelism(false)));
return numWaves * timePerCore;
}
private static double getStorageReadTime(double sizeMB, boolean isSparse, String source, Types.FileFormat format, IOMetrics metrics)
{
double time;
- // TODO: consider if the text or binary should be default if format == null
if (format == null || format.isTextFormat()) {
if (source.equals(S3_SOURCE_IDENTIFIER)) {
- if (isSparse)
- time = sizeMB / metrics.s3ReadTextSparseBandwidth;
- else // dense
- time = sizeMB / metrics.s3ReadTextDenseBandwidth;
+ if (isSparse) {
+ time = SPARSE_FACTOR * metrics.s3BandwidthEfficiency * sizeMB;
+ }
+ else {// dense
+ time = metrics.s3BandwidthEfficiency * sizeMB;
+ }
} else { // HDFS
if (isSparse)
time = sizeMB / metrics.hdfsReadTextSparseBandwidth;
@@ -359,16 +362,16 @@ public static double getHadoopWriteTime(VarStats stats, IOMetrics metrics) {
}
protected static double getStorageWriteTime(double sizeMB, boolean isSparse, String source, Types.FileFormat format, IOMetrics metrics) {
- if (format == null || !(source.equals(HDFS_SOURCE_IDENTIFIER) || source.equals(S3_SOURCE_IDENTIFIER))) {
+ if (format == null || isInvalidDataSource(source)) {
throw new RuntimeException("Estimation not possible without source identifier and file format");
}
double time;
if (format.isTextFormat()) {
if (source.equals(S3_SOURCE_IDENTIFIER)) {
if (isSparse)
- time = sizeMB / metrics.s3WriteTextSparseBandwidth;
+ time = SPARSE_FACTOR * metrics.s3BandwidthEfficiency * sizeMB;
else // dense
- time = sizeMB / metrics.s3WriteTextDenseBandwidth;
+ time = metrics.s3BandwidthEfficiency * sizeMB;
} else { // HDFS
if (isSparse)
time = sizeMB / metrics.hdfsWriteTextSparseBandwidth;
@@ -391,7 +394,7 @@ protected static double getStorageWriteTime(double sizeMB, boolean isSparse, Str
}
/**
- * Estimates the time ro parallelize a local object to Spark.
+ * Estimates the time to parallelize a local object to Spark.
*
* @param output RDD statistics for the object to be collected/transferred.
* @param driverMetrics I/O metrics for the receiver - driver node
@@ -399,7 +402,6 @@ protected static double getStorageWriteTime(double sizeMB, boolean isSparse, Str
* @return estimated time in seconds
*/
public static double getSparkParallelizeTime(RDDStats output, IOMetrics driverMetrics, IOMetrics executorMetrics) {
- // TODO: ensure the object related to stats is read in memory already ot add logic to account for its read time
// it is assumed that the RDD object is already created/read
// general idea: time = + ;
// NOTE: currently it is assumed that ht serialized data has the same size as the original data what may not be true in the general case
@@ -525,7 +527,6 @@ protected static double getSparkBroadcastTime(VarStats stats, IOMetrics driverMe
// TODO: ensure the object related to stats is read in memory already ot add logic to account for its read time
// it is assumed that the Cp broadcast object is already created/read
// general idea: time = + ;
- // NOTE: currently it is assumed that ht serialized data has the same size as the original data what may not be true in the general case
double sizeMB = (double) OptimizerUtils.estimatePartitionedSizeExactSparsity(stats.characteristics) / (1024 * 1024);
// 1. serialization time
double serializationTime = sizeMB / driverMetrics.serializationBandwidth;
@@ -545,7 +546,10 @@ protected static double getSparkBroadcastTime(VarStats stats, IOMetrics driverMe
public static String getDataSource(String fileName) {
String[] fileParts = fileName.split("://");
if (fileParts.length > 1) {
- return fileParts[0].toLowerCase();
+ String filesystem = fileParts[0].toLowerCase();
+ if (filesystem.matches("\\b(s3|s3a)\\b"))
+ return S3_SOURCE_IDENTIFIER;
+ return filesystem;
}
return HDFS_SOURCE_IDENTIFIER;
}
@@ -571,12 +575,16 @@ private static long getPartitionedFileSize(VarStats fileStats) {
Types.FileFormat format = (Types.FileFormat) fileStats.fileInfo[1];
long size;
if (format == Types.FileFormat.BINARY) {
- size = MatrixBlock.estimateSizeOnDisk(fileStats.getM(), fileStats.getM(), fileStats.getNNZ());
+ size = MatrixBlock.estimateSizeOnDisk(fileStats.getM(), fileStats.getN(), fileStats.getNNZ());
} else if (format.isTextFormat()) {
- size = OptimizerUtils.estimateSizeTextOutput(fileStats.getM(), fileStats.getM(), fileStats.getNNZ(), format);
+ size = OptimizerUtils.estimateSizeTextOutput(fileStats.getM(), fileStats.getN(), fileStats.getNNZ(), format);
} else { // compressed
throw new RuntimeException("Format " + format + " is not supported for estimation yet.");
}
return size;
}
+
+ public static boolean isInvalidDataSource(String identifier) {
+ return !identifier.equals(HDFS_SOURCE_IDENTIFIER) && !identifier.equals(S3_SOURCE_IDENTIFIER);
+ }
}
diff --git a/src/main/java/org/apache/sysds/resource/cost/RDDStats.java b/src/main/java/org/apache/sysds/resource/cost/RDDStats.java
index 01ca8f1fc95..0bd72d3cf3e 100644
--- a/src/main/java/org/apache/sysds/resource/cost/RDDStats.java
+++ b/src/main/java/org/apache/sysds/resource/cost/RDDStats.java
@@ -28,6 +28,7 @@ public class RDDStats {
long distributedSize;
int numPartitions;
boolean hashPartitioned;
+ // NOTE: checkpointing not fully considered by the cost estimator currently
boolean checkpoint;
double cost;
boolean isCollected;
diff --git a/src/main/java/org/apache/sysds/resource/cost/SparkCostUtils.java b/src/main/java/org/apache/sysds/resource/cost/SparkCostUtils.java
index 9d9851907a1..66445d20d7a 100644
--- a/src/main/java/org/apache/sysds/resource/cost/SparkCostUtils.java
+++ b/src/main/java/org/apache/sysds/resource/cost/SparkCostUtils.java
@@ -44,7 +44,7 @@ public static double getReblockInstTime(String opcode, VarStats input, VarStats
double readTime = getHadoopReadTime(input, executorMetrics);
long sizeTextFile = OptimizerUtils.estimateSizeTextOutput(input.getM(), input.getN(), input.getNNZ(), (Types.FileFormat) input.fileInfo[1]);
RDDStats textRdd = new RDDStats(sizeTextFile, -1);
- double shuffleTime = getSparkShuffleTime(textRdd, executorMetrics, true);
+ double shuffleTime = getSparkShuffleTime(textRdd, executorMetrics, false);
double timeStage1 = readTime + shuffleTime;
// new stage: transform partitioned shuffled text object into partitioned binary object
long nflop = getInstNFLOP(SPType.Reblock, opcode, output);
@@ -256,7 +256,7 @@ public static double getReorgInstTime(UnarySPInstruction inst, VarStats input, V
output.rddStats.numPartitions = input.rddStats.numPartitions;
output.rddStats.hashPartitioned = input.rddStats.hashPartitioned;
break;
- default: // rsort
+ default: // rsort
String ixretAsString = InstructionUtils.getInstructionParts(inst.getInstructionString())[4];
boolean ixret = ixretAsString.equalsIgnoreCase("true");
int shuffleFactor;
@@ -498,6 +498,69 @@ public static double getParameterizedBuiltinInstTime(ParameterizedBuiltinSPInstr
return dataTransmissionTime + mapTime;
}
+ public static double getTernaryInstTime(TernarySPInstruction tInst, VarStats input1, VarStats input2, VarStats input3, VarStats output, IOMetrics executorMetrics) {
+ RDDStats[] inputRddStats = {}; // to be used later at CPU time estimation (mem. scanning)
+ double dataTransmissionTime = 0;
+ if (!input1.isScalar() && !input2.isScalar()) {
+ inputRddStats = new RDDStats[]{input1.rddStats, input2.rddStats};
+ // input1.join(input2)
+ dataTransmissionTime += getSparkShuffleTime(input1.rddStats, executorMetrics,
+ input1.rddStats.hashPartitioned);
+ dataTransmissionTime += getSparkShuffleTime(input2.rddStats, executorMetrics,
+ input2.rddStats.hashPartitioned);
+ } else if (!input1.isScalar() && !input3.isScalar()) {
+ inputRddStats = new RDDStats[]{input1.rddStats, input3.rddStats};
+ // input1.join(input3)
+ dataTransmissionTime += getSparkShuffleTime(input1.rddStats, executorMetrics,
+ input1.rddStats.hashPartitioned);
+ dataTransmissionTime += getSparkShuffleTime(input3.rddStats, executorMetrics,
+ input3.rddStats.hashPartitioned);
+ } else if (!input2.isScalar() || !input3.isScalar()) {
+ inputRddStats = new RDDStats[]{input2.rddStats, input3.rddStats};
+ // input2.join(input3)
+ dataTransmissionTime += getSparkShuffleTime(input2.rddStats, executorMetrics,
+ input2.rddStats.hashPartitioned);
+ dataTransmissionTime += getSparkShuffleTime(input3.rddStats, executorMetrics,
+ input3.rddStats.hashPartitioned);
+ } else if (!input1.isScalar() && !input2.isScalar() && !input3.isScalar()) {
+ inputRddStats = new RDDStats[]{input1.rddStats, input2.rddStats, input3.rddStats};
+ // input1.join(input2).join(input3)
+ dataTransmissionTime += getSparkShuffleTime(input1.rddStats, executorMetrics,
+ input1.rddStats.hashPartitioned);
+ dataTransmissionTime += getSparkShuffleTime(input2.rddStats, executorMetrics,
+ input2.rddStats.hashPartitioned);
+ dataTransmissionTime += getSparkShuffleTime(input3.rddStats, executorMetrics,
+ input3.rddStats.hashPartitioned);
+ }
+
+ long nflop = getInstNFLOP(SPType.Ternary, tInst.getOpcode(), output, input1, input2, input3);
+ double mapTime = getCPUTime(nflop, output.rddStats.numPartitions, executorMetrics,
+ output.rddStats, inputRddStats);
+
+ return dataTransmissionTime + mapTime;
+ }
+
+ public static double getQuaternaryInstTime(QuaternarySPInstruction quatInst, VarStats input1, VarStats input2, VarStats input3, VarStats output, IOMetrics driverMetrics, IOMetrics executorMetrics) {
+ String opcode = quatInst.getOpcode();
+ if (opcode.startsWith("red")) {
+ throw new RuntimeException("Spark Quaternary reduce-operations are not supported yet");
+ }
+ double dataTransmissionTime;
+ dataTransmissionTime = getSparkBroadcastTime(input2, driverMetrics, executorMetrics)
+ + getSparkBroadcastTime(input3, driverMetrics, executorMetrics); // for map-side ops only
+ if (opcode.equals("mapwsloss") || opcode.equals("mapwcemm")) {
+ output.rddStats.isCollected = true;
+ } else if (opcode.equals("mapwdivmm")) {
+ dataTransmissionTime += getSparkShuffleTime(output.rddStats, executorMetrics, true);
+ }
+
+ long nflop = getInstNFLOP(quatInst.getSPInstructionType(), opcode, output, input1);
+ double mapTime = getCPUTime(nflop, input1.rddStats.numPartitions, executorMetrics,
+ output.rddStats, input1.rddStats);
+
+ return dataTransmissionTime + mapTime;
+ }
+
/**
* Computes an estimate for the time needed by the CPU to execute (including memory access)
* an instruction by providing number of floating operations.
@@ -674,139 +737,16 @@ private static long getInstNFLOP(
return CPCostUtils.getInstNFLOP(CPType.Ctable, opcode, output, inputs);
case ParameterizedBuiltin:
return CPCostUtils.getInstNFLOP(CPType.ParameterizedBuiltin, opcode, output, inputs);
+ case Ternary:
+ // only the output is relevant for the calculation
+ return CPCostUtils.getInstNFLOP(CPType.Ternary, opcode, output);
+ case Quaternary:
+ String opcodeRoot = opcode.substring(3);
+ return CPCostUtils.getInstNFLOP(CPType.Quaternary, opcodeRoot, output, inputs);
default:
// all existing cases should have been handled above
throw new DMLRuntimeException("Spark operation type'" + instructionType + "' is not supported by SystemDS");
}
throw new RuntimeException();
}
-
-
-// //ternary aggregate operators
-// case "tak+*":
-// break;
-// case "tack+*":
-// break;
-// // Neural network operators
-// case "conv2d":
-// case "conv2d_bias_add":
-// case "maxpooling":
-// case "relu_maxpooling":
-// case RightIndex.OPCODE:
-// case LeftIndex.OPCODE:
-// case "mapLeftIndex":
-// case "_map",:
-// break;
-// // Spark-specific instructions
-// case Checkpoint.DEFAULT_CP_OPCODE,:
-// break;
-// case Checkpoint.ASYNC_CP_OPCODE,:
-// break;
-// case Compression.OPCODE,:
-// break;
-// case DeCompression.OPCODE,:
-// break;
-// // Parameterized Builtin Functions
-// case "autoDiff",:
-// break;
-// case "contains",:
-// break;
-// case "groupedagg",:
-// break;
-// case "mapgroupedagg",:
-// break;
-// case "rmempty",:
-// break;
-// case "replace",:
-// break;
-// case "rexpand",:
-// break;
-// case "lowertri",:
-// break;
-// case "uppertri",:
-// break;
-// case "tokenize",:
-// break;
-// case "transformapply",:
-// break;
-// case "transformdecode",:
-// break;
-// case "transformencode",:
-// break;
-// case "mappend",:
-// break;
-// case "rappend",:
-// break;
-// case "gappend",:
-// break;
-// case "galignedappend",:
-// break;
-// //ternary instruction opcodes
-// case "ctable",:
-// break;
-// case "ctableexpand",:
-// break;
-//
-// //ternary instruction opcodes
-// case "+*",:
-// break;
-// case "-*",:
-// break;
-// case "ifelse",:
-// break;
-//
-// //quaternary instruction opcodes
-// case WeightedSquaredLoss.OPCODE,:
-// break;
-// case WeightedSquaredLossR.OPCODE,:
-// break;
-// case WeightedSigmoid.OPCODE,:
-// break;
-// case WeightedSigmoidR.OPCODE,:
-// break;
-// case WeightedDivMM.OPCODE,:
-// break;
-// case WeightedDivMMR.OPCODE,:
-// break;
-// case WeightedCrossEntropy.OPCODE,:
-// break;
-// case WeightedCrossEntropyR.OPCODE,:
-// break;
-// case WeightedUnaryMM.OPCODE,:
-// break;
-// case WeightedUnaryMMR.OPCODE,:
-// break;
-// case "bcumoffk+":
-// break;
-// case "bcumoff*":
-// break;
-// case "bcumoff+*":
-// break;
-// case "bcumoffmin",:
-// break;
-// case "bcumoffmax",:
-// break;
-//
-// //central moment, covariance, quantiles (sort/pick)
-// case "cm" ,:
-// break;
-// case "cov" ,:
-// break;
-// case "qsort" ,:
-// break;
-// case "qpick" ,:
-// break;
-//
-// case "binuaggchain",:
-// break;
-//
-// case "write" ,:
-// break;
-//
-//
-// case "spoof":
-// break;
-// default:
-// throw RuntimeException("No complexity factor for op. code: " + opcode);
-// }
}
diff --git a/src/main/java/org/apache/sysds/resource/enumeration/EnumerationUtils.java b/src/main/java/org/apache/sysds/resource/enumeration/EnumerationUtils.java
index 69002269472..fa22f6c4f73 100644
--- a/src/main/java/org/apache/sysds/resource/enumeration/EnumerationUtils.java
+++ b/src/main/java/org/apache/sysds/resource/enumeration/EnumerationUtils.java
@@ -21,9 +21,7 @@
import org.apache.sysds.resource.CloudInstance;
-import java.util.HashMap;
-import java.util.LinkedList;
-import java.util.TreeMap;
+import java.util.*;
public class EnumerationUtils {
/**
@@ -64,6 +62,8 @@ public void initSpace(HashMap instances) {
LinkedList currentList = currentSubTree.get(instance.getVCPUs());
currentList.add(instance);
+ // ensure total order based on price (ascending)
+ currentList.sort(Comparator.comparingDouble(CloudInstance::getPrice));
}
}
}
@@ -87,6 +87,23 @@ public ConfigurationPoint(CloudInstance driverInstance, CloudInstance executorIn
this.executorInstance = executorInstance;
this.numberExecutors = numberExecutors;
}
+
+ @Override
+ public String toString() {
+ StringBuilder builder = new StringBuilder();
+ builder.append("Driver: ").append(driverInstance.getInstanceName());
+ builder.append("\n mem: ").append((double) driverInstance.getMemory()/(1024*1024*1024));
+ builder.append(", v. cores: ").append(driverInstance.getVCPUs());
+ builder.append("\nExecutors: ");
+ if (numberExecutors > 0) {
+ builder.append(numberExecutors).append(" x ").append(executorInstance.getInstanceName());
+ builder.append("\n mem: ").append((double) executorInstance.getMemory()/(1024*1024*1024));
+ builder.append(", v. cores: ").append(executorInstance.getVCPUs());
+ } else {
+ builder.append("-");
+ }
+ return builder.toString();
+ }
}
/**
@@ -109,5 +126,13 @@ public void update(ConfigurationPoint point, double timeCost, double monetaryCos
this.timeCost = timeCost;
this.monetaryCost = monetaryCost;
}
+
+ public double getTimeCost() {
+ return timeCost;
+ }
+
+ public double getMonetaryCost() {
+ return monetaryCost;
+ }
}
}
diff --git a/src/main/java/org/apache/sysds/resource/enumeration/Enumerator.java b/src/main/java/org/apache/sysds/resource/enumeration/Enumerator.java
index 9a51c78a433..b2de8410a5e 100644
--- a/src/main/java/org/apache/sysds/resource/enumeration/Enumerator.java
+++ b/src/main/java/org/apache/sysds/resource/enumeration/Enumerator.java
@@ -20,9 +20,11 @@
package org.apache.sysds.resource.enumeration;
-import org.apache.sysds.resource.AWSUtils;
import org.apache.sysds.resource.CloudInstance;
import org.apache.sysds.resource.CloudUtils;
+import org.apache.sysds.resource.CloudUtils.InstanceFamily;
+import org.apache.sysds.resource.CloudUtils.InstanceSize;
+import org.apache.sysds.resource.CloudUtils.CloudProvider;
import org.apache.sysds.resource.ResourceCompiler;
import org.apache.sysds.resource.cost.CostEstimationException;
import org.apache.sysds.resource.cost.CostEstimator;
@@ -31,23 +33,29 @@
import org.apache.sysds.resource.enumeration.EnumerationUtils.ConfigurationPoint;
import org.apache.sysds.resource.enumeration.EnumerationUtils.SolutionPoint;
-import java.io.IOException;
import java.util.*;
+import java.util.Map.Entry;
+import java.util.concurrent.atomic.AtomicReference;
public abstract class Enumerator {
public enum EnumerationStrategy {
- GridBased, // considering all combination within a given range of configuration
- InterestBased, // considering only combinations of configurations with memory budge close to memory estimates
+ // considering all combinations within a given range of configurations
+ GridBased,
+ // considering only combinations of configurations with memory budge close to program memory estimates
+ InterestBased,
+ // considering potentially all combinations within a given range of configurations
+ // but decides for pruning following pre-defined rules
+ PruneBased
}
public enum OptimizationStrategy {
- MinTime, // always prioritize execution time minimization
- MinPrice, // always prioritize operation price minimization
+ MinCosts, // use linearized scoring function to minimize both time and price, no constrains apply
+ MinTime, // minimize execution time constrained to a given price limit
+ MinPrice, // minimize time constrained to a given price limit
}
// Static variables ------------------------------------------------------------------------------------------------
-
public static final int DEFAULT_MIN_EXECUTORS = 0; // Single Node execution allowed
/**
* A reasonable upper bound for the possible number of executors
@@ -56,136 +64,55 @@ public enum OptimizationStrategy {
* have too high distribution overhead
*/
public static final int DEFAULT_MAX_EXECUTORS = 200;
-
- // limit for the ratio number of executor and number
- // of executor per executor
- public static final int MAX_LEVEL_PARALLELISM = 1000;
-
/** Time/Monetary delta for considering optimal solutions as fraction */
public static final double COST_DELTA_FRACTION = 0.02;
+ /**
+ * A generally applied services quotes in AWS - 1152:
+ * number of vCPUs running at the same time for the account in a single region.
+ */
+ // Static configurations -------------------------------------------------------------------------------------------
+ static double LINEAR_OBJECTIVE_RATIO = 0.01; // time/price ratio
+ static double MAX_TIME = Double.MAX_VALUE; // no limit by default
+ static double MAX_PRICE = Double.MAX_VALUE; // no limit by default
+ static int CPU_QUOTA = 1152;
+ // allow changing the default quota value
+ public static void setCostsWeightFactor(double newFactor) { LINEAR_OBJECTIVE_RATIO = newFactor; }
+ public static void setMinTime(double maxTime) { MAX_TIME = maxTime; }
+ public static void setMinPrice(double maxPrice) { MAX_PRICE = maxPrice; }
+ public static void setCpuQuota(int newQuotaValue) { CPU_QUOTA = newQuotaValue; }
// Instance variables ----------------------------------------------------------------------------------------------
- HashMap instances = null;
+ HashMap instances;
Program program;
- CloudUtils utils;
EnumerationStrategy enumStrategy;
OptimizationStrategy optStrategy;
- private final double maxTime;
- private final double maxPrice;
protected final int minExecutors;
protected final int maxExecutors;
- protected final Set instanceTypesRange;
+ protected final Set instanceTypesRange;
protected final Set instanceSizeRange;
protected final InstanceSearchSpace driverSpace = new InstanceSearchSpace();
protected final InstanceSearchSpace executorSpace = new InstanceSearchSpace();
- protected ArrayList solutionPool = new ArrayList<>();
+ protected AtomicReference optimalSolution = new AtomicReference<>(null);
// Initialization functionality ------------------------------------------------------------------------------------
public Enumerator(Builder builder) {
- if (builder.provider.equals(CloudUtils.CloudProvider.AWS)) {
- utils = new AWSUtils();
- } // as of now no other provider is supported
this.program = builder.program;
+ this.instances = builder.instances;
this.enumStrategy = builder.enumStrategy;
this.optStrategy = builder.optStrategy;
- this.maxTime = builder.maxTime;
- this.maxPrice = builder.maxPrice;
this.minExecutors = builder.minExecutors;
this.maxExecutors = builder.maxExecutors;
- this.instanceTypesRange = builder.instanceTypesRange;
+ this.instanceTypesRange = builder.instanceFamiliesRange;
this.instanceSizeRange = builder.instanceSizeRange;
- }
-
- /**
- * Meant to be used for testing purposes
- * @return ?
- */
- public HashMap getInstances() {
- return instances;
- }
-
- /**
- * Meant to be used for testing purposes
- * @return ?
- */
- public InstanceSearchSpace getDriverSpace() {
- return driverSpace;
- }
-
- /**
- * Meant to be used for testing purposes
- * @param inputSpace ?
- */
- public void setDriverSpace(InstanceSearchSpace inputSpace) {
- driverSpace.putAll(inputSpace);
- }
-
- /**
- * Meant to be used for testing purposes
- * @return ?
- */
- public InstanceSearchSpace getExecutorSpace() {
- return executorSpace;
- }
-
- /**
- * Meant to be used for testing purposes
- * @param inputSpace ?
- */
- public void setExecutorSpace(InstanceSearchSpace inputSpace) {
- executorSpace.putAll(inputSpace);
- }
-
- /**
- * Meant to be used for testing purposes
- * @return ?
- */
- public ArrayList getSolutionPool() {
- return solutionPool;
- }
-
- /**
- * Meant to be used for testing purposes
- * @param solutionPool ?
- */
- public void setSolutionPool(ArrayList solutionPool) {
- this.solutionPool = solutionPool;
- }
-
- /**
- * Setting the available VM instances manually.
- * Meant to be used for testing purposes.
- * @param inputInstances initialized map of instances
- */
- public void setInstanceTable(HashMap inputInstances) {
- instances = new HashMap<>();
- for (String key: inputInstances.keySet()) {
- if (instanceTypesRange.contains(utils.getInstanceType(key))
- && instanceSizeRange.contains(utils.getInstanceSize(key))) {
- instances.put(key, inputInstances.get(key));
- }
- }
- }
-
- /**
- * Loads the info table for the available VM instances
- * and filters out the instances that are not contained
- * in the set of allowed instance types and sizes.
- *
- * @param path csv file with instances' info
- * @throws IOException in case the loading part fails at reading the csv file
- */
- public void loadInstanceTableFile(String path) throws IOException {
- HashMap allInstances = utils.loadInstanceInfoTable(path);
- instances = new HashMap<>();
- for (String key: allInstances.keySet()) {
- if (instanceTypesRange.contains(utils.getInstanceType(key))
- && instanceSizeRange.contains(utils.getInstanceSize(key))) {
- instances.put(key, allInstances.get(key));
- }
- }
+ // init optimal solution here to allow errors at comparing before the first update
+ SolutionPoint initSolutionPoint = new SolutionPoint(
+ new ConfigurationPoint(null, null, -1),
+ Double.MAX_VALUE,
+ Double.MAX_VALUE
+ );
+ optimalSolution.set(initSolutionPoint);
}
// Main functionality ----------------------------------------------------------------------------------------------
@@ -206,48 +133,61 @@ public void loadInstanceTableFile(String path) throws IOException {
* dynamically for each parsed executor instance.
*/
public void processing() {
+ long driverMemory, executorMemory;
+ int driverCores, executorCores;
ConfigurationPoint configurationPoint;
- SolutionPoint optSolutionPoint = new SolutionPoint(
- new ConfigurationPoint(null, null, -1),
- Double.MAX_VALUE,
- Double.MAX_VALUE
- );
- for (Map.Entry>> dMemoryEntry: driverSpace.entrySet()) {
+
+ for (Entry>> dMemoryEntry: driverSpace.entrySet()) {
+ driverMemory = dMemoryEntry.getKey();
// loop over the search space to enumerate the driver configurations
- for (Map.Entry> dCoresEntry: dMemoryEntry.getValue().entrySet()) {
+ for (Entry> dCoresEntry: dMemoryEntry.getValue().entrySet()) {
+ driverCores = dCoresEntry.getKey();
// single node execution mode
- if (evaluateSingleNodeExecution(dMemoryEntry.getKey())) {
- program = ResourceCompiler.doFullRecompilation(
- program,
- dMemoryEntry.getKey(),
- dCoresEntry.getKey()
- );
+ if (evaluateSingleNodeExecution(driverMemory, driverCores)) {
+ ResourceCompiler.setSingleNodeResourceConfigs(driverMemory, driverCores);
+ program = ResourceCompiler.doFullRecompilation(program);
+ // no need of recompilation for single nodes with identical memory budget and #v. cores
for (CloudInstance dInstance: dCoresEntry.getValue()) {
+ // iterate over all driver nodes with the currently evaluated memory and #cores values
configurationPoint = new ConfigurationPoint(dInstance);
- updateOptimalSolution(optSolutionPoint, configurationPoint);
+ double[] newEstimates = getCostEstimate(configurationPoint);
+ updateOptimalSolution(newEstimates[0], newEstimates[1], configurationPoint);
}
}
// enumeration for distributed execution
- for (Map.Entry>> eMemoryEntry: executorSpace.entrySet()) {
+ for (Entry>> eMemoryEntry: executorSpace.entrySet()) {
+ executorMemory = eMemoryEntry.getKey();
// loop over the search space to enumerate the executor configurations
- for (Map.Entry> eCoresEntry: eMemoryEntry.getValue().entrySet()) {
- List numberExecutorsSet = estimateRangeExecutors(eMemoryEntry.getKey(), eCoresEntry.getKey());
- // Spark execution mode
+ for (Entry> eCoresEntry: eMemoryEntry.getValue().entrySet()) {
+ executorCores = eCoresEntry.getKey();
+ List numberExecutorsSet =
+ estimateRangeExecutors(driverCores, executorMemory, executorCores);
+ // for Spark execution mode
for (int numberExecutors: numberExecutorsSet) {
- // TODO: avoid full recompilation when the driver memory is not changed
- program = ResourceCompiler.doFullRecompilation(
- program,
- dMemoryEntry.getKey(),
- dCoresEntry.getKey(),
- numberExecutors,
- eMemoryEntry.getKey(),
- eCoresEntry.getKey()
- );
- // TODO: avoid full program cost estimation when the driver instance is not changed
+ try {
+ ResourceCompiler.setSparkClusterResourceConfigs(
+ driverMemory,
+ driverCores,
+ numberExecutors,
+ executorMemory,
+ executorCores
+ );
+ } catch (IllegalArgumentException e) {
+ // insufficient driver memory detected
+ break;
+ }
+ program = ResourceCompiler.doFullRecompilation(program);
+// System.out.println(Explain.explain(program));
+ // no need of recompilation for a cluster with identical #executors and
+ // with identical memory and #v. cores for driver and executor nodes
for (CloudInstance dInstance: dCoresEntry.getValue()) {
+ // iterate over all driver nodes with the currently evaluated memory and #cores values
for (CloudInstance eInstance: eCoresEntry.getValue()) {
+ // iterate over all executor nodes for the evaluated cluster size
+ // with the currently evaluated memory and #cores values
configurationPoint = new ConfigurationPoint(dInstance, eInstance, numberExecutors);
- updateOptimalSolution(optSolutionPoint, configurationPoint);
+ double[] newEstimates = getCostEstimate(configurationPoint);
+ updateOptimalSolution(newEstimates[0], newEstimates[1], configurationPoint);
}
}
}
@@ -258,72 +198,55 @@ public void processing() {
}
/**
- * Deciding in the overall best solution out
- * of the filled pool of potential solutions
- * after processing.
- * @return single optimal cluster configuration
+ * Retrieving the estimated optimal configurations after processing.
+ *
+ * @return optimal cluster configuration and corresponding costs
*/
public SolutionPoint postprocessing() {
- if (solutionPool.isEmpty()) {
- throw new RuntimeException("Calling postprocessing() should follow calling processing()");
+ if (optimalSolution.get() == null) {
+ throw new RuntimeException("No solution have met the constrains. " +
+ "Try adjusting the time/price constrain or switch to 'MinCosts' optimization strategy");
}
- SolutionPoint optSolution = solutionPool.get(0);
- double bestCost = Double.MAX_VALUE;
- for (SolutionPoint solution: solutionPool) {
- double combinedCost = solution.monetaryCost * solution.timeCost;
- if (combinedCost < bestCost) {
- optSolution = solution;
- bestCost = combinedCost;
- } else if (combinedCost == bestCost) {
- // the ascending order of the searching spaces for driver and executor
- // instances ensures that in case of equally good optimal solutions
- // the first one has at least resource characteristics.
- // This, however, is not valid for the number of executors
- if (solution.numberExecutors < optSolution.numberExecutors) {
- optSolution = solution;
- bestCost = combinedCost;
- }
- }
- }
- return optSolution;
+ return optimalSolution.get();
}
// Helper methods --------------------------------------------------------------------------------------------------
- public abstract boolean evaluateSingleNodeExecution(long driverMemory);
+ public abstract boolean evaluateSingleNodeExecution(long driverMemory, int cores);
/**
* Estimates the minimum and maximum number of
- * executors based on given VM instance characteristics
- * and on the enumeration strategy
+ * executors based on given VM instance characteristics,
+ * the enumeration strategy and the user-defined configurations
*
- * @param executorMemory memory of currently considered executor instance
- * @param executorCores CPU of cores of currently considered executor instance
+ * @param driverCores CPU cores for the currently evaluated driver node
+ * @param executorMemory memory of currently evaluated executor node
+ * @param executorCores CPU cores of currently evaluated executor node
* @return - [min, max]
*/
- public abstract ArrayList estimateRangeExecutors(long executorMemory, int executorCores);
+ public abstract ArrayList estimateRangeExecutors(int driverCores, long executorMemory, int executorCores);
/**
* Estimates the time cost for the current program based on the
* given cluster configurations and following this estimation
* it calculates the corresponding monetary cost.
- * @param point - cluster configuration used for (re)compiling the current program
+ * @param point cluster configuration used for (re)compiling the current program
* @return - [time cost, monetary cost]
*/
- private double[] getCostEstimate(ConfigurationPoint point) {
+ protected double[] getCostEstimate(ConfigurationPoint point) {
// get the estimated time cost
double timeCost;
+ double monetaryCost;
try {
// estimate execution time of the current program
- // TODO: pass further relevant cluster configurations to cost estimator after extending it
- // like for example: FLOPS, I/O and networking speed
timeCost = CostEstimator.estimateExecutionTime(program, point.driverInstance, point.executorInstance)
+ CloudUtils.DEFAULT_CLUSTER_LAUNCH_TIME;
+ monetaryCost = CloudUtils.calculateClusterPrice(point, timeCost, CloudProvider.AWS);
} catch (CostEstimationException e) {
- throw new RuntimeException(e.getMessage());
+ timeCost = Double.MAX_VALUE;
+ monetaryCost = Double.MAX_VALUE;
}
// calculate monetary cost
- double monetaryCost = utils.calculateClusterPrice(point, timeCost);
return new double[] {timeCost, monetaryCost}; // time cost, monetary cost
}
@@ -334,54 +257,66 @@ private double[] getCostEstimate(ConfigurationPoint point) {
* and the new cost estimation, it decides if the given cluster configuration
* can be potential optimal solution having lower cost or such a cost
* that is negligibly higher than the current lowest one.
- * @param currentOptimal solution point with the lowest cost
- * @param newPoint new cluster configuration for estimation
+ *
+ * @param newTimeEstimate estimated time cost for the given configurations
+ * @param newMonetaryEstimate estimated monetary cost for the given configurations
+ * @param newPoint new cluster configuration for estimation
*/
- private void updateOptimalSolution(SolutionPoint currentOptimal, ConfigurationPoint newPoint) {
- // TODO: clarify if setting max time and max price simultaneously makes really sense
- SolutionPoint newPotentialSolution;
- boolean replaceCurrentOptimal = false;
- double[] newCost = getCostEstimate(newPoint);
- if (optStrategy == OptimizationStrategy.MinTime) {
- if (newCost[1] > maxPrice || newCost[0] >= currentOptimal.timeCost * (1 + COST_DELTA_FRACTION)) {
+ public void updateOptimalSolution(double newTimeEstimate, double newMonetaryEstimate, ConfigurationPoint newPoint) {
+ SolutionPoint currentOptimal = optimalSolution.get();
+ if (optStrategy == OptimizationStrategy.MinCosts) {
+ double optimalScore = linearScoringFunction(currentOptimal.timeCost, currentOptimal.monetaryCost);
+ double newScore = linearScoringFunction(newTimeEstimate, newMonetaryEstimate);
+ if (newScore > optimalScore) {
+ return;
+ }
+ if (newScore == optimalScore && newMonetaryEstimate >= currentOptimal.monetaryCost) {
+ // prioritize cost for the edge case
+ return;
+ }
+ } else if (optStrategy == OptimizationStrategy.MinTime) {
+ if (newMonetaryEstimate > MAX_PRICE || newTimeEstimate > currentOptimal.timeCost) {
+ return;
+ }
+ if (newTimeEstimate == currentOptimal.timeCost && newMonetaryEstimate > currentOptimal.monetaryCost) {
return;
}
- if (newCost[0] < currentOptimal.timeCost) replaceCurrentOptimal = true;
} else if (optStrategy == OptimizationStrategy.MinPrice) {
- if (newCost[0] > maxTime || newCost[1] >= currentOptimal.monetaryCost * (1 + COST_DELTA_FRACTION)) {
+ if (newTimeEstimate > MAX_TIME || newMonetaryEstimate > currentOptimal.monetaryCost) {
+ return;
+ }
+ if (newMonetaryEstimate == currentOptimal.monetaryCost && newTimeEstimate > currentOptimal.timeCost) {
return;
}
- if (newCost[1] < currentOptimal.monetaryCost) replaceCurrentOptimal = true;
- }
- newPotentialSolution = new SolutionPoint(newPoint, newCost[0], newCost[1]);
- solutionPool.add(newPotentialSolution);
- if (replaceCurrentOptimal) {
- currentOptimal.update(newPoint, newCost[0], newCost[1]);
}
+ SolutionPoint newSolution = new SolutionPoint(newPoint, newTimeEstimate, newMonetaryEstimate);
+ optimalSolution.set(newSolution);
+ }
+
+ static double linearScoringFunction(double time, double price) {
+ return LINEAR_OBJECTIVE_RATIO * time + (1 - LINEAR_OBJECTIVE_RATIO) * price;
}
// Class builder ---------------------------------------------------------------------------------------------------
public static class Builder {
- private final CloudUtils.CloudProvider provider = CloudUtils.CloudProvider.AWS; // currently default and only choice
- private Program program;
+ private Program program = null;
+ private HashMap instances = null;
private EnumerationStrategy enumStrategy = null;
private OptimizationStrategy optStrategy = null;
- private double maxTime = -1d;
- private double maxPrice = -1d;
private int minExecutors = DEFAULT_MIN_EXECUTORS;
private int maxExecutors = DEFAULT_MAX_EXECUTORS;
- private Set instanceTypesRange = null;
- private Set instanceSizeRange = null;
+ private Set instanceFamiliesRange = null;
+ private Set instanceSizeRange = null;
// GridBased specific ------------------------------------------------------------------------------------------
private int stepSizeExecutors = 1;
private int expBaseExecutors = -1; // flag for exp. increasing number of executors if -1
// InterestBased specific --------------------------------------------------------------------------------------
- private boolean fitDriverMemory = true;
- private boolean fitBroadcastMemory = true;
- private boolean checkSingleNodeExecution = false;
- private boolean fitCheckpointMemory = false;
+ private boolean interestLargestEstimate = true;
+ private boolean interestEstimatesInCP = true;
+ private boolean interestBroadcastVars = true;
+ private boolean interestOutputCaching = false; // caching not fully considered by the cost estimator
public Builder() {}
public Builder withRuntimeProgram(Program program) {
@@ -389,6 +324,11 @@ public Builder withRuntimeProgram(Program program) {
return this;
}
+ public Builder withAvailableInstances(HashMap instances) {
+ this.instances = instances;
+ return this;
+ }
+
public Builder withEnumerationStrategy(EnumerationStrategy strategy) {
this.enumStrategy = strategy;
return this;
@@ -399,31 +339,14 @@ public Builder withOptimizationStrategy(OptimizationStrategy strategy) {
return this;
}
- public Builder withTimeLimit(double time) {
- if (time < CloudUtils.MINIMAL_EXECUTION_TIME) {
- throw new IllegalArgumentException(CloudUtils.MINIMAL_EXECUTION_TIME +
- "s is the minimum target execution time.");
- }
- this.maxTime = time;
- return this;
- }
-
- public Builder withBudget(double price) {
- if (price <= 0) {
- throw new IllegalArgumentException("The given budget (target price) should be positive");
- }
- this.maxPrice = price;
- return this;
- }
-
public Builder withNumberExecutorsRange(int min, int max) {
- this.minExecutors = min;
- this.maxExecutors = max;
+ this.minExecutors = min < 0? DEFAULT_MIN_EXECUTORS : min;
+ this.maxExecutors = max < 0? DEFAULT_MAX_EXECUTORS : max;
return this;
}
- public Builder withInstanceTypeRange(String[] instanceTypes) {
- this.instanceTypesRange = typeRangeFromStrings(instanceTypes);
+ public Builder withInstanceFamilyRange(String[] instanceFamilies) {
+ this.instanceFamiliesRange = typeRangeFromStrings(instanceFamilies);
return this;
}
@@ -437,94 +360,180 @@ public Builder withStepSizeExecutor(int stepSize) {
return this;
}
-
- public Builder withFitDriverMemory(boolean fitDriverMemory) {
- this.fitDriverMemory = fitDriverMemory;
+ public Builder withInterestLargestEstimate(boolean fitSingleNodeMemory) {
+ this.interestLargestEstimate = fitSingleNodeMemory;
return this;
}
- public Builder withFitBroadcastMemory(boolean fitBroadcastMemory) {
- this.fitBroadcastMemory = fitBroadcastMemory;
+ public Builder withInterestEstimatesInCP(boolean fitDriverMemory) {
+ this.interestEstimatesInCP = fitDriverMemory;
return this;
}
- public Builder withCheckSingleNodeExecution(boolean checkSingleNodeExecution) {
- this.checkSingleNodeExecution = checkSingleNodeExecution;
+ public Builder withInterestBroadcastVars(boolean fitExecutorMemory) {
+ this.interestBroadcastVars = fitExecutorMemory;
return this;
}
- public Builder withFitCheckpointMemory(boolean fitCheckpointMemory) {
- this.fitCheckpointMemory = fitCheckpointMemory;
+ public Builder withInterestOutputCaching(boolean fitCheckpointMemory) {
+ this.interestOutputCaching = fitCheckpointMemory;
return this;
}
public Builder withExpBaseExecutors(int expBaseExecutors) {
if (expBaseExecutors != -1 && expBaseExecutors < 2) {
- throw new IllegalArgumentException("Given exponent base for number of executors should be -1 or bigger than 1.");
+ throw new IllegalArgumentException(
+ "Given exponent base for number of executors should be -1 or bigger than 1."
+ );
}
this.expBaseExecutors = expBaseExecutors;
return this;
}
public Enumerator build() {
- if (this.program == null) {
+ if (program == null) {
throw new IllegalArgumentException("Providing runtime program is required");
}
- if (instanceTypesRange == null) {
- instanceTypesRange = EnumSet.allOf(CloudUtils.InstanceType.class);
+ if (instances == null) {
+ throw new IllegalArgumentException("Providing available instances is required");
}
+ if (instanceFamiliesRange == null) {
+ instanceFamiliesRange = EnumSet.allOf(InstanceFamily.class);
+ }
if (instanceSizeRange == null) {
- instanceSizeRange = EnumSet.allOf(CloudUtils.InstanceSize.class);
+ instanceSizeRange = EnumSet.allOf(InstanceSize.class);
}
-
- switch (optStrategy) {
- case MinTime:
- if (this.maxPrice < 0) {
- throw new IllegalArgumentException("Budget not specified but required " +
- "for the chosen optimization strategy: " + optStrategy);
- }
- break;
- case MinPrice:
- if (this.maxTime < 0) {
- throw new IllegalArgumentException("Time limit not specified but required " +
- "for the chosen optimization strategy: " + optStrategy);
- }
- break;
- default: // in case optimization strategy was not configured
- throw new IllegalArgumentException("Setting an optimization strategy is required.");
+ // filter instances that are not supported or not of the desired type/size
+ HashMap instancesWithinRange = new HashMap<>();
+ for (String key: instances.keySet()) {
+ if (instanceFamiliesRange.contains(CloudUtils.getInstanceFamily(key))
+ && instanceSizeRange.contains(CloudUtils.getInstanceSize(key))) {
+ instancesWithinRange.put(key, instances.get(key));
+ }
}
+ instances = instancesWithinRange;
switch (enumStrategy) {
case GridBased:
return new GridBasedEnumerator(this, stepSizeExecutors, expBaseExecutors);
case InterestBased:
- if (fitCheckpointMemory && expBaseExecutors != -1) {
- throw new IllegalArgumentException("Number of executors cannot be fitted on the checkpoint estimates and increased exponentially simultaneously.");
- }
- return new InterestBasedEnumerator(this, fitDriverMemory, fitBroadcastMemory, checkSingleNodeExecution, fitCheckpointMemory);
+ return new InterestBasedEnumerator(this,
+ interestLargestEstimate,
+ interestEstimatesInCP,
+ interestBroadcastVars,
+ interestOutputCaching
+ );
+ case PruneBased:
+ return new PruneBasedEnumerator(this);
default:
throw new IllegalArgumentException("Setting an enumeration strategy is required.");
}
}
- protected static Set typeRangeFromStrings(String[] types) {
- Set result = EnumSet.noneOf(CloudUtils.InstanceType.class);
+ protected static Set typeRangeFromStrings(String[] types) throws IllegalArgumentException {
+ Set result = EnumSet.noneOf(InstanceFamily.class);
for (String typeAsString: types) {
- CloudUtils.InstanceType type = CloudUtils.InstanceType.customValueOf(typeAsString); // can throw IllegalArgumentException
+ InstanceFamily type = InstanceFamily.customValueOf(typeAsString);
result.add(type);
}
return result;
}
- protected static Set sizeRangeFromStrings(String[] sizes) {
- Set result = EnumSet.noneOf(CloudUtils.InstanceSize.class);
+ protected static Set sizeRangeFromStrings(String[] sizes) throws IllegalArgumentException {
+ Set result = EnumSet.noneOf(InstanceSize.class);
for (String sizeAsString: sizes) {
- CloudUtils.InstanceSize size = CloudUtils.InstanceSize.customValueOf(sizeAsString); // can throw IllegalArgumentException
+ InstanceSize size = InstanceSize.customValueOf(sizeAsString);
result.add(size);
}
return result;
}
}
+
+ // Public Getters and Setter meant for testing purposes only -------------------------------------------------------
+
+ /**
+ * Meant to be used for testing purposes
+ * @return the available instances for enumeration
+ */
+ public HashMap getInstances() {
+ return instances;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return the object representing the driver search space
+ */
+ public InstanceSearchSpace getDriverSpace() {
+ return driverSpace;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @param inputSpace the object representing the driver search space
+ */
+ public void setDriverSpace(InstanceSearchSpace inputSpace) {
+ driverSpace.putAll(inputSpace);
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return the object representing the executor search space
+ */
+ public InstanceSearchSpace getExecutorSpace() {
+ return executorSpace;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @param inputSpace the object representing the executor search space
+ */
+ public void setExecutorSpace(InstanceSearchSpace inputSpace) {
+ executorSpace.putAll(inputSpace);
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return applied enumeration strategy
+ */
+ public EnumerationStrategy getEnumStrategy() { return enumStrategy; }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return applied optimization strategy
+ */
+ public OptimizationStrategy getOptStrategy() { return optStrategy; }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return configured weight factor optimization function 'costs'
+ */
+ public double getCostsWeightFactor() {
+ return Enumerator.LINEAR_OBJECTIVE_RATIO;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return configured max time for consideration (seconds)
+ */
+ public double getMaxTime() {
+ return Enumerator.MAX_TIME;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return configured max price for consideration (dollars)
+ */
+ public double getMaxPrice() {
+ return Enumerator.MAX_PRICE;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return current optimal solution
+ */
+ public SolutionPoint getOptimalSolution() {
+ return optimalSolution.get();
+ }
}
diff --git a/src/main/java/org/apache/sysds/resource/enumeration/GridBasedEnumerator.java b/src/main/java/org/apache/sysds/resource/enumeration/GridBasedEnumerator.java
index aa71aba139b..571fc929b5c 100644
--- a/src/main/java/org/apache/sysds/resource/enumeration/GridBasedEnumerator.java
+++ b/src/main/java/org/apache/sysds/resource/enumeration/GridBasedEnumerator.java
@@ -22,13 +22,10 @@
import java.util.*;
public class GridBasedEnumerator extends Enumerator {
- // marks if the number of executors should
- // be increased by a given step
+ /** sets the step size at iterating over number of executors at enumeration */
private final int stepSizeExecutors;
- // marks if the number of executors should
- // be increased exponentially
- // (single node execution mode is not excluded)
- // -1 marks no exp. increasing
+ /** sets the exponential base for exp. iteration over number of executors at enumeration
+ * (-1 suggest to not use exponential increments - the default behaviour) */
private final int expBaseExecutors;
public GridBasedEnumerator(Builder builder, int stepSizeExecutors, int expBaseExecutors) {
super(builder);
@@ -48,19 +45,20 @@ public void preprocessing() {
}
@Override
- public boolean evaluateSingleNodeExecution(long driverMemory) {
+ public boolean evaluateSingleNodeExecution(long driverMemory, int cores) {
+ if (cores > CPU_QUOTA) return false;
return minExecutors == 0;
}
@Override
- public ArrayList estimateRangeExecutors(long executorMemory, int executorCores) {
- // consider the maximum level of parallelism and
+ public ArrayList estimateRangeExecutors(int driverCores, long executorMemory, int executorCores) {
+ // consider the cpu quota (limit) for cloud instances and
// based on the initiated flags decides for the following methods
// for enumeration of the number of executors:
// 1. Increasing the number of executor with given step size (default 1)
- // 2. Exponentially increasing number of executors based on
- // a given exponent base - with additional option for 0 executors
- int currentMax = Math.min(maxExecutors, MAX_LEVEL_PARALLELISM / executorCores);
+ // 2. Exponentially increasing number of executors based on a given exponent base
+ int maxAchievableLevelOfParallelism = CPU_QUOTA - driverCores;
+ int currentMax = Math.min(maxExecutors, maxAchievableLevelOfParallelism / executorCores);
ArrayList result;
if (expBaseExecutors > 1) {
int maxCapacity = (int) Math.floor(Math.log(currentMax) / Math.log(2));
@@ -86,4 +84,22 @@ public ArrayList estimateRangeExecutors(long executorMemory, int execut
return result;
}
+
+ // Public Getters and Setter meant for testing purposes only -------------------------------------------------------
+
+ /**
+ * Meant to be used for testing purposes
+ * @return the set step size for the grid search
+ */
+ public int getStepSize() {
+ return stepSizeExecutors;
+ }
+
+ /**
+ * Meant to be used for testing purposes
+ * @return the set exponential base for the grid search
+ */
+ public int getExpBase() {
+ return expBaseExecutors;
+ }
}
diff --git a/src/main/java/org/apache/sysds/resource/enumeration/InterestBasedEnumerator.java b/src/main/java/org/apache/sysds/resource/enumeration/InterestBasedEnumerator.java
index ea6971a967a..349d44312f5 100644
--- a/src/main/java/org/apache/sysds/resource/enumeration/InterestBasedEnumerator.java
+++ b/src/main/java/org/apache/sysds/resource/enumeration/InterestBasedEnumerator.java
@@ -28,44 +28,67 @@
import java.util.*;
import java.util.stream.Collectors;
+import static org.apache.sysds.resource.CloudUtils.JVM_MEMORY_FACTOR;
+
public class InterestBasedEnumerator extends Enumerator {
+ // Static configurations -------------------------------------------------------------------------------------------
public final static long MINIMUM_RELEVANT_MEM_ESTIMATE = 2L * 1024 * 1024 * 1024; // 2GB
- // different instance families can have slightly different memory characteristics (e.g. EC2 Graviton (arm) instances)
- // and using memory delta allows not ignoring such instances
- // TODO: enable usage of memory delta when FLOPS and bandwidth characteristics added to cost estimator
- public final static boolean USE_MEMORY_DELTA = false;
- public final static double MEMORY_DELTA_FRACTION = 0.05; // 5%
- public final static double BROADCAST_MEMORY_FACTOR = 0.21; // fraction of the minimum memory fraction for storage
- // marks if memory estimates should be used at deciding
- // for the search space of the instance for the driver nodes
- private final boolean fitDriverMemory;
- // marks if memory estimates should be used at deciding
- // for the search space of the instance for the executor nodes
- private final boolean fitBroadcastMemory;
- // marks if the estimation of the range of number of executors
- // for consideration should exclude single node execution mode
- // if any of the estimates cannot fit in the driver's memory
- private final boolean checkSingleNodeExecution;
- // marks if the estimated output size should be
- // considered as interesting point at deciding the
- // number of executors - checkpoint storage level
- private final boolean fitCheckpointMemory;
- // largest full memory estimate (scaled)
- private long largestMemoryEstimateCP;
- // ordered set ot output memory estimates (scaled)
- private TreeSet memoryEstimatesSpark;
+ /** different instance families can have slightly different memory characteristics
+ * and using memory delta allows not ignoring equivalent instances of different families
+ * (e.g. newer generation use 7.5/15.25/30.5/... GB memory instead of 8/16/32/...) */
+ public final static boolean USE_MEMORY_DELTA = true; // NOTE: be careful for the proper logic implementation
+ // 10% -> account for the deltas in between equivalent Amazon EC2 instances from different generations
+ public final static double MEMORY_DELTA_FRACTION = 0.1;
+ public final static double MEMORY_FACTOR = OptimizerUtils.MEM_UTIL_FACTOR * JVM_MEMORY_FACTOR;
+ // Represents an approximation for the fraction of the whole node's memory budget available to the executor
+ // since the exact value is not static <- for more info check CloudUtils.getEffectiveExecutorResources
+ private final static double EXECUTOR_MEMORY_FACTOR = 0.6;
+ // fraction of the available memory budget for broadcast variables
+ // 0.21 -> represents 70% of the storage fraction of the executors memory which is 30% of the whole executor memory
+ public final static double BROADCAST_MEMORY_FACTOR = 0.21 * EXECUTOR_MEMORY_FACTOR;
+ // fraction of the minimum available memory budget for data storing data in-memory
+ public final static double CACHE_MEMORY_FACTOR = 0.3 * EXECUTOR_MEMORY_FACTOR;
+
+ // User-defined configurations (flag for enabling/disabling the different available options) -----------------------
+ /**
+ * enables the use of the largest memory estimate (inputs + intermediates + output)
+ * as threshold for considering single node execution as a possible option ->
+ * only if the largest estimates fit in the current CP memory */
+ private final boolean interestLargestEstimate;
+ /**
+ * enables the use memory estimates (inputs + intermediates + output)
+ * as interest points for defining the search space of the driver/CP node ->
+ * nodes with memory budget close to the size of the estimates */
+ private final boolean interestEstimatesInCP;
+ /**
+ * enables the use output estimates (potential broadcast variables) as interest point
+ * for defining the search space of al all nodes in a cluster ->
+ * driver/CP nodes with memory budget close to twice of the broadcast size
+ * and executor nodes with broadcast memory fraction close to the broadcast size */
+ private final boolean interestBroadcastVars;
+ /**
+ * enables the use of output memory estimates as interest point
+ * for defining the range of number of executor nodes in a cluster ->
+ * number of nodes which leads to combined memory budget close to the output size */
+ private final boolean interestOutputCaching;
+
+ // Instance variables ----------------------------------------------------------------------------------------------
+ private long largestMemoryEstimateCP; // largest full memory estimate (scaled)
+ private TreeSet memoryEstimatesSpark; // ordered set ot output memory estimates (scaled)
+
+ // Instance methods ------------------------------------------------------------------------------------------------
public InterestBasedEnumerator(
Builder builder,
+ boolean interestLargestEstimate,
boolean fitDriverMemory,
- boolean fitBroadcastMemory,
- boolean checkSingleNodeExecution,
- boolean fitCheckpointMemory
+ boolean interestBroadcastVars,
+ boolean interestOutputCaching
) {
super(builder);
- this.fitDriverMemory = fitDriverMemory;
- this.fitBroadcastMemory = fitBroadcastMemory;
- this.checkSingleNodeExecution = checkSingleNodeExecution;
- this.fitCheckpointMemory = fitCheckpointMemory;
+ this.interestLargestEstimate = interestLargestEstimate;
+ this.interestEstimatesInCP = fitDriverMemory;
+ this.interestBroadcastVars = interestBroadcastVars;
+ this.interestOutputCaching = interestOutputCaching;
}
@Override
@@ -73,69 +96,70 @@ public void preprocessing() {
InstanceSearchSpace fullSearchSpace = new InstanceSearchSpace();
fullSearchSpace.initSpace(instances);
- if (fitDriverMemory) {
+ if (interestEstimatesInCP || interestLargestEstimate) {
// get full memory estimates and scale according ot the driver memory factor
- TreeSet memoryEstimatesForDriver = getMemoryEstimates(program, false, OptimizerUtils.MEM_UTIL_FACTOR);
+ TreeSet memoryEstimatesForDriver = getMemoryEstimates(program, false, MEMORY_FACTOR);
setInstanceSpace(fullSearchSpace, driverSpace, memoryEstimatesForDriver);
- if (checkSingleNodeExecution) {
+ if (interestLargestEstimate) {
largestMemoryEstimateCP = !memoryEstimatesForDriver.isEmpty()? memoryEstimatesForDriver.last() : -1;
}
}
- if (fitBroadcastMemory) {
+ if (interestBroadcastVars) {
// get output memory estimates and scaled according the broadcast memory factor
// for executors' memory search space and driver memory factor for driver's memory search space
TreeSet memoryEstimatesOutputSpark = getMemoryEstimates(program, true, BROADCAST_MEMORY_FACTOR);
+ setInstanceSpace(fullSearchSpace, executorSpace, memoryEstimatesOutputSpark);
// avoid calling getMemoryEstimates with different factor but rescale: output should fit twice in the CP memory
TreeSet memoryEstimatesOutputCP = memoryEstimatesOutputSpark.stream()
- .map(mem -> 2 * (long) (mem * BROADCAST_MEMORY_FACTOR / OptimizerUtils.MEM_UTIL_FACTOR))
+ .map(mem -> 2 * (long) (mem * BROADCAST_MEMORY_FACTOR / MEMORY_FACTOR))
.collect(Collectors.toCollection(TreeSet::new));
setInstanceSpace(fullSearchSpace, driverSpace, memoryEstimatesOutputCP);
- setInstanceSpace(fullSearchSpace, executorSpace, memoryEstimatesOutputSpark);
- if (checkSingleNodeExecution) {
- largestMemoryEstimateCP = !memoryEstimatesOutputCP.isEmpty()? memoryEstimatesOutputCP.last() : -1;
- }
- if (fitCheckpointMemory) {
- memoryEstimatesSpark = memoryEstimatesOutputSpark;
+
+ if (interestOutputCaching) {
+ // adapt the memory factor with minimum recompilation
+ memoryEstimatesSpark = memoryEstimatesOutputSpark.stream()
+ .map(estimate -> (long) (estimate * BROADCAST_MEMORY_FACTOR / CACHE_MEMORY_FACTOR))
+ .collect(Collectors.toCollection(TreeSet::new));
}
} else {
executorSpace.putAll(fullSearchSpace);
- if (fitCheckpointMemory) {
- memoryEstimatesSpark = getMemoryEstimates(program, true, BROADCAST_MEMORY_FACTOR);
+ if (interestOutputCaching) {
+ memoryEstimatesSpark = getMemoryEstimates(program, true, CACHE_MEMORY_FACTOR);
}
}
- if (!fitDriverMemory && !fitBroadcastMemory) {
+ if (!interestEstimatesInCP && !interestBroadcastVars) {
driverSpace.putAll(fullSearchSpace);
- if (checkSingleNodeExecution) {
- TreeSet memoryEstimatesForDriver = getMemoryEstimates(program, false, OptimizerUtils.MEM_UTIL_FACTOR);
- largestMemoryEstimateCP = !memoryEstimatesForDriver.isEmpty()? memoryEstimatesForDriver.last() : -1;
- }
}
}
@Override
- public boolean evaluateSingleNodeExecution(long driverMemory) {
- // Checking if single node execution should be excluded is optional.
- if (checkSingleNodeExecution && minExecutors == 0 && largestMemoryEstimateCP > 0) {
+ public boolean evaluateSingleNodeExecution(long driverMemory, int cores) {
+ if (cores > CPU_QUOTA) return false;
+ if (interestLargestEstimate /* enabled? */
+ && minExecutors == 0 /* single node exec. allowed */
+ && largestMemoryEstimateCP > 0 /* at least one memory estimate above the threshold */
+ ) {
return largestMemoryEstimateCP <= driverMemory;
}
return minExecutors == 0;
}
@Override
- public ArrayList estimateRangeExecutors(long executorMemory, int executorCores) {
- // consider the maximum level of parallelism and
+ public ArrayList estimateRangeExecutors(int driverCores, long executorMemory, int executorCores) {
+ // consider the CPU limit/quota and
// based on the initiated flags decides on the following methods
// for enumeration of the number of executors:
// 1. Such a number that leads to combined distributed memory
- // close to the output size of the HOPs
- // 3. Enumerating all options with the established range
+ // close to the output size of large HOPs
+ // 2. Enumerating all options with the established range
int min = Math.max(1, minExecutors);
- int max = Math.min(maxExecutors, (MAX_LEVEL_PARALLELISM / executorCores));
+ int maxAchievableLevelOfParallelism = CPU_QUOTA - driverCores;
+ int max = Math.min(maxExecutors, (maxAchievableLevelOfParallelism / executorCores));
ArrayList result;
- if (fitCheckpointMemory) {
+ if (interestOutputCaching) {
result = new ArrayList<>(memoryEstimatesSpark.size() + 1);
int previousNumber = -1;
for (long estimate: memoryEstimatesSpark) {
@@ -169,8 +193,13 @@ public ArrayList estimateRangeExecutors(long executorMemory, int execut
return result;
}
- // Static helper methods -------------------------------------------------------------------------------------------
- private static void setInstanceSpace(InstanceSearchSpace inputSpace, InstanceSearchSpace outputSpace, TreeSet memoryEstimates) {
+ // Static (helper) methods -----------------------------------------------------------------------------------------
+
+ private static void setInstanceSpace(
+ InstanceSearchSpace inputSpace,
+ InstanceSearchSpace outputSpace,
+ TreeSet memoryEstimates
+ ) {
TreeSet memoryPoints = getMemoryPoints(memoryEstimates, inputSpace.keySet());
for (long memory: memoryPoints) {
outputSpace.put(memory, inputSpace.get(memory));
@@ -207,10 +236,11 @@ private static TreeSet getMemoryPoints(TreeSet estimates, Set
long smallestOfTheLarger = relevantPoints.isEmpty()? -1 : relevantPoints.get(0);
if (USE_MEMORY_DELTA) {
- // Delta memory of 5% of the node's memory allows not ignoring
+ // Delta memory of 10% of the node's memory allows not ignoring
// memory points with potentially equivalent values but not exactly the same values.
// This is the case for example in AWS for instances of the same type but with
- // different additional capabilities: m5.xlarge (16GB) vs m5n.xlarge (15.25GB).
+ // different additional capabilities:
+ // c5.xlarge (8GB) vs c5n.xlarge (8GB) or m5.xlarge (16GB) vs m5n.xlarge (15.25GB).
// Get points smaller than the current memory estimate within the memory delta
long memoryDelta = Math.round(estimate * MEMORY_DELTA_FRACTION);
for (long point : smallerPoints) {
@@ -312,4 +342,26 @@ private static void getMemoryEstimates(Hop hop, TreeSet mem, boolean outpu
}
hop.setVisited();
}
+
+ // Public Getters and Setter meant for testing purposes only -------------------------------------------------------
+
+ // Meant to be used for testing purposes
+ public boolean interestEstimatesInCPEnabled() {
+ return interestEstimatesInCP;
+ }
+
+ // Meant to be used for testing purposes
+ public boolean interestBroadcastVars() {
+ return interestBroadcastVars;
+ }
+
+ // Meant to be used for testing purposes
+ public boolean interestLargestEstimateEnabled() {
+ return interestLargestEstimate;
+ }
+
+ // Meant to be used for testing purposes
+ public boolean interestOutputCachingEnabled() {
+ return interestOutputCaching;
+ }
}
diff --git a/src/main/java/org/apache/sysds/resource/enumeration/PruneBasedEnumerator.java b/src/main/java/org/apache/sysds/resource/enumeration/PruneBasedEnumerator.java
new file mode 100644
index 00000000000..bf9f49bcda2
--- /dev/null
+++ b/src/main/java/org/apache/sysds/resource/enumeration/PruneBasedEnumerator.java
@@ -0,0 +1,339 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.resource.enumeration;
+
+import org.apache.hadoop.util.Lists;
+import org.apache.sysds.resource.CloudInstance;
+import org.apache.sysds.resource.ResourceCompiler;
+import org.apache.sysds.runtime.controlprogram.*;
+import org.apache.sysds.runtime.instructions.Instruction;
+
+import java.util.HashMap;
+import java.util.TreeMap;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.List;
+import java.util.LinkedList;
+import java.util.ArrayList;
+
+public class PruneBasedEnumerator extends Enumerator {
+ long insufficientSingleNodeMemory;
+ long singleNodeOnlyMemory;
+ HashMap maxExecutorsPerInstanceMap;
+
+ public PruneBasedEnumerator(Builder builder) {
+ super(builder);
+ insufficientSingleNodeMemory = -1;
+ singleNodeOnlyMemory = Long.MAX_VALUE;
+ maxExecutorsPerInstanceMap = new HashMap<>();
+ }
+
+ @Override
+ public void preprocessing() {
+ driverSpace.initSpace(instances);
+ executorSpace.initSpace(instances);
+ for (Map.Entry>> eMemoryEntry: executorSpace.entrySet()) {
+ for (Integer eCores: eMemoryEntry.getValue().keySet()) {
+ long combinationHash = combineHash(eMemoryEntry.getKey(), eCores);
+ maxExecutorsPerInstanceMap.put(combinationHash, maxExecutors);
+ }
+ }
+ }
+
+ @Override
+ public void processing() {
+ long driverMemory, executorMemory;
+ int driverCores, executorCores;
+ EnumerationUtils.ConfigurationPoint configurationPoint;
+
+ for (Entry>> dMemoryEntry: driverSpace.entrySet()) {
+ driverMemory = dMemoryEntry.getKey();
+ // loop over the search space to enumerate the driver configurations
+ for (Entry> dCoresEntry: dMemoryEntry.getValue().entrySet()) {
+ driverCores = dCoresEntry.getKey();
+ // single node execution mode
+ if (evaluateSingleNodeExecution(driverMemory, driverCores)) {
+ ResourceCompiler.setSingleNodeResourceConfigs(driverMemory, driverCores);
+ program = ResourceCompiler.doFullRecompilation(program);
+ // no need of recompilation for single nodes with identical memory budget and #v. cores
+ for (CloudInstance dInstance: dCoresEntry.getValue()) {
+ // iterate over all driver nodes with the currently evaluated memory and #cores values
+ configurationPoint = new EnumerationUtils.ConfigurationPoint(dInstance);
+ double[] newEstimates = getCostEstimate(configurationPoint);
+ if (isInvalidConfiguration(newEstimates)) {
+ // mark the current CP memory budget as insufficient for single node execution
+ insufficientSingleNodeMemory = driverMemory;
+ break;
+ }
+ updateOptimalSolution(newEstimates[0], newEstimates[1], configurationPoint);
+ }
+ }
+ if (driverMemory >= singleNodeOnlyMemory) continue;
+ // enumeration for distributed execution
+ for (Entry>> eMemoryEntry: executorSpace.entrySet()) {
+ if (driverMemory >= singleNodeOnlyMemory) continue;
+ executorMemory = eMemoryEntry.getKey();
+ // loop over the search space to enumerate the executor configurations
+ for (Entry> eCoresEntry: eMemoryEntry.getValue().entrySet()) {
+ if (driverMemory >= singleNodeOnlyMemory) continue;
+ executorCores = eCoresEntry.getKey();
+ List numberExecutorsSet = estimateRangeExecutors(driverCores, eMemoryEntry.getKey(), eCoresEntry.getKey());
+ // variables for tracking the best possible number of executors for each executor instance type
+ double localBestCostScore = Double.MAX_VALUE;
+ int newLocalBestNumberExecutors = -1;
+ // for Spark execution mode
+ for (int numberExecutors: numberExecutorsSet) {
+ try {
+ ResourceCompiler.setSparkClusterResourceConfigs(
+ driverMemory,
+ driverCores,
+ numberExecutors,
+ executorMemory,
+ executorCores
+ );
+ } catch (IllegalArgumentException e) {
+ // insufficient driver memory detected
+ break;
+ }
+ program = ResourceCompiler.doFullRecompilation(program);
+ if (!hasSparkInstructions(program)) {
+ // mark the current CP memory budget as dominant for the global optimal solution
+ // -> higher CP memory could not introduce Spark operations
+ singleNodeOnlyMemory = driverMemory;
+ break;
+ }
+ // no need of recompilation for a cluster with identical #executors and
+ // with identical memory and #v. cores for driver and executor nodes
+ for (CloudInstance dInstance: dCoresEntry.getValue()) {
+ // iterate over all driver nodes with the currently evaluated memory and #cores values
+ for (CloudInstance eInstance: eCoresEntry.getValue()) {
+ // iterate over all executor nodes for the evaluated cluster size
+ // with the currently evaluated memory and #cores values
+ configurationPoint = new EnumerationUtils.ConfigurationPoint(
+ dInstance,
+ eInstance,
+ numberExecutors
+ );
+ double[] newEstimates = getCostEstimate(configurationPoint);
+ updateOptimalSolution(
+ newEstimates[0],
+ newEstimates[1],
+ configurationPoint
+ );
+ // now checking for cost improvements regarding the current executor instance type
+ // this is not in all cases part of the optimal solution because other
+ // solutions with 0 executors or executors of other instance type could be better
+ if (optStrategy == OptimizationStrategy.MinCosts) {
+ double optimalScore = linearScoringFunction(newEstimates[0], newEstimates[1]);
+ if (localBestCostScore > optimalScore) {
+ localBestCostScore = optimalScore;
+ newLocalBestNumberExecutors = configurationPoint.numberExecutors;
+ }
+ } else if (optStrategy == OptimizationStrategy.MinTime) {
+ if (localBestCostScore > newEstimates[0]) {
+ // do not check for max. price here
+ localBestCostScore = newEstimates[0];
+ newLocalBestNumberExecutors = configurationPoint.numberExecutors;
+ }
+ } else { // minPrice
+ if (localBestCostScore > newEstimates[1]) {
+ // do not check for max. time here
+ localBestCostScore = newEstimates[1];
+ newLocalBestNumberExecutors = configurationPoint.numberExecutors;
+ }
+ }
+ }
+ }
+ }
+ // evaluate the local best number of executors to avoid evaluating solutions with
+ // more executors in the next iterations
+ if (localBestCostScore < Double.MAX_VALUE && newLocalBestNumberExecutors > 0) {
+ long combinationHash = combineHash(executorMemory, executorCores);
+ maxExecutorsPerInstanceMap.put(combinationHash, newLocalBestNumberExecutors);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Override
+ public boolean evaluateSingleNodeExecution(long driverMemory, int cores) {
+ if (cores > CPU_QUOTA || minExecutors > 0) return false;
+ return insufficientSingleNodeMemory != driverMemory;
+ }
+
+ @Override
+ public ArrayList estimateRangeExecutors(int driverCores, long executorMemory, int executorCores) {
+ // consider the cpu quota (limit) for cloud instances and the 'local' best number of executors
+ // to decide for the range of number of executors to be evaluated next
+ int maxAchievableLevelOfParallelism = CPU_QUOTA - driverCores;
+ int currentMax = Math.min(maxExecutors, maxAchievableLevelOfParallelism / executorCores);
+ long combinationHash = combineHash(executorMemory, executorCores);
+ int maxExecutorsToConsider = maxExecutorsPerInstanceMap.get(combinationHash);
+ currentMax = Math.min(currentMax, maxExecutorsToConsider);
+ ArrayList result = new ArrayList<>();
+ for (int i = 1; i <= currentMax; i++) {
+ result.add(i);
+ }
+ return result;
+ }
+
+ // Helpers ---------------------------------------------------------------------------------------------------------
+
+ /**
+ * Ensures unique mapping for a combination of node memory and number of
+ * executor cores due to the discrete nature of the node memory given in bytes.
+ * The smallest margin for cloud instances would be around 500MB ~ 500*1024^2 bytes,
+ * what is by far larger than the maximum number of executors core physically possible.
+ *
+ * @param executorMemory node memory in bytes
+ * @param cores number virtual cores (physical threads) for the node
+ * @return hash value
+ */
+ public static long combineHash(long executorMemory, int cores) {
+ return executorMemory + cores;
+ }
+
+
+ public static boolean isInvalidConfiguration(double[] estimates) {
+ return estimates[0] == Double.MAX_VALUE && estimates[1] == Double.MAX_VALUE;
+ }
+
+ /**
+ * Checks for Spark instruction in the given runtime program.
+ * It excludes from the check instructions for reblock operations
+ * and for caching since these are not always remove from the runtime
+ * program even their outputs are never used and ignored at execution.
+ *
+ * @param program runtime program
+ * @return boolean to mark if
+ * the execution would execute any Spark operation
+ */
+ public static boolean hasSparkInstructions(Program program) {
+ boolean hasSparkInst;
+ Map funcMap = program.getFunctionProgramBlocks();
+ if( funcMap != null && !funcMap.isEmpty() )
+ {
+ for( Map.Entry e : funcMap.entrySet() ) {
+ String fkey = e.getKey();
+ FunctionProgramBlock fpb = e.getValue();
+ for(ProgramBlock pb : fpb.getChildBlocks()) {
+ hasSparkInst = hasSparkInstructions(pb);
+ if (hasSparkInst) return true;
+ }
+ if(program.containsFunctionProgramBlock(fkey, false) ) {
+ FunctionProgramBlock fpb2 = program.getFunctionProgramBlock(fkey, false);
+ for(ProgramBlock pb : fpb2.getChildBlocks()) {
+ hasSparkInst = hasSparkInstructions(pb);
+ if (hasSparkInst) return true;
+ }
+ }
+ }
+ }
+
+ for(ProgramBlock pb : program.getProgramBlocks()) {
+ hasSparkInst = hasSparkInstructions(pb);
+ if (hasSparkInst) return true;
+ }
+ return false;
+ }
+
+ private static boolean hasSparkInstructions(ProgramBlock pb) {
+ boolean hasSparkInst;
+ if (pb instanceof FunctionProgramBlock ) {
+ FunctionProgramBlock fpb = (FunctionProgramBlock)pb;
+ for(ProgramBlock pbc : fpb.getChildBlocks()) {
+ hasSparkInst = hasSparkInstructions(pbc);
+ if (hasSparkInst) return true;
+ }
+ }
+ else if (pb instanceof WhileProgramBlock) {
+ WhileProgramBlock wpb = (WhileProgramBlock) pb;
+ hasSparkInst = hasSparkInstructions(wpb.getPredicate());
+ if (hasSparkInst) return true;
+ for(ProgramBlock pbc : wpb.getChildBlocks()) {
+ hasSparkInst = hasSparkInstructions(pbc);
+ if (hasSparkInst) return true;
+ }
+ if(wpb.getExitInstruction() != null) {
+ hasSparkInst = hasSparkInstructions(Lists.newArrayList(wpb.getExitInstruction()));
+ return hasSparkInst;
+ }
+ }
+ else if (pb instanceof IfProgramBlock) {
+ IfProgramBlock ipb = (IfProgramBlock) pb;
+ hasSparkInst = hasSparkInstructions(ipb.getPredicate());
+ if (hasSparkInst) return true;
+ for(ProgramBlock pbc : ipb.getChildBlocksIfBody()) {
+ hasSparkInst = hasSparkInstructions(pbc);
+ if (hasSparkInst) return true;
+ }
+ if(!ipb.getChildBlocksElseBody().isEmpty()) {
+ for(ProgramBlock pbc : ipb.getChildBlocksElseBody()) {
+ hasSparkInst = hasSparkInstructions(pbc);
+ if (hasSparkInst) return true;
+ }
+ }
+ if(ipb.getExitInstruction() != null) {
+ hasSparkInst = hasSparkInstructions(Lists.newArrayList(ipb.getExitInstruction()));
+ return hasSparkInst;
+ }
+ }
+ else if (pb instanceof ForProgramBlock) { // for and parfor loops
+ ForProgramBlock fpb = (ForProgramBlock) pb;
+ hasSparkInst = hasSparkInstructions(fpb.getFromInstructions());
+ if (hasSparkInst) return true;
+ hasSparkInst = hasSparkInstructions(fpb.getToInstructions());
+ if (hasSparkInst) return true;
+ hasSparkInst = hasSparkInstructions(fpb.getIncrementInstructions());
+ if (hasSparkInst) return true;
+ for(ProgramBlock pbc : fpb.getChildBlocks()) {
+ hasSparkInst = hasSparkInstructions(pbc);
+ if (hasSparkInst) return true;
+ }
+ if (fpb.getExitInstruction() != null) {
+ hasSparkInst = hasSparkInstructions(Lists.newArrayList(fpb.getExitInstruction()));
+ return hasSparkInst;
+ }
+ }
+ else if( pb instanceof BasicProgramBlock ) {
+ BasicProgramBlock bpb = (BasicProgramBlock) pb;
+ hasSparkInst = hasSparkInstructions(bpb.getInstructions());
+ return hasSparkInst;
+ }
+ return false;
+ }
+
+ private static boolean hasSparkInstructions(List instructions) {
+ for (Instruction inst : instructions) {
+ Instruction.IType iType = inst.getType();
+ if (iType.equals(Instruction.IType.SPARK)) {
+ String opcode = inst.getOpcode();
+ if (!(opcode.contains("rblk") || opcode.contains("chkpoint"))) {
+ // reblock and checkpoint instructions may occur in a program
+ // compiled for hybrid execution mode but without effective Spark instruction
+ return true;
+ }
+ }
+ }
+ return false;
+ }
+}
diff --git a/src/test/java/org/apache/sysds/test/AutomatedTestBase.java b/src/test/java/org/apache/sysds/test/AutomatedTestBase.java
index 4c0f25d74a1..6b280301afb 100644
--- a/src/test/java/org/apache/sysds/test/AutomatedTestBase.java
+++ b/src/test/java/org/apache/sysds/test/AutomatedTestBase.java
@@ -136,7 +136,7 @@ public abstract class AutomatedTestBase {
protected static final String DATASET_DIR = "./src/test/resources/datasets/";
protected static final String INPUT_DIR = "in/";
protected static final String OUTPUT_DIR = "out/";
- protected static final String EXPECTED_DIR = "expected/";
+ protected static final String EXPECTED_DIR = "expectedInstances/";
/** Location where this class writes files for inspection if DEBUG is set to true. */
private static final String DEBUG_TEMP_DIR = "./tmp/";
@@ -1146,7 +1146,7 @@ protected void runRScript(boolean newWay) {
// *** HACK ALERT *** HACK ALERT *** HACK ALERT ***
// Some of the R scripts will fail if the "expected" directory doesn't exist.
// Make sure the directory exists.
- File expectedDir = new File(baseDirectory, "expected" + "/" + cacheDir);
+ File expectedDir = new File(baseDirectory, "expectedInstances" + "/" + cacheDir);
expectedDir.mkdirs();
// *** END HACK ***
diff --git a/src/test/java/org/apache/sysds/test/component/compress/mapping/MappingTests.java b/src/test/java/org/apache/sysds/test/component/compress/mapping/MappingTests.java
index c777a9bfd39..dc64e64f41d 100644
--- a/src/test/java/org/apache/sysds/test/component/compress/mapping/MappingTests.java
+++ b/src/test/java/org/apache/sysds/test/component/compress/mapping/MappingTests.java
@@ -270,7 +270,7 @@ public void replaceMin() {
public void getUnique() {
int u = m.getUnique();
if(max != u)
- fail("incorrect number of unique " + m + "\n expected" + max + " got" + u);
+ fail("incorrect number of unique " + m + "expectedInstances" + max + " got" + u);
}
@Test
diff --git a/src/test/java/org/apache/sysds/test/component/resource/CloudUtilsTests.java b/src/test/java/org/apache/sysds/test/component/resource/CloudUtilsTests.java
index b7e08ae35d7..3cd39148a14 100644
--- a/src/test/java/org/apache/sysds/test/component/resource/CloudUtilsTests.java
+++ b/src/test/java/org/apache/sysds/test/component/resource/CloudUtilsTests.java
@@ -19,39 +19,37 @@
package org.apache.sysds.test.component.resource;
-import org.apache.sysds.resource.AWSUtils;
import org.apache.sysds.resource.CloudInstance;
-import org.apache.sysds.resource.CloudUtils.InstanceType;
+import org.apache.sysds.resource.CloudUtils;
+import org.apache.sysds.resource.CloudUtils.InstanceFamily;
import org.apache.sysds.resource.CloudUtils.InstanceSize;
+import org.junit.Assert;
import org.junit.Test;
import java.io.File;
import java.io.IOException;
-import java.nio.file.Files;
import java.util.HashMap;
-import static org.apache.sysds.test.component.resource.TestingUtils.assertEqualsCloudInstances;
-import static org.apache.sysds.test.component.resource.TestingUtils.getSimpleCloudInstanceMap;
+import static org.apache.sysds.resource.CloudUtils.*;
+import static org.apache.sysds.test.component.resource.ResourceTestUtils.*;
import static org.junit.Assert.*;
@net.jcip.annotations.NotThreadSafe
public class CloudUtilsTests {
@Test
- public void getInstanceTypeAWSTest() {
- AWSUtils utils = new AWSUtils();
+ public void getInstanceFamilyTest() {
+ InstanceFamily expectedValue = InstanceFamily.M5;
+ CloudUtils.InstanceFamily actualValue;
- InstanceType expectedValue = InstanceType.M5;
- InstanceType actualValue;
-
- actualValue = utils.getInstanceType("m5.xlarge");
+ actualValue = CloudUtils.getInstanceFamily("m5.xlarge");
assertEquals(expectedValue, actualValue);
- actualValue = utils.getInstanceType("M5.XLARGE");
+ actualValue = CloudUtils.getInstanceFamily("M5.XLARGE");
assertEquals(expectedValue, actualValue);
try {
- utils.getInstanceType("NON-M5.xlarge");
+ CloudUtils.getInstanceFamily("NON-M5.xlarge");
fail("Throwing IllegalArgumentException was expected");
} catch (IllegalArgumentException e) {
// this block ensures correct execution of the test
@@ -59,20 +57,18 @@ public void getInstanceTypeAWSTest() {
}
@Test
- public void getInstanceSizeAWSTest() {
- AWSUtils utils = new AWSUtils();
-
+ public void getInstanceSizeTest() {
InstanceSize expectedValue = InstanceSize._XLARGE;
InstanceSize actualValue;
- actualValue = utils.getInstanceSize("m5.xlarge");
+ actualValue = CloudUtils.getInstanceSize("m5.xlarge");
assertEquals(expectedValue, actualValue);
- actualValue = utils.getInstanceSize("M5.XLARGE");
+ actualValue = CloudUtils.getInstanceSize("M5.XLARGE");
assertEquals(expectedValue, actualValue);
try {
- utils.getInstanceSize("m5.nonxlarge");
+ CloudUtils.getInstanceSize("m5.nonxlarge");
fail("Throwing IllegalArgumentException was expected");
} catch (IllegalArgumentException e) {
// this block ensures correct execution of the test
@@ -80,39 +76,145 @@ public void getInstanceSizeAWSTest() {
}
@Test
- public void validateInstanceNameAWSTest() {
- AWSUtils utils = new AWSUtils();
-
+ public void validateInstanceNameTest() {
// basic intel instance (old)
- assertTrue(utils.validateInstanceName("m5.2xlarge"));
- assertTrue(utils.validateInstanceName("M5.2XLARGE"));
+ assertTrue(CloudUtils.validateInstanceName("m5.2xlarge"));
+ assertTrue(CloudUtils.validateInstanceName("M5.2XLARGE"));
// basic intel instance (new)
- assertTrue(utils.validateInstanceName("m6i.xlarge"));
+ assertTrue(CloudUtils.validateInstanceName("m6i.xlarge"));
// basic amd instance
- assertTrue(utils.validateInstanceName("m6a.xlarge"));
+ assertTrue(CloudUtils.validateInstanceName("m6a.xlarge"));
// basic graviton instance
- assertTrue(utils.validateInstanceName("m6g.xlarge"));
+ assertTrue(CloudUtils.validateInstanceName("m6g.xlarge"));
// invalid values
- assertFalse(utils.validateInstanceName("v5.xlarge"));
- assertFalse(utils.validateInstanceName("m5.notlarge"));
- assertFalse(utils.validateInstanceName("m5xlarge"));
- assertFalse(utils.validateInstanceName(".xlarge"));
- assertFalse(utils.validateInstanceName("m5."));
+ assertFalse(CloudUtils.validateInstanceName("v5.xlarge"));
+ assertFalse(CloudUtils.validateInstanceName("m5.notlarge"));
+ assertFalse(CloudUtils.validateInstanceName("m5xlarge"));
+ assertFalse(CloudUtils.validateInstanceName(".xlarge"));
+ assertFalse(CloudUtils.validateInstanceName("m5."));
}
@Test
- public void loadCSVFileAWSTest() throws IOException {
- AWSUtils utils = new AWSUtils();
+ public void loadDefaultFeeTableTest() {
+ // test that the provided default file is accounted as valid by the function for loading
+ String[] regions = {
+ "us-east-1",
+ "us-east-2",
+ "us-west-1",
+ "us-west-2",
+ "ca-central-1",
+ "ca-west-1",
+ "af-south-1",
+ "ap-east-1",
+ "ap-south-2",
+ "ap-southeast-3",
+ "ap-southeast-5",
+ "ap-southeast-4",
+ "ap-south-1",
+ "ap-northeast-3",
+ "ap-northeast-2",
+ "ap-southeast-1",
+ "ap-southeast-2",
+ "ap-northeast-1",
+ "eu-central-1",
+ "eu-west-1",
+ "eu-west-2",
+ "eu-south-1",
+ "eu-west-3",
+ "eu-south-2",
+ "eu-north-1",
+ "eu-central-2",
+ "il-central-1",
+ "me-south-1",
+ "me-central-1",
+ "sa-east-1"
+ };
+
+ for (String region : regions) {
+ try {
+ double[] prices = CloudUtils.loadRegionalPrices(DEFAULT_REGIONAL_PRICE_TABLE, region);
+ double feeRatio = prices[0];
+ double ebsPrice = prices[1];
+ Assert.assertTrue(feeRatio >= 0.15 && feeRatio <= 0.25);
+ Assert.assertTrue(ebsPrice >= 0.08);
+ } catch (IOException e) {
+ Assert.fail("Throwing IOException not expected: " + e);
+ }
+ }
+ }
- File tmpFile = TestingUtils.generateTmpInstanceInfoTableFile();
+ @Test
+ public void loadingInstanceInfoTest() throws IOException {
+ // test the proper loading of the table
+ File file = ResourceTestUtils.getMinimalInstanceInfoTableFile();
- HashMap actual = utils.loadInstanceInfoTable(tmpFile.getPath());
+ HashMap actual = CloudUtils.loadInstanceInfoTable(file.getPath(), TEST_FEE_RATIO, TEST_STORAGE_PRICE);
HashMap expected = getSimpleCloudInstanceMap();
for (String instanceName: expected.keySet()) {
assertEqualsCloudInstances(expected.get(instanceName), actual.get(instanceName));
}
+ }
- Files.deleteIfExists(tmpFile.toPath());
+ @Test
+ public void loadDefaultInstanceInfoTableFileTest() throws IOException {
+ // test that the provided default file is accounted as valid by the function for loading
+ HashMap instanceMap = CloudUtils.loadInstanceInfoTable(DEFAULT_INSTANCE_INFO_TABLE, TEST_FEE_RATIO, TEST_STORAGE_PRICE);
+ // test if all instances from 'M', 'C' or 'R' families
+ // and if the minimum size is xlarge as required for EMR
+ for (String instanceType : instanceMap.keySet()) {
+ Assert.assertTrue(instanceType.startsWith("m") || instanceType.startsWith("c") || instanceType.startsWith("r"));
+ Assert.assertTrue(instanceType.contains("xlarge"));
+ }
+ }
+
+ @Test
+ public void getEffectiveExecutorResourcesGeneralCaseTest() {
+ long inputMemory = GBtoBytes(16);
+ int inputCores = 4;
+ int inputNumExecutors = 4;
+
+ int expectedAmMemoryMB = 768; // 512 + 256
+ int expectedAmMemoryOverhead = 384; // using the absolute minimum
+ int expectedExecutorMemoryMB = (int) (((0.75 * inputMemory / (1024 * 1024))
+ - (expectedAmMemoryMB + expectedAmMemoryOverhead)) / 1.1);
+ int expectedAmCores = 1;
+ int expectedExecutorCores = inputCores - expectedAmCores;
+
+ int[] result = getEffectiveExecutorResources(inputMemory, inputCores, inputNumExecutors);
+ int resultExecutorMemoryMB = result[0];
+ int resultExecutorCores = result[1];
+ int resultNumExecutors = result[2];
+ int resultAmMemoryMB = result[3];
+ int resultAmCores = result[4];
+
+ Assert.assertEquals(resultExecutorMemoryMB, expectedExecutorMemoryMB);
+ Assert.assertEquals(resultExecutorCores, expectedExecutorCores);
+ Assert.assertEquals(resultNumExecutors, inputNumExecutors);
+ Assert.assertEquals(resultAmMemoryMB, expectedAmMemoryMB);
+ Assert.assertEquals(resultAmCores, expectedAmCores);
+ }
+
+ @Test
+ public void getEffectiveExecutorResourcesEdgeCaseTest() {
+ // edge case -> large cluster with small machines -> dedicated machine for the AM
+ long inputMemory = GBtoBytes(8);
+ int inputCores = 4;
+ int inputNumExecutors = 48;
+
+ int expectedContainerMemoryMB = (int) (((0.75 * inputMemory / (1024 * 1024))) / 1.1);
+
+ int[] result = getEffectiveExecutorResources(inputMemory, inputCores, inputNumExecutors);
+ int resultExecutorMemoryMB = result[0];
+ int resultExecutorCores = result[1];
+ int resultNumExecutors = result[2];
+ int resultAmMemoryMB = result[3];
+ int resultAmCores = result[4];
+
+ Assert.assertEquals(resultExecutorMemoryMB, expectedContainerMemoryMB);
+ Assert.assertEquals(resultExecutorCores, inputCores);
+ Assert.assertEquals(resultNumExecutors, inputNumExecutors - 1);
+ Assert.assertEquals(resultAmMemoryMB, expectedContainerMemoryMB);
+ Assert.assertEquals(resultAmCores, inputCores);
}
}
diff --git a/src/test/java/org/apache/sysds/test/component/resource/CostEstimatorTest.java b/src/test/java/org/apache/sysds/test/component/resource/CostEstimatorTest.java
index f7ceaf9d50b..c9ef1b7109e 100644
--- a/src/test/java/org/apache/sysds/test/component/resource/CostEstimatorTest.java
+++ b/src/test/java/org/apache/sysds/test/component/resource/CostEstimatorTest.java
@@ -23,8 +23,10 @@
import java.io.FileReader;
import java.util.HashMap;
+import org.apache.sysds.conf.CompilerConfig;
import org.apache.sysds.resource.CloudInstance;
import org.apache.sysds.resource.ResourceCompiler;
+import org.apache.sysds.resource.cost.CostEstimationException;
import org.apache.sysds.utils.Explain;
import org.junit.Assert;
import org.junit.Test;
@@ -41,11 +43,12 @@
import org.apache.sysds.test.TestConfiguration;
import scala.Tuple2;
-import static org.apache.sysds.test.component.resource.TestingUtils.getSimpleCloudInstanceMap;
+import static org.apache.sysds.test.component.resource.ResourceTestUtils.*;
-public class CostEstimatorTest extends AutomatedTestBase
-{
- private static final boolean DEBUG_MODE = true;
+public class CostEstimatorTest extends AutomatedTestBase {
+ static {
+ ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.RESOURCE_OPTIMIZATION, true);
+ }
private static final String TEST_DIR = "component/resource/";
private static final String HOME = SCRIPT_DIR + TEST_DIR;
private static final String TEST_CLASS_DIR = TEST_DIR + CostEstimatorTest.class.getSimpleName() + "/";
@@ -56,34 +59,193 @@ public class CostEstimatorTest extends AutomatedTestBase
public void setUp() {}
@Test
- public void testL2SVMSingleNode() { runTest("Algorithm_L2SVM.dml", "m5.xlarge", null); }
+ public void L2SVMSingleNodeTest() {
+ try { // single node configuration
+ runTest("Algorithm_L2SVM.dml", "m5.xlarge", null);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
@Test
- public void testL2SVMHybrid() { runTest("Algorithm_L2SVM.dml", "m5.xlarge", "m5.xlarge"); }
+ public void L2SVMHybridTest() {
+ // m and n values force Spark operations
+ Tuple2 mVar = new Tuple2<>("$m", "100000");
+ Tuple2 nVar = new Tuple2<>("$n", "15000");
+ try {
+ runTest("Algorithm_L2SVM.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
@Test
- public void testLinregSingleNode() { runTest("Algorithm_Linreg.dml", "m5.xlarge", null); }
+ public void L2SVMSingleNodeOverHybridTest() {
+ // m and n values do NOT force Spark operations (4GB input)
+ Tuple2 mVar = new Tuple2<>("$m", "50000");
+ Tuple2 nVar = new Tuple2<>("$n", "10000");
+ double singleNodeTimeCost, clusterTimeCost;
+ try { // single node configuration
+ singleNodeTimeCost = runTest("Algorithm_L2SVM.dml", "m5.xlarge", null, mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+
+ try { // cluster configuration
+ clusterTimeCost = runTest("Algorithm_L2SVM.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+ // not equal because some operations are directly scheduled on spark in hybrid mode
+ Assert.assertTrue(singleNodeTimeCost <= clusterTimeCost);
+ }
+
+
@Test
- public void testLinregHybrid() { runTest("Algorithm_Linreg.dml", "m5.xlarge", "m5.xlarge"); }
+ public void LinregSingleNodeTest() {
+ try { // single node configuration
+ runTest("Algorithm_Linreg.dml", "m5.xlarge", null);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
@Test
- public void testPCASingleNode() { runTest("Algorithm_PCA.dml", "m5.xlarge", null); }
+ public void LinregHybridTest() {
+ // m and n values force Spark operations
+ Tuple2 mVar = new Tuple2<>("$m", "100000");
+ Tuple2 nVar = new Tuple2<>("$n", "15000");
+
+ try { // cluster configuration
+ runTest("Algorithm_Linreg.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
+
@Test
- public void testPCAHybrid() { runTest("Algorithm_PCA.dml", "m5.xlarge", "m5.xlarge"); }
+ public void LinregSingleNodeOverHybridTest() {
+ // m and n values do NOT force Spark operations (4GB input)
+ Tuple2 mVar = new Tuple2<>("$m", "50000");
+ Tuple2 nVar = new Tuple2<>("$n", "10000");
+ double singleNodeTimeCost, clusterTimeCost;
+ try { // single node configuration
+ singleNodeTimeCost = runTest("Algorithm_Linreg.dml", "m5.xlarge", null, mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+
+ try { // cluster configuration
+ clusterTimeCost = runTest("Algorithm_Linreg.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+ // not equal because some operations are directly scheduled on spark in hybrid mode
+ Assert.assertTrue(singleNodeTimeCost <= clusterTimeCost);
+ }
@Test
- public void testPNMFSingleNode() { runTest("Algorithm_PNMF.dml", "m5.xlarge", null); }
+ public void testPCASingleNode() {
+ try { // single node configuration
+ runTest("Algorithm_PCA.dml", "m5.xlarge", null);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
+ @Test
+ public void testPCAHybrid() {
+ // m and n values force Spark operations
+ Tuple2 mVar = new Tuple2<>("$m", "100000");
+ Tuple2 nVar = new Tuple2<>("$n", "15000");
+ try { // cluster configuration
+ runTest("Algorithm_PCA.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
@Test
- public void testPNMFHybrid() { runTest("Algorithm_PNMF.dml", "m5.xlarge", "m5.xlarge"); }
+ public void testPCASingleOverHybrid() {
+ // m and n values do Not force Spark operations
+ Tuple2 mVar = new Tuple2<>("$m", "40000");
+ Tuple2 nVar = new Tuple2<>("$n", "10000");
+ double singleNodeTimeCost, clusterTimeCost;
+ try { // single node configuration
+ singleNodeTimeCost = runTest("Algorithm_PCA.dml", "m5.xlarge", null, mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+
+ try { // cluster configuration
+ clusterTimeCost = runTest("Algorithm_PCA.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+ // not equal because some operations are directly scheduled on spark in hybrid mode
+ Assert.assertTrue(singleNodeTimeCost <= clusterTimeCost);
+ }
+
+ @Test
+ public void testPNMFSingleNode() {
+ try { // single node configuration
+ runTest("Algorithm_PNMF.dml", "m5.xlarge", null);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
+
+ @Test
+ public void testPNMFHybrid() {
+ // m and n values force Spark operations (80GB input)
+ Tuple2 mVar = new Tuple2<>("$m", "1000000");
+ Tuple2 nVar = new Tuple2<>("$n", "10000");
+ try { // cluster configuration
+ runTest("Algorithm_PNMF.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
+ }
+
+ @Test
+ public void testPNMFSingleNodeOverHybrid() {
+ // m and n values do NOT force Spark operations (4GB input)
+ Tuple2 mVar = new Tuple2<>("$m", "500000");
+ Tuple2 nVar = new Tuple2<>("$n", "1000");
+ double singleNodeTimeCost, clusterTimeCost;
+ try { // single node configuration
+ singleNodeTimeCost = runTest("Algorithm_PNMF.dml", "m5.xlarge", null, nVar, mVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+
+ try {
+ clusterTimeCost = runTest("Algorithm_PNMF.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ return;
+ }
+ // not equal because some operations are directly scheduled on spark in hybrid mode
+ Assert.assertTrue(singleNodeTimeCost <= clusterTimeCost);
+ }
@Test
public void testReadAndWriteSingleNode() {
Tuple2 arg1 = new Tuple2<>("$fileA", HOME+"data/A.csv");
Tuple2 arg2 = new Tuple2<>("$fileA_Csv", HOME+"data/A_copy.csv");
Tuple2 arg3 = new Tuple2<>("$fileA_Text", HOME+"data/A_copy_text.text");
- runTest("ReadAndWrite.dml", "m5.xlarge", null, arg1, arg2, arg3);
+ try {
+ runTest("ReadAndWrite.dml", "m5.xlarge", null, arg1, arg2, arg3);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
}
@Test
@@ -91,30 +253,46 @@ public void testReadAndWriteHybrid() {
Tuple2 arg1 = new Tuple2<>("$fileA", HOME+"data/A.csv");
Tuple2 arg2 = new Tuple2<>("$fileA_Csv", HOME+"data/A_copy.csv");
Tuple2 arg3 = new Tuple2<>("$fileA_Text", HOME+"data/A_copy_text.text");
- runTest("ReadAndWrite.dml", "c5.xlarge", "m5.xlarge", arg1, arg2, arg3);
+ try {
+ runTest("ReadAndWrite.dml", "c5.xlarge", "m5.xlarge", arg1, arg2, arg3);
+ } catch (CostEstimationException e) {
+ Assert.fail("Memory is expected to be sufficient, but exception thrown: " + e);
+ }
}
+ @Test
+ public void withInsufficientMem() {
+ // m and n values do NOT force Spark operations
+ Tuple2 mVar = new Tuple2<>("$m", "100000");
+ Tuple2 nVar = new Tuple2<>("$n", "10000");
+ try { // cluster configuration
+ runTest("Algorithm_Linreg.dml", "m5.xlarge", "m5.xlarge", mVar, nVar);
+ Assert.fail("Memory is expected to be insufficient, but no exception thrown: ");
+ } catch (CostEstimationException e) {
+ Assert.assertEquals(e.getMessage(), "Insufficient local memory");
+ }
+ }
+ // Helpers ---------------------------------------------------------------------------------------------------------
@SafeVarargs
- private void runTest(String scriptFilename, String driverInstance, String executorInstance, Tuple2...args) {
+ private double runTest(String scriptFilename, String driverInstance, String executorInstance, Tuple2...args) throws CostEstimationException {
CloudInstance driver;
CloudInstance executor;
try {
- // setting driver node is required
+ // setting CP (driver) node is required
driver = INSTANCE_MAP.get(driverInstance);
- ResourceCompiler.setDriverConfigurations(driver.getMemory(), driver.getVCPUs());
- // setting executor node is optional: no executor -> single node execution
+ // setting executor node is optional: no executors -> single node execution
if (executorInstance == null) {
executor = null;
- ResourceCompiler.setSingleNodeExecution();
+ ResourceCompiler.setSingleNodeResourceConfigs(driver.getMemory(), driver.getVCPUs());
} else {
executor = INSTANCE_MAP.get(executorInstance);
- ResourceCompiler.setExecutorConfigurations(DEFAULT_NUM_EXECUTORS, executor.getMemory(), executor.getVCPUs());
+ ResourceCompiler.setSparkClusterResourceConfigs(driver.getMemory(), driver.getVCPUs(), DEFAULT_NUM_EXECUTORS, executor.getMemory(), executor.getVCPUs());
}
} catch (Exception e) {
e.printStackTrace();
- throw new RuntimeException("Resource initialization for teh current test failed.");
+ throw new RuntimeException("Resource initialization for the current test failed.");
}
try
{
@@ -129,23 +307,23 @@ private void runTest(String scriptFilename, String driverInstance, String execut
DMLConfig conf = new DMLConfig(getCurConfigFile().getPath());
ConfigurationManager.setLocalConfig(conf);
-
- String dmlScriptString="";
+
// assign arguments
HashMap argVals = new HashMap<>();
for (Tuple2 arg : args)
argVals.put(arg._1, arg._2);
//read script
+ StringBuilder dmlScriptString= new StringBuilder();
try( BufferedReader in = new BufferedReader(new FileReader(HOME + scriptFilename)) ) {
- String s1 = null;
+ String s1;
while ((s1 = in.readLine()) != null)
- dmlScriptString += s1 + "\n";
+ dmlScriptString.append(s1).append("\n");
}
//simplified compilation chain
ParserWrapper parser = ParserFactory.createParser();
- DMLProgram prog = parser.parse(DMLScript.DML_FILE_PATH_ANTLR_PARSER, dmlScriptString, argVals);
+ DMLProgram prog = parser.parse(DMLScript.DML_FILE_PATH_ANTLR_PARSER, dmlScriptString.toString(), argVals);
DMLTranslator dmlt = new DMLTranslator(prog);
dmlt.liveVariableAnalysis(prog);
dmlt.validateParseTree(prog);
@@ -153,13 +331,18 @@ private void runTest(String scriptFilename, String driverInstance, String execut
dmlt.rewriteHopsDAG(prog);
dmlt.constructLops(prog);
Program rtprog = dmlt.getRuntimeProgram(prog, ConfigurationManager.getDMLConfig());
- if (DEBUG_MODE) System.out.println(Explain.explain(rtprog));
+ if (DEBUG) System.out.println(Explain.explain(rtprog));
double timeCost = CostEstimator.estimateExecutionTime(rtprog, driver, executor);
- if (DEBUG_MODE) System.out.println("Estimated execution time: " + timeCost + " seconds.");
+ if (DEBUG) System.out.println("Estimated execution time: " + timeCost + " seconds.");
// check error-free cost estimation and meaningful result
Assert.assertTrue(timeCost > 0);
+ // return time cost for further assertions
+ return timeCost;
}
catch(Exception e) {
+ if (e instanceof CostEstimationException)
+ throw new CostEstimationException(e.getMessage());
+ // else
e.printStackTrace();
throw new RuntimeException("Error at parsing the return program for cost estimation");
}
diff --git a/src/test/java/org/apache/sysds/test/component/resource/EnumeratorTests.java b/src/test/java/org/apache/sysds/test/component/resource/EnumeratorTests.java
index e3332643c21..5436193caf3 100644
--- a/src/test/java/org/apache/sysds/test/component/resource/EnumeratorTests.java
+++ b/src/test/java/org/apache/sysds/test/component/resource/EnumeratorTests.java
@@ -19,77 +19,94 @@
package org.apache.sysds.test.component.resource;
+import org.apache.sysds.conf.CompilerConfig;
+import org.apache.sysds.conf.ConfigurationManager;
import org.apache.sysds.hops.OptimizerUtils;
import org.apache.sysds.resource.CloudInstance;
-import org.apache.sysds.resource.enumeration.Enumerator;
-import org.apache.sysds.resource.enumeration.EnumerationUtils.InstanceSearchSpace;
import org.apache.sysds.resource.enumeration.EnumerationUtils.ConfigurationPoint;
import org.apache.sysds.resource.enumeration.EnumerationUtils.SolutionPoint;
+import org.apache.sysds.resource.enumeration.Enumerator;
+import org.apache.sysds.resource.enumeration.EnumerationUtils.InstanceSearchSpace;
import org.apache.sysds.resource.enumeration.InterestBasedEnumerator;
import org.apache.sysds.runtime.controlprogram.Program;
+import org.apache.sysds.test.AutomatedTestBase;
import org.junit.Assert;
import org.junit.Test;
import org.mockito.MockedStatic;
import org.mockito.Mockito;
-import java.io.File;
-import java.io.IOException;
-import java.nio.file.Files;
import java.util.ArrayList;
-import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
+import java.util.stream.Collectors;
import static org.apache.sysds.resource.CloudUtils.GBtoBytes;
+import static org.apache.sysds.test.component.resource.ResourceTestUtils.*;
import static org.junit.Assert.*;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.ArgumentMatchers.eq;
@net.jcip.annotations.NotThreadSafe
-public class EnumeratorTests {
-
- @Test
- public void loadInstanceTableTest() throws IOException {
- // loading the table is entirely implemented by the abstract class
- // use any enumerator
- Enumerator anyEnumerator = getGridBasedEnumeratorPrebuild()
- .withInstanceTypeRange(new String[]{"m5"})
- .withInstanceSizeRange(new String[]{"xlarge"})
- .build();
+public class EnumeratorTests extends AutomatedTestBase {
+ static {
+ ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.RESOURCE_OPTIMIZATION, true);
+ }
- File tmpFile = TestingUtils.generateTmpInstanceInfoTableFile();
- anyEnumerator.loadInstanceTableFile(tmpFile.toString());
+ @Override
+ public void setUp() {}
- HashMap actualInstances = anyEnumerator.getInstances();
+ @Test
+ public void builderWithInstanceRangeTest() {
+ // test the parsing of mechanism for instance family and instance size ranges
+ HashMap availableInstances = getSimpleCloudInstanceMap();
- Assert.assertEquals(1, actualInstances.size());
- Assert.assertNotNull(actualInstances.get("m5.xlarge"));
+ Enumerator defaultEnumerator = getGridBasedEnumeratorPrebuild().build();
+ Assert.assertEquals(availableInstances.size(), defaultEnumerator.getInstances().size());
- Files.deleteIfExists(tmpFile.toPath());
+ Enumerator enumeratorWithInstanceRanges = getGridBasedEnumeratorPrebuild()
+ .withInstanceFamilyRange(new String[]{"m5", "c5"})
+ .withInstanceSizeRange(new String[]{"xlarge"})
+ .build();
+ List expectedInstancesList = availableInstances.values().stream()
+ .filter(instance -> instance.getInstanceName().startsWith("m5.")
+ || instance.getInstanceName().startsWith("c5."))
+ .filter(instance -> instance.getInstanceName().endsWith(".xlarge"))
+ .collect(Collectors.toList());
+ HashMap actualInstancesMap = enumeratorWithInstanceRanges.getInstances();
+ for (CloudInstance expectedInstance : expectedInstancesList) {
+ Assert.assertTrue(
+ actualInstancesMap.containsKey(expectedInstance.getInstanceName())
+ );
+ }
}
@Test
public void preprocessingGridBasedTest() {
Enumerator gridBasedEnumerator = getGridBasedEnumeratorPrebuild().build();
- HashMap instances = TestingUtils.getSimpleCloudInstanceMap();
- gridBasedEnumerator.setInstanceTable(instances);
-
gridBasedEnumerator.preprocessing();
// assertions for driver space
InstanceSearchSpace driverSpace = gridBasedEnumerator.getDriverSpace();
- assertEquals(3, driverSpace.size());
+ assertEquals(4, driverSpace.size());
assertInstanceInSearchSpace("c5.xlarge", driverSpace, 8, 4, 0);
+ assertInstanceInSearchSpace("c5d.xlarge", driverSpace, 8, 4, 1);
+ assertInstanceInSearchSpace("c5n.xlarge", driverSpace, 10.5, 4, 0);
assertInstanceInSearchSpace("m5.xlarge", driverSpace, 16, 4, 0);
+ assertInstanceInSearchSpace("m5d.xlarge", driverSpace, 16, 4, 1);
+ assertInstanceInSearchSpace("m5n.xlarge", driverSpace, 16, 4, 2);
assertInstanceInSearchSpace("c5.2xlarge", driverSpace, 16, 8, 0);
assertInstanceInSearchSpace("m5.2xlarge", driverSpace, 32, 8, 0);
// assertions for executor space
InstanceSearchSpace executorSpace = gridBasedEnumerator.getDriverSpace();
- assertEquals(3, executorSpace.size());
+ assertEquals(4, executorSpace.size());
assertInstanceInSearchSpace("c5.xlarge", executorSpace, 8, 4, 0);
+ assertInstanceInSearchSpace("c5d.xlarge", executorSpace, 8, 4, 1);
+ assertInstanceInSearchSpace("c5n.xlarge", executorSpace, 10.5, 4, 0);
assertInstanceInSearchSpace("m5.xlarge", executorSpace, 16, 4, 0);
+ assertInstanceInSearchSpace("m5d.xlarge", executorSpace, 16, 4, 1);
+ assertInstanceInSearchSpace("m5n.xlarge", executorSpace, 16, 4, 2);
assertInstanceInSearchSpace("c5.2xlarge", executorSpace, 16, 8, 0);
assertInstanceInSearchSpace("m5.2xlarge", executorSpace, 32, 8, 0);
}
@@ -97,13 +114,10 @@ public void preprocessingGridBasedTest() {
@Test
public void preprocessingInterestBasedDriverMemoryTest() {
Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
- .withFitDriverMemory(true)
- .withFitBroadcastMemory(false)
+ .withInterestEstimatesInCP(true)
+ .withInterestBroadcastVars(false)
.build();
- HashMap instances = TestingUtils.getSimpleCloudInstanceMap();
- interestBasedEnumerator.setInstanceTable(instances);
-
// use 10GB (scaled) memory estimate to be between the available 8GB and 16GB driver node's memory
TreeSet mockingMemoryEstimates = new TreeSet<>(Set.of(GBtoBytes(10)));
try (MockedStatic mockedEnumerator =
@@ -119,16 +133,18 @@ public void preprocessingInterestBasedDriverMemoryTest() {
// assertions for driver space
InstanceSearchSpace driverSpace = interestBasedEnumerator.getDriverSpace();
- assertEquals(2, driverSpace.size());
+ assertEquals(1, driverSpace.size());
assertInstanceInSearchSpace("c5.xlarge", driverSpace, 8, 4, 0);
- assertInstanceInSearchSpace("m5.xlarge", driverSpace, 16, 4, 0);
- assertInstanceInSearchSpace("c5.2xlarge", driverSpace, 16, 8, 0);
Assert.assertNull(driverSpace.get(GBtoBytes(32)));
// assertions for executor space
InstanceSearchSpace executorSpace = interestBasedEnumerator.getExecutorSpace();
- assertEquals(3, executorSpace.size());
+ assertEquals(4, executorSpace.size());
assertInstanceInSearchSpace("c5.xlarge", executorSpace, 8, 4, 0);
+ assertInstanceInSearchSpace("c5d.xlarge", executorSpace, 8, 4, 1);
+ assertInstanceInSearchSpace("c5n.xlarge", executorSpace, 10.5, 4, 0);
assertInstanceInSearchSpace("m5.xlarge", executorSpace, 16, 4, 0);
+ assertInstanceInSearchSpace("m5d.xlarge", executorSpace, 16, 4, 1);
+ assertInstanceInSearchSpace("m5n.xlarge", executorSpace, 16, 4, 2);
assertInstanceInSearchSpace("c5.2xlarge", executorSpace, 16, 8, 0);
assertInstanceInSearchSpace("m5.2xlarge", executorSpace, 32, 8, 0);
}
@@ -136,13 +152,10 @@ public void preprocessingInterestBasedDriverMemoryTest() {
@Test
public void preprocessingInterestBasedBroadcastMemoryTest() {
Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
- .withFitDriverMemory(false)
- .withFitBroadcastMemory(true)
+ .withInterestEstimatesInCP(false)
+ .withInterestBroadcastVars(true)
.build();
- HashMap instances = TestingUtils.getSimpleCloudInstanceMap();
- interestBasedEnumerator.setInstanceTable(instances);
-
double outputEstimate = 2.5;
double scaledOutputEstimateBroadcast = outputEstimate / InterestBasedEnumerator.BROADCAST_MEMORY_FACTOR; // ~=12
// scaledOutputEstimateCP = 2 * outputEstimate / OptimizerUtils.MEM_UTIL_FACTOR ~= 7
@@ -167,10 +180,138 @@ public void preprocessingInterestBasedBroadcastMemoryTest() {
// assertions for executor space
InstanceSearchSpace executorSpace = interestBasedEnumerator.getExecutorSpace();
assertEquals(2, executorSpace.size());
- assertInstanceInSearchSpace("c5.xlarge", executorSpace, 8, 4, 0);
assertInstanceInSearchSpace("m5.xlarge", executorSpace, 16, 4, 0);
- assertInstanceInSearchSpace("c5.2xlarge", executorSpace, 16, 8, 0);
- Assert.assertNull(executorSpace.get(GBtoBytes(32)));
+ assertInstanceInSearchSpace("m5d.xlarge", executorSpace, 16, 4, 1);
+ assertInstanceInSearchSpace("m5n.xlarge", executorSpace, 16, 4, 2);
+ assertInstanceInSearchSpace("m5.2xlarge", executorSpace, 32, 8, 0);
+ Assert.assertNull(executorSpace.get(GBtoBytes(10.5)));
+ }
+
+ @Test
+ public void updateOptimalSolutionMinCostsTest() {
+ ConfigurationPoint dummyConfig = new ConfigurationPoint(null, null, -1);
+ SolutionPoint currentSolution;
+ Program emptyProgram = new Program();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
+ Enumerator enumerator = (new Enumerator.Builder())
+ .withRuntimeProgram(emptyProgram)
+ .withAvailableInstances(instances)
+ .withEnumerationStrategy(Enumerator.EnumerationStrategy.GridBased)
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinCosts)
+ .build();
+
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(100, 100, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(100, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(100, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(90, 1000, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(100, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(100, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(200, 99, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(100, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(100, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(101, 99, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(101, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(99, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(99, 100, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(101, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(99, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(0.5, 100, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(0.5, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(100, currentSolution.getMonetaryCost(), 0);
+ }
+
+ @Test
+ public void updateOptimalSolutionMinTimeTest() {
+ ConfigurationPoint dummyConfig = new ConfigurationPoint(null, null, -1);
+ SolutionPoint currentSolution;
+ Program emptyProgram = new Program();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
+ Enumerator.setMinPrice(100);
+ Enumerator enumerator = (new Enumerator.Builder())
+ .withRuntimeProgram(emptyProgram)
+ .withAvailableInstances(instances)
+ .withEnumerationStrategy(Enumerator.EnumerationStrategy.GridBased)
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinTime)
+ .build();
+
+
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(100, 101, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(90, 100, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(90, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(100, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(80, 10, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(80, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(10, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(10, 100, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(10, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(100, currentSolution.getMonetaryCost(), 0);
+ }
+
+ @Test
+ public void updateOptimalSolutionMinPriceTest() {
+ ConfigurationPoint dummyConfig = new ConfigurationPoint(null, null, -1);
+ SolutionPoint currentSolution;
+ Program emptyProgram = new Program();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
+ Enumerator.setMinTime(600);
+ Enumerator enumerator = (new Enumerator.Builder())
+ .withRuntimeProgram(emptyProgram)
+ .withAvailableInstances(instances)
+ .withEnumerationStrategy(Enumerator.EnumerationStrategy.GridBased)
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice)
+ .build();
+
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(601, 100, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(Double.MAX_VALUE, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(100, 90, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(100, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(90, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(10, 80, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(10, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(80, currentSolution.getMonetaryCost(), 0);
+
+ enumerator.updateOptimalSolution(100, 10, dummyConfig);
+ currentSolution = enumerator.getOptimalSolution();
+ Assert.assertEquals(100, currentSolution.getTimeCost(), 0);
+ Assert.assertEquals(10, currentSolution.getMonetaryCost(), 0);
}
@Test
@@ -183,7 +324,7 @@ public void evaluateSingleNodeExecutionGridBasedTest() {
.build();
// memory not relevant for grid-based enumerator
- result = gridBasedEnumerator.evaluateSingleNodeExecution(-1);
+ result = gridBasedEnumerator.evaluateSingleNodeExecution(-1, 1);
Assert.assertTrue(result);
gridBasedEnumerator = getGridBasedEnumeratorPrebuild()
@@ -191,7 +332,7 @@ public void evaluateSingleNodeExecutionGridBasedTest() {
.build();
// memory not relevant for grid-based enumerator
- result = gridBasedEnumerator.evaluateSingleNodeExecution(-1);
+ result = gridBasedEnumerator.evaluateSingleNodeExecution(-1, 1);
Assert.assertFalse(result);
}
@@ -206,11 +347,11 @@ public void estimateRangeExecutorsGridBasedStepSizeTest() {
.build();
// test the general case when the max level of parallelism is not reached (0 is never part of the result)
List expectedResult = new ArrayList<>(List.of(2, 4, 6, 8, 10));
- List actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 4);
+ List actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 4);
Assert.assertEquals(expectedResult, actualResult);
// test the case when the max level of parallelism (1000) is reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(2, 4));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 200);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 200);
Assert.assertEquals(expectedResult, actualResult);
// num. executors range not starting from zero and without step size given
@@ -219,11 +360,11 @@ public void estimateRangeExecutorsGridBasedStepSizeTest() {
.build();
// test the general case when the max level of parallelism is not reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(3, 4, 5, 6, 7, 8));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 4);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 4);
Assert.assertEquals(expectedResult, actualResult);
// test the case when the max level of parallelism (1000) is reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(3, 4, 5));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 200);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 200);
Assert.assertEquals(expectedResult, actualResult);
}
@@ -240,11 +381,11 @@ public void estimateRangeExecutorsGridBasedExpBaseTest() {
.build();
// test the general case when the max level of parallelism is not reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(1, 2, 4, 8));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 4);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 4);
Assert.assertEquals(expectedResult, actualResult);
// test the case when the max level of parallelism (1000) is reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(1, 2, 4));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 200);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 200);
Assert.assertEquals(expectedResult, actualResult);
// num. executors range not starting from zero and with exponential base = 3
@@ -254,11 +395,11 @@ public void estimateRangeExecutorsGridBasedExpBaseTest() {
.build();
// test the general case when the max level of parallelism is not reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(3,9, 27));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 4);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 4);
Assert.assertEquals(expectedResult, actualResult);
// test the case when the max level of parallelism (1000) is reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(3,9));
- actualResult = gridBasedEnumerator.estimateRangeExecutors(-1, 100);
+ actualResult = gridBasedEnumerator.estimateRangeExecutors(1, -1, 100);
Assert.assertEquals(expectedResult, actualResult);
}
@@ -266,17 +407,14 @@ public void estimateRangeExecutorsGridBasedExpBaseTest() {
public void evaluateSingleNodeExecutionInterestBasedTest() {
boolean result;
- // no fitting the memory estimates for checkpointing
+ // no fitting the memory estimates for caching
Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
.withNumberExecutorsRange(0, 5)
- .withFitDriverMemory(false)
- .withFitBroadcastMemory(false)
- .withCheckSingleNodeExecution(true)
+ .withInterestEstimatesInCP(false)
+ .withInterestBroadcastVars(false)
+ .withInterestLargestEstimate(true)
.build();
- HashMap instances = TestingUtils.getSimpleCloudInstanceMap();
- interestBasedEnumerator.setInstanceTable(instances);
-
TreeSet mockingMemoryEstimates = new TreeSet<>(Set.of(GBtoBytes(6), GBtoBytes(12)));
try (MockedStatic mockedEnumerator =
Mockito.mockStatic(InterestBasedEnumerator.class, Mockito.CALLS_REAL_METHODS)) {
@@ -284,32 +422,51 @@ public void evaluateSingleNodeExecutionInterestBasedTest() {
.when(() -> InterestBasedEnumerator.getMemoryEstimates(
any(Program.class),
eq(false),
- eq(OptimizerUtils.MEM_UTIL_FACTOR)))
+ eq(InterestBasedEnumerator.MEMORY_FACTOR)))
.thenReturn(mockingMemoryEstimates);
// initiate memoryEstimatesSpark
interestBasedEnumerator.preprocessing();
}
- result = interestBasedEnumerator.evaluateSingleNodeExecution(GBtoBytes(8));
+ result = interestBasedEnumerator.evaluateSingleNodeExecution(GBtoBytes(8), 1);
Assert.assertFalse(result);
}
@Test
- public void estimateRangeExecutorsInterestBasedGeneralTest() {
+ public void estimateRangeExecutorsInterestBasedAllEnabledTest() {
+ ArrayList expectedResult;
+ ArrayListactualResult;
+
+ // no fitting the memory estimates for checkpointing
+ Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
+ .withNumberExecutorsRange(0, 5)
+ .withInterestOutputCaching(true)
+ .build();
+ interestBasedEnumerator.preprocessing();
+ // test the general case of limiting to only one executor for the empty program (no memory estimates)
+ expectedResult = new ArrayList<>(List.of(1));
+ actualResult = interestBasedEnumerator.estimateRangeExecutors(1, -1, 100);
+ Assert.assertEquals(expectedResult, actualResult);
+ }
+
+ @Test
+ public void estimateRangeExecutorsInterestBasedNoInterestOutputCachingTest() {
ArrayList expectedResult;
ArrayListactualResult;
// no fitting the memory estimates for checkpointing
Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
.withNumberExecutorsRange(0, 5)
+ .withInterestOutputCaching(false) // explicit but also default
.build();
+ interestBasedEnumerator.preprocessing();
// test the general case when the max level of parallelism is not reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(1, 2, 3, 4, 5));
- actualResult = interestBasedEnumerator.estimateRangeExecutors(-1, 4);
+ actualResult = interestBasedEnumerator.estimateRangeExecutors(1, -1, 4);
Assert.assertEquals(expectedResult, actualResult);
- // test the case when the max level of parallelism (1000) is reached (0 is never part of the result)
- expectedResult = new ArrayList<>(List.of(1, 2, 3));
- actualResult = interestBasedEnumerator.estimateRangeExecutors(-1, 256);
+ // test the case when the max level of parallelism (1152) is reached (0 is never part of the result)
+ expectedResult = new ArrayList<>(List.of(1, 2, 3, 4));
+ actualResult = interestBasedEnumerator.estimateRangeExecutors(1, -1, 256);
Assert.assertEquals(expectedResult, actualResult);
}
@@ -318,14 +475,11 @@ public void estimateRangeExecutorsInterestBasedCheckpointMemoryTest() {
// fitting the memory estimates for checkpointing
Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
.withNumberExecutorsRange(0, 5)
- .withFitCheckpointMemory(true)
- .withFitDriverMemory(false)
- .withFitBroadcastMemory(false)
+ .withInterestEstimatesInCP(false)
+ .withInterestBroadcastVars(false)
+ .withInterestOutputCaching(true)
.build();
- HashMap instances = TestingUtils.getSimpleCloudInstanceMap();
- interestBasedEnumerator.setInstanceTable(instances);
-
TreeSet mockingMemoryEstimates = new TreeSet<>(Set.of(GBtoBytes(20), GBtoBytes(40)));
try (MockedStatic mockedEnumerator =
Mockito.mockStatic(InterestBasedEnumerator.class, Mockito.CALLS_REAL_METHODS)) {
@@ -333,7 +487,7 @@ public void estimateRangeExecutorsInterestBasedCheckpointMemoryTest() {
.when(() -> InterestBasedEnumerator.getMemoryEstimates(
any(Program.class),
eq(true),
- eq(InterestBasedEnumerator.BROADCAST_MEMORY_FACTOR)))
+ eq(InterestBasedEnumerator.CACHE_MEMORY_FACTOR)))
.thenReturn(mockingMemoryEstimates);
// initiate memoryEstimatesSpark
interestBasedEnumerator.preprocessing();
@@ -341,11 +495,11 @@ public void estimateRangeExecutorsInterestBasedCheckpointMemoryTest() {
// test the general case when the max level of parallelism is not reached (0 is never part of the result)
List expectedResult = new ArrayList<>(List.of(1, 2, 3));
- List actualResult = interestBasedEnumerator.estimateRangeExecutors(GBtoBytes(16), 4);
+ List actualResult = interestBasedEnumerator.estimateRangeExecutors(1, GBtoBytes(16), 4);
Assert.assertEquals(expectedResult, actualResult);
// test the case when the max level of parallelism (1000) is reached (0 is never part of the result)
expectedResult = new ArrayList<>(List.of(1, 2));
- actualResult = interestBasedEnumerator.estimateRangeExecutors(GBtoBytes(16), 500);
+ actualResult = interestBasedEnumerator.estimateRangeExecutors(1, GBtoBytes(16), 500);
Assert.assertEquals(expectedResult, actualResult);
}
@@ -353,7 +507,6 @@ public void estimateRangeExecutorsInterestBasedCheckpointMemoryTest() {
public void processingTest() {
// all implemented enumerators should enumerate the same solution pool in this basic case - empty program
Enumerator gridBasedEnumerator = getGridBasedEnumeratorPrebuild()
- .withTimeLimit(Double.MAX_VALUE)
.withNumberExecutorsRange(0, 2)
.build();
@@ -361,58 +514,33 @@ public void processingTest() {
.withNumberExecutorsRange(0, 2)
.build();
- HashMap instances = TestingUtils.getSimpleCloudInstanceMap();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
InstanceSearchSpace space = new InstanceSearchSpace();
space.initSpace(instances);
// run processing for the grid based enumerator
gridBasedEnumerator.setDriverSpace(space);
gridBasedEnumerator.setExecutorSpace(space);
+ gridBasedEnumerator.preprocessing();
gridBasedEnumerator.processing();
- ArrayList actualSolutionPoolGB = gridBasedEnumerator.getSolutionPool();
+ SolutionPoint actualSolutionGB = gridBasedEnumerator.postprocessing();
// run processing for the interest based enumerator
interestBasedEnumerator.setDriverSpace(space);
interestBasedEnumerator.setExecutorSpace(space);
+ interestBasedEnumerator.preprocessing();
interestBasedEnumerator.processing();
- ArrayList actualSolutionPoolIB = gridBasedEnumerator.getSolutionPool();
-
-
- List expectedInstances = new ArrayList<>(Arrays.asList(
- instances.get("c5.xlarge")
- ));
- // expected solution pool with 0 executors (number executors = 0, executors and executorInstance being null)
- // with a single solution -> the cheapest instance for the driver
- Assert.assertEquals(expectedInstances.size(), actualSolutionPoolGB.size());
- Assert.assertEquals(expectedInstances.size(), actualSolutionPoolIB.size());
- for (int i = 0; i < expectedInstances.size(); i++) {
- SolutionPoint pointGB = actualSolutionPoolGB.get(i);
- Assert.assertEquals(0, pointGB.numberExecutors);
- Assert.assertEquals(expectedInstances.get(i), pointGB.driverInstance);
- Assert.assertNull(pointGB.executorInstance);
- SolutionPoint pointIB = actualSolutionPoolGB.get(i);
- Assert.assertEquals(0, pointIB.numberExecutors);
- Assert.assertEquals(expectedInstances.get(i), pointIB.driverInstance);
- Assert.assertNull(pointIB.executorInstance);
- }
- }
-
- @Test
- public void postprocessingTest() {
- // postprocessing equivalent for all types of enumerators
- Enumerator enumerator = getGridBasedEnumeratorPrebuild().build();
- // construct solution pool
- // first dummy configuration point since not relevant for postprocessing
- ConfigurationPoint dummyPoint = new ConfigurationPoint(null);
- SolutionPoint solution1 = new SolutionPoint(dummyPoint, 1000, 1000);
- SolutionPoint solution2 = new SolutionPoint(dummyPoint, 900, 1000); // optimal point
- SolutionPoint solution3 = new SolutionPoint(dummyPoint, 800, 10000);
- SolutionPoint solution4 = new SolutionPoint(dummyPoint, 1000, 10000);
- SolutionPoint solution5 = new SolutionPoint(dummyPoint, 900, 10000);
- ArrayList mockListSolutions = new ArrayList<>(List.of(solution1, solution2, solution3, solution4, solution5));
- enumerator.setSolutionPool(mockListSolutions);
-
- SolutionPoint optimalSolution = enumerator.postprocessing();
- assertEquals(solution2, optimalSolution);
+ SolutionPoint actualSolutionIB = gridBasedEnumerator.postprocessing();
+
+ // expected solution with 0 executors (number executors = 0, executors and executorInstance being null)
+ // and the cheapest instance for the driver
+ // Grid-Based
+ Assert.assertEquals(0, actualSolutionGB.numberExecutors);
+ assertEqualsCloudInstances(instances.get("c5.xlarge"), actualSolutionGB.driverInstance);
+ Assert.assertNull(actualSolutionIB.executorInstance);
+ // Interest-Based
+ Assert.assertEquals(0, actualSolutionIB.numberExecutors);
+ assertEqualsCloudInstances(instances.get("c5.xlarge"), actualSolutionIB.driverInstance);
+ Assert.assertNull(actualSolutionIB.executorInstance);
}
@Test
@@ -421,8 +549,6 @@ public void GridBasedEnumerationMinPriceTest() {
.withNumberExecutorsRange(0, 2)
.build();
- gridBasedEnumerator.setInstanceTable(TestingUtils.getSimpleCloudInstanceMap());
-
gridBasedEnumerator.preprocessing();
gridBasedEnumerator.processing();
SolutionPoint solution = gridBasedEnumerator.postprocessing();
@@ -439,8 +565,6 @@ public void InterestBasedEnumerationMinPriceTest() {
.withNumberExecutorsRange(0, 2)
.build();
- interestBasedEnumerator.setInstanceTable(TestingUtils.getSimpleCloudInstanceMap());
-
interestBasedEnumerator.preprocessing();
interestBasedEnumerator.processing();
SolutionPoint solution = interestBasedEnumerator.postprocessing();
@@ -454,13 +578,10 @@ public void InterestBasedEnumerationMinPriceTest() {
@Test
public void GridBasedEnumerationMinTimeTest() {
Enumerator gridBasedEnumerator = getGridBasedEnumeratorPrebuild()
- .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinTime)
- .withBudget(Double.MAX_VALUE)
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice)
.withNumberExecutorsRange(0, 2)
.build();
- gridBasedEnumerator.setInstanceTable(TestingUtils.getSimpleCloudInstanceMap());
-
gridBasedEnumerator.preprocessing();
gridBasedEnumerator.processing();
SolutionPoint solution = gridBasedEnumerator.postprocessing();
@@ -475,12 +596,9 @@ public void GridBasedEnumerationMinTimeTest() {
public void InterestBasedEnumerationMinTimeTest() {
Enumerator interestBasedEnumerator = getInterestBasedEnumeratorPrebuild()
.withOptimizationStrategy(Enumerator.OptimizationStrategy.MinTime)
- .withBudget(Double.MAX_VALUE)
.withNumberExecutorsRange(0, 2)
.build();
- interestBasedEnumerator.setInstanceTable(TestingUtils.getSimpleCloudInstanceMap());
-
interestBasedEnumerator.preprocessing();
interestBasedEnumerator.processing();
SolutionPoint solution = interestBasedEnumerator.postprocessing();
@@ -491,29 +609,59 @@ public void InterestBasedEnumerationMinTimeTest() {
Assert.assertEquals(0, solution.numberExecutors);
}
- // Helpers
+ @Test
+ public void PruneBasedEnumerationMinTimeTest() {
+ Enumerator pruneBasedEnumerator = getPruneBasedEnumeratorPrebuild()
+ .withNumberExecutorsRange(0, 2)
+ .build();
+
+ pruneBasedEnumerator.preprocessing();
+ pruneBasedEnumerator.processing();
+ SolutionPoint solution = pruneBasedEnumerator.postprocessing();
+
+ // expected c5.xlarge since it is the cheaper
+ Assert.assertEquals("c5.xlarge", solution.driverInstance.getInstanceName());
+ // expected no executor nodes since tested for a 'zero' program
+ Assert.assertEquals(0, solution.numberExecutors);
+ }
+
+
+ // Helpers ---------------------------------------------------------------------------------------------------------
+
private static Enumerator.Builder getGridBasedEnumeratorPrebuild() {
Program emptyProgram = new Program();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
return (new Enumerator.Builder())
.withRuntimeProgram(emptyProgram)
+ .withAvailableInstances(instances)
.withEnumerationStrategy(Enumerator.EnumerationStrategy.GridBased)
- .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice)
- .withTimeLimit(Double.MAX_VALUE);
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice);
}
private static Enumerator.Builder getInterestBasedEnumeratorPrebuild() {
Program emptyProgram = new Program();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
return (new Enumerator.Builder())
.withRuntimeProgram(emptyProgram)
+ .withAvailableInstances(instances)
.withEnumerationStrategy(Enumerator.EnumerationStrategy.InterestBased)
- .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice)
- .withTimeLimit(Double.MAX_VALUE);
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice);
+ }
+
+ private static Enumerator.Builder getPruneBasedEnumeratorPrebuild() {
+ Program emptyProgram = new Program();
+ HashMap instances = ResourceTestUtils.getSimpleCloudInstanceMap();
+ return (new Enumerator.Builder())
+ .withRuntimeProgram(emptyProgram)
+ .withAvailableInstances(instances)
+ .withEnumerationStrategy(Enumerator.EnumerationStrategy.PruneBased)
+ .withOptimizationStrategy(Enumerator.OptimizationStrategy.MinPrice);
}
private static void assertInstanceInSearchSpace(
String expectedName,
InstanceSearchSpace searchSpace,
- int memory, /* in GB */
+ double memory, /* in GB */
int cores,
int index
) {
diff --git a/src/test/java/org/apache/sysds/test/component/resource/InstructionsCostEstimatorTest.java b/src/test/java/org/apache/sysds/test/component/resource/InstructionsCostEstimatorTest.java
index 452e82e1442..eee25792709 100644
--- a/src/test/java/org/apache/sysds/test/component/resource/InstructionsCostEstimatorTest.java
+++ b/src/test/java/org/apache/sysds/test/component/resource/InstructionsCostEstimatorTest.java
@@ -40,7 +40,7 @@
import java.util.HashMap;
import static org.apache.sysds.resource.CloudUtils.GBtoBytes;
-import static org.apache.sysds.test.component.resource.TestingUtils.getSimpleCloudInstanceMap;
+import static org.apache.sysds.test.component.resource.ResourceTestUtils.getSimpleCloudInstanceMap;
public class InstructionsCostEstimatorTest {
private static final HashMap instanceMap = getSimpleCloudInstanceMap();
@@ -49,15 +49,13 @@ public class InstructionsCostEstimatorTest {
@Before
public void setup() {
- ResourceCompiler.setDriverConfigurations(GBtoBytes(8), 4);
- ResourceCompiler.setExecutorConfigurations(4, GBtoBytes(8), 4);
+ ResourceCompiler.setSparkClusterResourceConfigs(GBtoBytes(8), 4, 4, GBtoBytes(8), 4);
estimator = new CostEstimator(new Program(), instanceMap.get("m5.xlarge"), instanceMap.get("m5.xlarge"));
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- // Tests for CP Instructions //
- ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
-
+ // Tests for CP Instructions
+
@Test
public void createvarMatrixVariableCPInstructionTest() throws CostEstimationException {
String instDefinition = "CP°createvar°testVar°testOutputFile°false°MATRIX°binary°100°100°1000°10000°COPY";
@@ -123,9 +121,8 @@ public void randCPInstructionExceedMemoryBudgetTest() {
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- // Tests for Spark Instructions //
- ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
-
+ // Tests for Spark Instructions
+
@Test
public void plusBinaryMatrixMatrixSpInstructionTest() throws CostEstimationException {
HashMap inputStats = new HashMap<>();
@@ -138,8 +135,7 @@ public void plusBinaryMatrixMatrixSpInstructionTest() throws CostEstimationExcep
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- // Helper methods for testing Instructions //
- ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
+ // Helper methods for testing Instructions
private VarStats generateStats(String name, long m, long n, long nnz) {
MatrixCharacteristics mc = new MatrixCharacteristics(m, n, nnz);
diff --git a/src/test/java/org/apache/sysds/test/component/resource/RecompilationTest.java b/src/test/java/org/apache/sysds/test/component/resource/RecompilationTest.java
index f80198290f1..178ae2f46ad 100644
--- a/src/test/java/org/apache/sysds/test/component/resource/RecompilationTest.java
+++ b/src/test/java/org/apache/sysds/test/component/resource/RecompilationTest.java
@@ -19,6 +19,9 @@
package org.apache.sysds.test.component.resource;
+import org.apache.sysds.conf.CompilerConfig;
+import org.apache.sysds.conf.ConfigurationManager;
+import org.apache.sysds.resource.CloudUtils;
import org.apache.sysds.resource.ResourceCompiler;
import org.apache.sysds.runtime.controlprogram.BasicProgramBlock;
import org.apache.sysds.runtime.controlprogram.Program;
@@ -32,15 +35,18 @@
import org.junit.Test;
import java.io.IOException;
-import java.util.HashMap;
-import java.util.Map;
-import java.util.Objects;
-import java.util.Optional;
+import java.util.*;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import static org.apache.sysds.resource.CloudUtils.getEffectiveExecutorResources;
import static org.apache.sysds.runtime.controlprogram.context.SparkExecutionContext.SparkClusterConfig.RESERVED_SYSTEM_MEMORY_BYTES;
public class RecompilationTest extends AutomatedTestBase {
- private static final boolean DEBUG_MODE = true;
+ static {
+ ConfigurationManager.getCompilerConfig().set(CompilerConfig.ConfigType.RESOURCE_OPTIMIZATION, true);
+ }
+ private static final boolean DEBUG_MODE = false;
private static final String TEST_DIR = "component/resource/";
private static final String TEST_DATA_DIR = "component/resource/data/";
private static final String HOME = SCRIPT_DIR + TEST_DIR;
@@ -55,29 +61,37 @@ public void setUp() {}
// Tests for setting cluster configurations ------------------------------------------------------------------------
@Test
- public void testSetDriverConfigurations() {
+ public void testSetSingleNodeResourceConfigs() {
long nodeMemory = 1024*1024*1024; // 1GB
long expectedMemory = (long) (0.9 * nodeMemory);
int expectedThreads = 4;
- ResourceCompiler.setDriverConfigurations(nodeMemory, expectedThreads);
+ ResourceCompiler.setSingleNodeResourceConfigs(nodeMemory, expectedThreads);
Assert.assertEquals(expectedMemory, InfrastructureAnalyzer.getLocalMaxMemory());
Assert.assertEquals(expectedThreads, InfrastructureAnalyzer.getLocalParallelism());
}
@Test
- public void testSetExecutorConfigurations() {
+ public void testSetSparkClusterResourceConfigs() {
+ long driverMemory = 4L*1024 * 1024 * 1024; // 1GB
+ int driverThreads = 4;
int numberExecutors = 10;
- long executorMemory = 1024*1024*1024; // 1GB
- long expectedMemoryBudget = (long) (numberExecutors*(executorMemory-RESERVED_SYSTEM_MEMORY_BYTES)*0.6);
+ long executorMemory = 1024 * 1024 * 1024; // 1GB
int executorThreads = 4;
- int expectedParallelism = numberExecutors*executorThreads;
-
- ResourceCompiler.setExecutorConfigurations(numberExecutors, executorMemory, executorThreads);
- Assert.assertEquals(numberExecutors, SparkExecutionContext.getNumExecutors());
- Assert.assertEquals(expectedMemoryBudget, (long) SparkExecutionContext.getDataMemoryBudget(false, false));
+ ResourceCompiler.setSparkClusterResourceConfigs(driverMemory, driverThreads, numberExecutors, executorMemory, executorThreads);
+
+ long expectedDriverMemory = CloudUtils.calculateEffectiveDriverMemoryBudget(driverMemory, executorThreads*numberExecutors);
+ int[] expectedExecutorValues = getEffectiveExecutorResources(executorMemory, executorThreads, numberExecutors);
+ int expectedNumExecutors = expectedExecutorValues[2];
+ long expectedExecutorMemoryBudget = (long) (0.6 * (
+ expectedNumExecutors * (expectedExecutorValues[0] * 1024 * 1024L - RESERVED_SYSTEM_MEMORY_BYTES)));
+ int expectedParallelism = expectedExecutorValues[1] * expectedExecutorValues[2];
+ Assert.assertEquals(expectedDriverMemory, InfrastructureAnalyzer.getLocalMaxMemory());
+ Assert.assertEquals(driverThreads, InfrastructureAnalyzer.getLocalParallelism());
+ Assert.assertEquals(expectedExecutorValues[2], SparkExecutionContext.getNumExecutors());
+ Assert.assertEquals(expectedExecutorMemoryBudget, (long) SparkExecutionContext.getDataMemoryBudget(false, false));
Assert.assertEquals(expectedParallelism, SparkExecutionContext.getDefaultParallelism(false));
}
@@ -89,7 +103,7 @@ public void test_CP_MM_Enforced() throws IOException {
// X = A.csv: (10^5)x(10^4) = 10^9 ~ 8BG
// Y = B.csv: (10^4)x(10^3) = 10^7 ~ 80MB
// X %*% Y -> (10^5)x(10^3) = 10^8 ~ 800MB
- runTestMM("A.csv", "B.csv", 8L*1024*1024*1024, 0, -1, "ba+*");
+ runTestMM("A.csv", "B.csv", 8L*1024*1024*1024, 0, 0, "ba+*", false);
}
@Test
@@ -98,7 +112,7 @@ public void test_CP_MM_Preferred() throws IOException {
// X = A.csv: (10^5)x(10^4) = 10^9 ~ 8BG
// Y = B.csv: (10^4)x(10^3) = 10^7 ~ 80MB
// X %*% Y -> (10^5)x(10^3) = 10^8 ~ 800MB
- runTestMM("A.csv", "B.csv", 16L*1024*1024*1024, 2, 1024*1024*1024, "ba+*");
+ runTestMM("A.csv", "B.csv", 16L*1024*1024*1024, 2, 1024*1024*1024, "ba+*", false);
}
@Test
@@ -107,7 +121,7 @@ public void test_SP_MAPMM() throws IOException {
// X = A.csv: (10^5)x(10^4) = 10^9 ~ 8BG
// Y = B.csv: (10^4)x(10^3) = 10^7 ~ 80MB
// X %*% Y -> (10^5)x(10^3) = 10^8 ~ 800MB
- runTestMM("A.csv", "B.csv", 4L*1024*1024*1024, 2, 4L*1024*1024*1024, "mapmm");
+ runTestMM("A.csv", "B.csv", 4L*1024*1024*1024, 2, 4L*1024*1024*1024, "mapmm", true);
}
@Test
@@ -116,7 +130,7 @@ public void test_SP_RMM() throws IOException {
// X = A.csv: (10^5)x(10^4) = 10^9 ~ 8BG
// Y = B.csv: (10^4)x(10^3) = 10^7 ~ 80MB
// X %*% Y -> (10^5)x(10^3) = 10^8 ~ 800MB
- runTestMM("A.csv", "B.csv", 1024*1024*1024, 2, (long) (0.5*1024*1024*1024), "rmm");
+ runTestMM("A.csv", "B.csv", 4L*1024*1024*1024, 4, 1024*1024*1024, "rmm", true);
}
@Test
@@ -125,7 +139,7 @@ public void test_SP_CPMM() throws IOException {
// X = A.csv: (10^5)x(10^4) = 10^9 ~ 8BG
// Y = C.csv: (10^4)x(10^4) = 10^8 ~ 800MB
// X %*% Y -> (10^5)x(10^4) = 10^9 ~ 8GB
- runTestMM("A.csv", "C.csv", 8L*1024*1024*1024, 2, 4L*1024*1024*1024, "cpmm");
+ runTestMM("A.csv", "C.csv", 8L*1024*1024*1024, 2, 4L*1024*1024*1024, "cpmm", true);
}
// Tests for transposed self matrix multiplication (t(X)%*%X) ------------------------------------------------------
@@ -138,14 +152,6 @@ public void test_CP_TSMM() throws IOException {
runTestTSMM("B.csv", 8L*1024*1024*1024, 0, -1, "tsmm", false);
}
- @Test
- public void test_SP_TSMM() throws IOException {
- // Distributed cluster with 1GB driver memory and 8GB executor memory -> tsmm operator in Spark
- // X = D.csv: (10^5)x(10^3) = 10^8 ~ 800MB
- // t(X) %*% X -> (10^3)x(10^3) = 10^6 ~ 8MB (single block)
- runTestTSMM("D.csv", 1024*1024*1024, 2, 8L*1024*1024*1024, "tsmm", true);
- }
-
@Test
public void test_SP_TSMM_as_CPMM() throws IOException {
// Distributed cluster with 8GB driver memory and 8GB executor memory -> cpmm operator in Spark
@@ -156,50 +162,64 @@ public void test_SP_TSMM_as_CPMM() throws IOException {
@Test
public void test_MM_RecompilationSequence() throws IOException {
- Map nvargs = new HashMap<>();
- nvargs.put("$X", HOME_DATA+"A.csv");
- nvargs.put("$Y", HOME_DATA+"B.csv");
+ runTestMM("A.csv", "B.csv", 8L*1024*1024*1024, 0, -1, "ba+*", false);
- // pre-compiled program using default values to be used as source for the recompilation
- Program precompiledProgram = generateInitialProgram(HOME+"mm_test.dml", nvargs);
- // original compilation used for comparison
- Program expectedProgram;
-
- ResourceCompiler.setDriverConfigurations(8L*1024*1024*1024, driverThreads);
- ResourceCompiler.setSingleNodeExecution();
- expectedProgram = ResourceCompiler.compile(HOME+"mm_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, 8L*1024*1024*1024, 0, -1, "ba+*", false);
-
- ResourceCompiler.setDriverConfigurations(16L*1024*1024*1024, driverThreads);
- ResourceCompiler.setExecutorConfigurations(4, 1024*1024*1024, executorThreads);
- expectedProgram = ResourceCompiler.compile(HOME+"mm_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, 16L*1024*1024*1024, 4, 1024*1024*1024, "ba+*", false);
-
- ResourceCompiler.setDriverConfigurations(4L*1024*1024*1024, driverThreads);
- ResourceCompiler.setExecutorConfigurations(2, 4L*1024*1024*1024, executorThreads);
- expectedProgram = ResourceCompiler.compile(HOME+"mm_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, 4L*1024*1024*1024, 2, 4L*1024*1024*1024, "mapmm", true);
-
- ResourceCompiler.setDriverConfigurations(1024*1024*1024, driverThreads);
- ResourceCompiler.setExecutorConfigurations(2, (long) (0.5*1024*1024*1024), executorThreads);
- expectedProgram = ResourceCompiler.compile(HOME+"mm_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, 1024*1024*1024, 2, (long) (0.5*1024*1024*1024), "rmm", true);
-
- ResourceCompiler.setDriverConfigurations(8L*1024*1024*1024, driverThreads);
- ResourceCompiler.setSingleNodeExecution();
- expectedProgram = ResourceCompiler.compile(HOME+"mm_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, 8L*1024*1024*1024, 0, -1, "ba+*", false);
+ runTestMM("A.csv", "B.csv", 16L*1024*1024*1024, 4, 1024*1024*1024, "ba+*", false);
+
+ runTestMM("A.csv", "B.csv", 4L*1024*1024*1024, 2, 4L*1024*1024*1024, "mapmm", true);
+
+ runTestMM("A.csv", "B.csv", 4L*1024*1024*1024, 4, 1024*1024*1024, "rmm", true);
+
+ runTestMM("A.csv", "B.csv", 8L*1024*1024*1024, 0, -1, "ba+*", false);
+ }
+
+ @Test
+ public void test_L2SVM() throws IOException {
+ runTestAlgorithm("Algorithm_L2SVM.dml", 8L*1024*1024*1024, 0, -1);
+ runTestAlgorithm("Algorithm_L2SVM.dml", 8L*1024*1024*1024, 4, 4L*1024*1024*1024);
+ }
+
+ @Test
+ public void test_LinReg() throws IOException {
+ runTestAlgorithm("Algorithm_Linreg.dml", 8L*1024*1024*1024, 0, -1);
+ runTestAlgorithm("Algorithm_Linreg.dml", 8L*1024*1024*1024, 4, 4L*1024*1024*1024);
+ }
+
+ @Test
+ public void test_PCA() throws IOException {
+ runTestAlgorithm("Algorithm_PCA.dml", 8L*1024*1024*1024, 0, -1);
+ runTestAlgorithm("Algorithm_PCA.dml", 8L*1024*1024*1024, 4, 8L*1024*1024*1024);
+ }
+
+ @Test
+ public void test_PCA_Hybrid() throws IOException {
+ HashMap nvargs = new HashMap<>();
+ nvargs.put("$m", "10000000");
+ nvargs.put("$n", "100000");
+ runTestAlgorithm("Algorithm_PCA.dml", 16L*1024*1024*1024, 4, 16L*1024*1024*1024, nvargs);
+ }
+
+ @Test
+ public void test_PNMF() throws IOException {
+ runTestAlgorithm("Algorithm_PNMF.dml", 8L*1024*1024*1024, 0, -1);
+ runTestAlgorithm("Algorithm_PNMF.dml", 8L*1024*1024*1024, 4, 4L*1024*1024*1024);
+ }
+
+ @Test
+ public void test_PNMF_Hybrid() throws IOException {
+ HashMap nvargs = new HashMap<>();
+ nvargs.put("$m", "10000000");
+ nvargs.put("$n", "10000");
+ runTestAlgorithm("Algorithm_PNMF.dml", 8L*1024*1024*1024, 4, 4L*1024*1024*1024, nvargs);
}
// Helper functions ------------------------------------------------------------------------------------------------
private Program generateInitialProgram(String filePath, Map args) throws IOException {
- ResourceCompiler.setDriverConfigurations(ResourceCompiler.DEFAULT_DRIVER_MEMORY, ResourceCompiler.DEFAULT_DRIVER_THREADS);
- ResourceCompiler.setExecutorConfigurations(ResourceCompiler.DEFAULT_NUMBER_EXECUTORS, ResourceCompiler.DEFAULT_EXECUTOR_MEMORY, ResourceCompiler.DEFAULT_EXECUTOR_THREADS);
+ ResourceCompiler.setSparkClusterResourceConfigs(4L*1024*1024*1024, 4, ResourceCompiler.DEFAULT_NUMBER_EXECUTORS, ResourceCompiler.DEFAULT_EXECUTOR_MEMORY, ResourceCompiler.DEFAULT_EXECUTOR_THREADS);
return ResourceCompiler.compile(filePath, args);
}
- private void runTestMM(String fileX, String fileY, long driverMemory, int numberExecutors, long executorMemory, String expectedOpcode) throws IOException {
- boolean expectedSparkExecType = !Objects.equals(expectedOpcode,"ba+*");
+ private void runTestMM(String fileX, String fileY, long driverMemory, int numberExecutors, long executorMemory, String expectedOpcode, boolean expectedSparkExecType) throws IOException {
Map nvargs = new HashMap<>();
nvargs.put("$X", HOME_DATA+fileX);
nvargs.put("$Y", HOME_DATA+fileY);
@@ -207,16 +227,20 @@ private void runTestMM(String fileX, String fileY, long driverMemory, int number
// pre-compiled program using default values to be used as source for the recompilation
Program precompiledProgram = generateInitialProgram(HOME+"mm_test.dml", nvargs);
- ResourceCompiler.setDriverConfigurations(driverMemory, driverThreads);
if (numberExecutors > 0) {
- ResourceCompiler.setExecutorConfigurations(numberExecutors, executorMemory, executorThreads);
+ ResourceCompiler.setSparkClusterResourceConfigs(driverMemory, driverThreads, numberExecutors, executorMemory, executorThreads);
} else {
- ResourceCompiler.setSingleNodeExecution();
+ ResourceCompiler.setSingleNodeResourceConfigs(driverMemory, driverThreads);
}
// original compilation used for comparison
Program expectedProgram = ResourceCompiler.compile(HOME+"mm_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, driverMemory, numberExecutors, executorMemory, expectedOpcode, expectedSparkExecType);
+ Program recompiledProgram = runTest(precompiledProgram, expectedProgram, driverMemory, numberExecutors, executorMemory);
+ System.out.println(Explain.explain(recompiledProgram));
+ Optional mmInstruction = ((BasicProgramBlock) recompiledProgram.getProgramBlocks().get(0)).getInstructions().stream()
+ .filter(inst -> (Objects.equals(expectedSparkExecType, inst instanceof SPInstruction) && Objects.equals(inst.getOpcode(), expectedOpcode)))
+ .findFirst();
+ Assert.assertTrue(mmInstruction.isPresent());
}
private void runTestTSMM(String fileX, long driverMemory, int numberExecutors, long executorMemory, String expectedOpcode, boolean expectedSparkExecType) throws IOException {
@@ -226,33 +250,111 @@ private void runTestTSMM(String fileX, long driverMemory, int numberExecutors, l
// pre-compiled program using default values to be used as source for the recompilation
Program precompiledProgram = generateInitialProgram(HOME+"mm_transpose_test.dml", nvargs);
- ResourceCompiler.setDriverConfigurations(driverMemory, driverThreads);
if (numberExecutors > 0) {
- ResourceCompiler.setExecutorConfigurations(numberExecutors, executorMemory, executorThreads);
+ ResourceCompiler.setSparkClusterResourceConfigs(driverMemory, driverThreads, numberExecutors, executorMemory, executorThreads);
} else {
- ResourceCompiler.setSingleNodeExecution();
+ ResourceCompiler.setSingleNodeResourceConfigs(driverMemory, driverThreads);
}
// original compilation used for comparison
Program expectedProgram = ResourceCompiler.compile(HOME+"mm_transpose_test.dml", nvargs);
- runTest(precompiledProgram, expectedProgram, driverMemory, numberExecutors, executorMemory, expectedOpcode, expectedSparkExecType);
+ Program recompiledProgram = runTest(precompiledProgram, expectedProgram, driverMemory, numberExecutors, executorMemory);
+ Optional mmInstruction = ((BasicProgramBlock) recompiledProgram.getProgramBlocks().get(0)).getInstructions().stream()
+ .filter(inst -> (Objects.equals(expectedSparkExecType, inst instanceof SPInstruction) && Objects.equals(inst.getOpcode(), expectedOpcode)))
+ .findFirst();
+ Assert.assertTrue(mmInstruction.isPresent());
}
- private void runTest(Program precompiledProgram, Program expectedProgram, long driverMemory, int numberExecutors, long executorMemory, String expectedOpcode, boolean expectedSparkExecType) {
- String expectedProgramExplained = Explain.explain(expectedProgram);
+ private void runTestAlgorithm(String dmlScript, long driverMemory, int numberExecutors, long executorMemory) throws IOException {
+ Map nvargs = new HashMap<>();
+ runTestAlgorithm(dmlScript, driverMemory, numberExecutors, executorMemory, nvargs);
+ }
+ private void runTestAlgorithm(String dmlScript, long driverMemory, int numberExecutors, long executorMemory,
+ Map nvargs) throws IOException {
+ // pre-compiled program using default values to be used as source for the recompilation
+ Program precompiledProgram = generateInitialProgram(HOME+dmlScript, nvargs);
+ System.out.println("precompiled");
+ System.out.println(Explain.explain(precompiledProgram));
+ if (numberExecutors > 0) {
+ ResourceCompiler.setSparkClusterResourceConfigs(driverMemory, driverThreads, numberExecutors, executorMemory, executorThreads);
+ } else {
+ ResourceCompiler.setSingleNodeResourceConfigs(driverMemory, driverThreads);
+ }
+ // original compilation used for comparison
+ Program expectedProgram = ResourceCompiler.compile(HOME+dmlScript, nvargs);
+ System.out.println("expected");
+ System.out.println(Explain.explain(expectedProgram));
+ runTest(precompiledProgram, expectedProgram, driverMemory, numberExecutors, executorMemory);
+ }
+
+ private Program runTest(Program precompiledProgram, Program expectedProgram, long driverMemory, int numberExecutors, long executorMemory) {
+ if (DEBUG_MODE) System.out.println(Explain.explain(expectedProgram));
Program recompiledProgram;
if (numberExecutors == 0) {
- recompiledProgram = ResourceCompiler.doFullRecompilation(precompiledProgram, driverMemory, driverThreads);
+ ResourceCompiler.setSingleNodeResourceConfigs(driverMemory, driverThreads);
+ recompiledProgram = ResourceCompiler.doFullRecompilation(precompiledProgram);
} else {
- recompiledProgram = ResourceCompiler.doFullRecompilation(precompiledProgram, driverMemory, driverThreads, numberExecutors, executorMemory, executorThreads);
+ ResourceCompiler.setSparkClusterResourceConfigs(
+ driverMemory,
+ driverThreads,
+ numberExecutors,
+ executorMemory,
+ executorThreads
+ );
+ recompiledProgram = ResourceCompiler.doFullRecompilation(precompiledProgram);
}
- String actualProgramExplained = Explain.explain(recompiledProgram);
+ System.out.println("recompiled");
+ System.out.println(Explain.explain(recompiledProgram));
- if (DEBUG_MODE) System.out.println(actualProgramExplained);
+ if (DEBUG_MODE) System.out.println(Explain.explain(recompiledProgram));
+ assertEqualPrograms(expectedProgram, recompiledProgram);
+ return recompiledProgram;
+ }
+
+ private void assertEqualPrograms(Program expected, Program actual) {
+ // strip empty blocks basic program blocks
+ String expectedProgramExplained = stripGeneralAndReplaceRandoms(Explain.explain(expected));
+ String actualProgramExplained = stripGeneralAndReplaceRandoms(Explain.explain(actual));
Assert.assertEquals(expectedProgramExplained, actualProgramExplained);
- Optional mmInstruction = ((BasicProgramBlock) recompiledProgram.getProgramBlocks().get(0)).getInstructions().stream()
- .filter(inst -> (Objects.equals(expectedSparkExecType, inst instanceof SPInstruction) && Objects.equals(inst.getOpcode(), expectedOpcode)))
- .findFirst();
- Assert.assertTrue(mmInstruction.isPresent());
+ }
+
+ private String stripGeneralAndReplaceRandoms(String explainedProgram) {
+ String[] lines = explainedProgram.split("\\n");
+ StringBuilder strippedBuilder = new StringBuilder();
+
+ LinkedList replaceList = new LinkedList<>();
+ Pattern patternUnique = Pattern.compile("(_Var|_mVar|_sbcvar)(\\d+)");
+
+ for (String line : lines) {
+ String pureLine = line.replaceFirst("^-*", "");
+ if (pureLine.startsWith("PROGRAM") || pureLine.startsWith("GENERIC") || pureLine.startsWith("CP rmvar")) {
+ continue;
+ } else if (pureLine.startsWith("CP mvvar") || pureLine.startsWith("CP cpvar")) {
+ String[] parts = pureLine.split(" ");
+ String lastPart = parts[parts.length - 1];
+ if (!patternUnique.matcher(lastPart).matches()) {
+ replaceList.add(lastPart);
+ }
+ continue;
+ }
+ if (pureLine.startsWith("CP") || pureLine.startsWith("SPARK")) {
+ line = line.replaceFirst("\\b/temp\\d+\\b", "/tempX");
+ Matcher matcherUnique = patternUnique.matcher(line);
+ StringBuilder newLine = new StringBuilder();
+ while (matcherUnique.find()) {
+ matcherUnique.appendReplacement(newLine, "testVar");
+ }
+ matcherUnique.appendTail(newLine);
+ line = newLine.toString();
+ } else if (pureLine.startsWith("FUNCTION")) {
+ line = pureLine.replaceFirst("recompile=true", "recompile=false");
+ }
+ strippedBuilder.append(line).append("\n");
+ }
+ String strippedProgram = "\n" + strippedBuilder.toString().trim() + "\n";
+ for (String literalVar : replaceList) {
+ strippedProgram = strippedProgram.replaceAll("\\b "+literalVar+".\\b", " testVar.");
+ }
+ return strippedProgram;
}
}
diff --git a/src/test/java/org/apache/sysds/test/component/resource/ResourceOptimizerTest.java b/src/test/java/org/apache/sysds/test/component/resource/ResourceOptimizerTest.java
new file mode 100644
index 00000000000..6f9ac86513b
--- /dev/null
+++ b/src/test/java/org/apache/sysds/test/component/resource/ResourceOptimizerTest.java
@@ -0,0 +1,428 @@
+package org.apache.sysds.test.component.resource;
+
+import org.apache.commons.cli.*;
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.sysds.resource.CloudInstance;
+import org.apache.sysds.resource.ResourceOptimizer;
+import org.apache.sysds.resource.enumeration.Enumerator;
+import org.apache.sysds.resource.enumeration.GridBasedEnumerator;
+import org.apache.sysds.resource.enumeration.InterestBasedEnumerator;
+import org.apache.sysds.resource.enumeration.PruneBasedEnumerator;
+import org.apache.sysds.test.AutomatedTestBase;
+import org.junit.Assert;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.HashMap;
+
+import static org.apache.sysds.resource.ResourceOptimizer.createOptions;
+import static org.apache.sysds.resource.ResourceOptimizer.initEnumerator;
+import static org.apache.sysds.test.component.resource.ResourceTestUtils.*;
+
+public class ResourceOptimizerTest extends AutomatedTestBase {
+ private static final String TEST_DIR = "component/resource/";
+ private static final String HOME = SCRIPT_DIR + TEST_DIR;
+
+ @Override
+ public void setUp() {}
+
+ @Test
+ public void initEnumeratorFromArgsDefaultsTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(args, options);
+ Assert.assertTrue(actualEnumerator instanceof GridBasedEnumerator);
+ // assert all defaults
+ HashMap expectedInstances = getSimpleCloudInstanceMap();
+ HashMap actualInstances = actualEnumerator.getInstances();
+ for (String instanceName: expectedInstances.keySet()) {
+ assertEqualsCloudInstances(expectedInstances.get(instanceName), actualInstances.get(instanceName));
+ }
+ Assert.assertEquals(Enumerator.EnumerationStrategy.GridBased, actualEnumerator.getEnumStrategy());
+ Assert.assertEquals(Enumerator.OptimizationStrategy.MinCosts, actualEnumerator.getOptStrategy());
+ // assert enum. specific default
+ GridBasedEnumerator gridBasedEnumerator = (GridBasedEnumerator) actualEnumerator;
+ Assert.assertEquals(1, gridBasedEnumerator.getStepSize());
+ Assert.assertEquals(-1, gridBasedEnumerator.getExpBase());
+ }
+
+ @Test
+ public void initEnumeratorFromArgsWithArgNTest() throws IOException {
+ File dmlScript = generateTmpDMLScript("m = $1;", "n = $2;");
+
+ String[] args = {
+ "-f", dmlScript.getPath(),
+ "-args", "10", "100"
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+
+ assertProperEnumeratorInitialization(args, options);
+
+ Files.deleteIfExists(dmlScript.toPath());
+ }
+
+ @Test
+ public void initEnumeratorFromArgsWithNvargTest() throws IOException {
+ File dmlScript = generateTmpDMLScript("m = $m;", "n = $n;");
+
+ String[] args = {
+ "-f", dmlScript.getPath(),
+ "-nvargs", "m=10", "n=100"
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+
+ assertProperEnumeratorInitialization(args, options);
+
+ Files.deleteIfExists(dmlScript.toPath());
+ }
+
+ @Test
+ public void initEnumeratorCostsWeightOptimizationInvalidTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ Options options = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(options, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ PropertiesConfiguration invalidOptions = generateTestingOptionsRequired("any");
+ invalidOptions.setProperty("OPTIMIZATION_FUNCTION", "costs");
+ invalidOptions.setProperty("COSTS_WEIGHT", "10");
+ try {
+ initEnumerator(line, invalidOptions);
+ Assert.fail("ParseException should have been raise here for not provided MAX_PRICE option");
+ } catch (Exception e) {
+ Assert.assertTrue(e instanceof ParseException);
+ }
+
+
+ String[] validArgs = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration validOptions = generateTestingOptionsRequired("any");
+ validOptions.setProperty("OPTIMIZATION_FUNCTION", "costs");
+ validOptions.setProperty("COSTS_WEIGHT", "0.1");
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(validArgs, validOptions);
+ Assert.assertEquals(Enumerator.OptimizationStrategy.MinCosts, actualEnumerator.getOptStrategy());
+ Assert.assertEquals(0.1, actualEnumerator.getCostsWeightFactor(), 0.0);
+ }
+
+ @Test
+ public void initEnumeratorMinTimeOptimizationInvalidTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ Options options = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(options, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ PropertiesConfiguration invalidOptions = generateTestingOptionsRequired("any");
+ invalidOptions.setProperty("OPTIMIZATION_FUNCTION", "time");
+ try {
+ initEnumerator(line, invalidOptions);
+ Assert.fail("ParseException should have been raise here for not provided MAX_PRICE option");
+ } catch (Exception e) {
+ Assert.assertTrue(e instanceof ParseException);
+ }
+
+
+ String[] validArgs = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration validOptions = generateTestingOptionsRequired("any");
+ validOptions.setProperty("OPTIMIZATION_FUNCTION", "time");
+ validOptions.setProperty("MAX_PRICE", "1000");
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(validArgs, validOptions);
+ Assert.assertEquals(Enumerator.OptimizationStrategy.MinTime, actualEnumerator.getOptStrategy());
+ Assert.assertEquals(1000, actualEnumerator.getMaxPrice(), 0.0);
+ }
+
+ @Test
+ public void initEnumeratorMinPriceOptimizationInvalidTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ Options options = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(options, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ PropertiesConfiguration invalidOptions = generateTestingOptionsRequired("any");
+ invalidOptions.setProperty("OPTIMIZATION_FUNCTION", "price");
+ try {
+ initEnumerator(line, invalidOptions);
+ Assert.fail("ParseException should have been raise here for not provided MAX_TIME option");
+ } catch (Exception e) {
+ Assert.assertTrue(e instanceof ParseException);
+ }
+
+
+ String[] validArgs = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration validOptions = generateTestingOptionsRequired("any");
+ validOptions.setProperty("OPTIMIZATION_FUNCTION", "price");
+ validOptions.setProperty("MAX_TIME", "1000");
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(validArgs, validOptions);
+ Assert.assertEquals(Enumerator.OptimizationStrategy.MinPrice, actualEnumerator.getOptStrategy());
+ Assert.assertEquals(1000, actualEnumerator.getMaxTime(), 0.0);
+ }
+
+ @Test
+ public void initGridEnumeratorWithAllOptionalArgsTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+ options.setProperty("ENUMERATION", "grid");
+ options.setProperty("STEP_SIZE", "3");
+ options.setProperty("EXPONENTIAL_BASE", "2");
+
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(args, options);
+ Assert.assertTrue(actualEnumerator instanceof GridBasedEnumerator);
+ // assert enum. specific default
+ Assert.assertEquals(3, ((GridBasedEnumerator) actualEnumerator).getStepSize());
+ Assert.assertEquals(2, ((GridBasedEnumerator) actualEnumerator).getExpBase());
+ }
+
+ @Test
+ public void initInterestEnumeratorWithDefaultsTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+ options.setProperty("ENUMERATION", "interest");
+
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(args, options);
+ Assert.assertTrue(actualEnumerator instanceof InterestBasedEnumerator);
+ // assert enum. specific default
+ Assert.assertTrue(((InterestBasedEnumerator) actualEnumerator).interestLargestEstimateEnabled());
+ Assert.assertTrue(((InterestBasedEnumerator) actualEnumerator).interestEstimatesInCPEnabled());
+ Assert.assertTrue(((InterestBasedEnumerator) actualEnumerator).interestBroadcastVars());
+ Assert.assertFalse(((InterestBasedEnumerator) actualEnumerator).interestOutputCachingEnabled());
+
+ }
+
+ @Test
+ public void initPruneEnumeratorWithDefaultsTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+ options.setProperty("ENUMERATION", "prune");
+
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(args, options);
+ Assert.assertTrue(actualEnumerator instanceof PruneBasedEnumerator);
+ }
+
+ @Test
+ public void initInterestEnumeratorWithWithAllOptionsTest() {
+ // set all the flags to opposite values to their defaults
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+ options.setProperty("ENUMERATION", "interest");
+ options.setProperty("USE_LARGEST_ESTIMATE", "false");
+ options.setProperty("USE_CP_ESTIMATES", "false");
+ options.setProperty("USE_BROADCASTS", "false");
+ options.setProperty("USE_OUTPUTS", "true");
+
+ InterestBasedEnumerator actualEnumerator =
+ (InterestBasedEnumerator) assertProperEnumeratorInitialization(args, options);
+ // assert enum. specific default
+ Assert.assertFalse(actualEnumerator.interestLargestEstimateEnabled());
+ Assert.assertFalse(actualEnumerator.interestEstimatesInCPEnabled());
+ Assert.assertFalse(actualEnumerator.interestBroadcastVars());
+ Assert.assertTrue(actualEnumerator.interestOutputCachingEnabled());
+ }
+
+ @Test
+ public void initEnumeratorWithInstanceRangeTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+ options.setProperty("INSTANCE_FAMILIES", "m5");
+ options.setProperty("INSTANCE_SIZES", "2xlarge");
+
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(args, options);
+
+ HashMap inputInstances = getSimpleCloudInstanceMap();
+ HashMap expectedInstances = new HashMap<>();
+ expectedInstances.put("m5.2xlarge", inputInstances.get("m5.2xlarge"));
+
+ HashMap actualInstances = actualEnumerator.getInstances();
+
+ for (String instanceName: expectedInstances.keySet()) {
+ assertEqualsCloudInstances(expectedInstances.get(instanceName), actualInstances.get(instanceName));
+ }
+ }
+
+ @Test
+ public void initEnumeratorWithCustomCPUQuotaTest() {
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ PropertiesConfiguration options = generateTestingOptionsRequired("any");
+ options.setProperty("CPU_QUOTA", "256");
+
+ Enumerator actualEnumerator = assertProperEnumeratorInitialization(args, options);
+
+ ArrayList actualRange = actualEnumerator.estimateRangeExecutors(128, -1, 16);
+ Assert.assertEquals(actualRange.size(), 8);
+ Assert.assertEquals(8, (int) actualRange.get(7));
+ }
+
+ @Test
+ public void mainWithHelpArgTest() {
+ // test with valid argument combination
+ String[] validArgs = {
+ "-help"
+ };
+ try {
+ ResourceOptimizer.main(validArgs);
+ } catch (Exception e) {
+ Assert.fail("Passing only '-help' should never raise an exception, but the following one was raised: "+e);
+ }
+
+ // test with invalid argument combination
+ String[] invalidArgs = {
+ "-help",
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ };
+ try {
+ ResourceOptimizer.main(invalidArgs);
+ Assert.fail("Passing '-help' and '-f' is not a valid combination but no exception was raised");
+ } catch (Exception e) {
+ Assert.assertTrue(e instanceof ParseException);
+ }
+ }
+
+ @Test
+ public void executeForL2SVM_MinimalSearchSpace_Test() throws IOException, ParseException {
+ Path tmpOutFolder = Files.createTempDirectory("out");
+
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ "-nvargs", "m=200000", "n=10000"
+ };
+ Options cliOptions = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(cliOptions, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ PropertiesConfiguration options = generateTestingOptionsRequired(tmpOutFolder.toString());
+ options.setProperty("MAX_EXECUTORS", "10");
+
+ ResourceOptimizer.execute(line, options);
+
+ if (!DEBUG) {
+ deleteDirectoryWithFiles(tmpOutFolder);
+ }
+ }
+
+ @Test
+ public void executeForL2SVM_MinimalSearchSpace_C5_XLARGE_Test() throws IOException, ParseException {
+ Path tmpOutFolder = Files.createTempDirectory("out");
+
+ String[] args = {
+ "-f", HOME+"Algorithm_L2SVM.dml",
+ "-nvargs", "m=200000", "n=10000"
+ };
+ Options cliOptions = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(cliOptions, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ PropertiesConfiguration options = generateTestingOptionsRequired(tmpOutFolder.toString());
+ options.setProperty("MAX_EXECUTORS", "10");
+ options.setProperty("INSTANCE_FAMILIES", "c5,c5d,c5n");
+ options.setProperty("INSTANCE_SIZES", "xlarge");
+
+ ResourceOptimizer.execute(line, options);
+
+ if (!DEBUG) {
+ deleteDirectoryWithFiles(tmpOutFolder);
+ }
+ }
+
+ @Test
+ @Ignore //disabled dependencies
+ public void executeForReadAndWrite_Test() throws IOException, ParseException {
+ Path tmpOutFolder = Files.createTempDirectory("out");
+
+ String[] args = {
+ "-f", HOME+"ReadAndWrite.dml",
+ "-nvargs",
+ "fileA=s3://data/in/A.csv",
+ "fileA_Csv=s3://data/out/A.csv",
+ "fileA_Text=s3://data/out/A.txt"
+ };
+ Options cliOptions = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(cliOptions, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ PropertiesConfiguration options = generateTestingOptionsRequired(tmpOutFolder.toString());
+ options.setProperty("MAX_EXECUTORS", "2");
+ String localInputs = "s3://data/in/A.csv=" + HOME + "data/A.csv";
+ options.setProperty("LOCAL_INPUTS", localInputs);
+
+ ResourceOptimizer.execute(line, options);
+
+ if (!DEBUG) {
+ deleteDirectoryWithFiles(tmpOutFolder);
+ }
+ }
+
+ // Helpers ---------------------------------------------------------------------------------------------------------
+
+ private Enumerator assertProperEnumeratorInitialization(String[] args, PropertiesConfiguration options) {
+ Options cliOptions = createOptions();
+ CommandLineParser clParser = new PosixParser();
+ CommandLine line = null;
+ try {
+ line = clParser.parse(cliOptions, args);
+ } catch (ParseException e) {
+ Assert.fail("ParseException should not have been raise here: "+e);
+ }
+ Enumerator actualEnumerator = null;
+ try {
+ actualEnumerator = initEnumerator(line, options);
+ } catch (Exception e) {
+ Assert.fail("Any exception should not have been raise here: "+e);
+ }
+ Assert.assertNotNull(actualEnumerator);
+
+ return actualEnumerator;
+ }
+}
diff --git a/src/test/java/org/apache/sysds/test/component/resource/ResourceTestUtils.java b/src/test/java/org/apache/sysds/test/component/resource/ResourceTestUtils.java
new file mode 100644
index 00000000000..ee5315d41fe
--- /dev/null
+++ b/src/test/java/org/apache/sysds/test/component/resource/ResourceTestUtils.java
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.component.resource;
+
+import org.apache.commons.configuration2.PropertiesConfiguration;
+import org.apache.sysds.resource.CloudInstance;
+import org.junit.Assert;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.*;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import static org.apache.sysds.resource.CloudUtils.GBtoBytes;
+
+public class ResourceTestUtils {
+ public static final String DEFAULT_REGIONAL_PRICE_TABLE = "./scripts/resource/aws_regional_prices.csv";
+ public static final String DEFAULT_INSTANCE_INFO_TABLE = "./scripts/resource/ec2_stats.csv";
+ private static final String TEST_ARTIFACTS = "./src/test/scripts/component/resource/artifacts/";
+ private static final String MINIAL_REGION_TABLE = "minimal_aws_regional_prices.csv";
+ private static final String MINIAL_INFO_TABLE = "minimal_ec2_stats.csv";
+ public static final String TEST_REGION;
+ public static final double TEST_FEE_RATIO;
+ public static final double TEST_STORAGE_PRICE;
+
+ static {
+ // ensure valid region table in artifacts and init test values
+ try {
+ List lines = Files.readAllLines(getMinimalFeeTableFile().toPath());
+ if (lines.size() > 1) {
+ String valueLine = lines.get(1);
+ String[] lineParts = valueLine.split(",");
+ if (lineParts.length != 3) throw new IOException();
+ TEST_REGION = lineParts[0];
+ TEST_FEE_RATIO = Double.parseDouble(lineParts[1]);
+ TEST_STORAGE_PRICE = Double.parseDouble(lineParts[2]);
+ } else {
+ throw new IOException();
+ }
+ } catch (IOException e) {
+ throw new RuntimeException("Invalid testing region table file");
+ }
+ }
+ public static void assertEqualsCloudInstances(CloudInstance expected, CloudInstance actual) {
+ Assert.assertEquals(expected.getInstanceName(), actual.getInstanceName());
+ Assert.assertEquals(expected.getMemory(), actual.getMemory());
+ Assert.assertEquals(expected.getVCPUs(), actual.getVCPUs());
+ Assert.assertEquals(expected.getFLOPS(), actual.getFLOPS());
+ Assert.assertEquals(expected.getMemoryBandwidth(), actual.getMemoryBandwidth(), 0.0);
+ Assert.assertEquals(expected.getDiskReadBandwidth(), actual.getDiskReadBandwidth(), 0.0);
+ Assert.assertEquals(expected.getDiskWriteBandwidth(), actual.getDiskWriteBandwidth(), 0.0);
+ Assert.assertEquals(expected.getNetworkBandwidth(), actual.getNetworkBandwidth(), 0.0);
+ Assert.assertEquals(expected.getPrice(), actual.getPrice(), 0.0);
+ }
+
+ public static HashMap getSimpleCloudInstanceMap() {
+ HashMap instanceMap = new HashMap<>();
+ // fill the map wsearchStrategyh enough cloud instances to allow testing all search space dimension searchStrategyerations
+ instanceMap.put("m5.xlarge", new CloudInstance("m5.xlarge", 0.192, TEST_FEE_RATIO*0.192, TEST_STORAGE_PRICE, GBtoBytes(16), 4, 160, 9934.166667, 143.72, 143.72, 156.25,false, 2, 32));
+ instanceMap.put("m5.2xlarge", new CloudInstance("m5.2xlarge", 0.384, TEST_FEE_RATIO*0.384, TEST_STORAGE_PRICE, GBtoBytes(32), 8, 320, 19868.33333, 287.50, 287.50, 312.5, false, 4, 32));
+ instanceMap.put("m5d.xlarge", new CloudInstance("m5d.xlarge", 0.226, TEST_FEE_RATIO * 0.226, TEST_STORAGE_PRICE, GBtoBytes(16), 4, 160, 9934.166667, 230.46875, 113.28125, 156.25, true, 1, 150));
+ instanceMap.put("m5n.xlarge", new CloudInstance("m5n.xlarge", 0.238, TEST_FEE_RATIO * 0.238, TEST_STORAGE_PRICE, GBtoBytes(16), 4, 160, 9934.166667, 143.72, 143.72, 512.5, false, 2, 32));
+ instanceMap.put("c5.xlarge", new CloudInstance("c5.xlarge", 0.17, TEST_FEE_RATIO*0.17, TEST_STORAGE_PRICE, GBtoBytes(8), 4, 192, 9934.166667, 143.72, 143.72, 156.25,false, 2, 32));
+ instanceMap.put("c5d.xlarge", new CloudInstance("c5d.xlarge", 0.192, TEST_FEE_RATIO * 0.192, TEST_STORAGE_PRICE, GBtoBytes(8), 4, 192, 9934.166667, 163.84, 73.728, 156.25, true, 1, 100));
+ instanceMap.put("c5n.xlarge", new CloudInstance("c5n.xlarge", 0.216, TEST_FEE_RATIO * 0.216, TEST_STORAGE_PRICE, GBtoBytes(10.5), 4, 192, 9934.166667, 143.72, 143.72, 625, false, 2, 32));
+ instanceMap.put("c5.2xlarge", new CloudInstance("c5.2xlarge", 0.34, TEST_FEE_RATIO*0.34, TEST_STORAGE_PRICE, GBtoBytes(16), 8, 384, 19868.33333, 287.50, 287.50, 312.5, false, 4, 32));
+
+ return instanceMap;
+ }
+
+ public static File getMinimalFeeTableFile() {
+ return new File(TEST_ARTIFACTS+MINIAL_REGION_TABLE);
+ }
+
+ public static File getMinimalInstanceInfoTableFile() {
+ return new File(TEST_ARTIFACTS+MINIAL_INFO_TABLE);
+ }
+
+ public static File generateTmpDMLScript(String...scriptLines) throws IOException {
+ File tmpFile = File.createTempFile("tmpScript", ".dml");
+ List lines = Arrays.stream(scriptLines).collect(Collectors.toList());
+ Files.write(tmpFile.toPath(), lines);
+ return tmpFile;
+ }
+
+ public static PropertiesConfiguration generateTestingOptionsRequired(String outputPath) {
+ return generateOptionsRequired(
+ TEST_REGION,
+ TEST_ARTIFACTS+MINIAL_INFO_TABLE,
+ TEST_ARTIFACTS+MINIAL_REGION_TABLE,
+ outputPath);
+ }
+
+ public static PropertiesConfiguration generateOptionsRequired(
+ String region,
+ String infoTable,
+ String regionTable,
+ String outputFolder
+ ) {
+ return generateOptions(region, infoTable, regionTable, outputFolder,
+ null, null, null, null, null, null, null, null,
+ null, null, null, null, null, null, null, null);
+ }
+
+ public static PropertiesConfiguration generateOptions(
+ String region,
+ String infoTable,
+ String regionTable,
+ String outputFolder,
+ String localInputs,
+ String enumeration,
+ String optimizationFunction,
+ String maxTime,
+ String maxPrice,
+ String cpuQuota,
+ String minExecutors,
+ String maxExecutors,
+ String instanceFamilies,
+ String instanceSizes,
+ String stepSize,
+ String exponentialBase,
+ String useLargestEstimate,
+ String useCpEstimates,
+ String useBroadcasts,
+ String useOutputs
+ ) {
+ PropertiesConfiguration options = new PropertiesConfiguration();
+
+ addToMapIfNotNull(options, "REGION", region);
+ addToMapIfNotNull(options, "INFO_TABLE", infoTable);
+ addToMapIfNotNull(options, "REGION_TABLE", regionTable);
+ addToMapIfNotNull(options, "OUTPUT_FOLDER", outputFolder);
+ addToMapIfNotNull(options, "LOCAL_INPUTS", localInputs);
+ addToMapIfNotNull(options, "ENUMERATION", enumeration);
+ addToMapIfNotNull(options, "OPTIMIZATION_FUNCTION", optimizationFunction);
+ addToMapIfNotNull(options, "MAX_TIME", maxTime);
+ addToMapIfNotNull(options, "MAX_PRICE", maxPrice);
+ addToMapIfNotNull(options, "CPU_QUOTA", cpuQuota);
+ addToMapIfNotNull(options, "MIN_EXECUTORS", minExecutors);
+ addToMapIfNotNull(options, "MAX_EXECUTORS", maxExecutors);
+ addToMapIfNotNull(options, "INSTANCE_FAMILIES", instanceFamilies);
+ addToMapIfNotNull(options, "INSTANCE_SIZES", instanceSizes);
+ addToMapIfNotNull(options, "STEP_SIZE", stepSize);
+ addToMapIfNotNull(options, "EXPONENTIAL_BASE", exponentialBase);
+ addToMapIfNotNull(options, "USE_LARGEST_ESTIMATE", useLargestEstimate);
+ addToMapIfNotNull(options, "USE_CP_ESTIMATES", useCpEstimates);
+ addToMapIfNotNull(options, "USE_BROADCASTS", useBroadcasts);
+ addToMapIfNotNull(options, "USE_OUTPUTS", useOutputs);
+
+ return options;
+ }
+
+ private static void addToMapIfNotNull(PropertiesConfiguration options, String key, String value) {
+ if (value != null) {
+ options.setProperty(key, value);
+ }
+ }
+
+ public static void deleteDirectoryWithFiles(Path dir) throws IOException {
+ // delete files in the directory and then the already empty directory itself
+ Files.walkFileTree(dir, new SimpleFileVisitor<>() {
+ @Override
+ public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
+ Files.delete(file);
+ return FileVisitResult.CONTINUE;
+ }
+
+ @Override
+ public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
+ Files.delete(dir);
+ return FileVisitResult.CONTINUE;
+ }
+ });
+ }
+}
diff --git a/src/test/java/org/apache/sysds/test/component/resource/TestingUtils.java b/src/test/java/org/apache/sysds/test/component/resource/TestingUtils.java
deleted file mode 100644
index 38dde489dd0..00000000000
--- a/src/test/java/org/apache/sysds/test/component/resource/TestingUtils.java
+++ /dev/null
@@ -1,71 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-
-package org.apache.sysds.test.component.resource;
-
-import org.apache.sysds.resource.CloudInstance;
-import org.junit.Assert;
-
-import java.io.File;
-import java.io.IOException;
-import java.nio.file.Files;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.List;
-
-import static org.apache.sysds.resource.CloudUtils.GBtoBytes;
-
-public class TestingUtils {
- public static void assertEqualsCloudInstances(CloudInstance expected, CloudInstance actual) {
- Assert.assertEquals(expected.getInstanceName(), actual.getInstanceName());
- Assert.assertEquals(expected.getMemory(), actual.getMemory());
- Assert.assertEquals(expected.getVCPUs(), actual.getVCPUs());
- Assert.assertEquals(expected.getFLOPS(), actual.getFLOPS());
- Assert.assertEquals(expected.getMemorySpeed(), actual.getMemorySpeed(), 0.0);
- Assert.assertEquals(expected.getDiskSpeed(), actual.getDiskSpeed(), 0.0);
- Assert.assertEquals(expected.getNetworkSpeed(), actual.getNetworkSpeed(), 0.0);
- Assert.assertEquals(expected.getPrice(), actual.getPrice(), 0.0);
-
- }
-
- public static HashMap getSimpleCloudInstanceMap() {
- HashMap instanceMap = new HashMap<>();
- // fill the map wsearchStrategyh enough cloud instances to allow testing all search space dimension searchStrategyerations
- instanceMap.put("m5.xlarge", new CloudInstance("m5.xlarge", GBtoBytes(16), 4, 0.34375, 21328.0, 143.75, 160.0, 0.23));
- instanceMap.put("m5.2xlarge", new CloudInstance("m5.2xlarge", GBtoBytes(32), 8, 0.6875, 21328.0, 287.50, 320.0, 0.46));
- instanceMap.put("c5.xlarge", new CloudInstance("c5.xlarge", GBtoBytes(8), 4, 0.46875, 21328.0, 143.75, 160.0, 0.194));
- instanceMap.put("c5.2xlarge", new CloudInstance("c5.2xlarge", GBtoBytes(16), 8, 0.9375, 21328.0, 287.50, 320.0, 0.388));
-
- return instanceMap;
- }
-
- public static File generateTmpInstanceInfoTableFile() throws IOException {
- File tmpFile = File.createTempFile("systemds_tmp", ".csv");
-
- List csvLines = Arrays.asList(
- "API_Name,Memory,vCPUs,gFlops,ramSpeed,diskSpeed,networkSpeed,Price",
- "m5.xlarge,16.0,4,0.34375,21328.0,143.75,160.0,0.23",
- "m5.2xlarge,32.0,8,0.6875,21328.0,287.50,320.0,0.46",
- "c5.xlarge,8.0,4,0.46875,21328.0,143.75,160.0,0.194",
- "c5.2xlarge,16.0,8,0.9375,21328.0,287.50,320.0,0.388"
- );
- Files.write(tmpFile.toPath(), csvLines);
- return tmpFile;
- }
-}
diff --git a/src/test/scripts/component/resource/Algorithm_L2SVM.dml b/src/test/scripts/component/resource/Algorithm_L2SVM.dml
index 74b432abcb7..5b7b16aeccf 100644
--- a/src/test/scripts/component/resource/Algorithm_L2SVM.dml
+++ b/src/test/scripts/component/resource/Algorithm_L2SVM.dml
@@ -19,8 +19,10 @@
#
#-------------------------------------------------------------
-X = rand(rows=10000, cols=10);
-Y = X %*% rand(rows=10, cols=1);
+m = ifdef($m, 10000);
+n = ifdef($n, 10);
+X = rand(rows=m, cols=n);
+Y = X %*% rand(rows=n, cols=1);
w = l2svm(X=X, Y=Y, intercept=1, epsilon=1e-6, reg=0.01, maxIterations=20);
print(sum(w));
diff --git a/src/test/scripts/component/resource/Algorithm_Linreg.dml b/src/test/scripts/component/resource/Algorithm_Linreg.dml
index eb6203e004e..ece32b91506 100644
--- a/src/test/scripts/component/resource/Algorithm_Linreg.dml
+++ b/src/test/scripts/component/resource/Algorithm_Linreg.dml
@@ -19,8 +19,10 @@
#
#-------------------------------------------------------------
-X = rand(rows=10000, cols=10);
-Y = X %*% rand(rows=10, cols=1);
+m = ifdef($m, 10000);
+n = ifdef($n, 10);
+X = rand(rows=m, cols=n);
+Y = X %*% rand(rows=n, cols=1);
w = lm(X=X, y=Y, icpt=2, tol=1e-8, reg=0.1, maxi=20);
print(sum(w));
diff --git a/src/test/scripts/component/resource/Algorithm_PCA.dml b/src/test/scripts/component/resource/Algorithm_PCA.dml
index 82948bc6f86..ff35053099d 100644
--- a/src/test/scripts/component/resource/Algorithm_PCA.dml
+++ b/src/test/scripts/component/resource/Algorithm_PCA.dml
@@ -19,7 +19,9 @@
#
#-------------------------------------------------------------
-X = rand(rows=10000, cols=10);
+m = ifdef($m, 10000);
+n = ifdef($n, 10);
+X = rand(rows=m, cols=n);
[X, C, C2, S2] = pca(X=X, center=TRUE, scale=TRUE);
print(sum(X));
diff --git a/src/test/scripts/component/resource/Algorithm_PNMF.dml b/src/test/scripts/component/resource/Algorithm_PNMF.dml
index 57375858093..49d72e21805 100644
--- a/src/test/scripts/component/resource/Algorithm_PNMF.dml
+++ b/src/test/scripts/component/resource/Algorithm_PNMF.dml
@@ -19,8 +19,10 @@
#
#-------------------------------------------------------------
-X = rand(rows=100000, cols=1000);
-rank = 10;
+m = ifdef($m, 100000);
+n = ifdef($n, 1000);
+rank = ifdef($rank, 10);
+X = rand(rows=m, cols=n);
[w, h] = pnmf(X=X, rnk=rank, verbose=FALSE);
print(sum(w));
diff --git a/src/test/scripts/component/resource/artifacts/minimal_aws_regional_prices.csv b/src/test/scripts/component/resource/artifacts/minimal_aws_regional_prices.csv
new file mode 100644
index 00000000000..9847181d2a8
--- /dev/null
+++ b/src/test/scripts/component/resource/artifacts/minimal_aws_regional_prices.csv
@@ -0,0 +1,2 @@
+Region,Fee Ratio,EBS Price
+us-east-1,0.25,0.08
diff --git a/src/test/scripts/component/resource/artifacts/minimal_ec2_stats.csv b/src/test/scripts/component/resource/artifacts/minimal_ec2_stats.csv
new file mode 100644
index 00000000000..346eb4e41b1
--- /dev/null
+++ b/src/test/scripts/component/resource/artifacts/minimal_ec2_stats.csv
@@ -0,0 +1,9 @@
+API_Name,Price,Memory,vCPUs,Cores,GFLOPS,memBandwidth,NVMe,storageVolumes,sizePerVolume,readStorageBandwidth,writeStorageBandwidth,networkBandwidth
+c5.2xlarge,0.3400000000,16.0,8,4,384,19868.33333,false,4,32,287.5,287.5,312.5
+c5.xlarge,0.1700000000,8.0,4,2,192,9934.166667,false,2,32,143.72,143.72,156.25
+c5d.xlarge,0.1920000000,8.0,4,2,192,9934.166667,true,1,100,163.84,73.728,156.25
+c5n.xlarge,0.2160000000,10.5,4,2,192,9934.166667,false,2,32,143.72,143.72,625
+m5.2xlarge,0.3840000000,32.0,8,4,320,19868.33333,false,4,32,287.5,287.5,312.5
+m5.xlarge,0.1920000000,16.0,4,2,160,9934.166667,false,2,32,143.72,143.72,156.25
+m5d.xlarge,0.2260000000,16.0,4,2,160,9934.166667,true,1,150,230.46875,113.28125,156.25
+m5n.xlarge,0.2380000000,16,4,2,160,9934.166667,false,2,32,143.72,143.72,512.5
diff --git a/src/test/scripts/component/resource/data/A.csv b/src/test/scripts/component/resource/data/A.csv
deleted file mode 100644
index e69de29bb2d..00000000000