-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1056] Add example on intensify for cross-validation. #1061
Merged
Merged
Changes from 2 commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
119 changes: 119 additions & 0 deletions
119
examples/4_advanced_optimizer/4_intensify_crossvalidation.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
""" | ||
Speeding up Cross-Validation with Intensification | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
An example of optimizing a simple support vector machine on the digits dataset. In contrast to the | ||
[simple example](examples/1_basics/2_svm_cv.py), in which all cross-validation folds are executed | ||
at once, we use the intensification mechanism described in the original | ||
[SMAC paper](https://link.springer.com/chapter/10.1007/978-3-642-25566-3_40) as also demonstrated | ||
by [Auto-WEKA](https://dl.acm.org/doi/10.1145/2487575.2487629). | ||
""" | ||
__copyright__ = "Copyright 2023, AutoML.org Freiburg-Hannover" | ||
__license__ = "3-clause BSD" | ||
|
||
N_FOLDS = 10 # Global variable that determines the number of folds | ||
|
||
from ConfigSpace import Configuration, ConfigurationSpace, Float | ||
from sklearn import datasets, svm | ||
from sklearn.model_selection import StratifiedKFold | ||
|
||
from smac import HyperparameterOptimizationFacade, Scenario | ||
from smac.intensifier import Intensifier | ||
|
||
# We load the digits dataset, a small-scale 10-class digit recognition dataset | ||
X, y = datasets.load_digits(return_X_y=True) | ||
|
||
|
||
class SVM: | ||
@property | ||
def configspace(self) -> ConfigurationSpace: | ||
# Build Configuration Space which defines all parameters and their ranges | ||
cs = ConfigurationSpace(seed=0) | ||
|
||
# First we create our hyperparameters | ||
C = Float("C", (2 ** - 5, 2 ** 15), default=1.0, log=True) | ||
gamma = Float("gamma", (2 ** -15, 2 ** 3), default=1.0, log=True) | ||
|
||
# Add hyperparameters to our configspace | ||
cs.add_hyperparameters([C, gamma]) | ||
|
||
return cs | ||
|
||
def train(self, config: Configuration, instance: str, seed: int = 0) -> float: | ||
"""Creates a SVM based on a configuration and evaluate on the given fold of the digits dataset | ||
|
||
Parameters | ||
---------- | ||
config: Configuration | ||
The configuration to train the SVM. | ||
instance: str | ||
The name of the instance this configuration should be evaluated on. This is always of type | ||
string by definition. In our case we cast to int, but this could also be the filename of a | ||
problem instance to be loaded. | ||
seed: int | ||
The seed used for this call. | ||
""" | ||
instance = int(instance) | ||
classifier = svm.SVC(**config, random_state=seed) | ||
splitter = StratifiedKFold(n_splits=N_FOLDS, shuffle=True, random_state=seed) | ||
for k, (train_idx, test_idx) in enumerate(splitter.split(X=X, y=y)): | ||
if k != instance: | ||
continue | ||
else: | ||
train_X = X[train_idx] | ||
train_y = y[train_idx] | ||
test_X = X[test_idx] | ||
test_y = y[test_idx] | ||
classifier.fit(train_X, train_y) | ||
cost = 1 - classifier.score(test_X, test_y) | ||
|
||
return cost | ||
|
||
|
||
if __name__ == "__main__": | ||
classifier = SVM() | ||
|
||
# Next, we create an object, holding general information about the run | ||
scenario = Scenario( | ||
classifier.configspace, | ||
n_trials=50, # We want to run max 50 trials (combination of config and seed) | ||
instances=[f"{i}" for i in range(N_FOLDS)], # Specify all instances by their name (as a string) | ||
instance_features={f"{i}": [i] for i in range(N_FOLDS)}, # breaks SMAC | ||
deterministic=True # To simplify the problem we make SMAC believe that we have a deterministic | ||
# optimization problem. | ||
|
||
) | ||
|
||
# We want to run the facade's default initial design, but we want to change the number | ||
# of initial configs to 5. | ||
initial_design = HyperparameterOptimizationFacade.get_initial_design(scenario, n_configs=5) | ||
|
||
# Now we use SMAC to find the best hyperparameters | ||
smac = HyperparameterOptimizationFacade( | ||
scenario, | ||
classifier.train, | ||
initial_design=initial_design, | ||
overwrite=True, # If the run exists, we overwrite it; alternatively, we can continue from last state | ||
# The next line defines the intensifier, i.e., the module that governs the selection of | ||
# instance-seed pairs. Since we set deterministic to True above, it only governs the instance in | ||
# this example. Technically, it is not necessary to create the intensifier as a user, but it is | ||
# necessary to do so because we change the argument max_config_calls (the number of instance-seed pairs | ||
# per configuration to try) to the number of cross-validation folds, while the default would be 3. | ||
intensifier=Intensifier(scenario=scenario, max_config_calls=N_FOLDS, seed=0) | ||
) | ||
|
||
incumbent = smac.optimize() | ||
|
||
# Get cost of default configuration | ||
default_cost = smac.validate(classifier.configspace.get_default_configuration()) | ||
print(f"Default cost: {default_cost}") | ||
|
||
# Let's calculate the cost of the incumbent | ||
incumbent_cost = smac.validate(incumbent) | ||
print(f"Incumbent cost: {incumbent_cost}") | ||
|
||
# Let's see how many configurations we have evaluated. If this number is higher than 5, we have looked | ||
# at more configurations than would have been possible with regular cross-validation, where the number | ||
# of configurations would be determined by the number of trials divided by the number of folds (50 / 10). | ||
runhistory = smac.runhistory | ||
print(f"Number of evaluated configurations: {len(runhistory.config_ids)}") |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here
n_trials
should be the combination of config, instance, and seed. (otherwise, users might think thatn_trials
indicates n configurations ifdeterministic=True
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the docstring and also improved the docstring of the example file.