From 9176ca60fa9f1fd20dc452bd8bfdabeb72e080b8 Mon Sep 17 00:00:00 2001
From: "C. Benjamins" <75323339+benjamc@users.noreply.github.com>
Date: Mon, 2 Dec 2024 15:09:10 +0100
Subject: [PATCH] 1085 runhistory documentation (#1175)

* Content draft of the documentation.

* Fix typos

* Fix typos

* Add important info to Facade and  Scenario Documentation

* adapt syntax

* Update runhistory describtion

* refactor(8_logging): add .

---------

Co-authored-by: Lukas Fehring <lukasfehring@gmail.com>
---
 docs/3_getting_started.md        |   7 ++
 docs/advanced_usage/8_logging.md | 107 +++++++++++++++++++++++++++++++
 2 files changed, 114 insertions(+)

diff --git a/docs/3_getting_started.md b/docs/3_getting_started.md
index 3237db587..79bbc7c53 100644
--- a/docs/3_getting_started.md
+++ b/docs/3_getting_started.md
@@ -75,6 +75,7 @@ from smac import Scenario
 
 scenario = Scenario(
     configspace=cs,
+    name="experiment_name",
     output_directory=Path("your_output_directory")
     walltime_limit=120,  # Limit to two minutes
     n_trials=500,  # Evaluated max 500 trials
@@ -83,9 +84,15 @@ scenario = Scenario(
 )
 ```
 
+!!! note
+    If no `name` is given, a hash of the experiment is used. Running the same experiment again at a later time will result in exactly the same hash. This is important, because the optimization will warmstart on the preexisting evaluations, if not otherwise specified in the [Facade][smac.facade.abstract_facade].
+
 
 ## Facade
 
+!!! warn
+    By default Facades will try to warmstart on preexisting logs. This behavior can be specified using the `overwrite` parameter.
+
 A [facade][smac.facade.abstract_facade] is the entry point to SMAC, which constructs a default optimization 
 pipeline for you. SMAC offers various facades, which satisfy many common use cases and are crucial to
 achieving peak performance. The idea behind the facades is to provide a simple interface to all of SMAC's components,
diff --git a/docs/advanced_usage/8_logging.md b/docs/advanced_usage/8_logging.md
index 7d24e84e2..18e8d7496 100644
--- a/docs/advanced_usage/8_logging.md
+++ b/docs/advanced_usage/8_logging.md
@@ -27,6 +27,113 @@ The table shows you the specific levels:
 | 40        | ERROR    |
 | 50        | CRITICAL |
 
+## Standard Logging Files
+
+By default, SMAC generates several files to document the optimization process. These files are stored in the directory structure `./output_directory/name/seed`, where name is replaced by a hash if no name is explicitly provided. This behavior can be customized through the [Scenario][smac.scenario] configuration, as shown in the example below:
+```python
+Scenario(
+    configspace = some_configspace,
+    name = 'experiment_name',
+    output_directory = Path('some_directory'),
+    ...
+)
+```
+Notably, if an output already exists at `./some_directory/experiment_name/seed`, the behavior is determined by the overwrite parameter in the [facade's][smac/facade/abstract_facade] settings. This parameter specifies whether to continue the previous run (default) or start a new run.
+
+The output is split into four different log files, and a copy of the utilized [Configuration Space of the ConfigSpace library](https://automl.github.io/ConfigSpace/latest/).
+
+### intensifier.json
+The [intensification][Intensification] is logged in `intensifier.json` and has the following structure:
+
+```json
+{
+  "incumbent_ids": [
+    65
+  ],
+  "rejected_config_ids": [
+    1,
+  ],
+  "incumbents_changed": 2,
+  "trajectory": [
+    {
+      "config_ids": [
+        1
+      ],
+      "costs": [
+        0.45706284046173096
+      ],
+      "trial": 1,
+      "walltime": 0.029736042022705078
+    },
+    #...
+  ],
+  "state": {
+    "tracker": {},
+    "next_bracket": 0
+  }
+}
+```
+
+### optimization.json
+The optimization process is portrayed in `optimization.json` with the following structure
+
+```json
+{
+  "used_walltime": 184.87366724014282,
+  "used_target_function_walltime": 20.229533672332764,
+  "last_update": 1732703596.5609574,
+  "finished": false
+}
+``` 
+### runhistory.json
+The runhistory.json in split into four parts. `stats`, `data`, `configs`, and `config_origins`.
+`stats` contains overall broad stats on the different evaluated configurations:
+```json
+  "stats": {
+    "submitted": 73,
+    "finished": 73,
+    "running": 0
+  },
+```
+
+`data` contains a list of entries, one for each configuration.
+```json
+  "data": [
+    [
+      1,                            # config_id
+      null,                         # instance or None
+      209652396,                    # seed or None
+      null,                         # budget or None
+      5.4345623938566385,           # cost
+      6.699562072753906e-05,        # time
+      6.299999999992423e-05,        # cpu_time
+      1,                            # status
+      1733133181.2144582,           # start_time
+      1733133181.21695,             # end_time
+      {}                            # additional_info
+    ],
+    ...
+  ]
+```
+
+`configs` is a human-readable dictionary of configurations, where the keys are the one-based `config_id`. It is important to note that in `runhistory.json`, the indexing is zero-based.
+```json
+  "configs": {
+    "1": {
+      "x": -2.3312147893012
+    },
+```
+
+Lastly, `config_origins` specifies the source of a configuration, indicating whether it stems from the initial design or results from the maximization of an acquisition function.
+```json
+  "config_origins": {
+    "1": "Initial Design: Sobol",
+    ...
+  }
+```
+
+### scenario.json
+The ´scenario.json´ file contains the overall state of the [Scenario][smac.scenario] logged to a json file.
 
 ## Custom File