Hydra Configuration Management

The following tutorial demonstrates how one may calibrate a simulation model using Hydra configuration files, which calisim incorporates as an optional dependency.

Hydra may be used to specify complex calibration procedures within composable YAML files. We will henceforth provide a basic example.

import numpy as np
import pandas as pd

from calisim.config import HydraConfiguration

import warnings
warnings.filterwarnings("ignore")

YAML Configuration

We will first instantiate a Hydra configuration object. This contains various utility functions for working with Hydra configuration files.

hydra_config = HydraConfiguration()

We will next view the contents of the conf directory, which contains the config.yaml file, alongside three subdirectories:

calibration
metric
model

! ls conf

calibration  config.yaml  metric  model

The contents of config.yaml are as follows:

defaults:
  - _self_
  - calibration: lotka_optimisation
  - model: lotka_volterra
  - metric: mse

We can see that the three elements within the defaults list (calibration, model, metric) match the three subdirectory names above. Let’s look at the files within the metric subdirectory.

! ls conf/metric

mse.yaml

We see a YAML file called mse.yaml, which matches the value of the metric element in the defaults list.

defaults:
  ...
  - metric: mse

The contents of the mse.yaml are as follows:

_target_: calisim.statistics.MeanSquaredError

Similarly, the model subdirectory contains a YAML file called lotka_volterra.yaml which is composed of the following:

_target_: calisim.example_models.LotkaVolterraModel

This should give you an idea of how Hydra works. Hydra will instantiate a configuration dictionary containing three keys (calibration, model, metric) The metric key will be mapped to a MeanSquaredError object while the model key will be mapped to a LotkaVolterraModel object.

But how about the the calibration key? Referring to the defaults list again:

defaults:
  - _self_
  - calibration: lotka_optimisation

Within the calibration subdirectory, there exists a YAML file called lotka_optimisation.yaml. This file a bit more complicated:

_target_: calisim.optimisation.OptimisationMethod
_partial_: true
engine: emukit
specification:
    _target_: calisim.optimisation.OptimisationMethodModel
    experiment_name: emukit_optimisation
    parameter_spec: 
        parameters:
            - name: alpha
              distribution_name: uniform
              distribution_args: [0.45, 0.55]
              data_type: continuous
            - name: beta
              distribution_name: uniform
              distribution_args: [0.02, 0.03]
              data_type: continuous
    directions: [ minimize ]
    output_labels: [ Lynx ]
    acquisition_func: ei
    n_iterations: 25
    n_init: 20
    n_samples: 100
    n_out: 1
    verbose: false
    method_kwargs:
        noise_var: 0.001

So, the calibration key is mapped to an OptimisationMethod object that uses the emukit engine. Under specification, we can see the parameter_spec containing parameter distribution information, the acquisition function (ei), the number of iterations (25), and so on. However, the OptimisationMethod object has a mandatory argument within its constructor called calibration_func, which we have not defined in the YAML file (note that it is possible to specify a particular calibration function by providing its full module path and name). In order to instantiate OptimisationMethod, we include _partial_: true so that we can dynamically specify calibration_func within our Python code.

Instantiation

Let’s construct a raw configuration object from the config.yaml file.

cfg = hydra_config.get_raw_config("config", "conf")
print(hydra_config.pretty(cfg))

calibration:
  _target_: calisim.optimisation.OptimisationMethod
  _partial_: true
  engine: emukit
  specification:
    _target_: calisim.optimisation.OptimisationMethodModel
    experiment_name: emukit_optimisation
    parameter_spec:
      parameters:
      - name: alpha
        distribution_name: uniform
        distribution_args:
        - 0.45
        - 0.55
        data_type: continuous
      - name: beta
        distribution_name: uniform
        distribution_args:
        - 0.02
        - 0.03
        data_type: continuous
    directions:
    - minimize
    output_labels:
    - Lynx
    acquisition_func: ei
    n_iterations: 25
    n_init: 20
    n_samples: 100
    n_out: 1
    verbose: false
    method_kwargs:
      noise_var: 0.001
model:
  _target_: calisim.example_models.LotkaVolterraModel
metric:
  _target_: calisim.statistics.MeanSquaredError

This contains the contents of the aforementioned YAML files within a single dictionary. However, we can instead construct a configuration object and instantiate all of the classes mapped to the _target_ keys.

cfg = hydra_config.get_configuration("config", "conf")

metric = cfg["metric"]
type(metric)

calisim.statistics.discrepancy.MeanSquaredError

model = cfg["model"]
observed_data = model.get_observed_data()
type(model)

calisim.example_models.lotka_volterra.LotkaVolterraModel

You can see that we have instantiated MeanSquaredError and LotkaVolterraModel objects. Note that we can also override the contents of the YAML configuration files.

For instance, we can instead instantiate a RootMeanSquaredError metric object like so:

cfg = hydra_config.get_configuration(
    "config", "conf", 
    overrides=["metric._target_=calisim.statistics.RootMeanSquaredError"]
)
metric = cfg["metric"]
type(metric)

calisim.statistics.discrepancy.RootMeanSquaredError

Hydra enables us to compose highly complex calibration workflows by mixing, matching, and overriding various sets of YAML files and keys. It is certainly possible to implement much more complicated configuration specifications, though we will keep things simple for this tutorial.

Calibration

Let’s next define our objective function for calibration.

def objective(
	parameters: dict, simulation_id: str, observed_data: np.ndarray | None, t: pd.Series
) -> float | list[float]:
	simulation_parameters = dict(h0=34.0, l0=5.9, t=t, gamma=0.84, delta=0.026)

	for k in ["alpha", "beta"]:
		simulation_parameters[k] = parameters[k]

	simulated_data = model.simulate(simulation_parameters).lynx.values
	discrepancy = metric.calculate(observed_data, simulated_data)
	return discrepancy

Finally, we can instantiate a calibrator object, then specify, execute, and analyzea calibration workflow. As mentioned prior, we must dynamically specify the calibration_func in the calibrator’s constructor using our objective function.

calibrator = cfg["calibration"](calibration_func=objective)
calibrator.specification.calibration_func_kwargs=dict(t=observed_data.year)
calibrator.specification.observed_data=observed_data.lynx.values
calibrator.specify().execute().analyze()

<calisim.optimisation.implementation.OptimisationMethod at 0x7f6a64149b40>

../../../_images/05c14b9e33f14fab18fc902e60bd17c4ff95a5eec9f042d5bba9969df970055a.png

The optimiser is able to retrieve the ground-truth parameter values from our simulation study.

pd.DataFrame([
    { "parameter": estimate.name, "estimate": estimate.estimate, "ground truth": model.GROUND_TRUTH[estimate.name] }
    for estimate in calibrator.get_parameter_estimates().estimates
])

	parameter	estimate	ground truth
0	alpha	0.518041	0.520
1	beta	0.024410	0.024

Let’s repeat the calibration procedure. But this time, we shall use a Kriging surrogate model for Bayesian optimisation via the openturns engine. Pulling all the above code together:

cfg = hydra_config.get_configuration(
    "config", 
    "conf", 
    overrides=[
        "calibration.engine=openturns",
        "+calibration.specification.method=kriging",
        "calibration.specification.method_kwargs=null"
    ]
)
metric = cfg["metric"]
model = cfg["model"]
observed_data = model.get_observed_data()

def objective(
	parameters: dict, simulation_id: str, observed_data: np.ndarray | None, t: pd.Series
) -> float | list[float]:
	simulation_parameters = dict(h0=34.0, l0=5.9, t=t, gamma=0.84, delta=0.026)

	for k in ["alpha", "beta"]:
		simulation_parameters[k] = parameters[k]

	simulated_data = model.simulate(simulation_parameters).lynx.values
	discrepancy = metric.calculate(observed_data, simulated_data)
	return discrepancy

calibrator = cfg["calibration"](calibration_func=objective)
calibrator.specification.calibration_func_kwargs=dict(t=observed_data.year)
calibrator.specification.observed_data=observed_data.lynx.values
calibrator.specify().execute().analyze() 

<calisim.optimisation.implementation.OptimisationMethod at 0x7f6a6414bdf0>

../../../_images/a49045b68b5fbe5a6ed642d4c1d0cfad0897d5694cace1b8c9493c2aec4822a9.png

../../../_images/7f9eeddf3fdb3ffa55b90cd7ba7d2ae05057c6c5b5a790d34ac9df8e1514790f.png

pd.DataFrame([
    { "parameter": estimate.name, "estimate": estimate.estimate, "ground truth": model.GROUND_TRUTH[estimate.name] }
    for estimate in calibrator.get_parameter_estimates().estimates
])

	parameter	estimate	ground truth
0	alpha	0.511805	0.520
1	beta	0.024113	0.024

We can see that OpenTurns is also able to recover the ground truth parameters.