Optimisation

The following tutorial demonstrates how one may calibrate a simulation model via optimisation using calisim. We will first import our required dependencies.

from calisim.data_model import (
	DistributionModel,
	ParameterDataType,
	ParameterSpecification,
)
from calisim.example_models import SirOdesModel
from calisim.optimisation import OptimisationMethod, OptimisationMethodModel
from calisim.statistics import MeanSquaredError
import numpy as np
import pandas as pd
from scipy.integrate import solve_ivp

import warnings
warnings.filterwarnings("ignore")

SIR Model Parameters and Initial Conditions

We next define our forward model. We will use an SIR (susceptible, infected, and recovered) compartmental model, combined with SciPy’s solver for ordinary differential equations. The SIR model is expressed as a system of ordinary differential equations where:

Parameter

Value

Description

β (beta)

0.4

Infection rate: probability of transmission per contact per time unit

γ (gamma)

0.1

Recovery rate: fraction of infected recovering per time unit

Average infectious period = 1 / γ = 10 time units

With the following compartments:

Compartment

Symbol

Initial Value

Description

Susceptible

S0

999

Individuals who can catch the disease (N - I0 - R0)

Infected

I0

1.0

Individuals currently infected and can spread the disease

Recovered

R0

0

Individuals recovered or removed; no longer infectious

def sir_simulate(parameters: dict) -> np.ndarray | pd.DataFrame:
    def dX_dt(_: np.ndarray, X: np.ndarray) -> np.ndarray:
        S, I, _ = X 
        dotS = -parameters["beta"] * S * I / parameters["N"]
        dotI = (
            parameters["beta"] * S * I / parameters["N"] - parameters["gamma"] * I
        )
        dotR = parameters["gamma"] * I
        return np.array([dotS, dotI, dotR])

    X0 = [parameters["S0"], parameters["I0"], parameters["R0"]]
    t = (parameters["t"].min(), parameters["t"].max())
    x_y = solve_ivp(
        fun=dX_dt, y0=X0, t_span=t, t_eval=parameters["t"].values.flatten()
    ).y

    df = pd.DataFrame(dict(dotS=x_y[0, :], dotI=x_y[1, :], dotR=x_y[2, :]))
    return df

We will perform a simulation study with the following ground-truth parameters:

model = SirOdesModel()
pd.DataFrame(model.GROUND_TRUTH, index=[0]) 
beta gamma N I0 R0 S0
0 0.4 0.1 1000 1.0 0 999.0

When supplied to our forward model, these ground-truth parameters will generate the observed data below:

observed_data = model.get_observed_data()
observed_data.head(6)
dotS dotI dotR day
0 999.000000 1.000000 0.000000 0
1 998.534208 1.349201 0.116592 1
2 997.906105 1.819995 0.273899 2
3 997.059813 2.454180 0.486007 3
4 995.919926 3.308098 0.771976 4
5 994.385263 4.457212 1.157524 5

Let’s view the trajectory of infected individuals over time in days.

observed_data.plot.scatter("day", "dotI")
<Axes: xlabel='day', ylabel='dotI'>
../../_images/96b9fdf1c3627db084a076c6ada91cc20b48256dca7db4e7c14b265b85a0bdc9.png

Black-Box Optimisation via Bayesian Optimisation

Next, let’s use calisim to perform calibration via black-box optimisation using the simulated and observed number of infections. We will apply the mean squared error as our loss function, with the aim of minimising the discrepancy between simulated and observed data.

To start with, we’ll need to define our ParameterSpecification parameter specification:

parameter_spec = ParameterSpecification(
	parameters=[
		DistributionModel(
			name="beta",
			distribution_name="uniform",
			distribution_args=[0.3, 0.5],
			data_type=ParameterDataType.CONTINUOUS,
		),
		DistributionModel(
			name="gamma",
			distribution_name="uniform",
			distribution_args=[0.05, 0.15],
			data_type=ParameterDataType.CONTINUOUS,
		),
	]
)

This contains information concerning the various parameter names, probability distributions, ranges, distribution parameters, and data types.

We next need to define the objective function around our forward model to ensure there’s compatibility with the calisim API.

def objective(
	parameters: dict, simulation_id: str, observed_data: np.ndarray | None, t: pd.Series
) -> float | list[float]:
    simulation_parameters = model.GROUND_TRUTH.copy()
    simulation_parameters["t"] = t
    
    for k in ["beta", "gamma"]:
        simulation_parameters[k] = parameters[k]
    
    simulated_data = sir_simulate(simulation_parameters).dotI.values
    metric = MeanSquaredError()
    discrepancy = metric.calculate(observed_data, simulated_data)
    return discrepancy

The last step is to create an OptimisationMethodModel specification for the calibration procedure itself, which we then supply to an OptimisationMethod calibrator. We’ll use the Bayesian Optimisation method with the expected improvement acquisition function via the Emukit engine.

specification = OptimisationMethodModel(
	experiment_name="emukit_optimisation",
	parameter_spec=parameter_spec,
	observed_data=observed_data.dotI.values,
	acquisition_func="ei",
	n_init=20,
	n_iterations=25,
	n_samples=100,
	directions=["minimize"],
	output_labels=["Number of Infected"],
	method_kwargs=dict(noise_var=1e-4),
    calibration_func_kwargs=dict(t=observed_data.day),
)

calibrator = OptimisationMethod(
	calibration_func=objective, specification=specification, engine="emukit"
)

Finally, we’ll run the calibration procedure. This is composed of 3 steps:

  1. Specify: Define your calibration problem: Parameter distributions, observed data, objective/discrepancy function, and calibration settings (like algorithm, directions, iterations)

  2. Execute: Run the actual calibration process (simulation + optimization/inference)

  3. Analyze: Process, summarize, and optionally save plots/metrics of the calibration results

Or SEA.

calibrator.specify().execute().analyze()
<calisim.optimisation.implementation.OptimisationMethod at 0x7fa73e85e530>
../../_images/4f78e6a4fe5b4107f8bad490e449e1e0293aa4500b0f26b88c909525e80876a8.png
emukit_results = pd.DataFrame([
    { "parameter": estimate.name, "estimate": estimate.estimate, "ground truth": model.GROUND_TRUTH[estimate.name] }
    for estimate in calibrator.get_parameter_estimates().estimates
])
emukit_results
parameter estimate ground truth
0 beta 0.398069 0.4
1 gamma 0.108188 0.1

The optimiser is able to retrieve the ground-truth parameter values from our simulation study.

Black-Box Optimisation via Efficient Global Optimisation

Again, let’s use calisim to perform calibration via black-box optimisation using the simulated and observed number of infections. Rather than performing using Bayesian optimisation via a Gaussian process surrogate model using EmuKit, let’s perform Efficient Global Optimisation using a Kriging surrogate via OpenTurns.

We can reuse the parameter specification and objective function, but we’ll need to create a new OptimisationMethodModel specification and OptimisationMethod calibrator:

specification = OptimisationMethodModel(
	experiment_name="openturns_optimisation",
	parameter_spec=parameter_spec,
	observed_data=observed_data.dotI.values,
	method="kriging",
	n_init=20,
	n_iterations=25,
	n_out=1,
	directions=["minimize"],
	output_labels=["Number of Infected"],
    calibration_func_kwargs=dict(t=observed_data.day),
)

calibrator = OptimisationMethod(
	calibration_func=objective, specification=specification, engine="openturns"
)

calibrator.specify().execute().analyze()
<calisim.optimisation.implementation.OptimisationMethod at 0x7fa73e58c460>
../../_images/5de25f663d82ba6b2ba6303b4ff26bf12c0233daab95eb3dfb2208237c1bf3a1.png ../../_images/7f9eeddf3fdb3ffa55b90cd7ba7d2ae05057c6c5b5a790d34ac9df8e1514790f.png
openturns_results = pd.DataFrame([
    { "parameter": estimate.name, "estimate": estimate.estimate, "ground truth": model.GROUND_TRUTH[estimate.name] }
    for estimate in calibrator.get_parameter_estimates().estimates
])
openturns_results
parameter estimate ground truth
0 beta 0.399979 0.4
1 gamma 0.099978 0.1

We can see that OpenTurns is also able to recover the ground truth parameters.