History Matching
The following tutorial demonstrates how one may calibrate a simulation model via history matching using calisim. We will first import our required dependencies.
from calisim.data_model import (
DistributionModel,
ParameterDataType,
ParameterSpecification,
)
from calisim.example_models import SirOdesModel
from calisim.history_matching import (
HistoryMatchingMethod,
HistoryMatchingMethodModel,
)
import numpy as np
import pandas as pd
from scipy.integrate import solve_ivp
import warnings
warnings.filterwarnings("ignore")
SIR Model Parameters and Initial Conditions
We next define our forward model. We will use an SIR (susceptible, infected, and recovered) compartmental model, combined with SciPy’s solver for ordinary differential equations. The SIR model is expressed as a system of ordinary differential equations where:
Parameter |
Value |
Description |
|---|---|---|
β (beta) |
0.4 |
Infection rate: probability of transmission per contact per time unit |
γ (gamma) |
0.1 |
Recovery rate: fraction of infected recovering per time unit |
Average infectious period = 1 / γ = 10 time units |
With the following compartments:
Compartment |
Symbol |
Initial Value |
Description |
|---|---|---|---|
Susceptible |
S0 |
999 |
Individuals who can catch the disease (N - I0 - R0) |
Infected |
I0 |
1.0 |
Individuals currently infected and can spread the disease |
Recovered |
R0 |
0 |
Individuals recovered or removed; no longer infectious |
def sir_simulate(parameters: dict) -> np.ndarray | pd.DataFrame:
def dX_dt(_: np.ndarray, X: np.ndarray) -> np.ndarray:
S, I, _ = X
dotS = -parameters["beta"] * S * I / parameters["N"]
dotI = (
parameters["beta"] * S * I / parameters["N"] - parameters["gamma"] * I
)
dotR = parameters["gamma"] * I
return np.array([dotS, dotI, dotR])
X0 = [parameters["S0"], parameters["I0"], parameters["R0"]]
t = (parameters["t"].min(), parameters["t"].max())
x_y = solve_ivp(
fun=dX_dt, y0=X0, t_span=t, t_eval=parameters["t"].values.flatten()
).y
df = pd.DataFrame(dict(dotS=x_y[0, :], dotI=x_y[1, :], dotR=x_y[2, :]))
return df
We will perform a simulation study with the following ground-truth parameters:
model = SirOdesModel()
pd.DataFrame(model.GROUND_TRUTH, index=[0])
| beta | gamma | N | I0 | R0 | S0 | |
|---|---|---|---|---|---|---|
| 0 | 0.4 | 0.1 | 1000 | 1.0 | 0 | 999.0 |
When supplied to our forward model, these ground-truth parameters will generate the observed data below:
observed_data = model.get_observed_data()
observed_data.head(6)
| dotS | dotI | dotR | day | |
|---|---|---|---|---|
| 0 | 999.000000 | 1.000000 | 0.000000 | 0 |
| 1 | 998.534208 | 1.349201 | 0.116592 | 1 |
| 2 | 997.906105 | 1.819995 | 0.273899 | 2 |
| 3 | 997.059813 | 2.454180 | 0.486007 | 3 |
| 4 | 995.919926 | 3.308098 | 0.771976 | 4 |
| 5 | 994.385263 | 4.457212 | 1.157524 | 5 |
Let’s view the trajectory of infected individuals over time in days.
observed_data.plot.scatter("day", "dotI")
<Axes: xlabel='day', ylabel='dotI'>
History Matching via the Sequential Iterative Ensemble Smoother
Next, let’s use calisim to perform calibration via history matching using the simulated and observed number of infections.
To start with, we’ll need to define our ParameterSpecification parameter specification using Normally distributed priors:
parameter_spec = ParameterSpecification(
parameters=[
DistributionModel(
name="beta",
distribution_name="normal",
distribution_args=[0.35, 0.05],
data_type=ParameterDataType.CONTINUOUS,
),
DistributionModel(
name="gamma",
distribution_name="normal",
distribution_args=[0.08, 0.01],
data_type=ParameterDataType.CONTINUOUS,
),
]
)
This contains information concerning the various parameter names, probability distributions, ranges, distribution parameters, and data types.
We next need to define a wrapper function around our forward model to ensure there’s compatibility with the calisim API.
def history_matching_func(
parameters: dict, simulation_id: str, observed_data: np.ndarray | None, t: pd.Series
) -> float | list[float]:
simulation_parameters = model.GROUND_TRUTH.copy()
simulation_parameters["t"] = t
for k in ["beta", "gamma"]:
simulation_parameters[k] = parameters[k]
simulated_data = sir_simulate(simulation_parameters).dotI.values
return simulated_data
The last step is to create a HistoryMatchingMethodModel specification for the calibration procedure itself, which we then supply to a HistoryMatchingMethod calibrator. We’ll use the sequential iterative ensemble smoother (SIES) method via the iterative_ensemble_smoother engine.
specification = HistoryMatchingMethodModel(
experiment_name="ies_history_matching",
parameter_spec=parameter_spec,
observed_data=observed_data.dotI.values,
method="sies",
n_samples=50,
n_iterations=8,
output_labels=["Number of Infected"],
covariance=np.eye(observed_data.dotI.values.shape[0]),
calibration_func_kwargs=dict(t=observed_data.day),
)
calibrator = HistoryMatchingMethod(
calibration_func=history_matching_func, specification=specification, engine="ies"
)
Finally, we’ll run the calibration procedure. This is composed of 3 steps:
Specify: Define your calibration problem: Parameter distributions, observed data, objective/discrepancy function, and calibration settings (like algorithm, directions, iterations)
Execute: Run the actual calibration process (simulation + optimization/inference)
Analyze: Process, summarize, and optionally save plots/metrics of the calibration results
Or SEA.
calibrator.specify().execute().analyze()
<calisim.history_matching.implementation.HistoryMatchingMethod at 0x7f8823db3190>
Let’s view the parameter estimates produced by the calibrator, with some measure of uncertainty.
pd.DataFrame([
{ "parameter": estimate.name, "estimate": estimate.estimate, "uncertainty": estimate.uncertainty, "ground truth": model.GROUND_TRUTH[estimate.name] }
for estimate in calibrator.get_parameter_estimates().estimates
])
| parameter | estimate | uncertainty | ground truth | |
|---|---|---|---|---|
| 0 | beta | 0.400196 | 0.001135 | 0.4 |
| 1 | gamma | 0.100281 | 0.000669 | 0.1 |
The history matching calibrator is able to retrieve the ground-truth parameter values from our simulation study.
History Matching via the Ensemble Smoother with Multiple Data Assimilation
Let’s reuse the iterative_ensemble_smoother engine. But we’ll demonstrate how to alter the calibration algorithm to the Ensemble Smoother with Multiple Data Assimilation (ESMDA) with minimal code changes.
specification.method = "esmda"
calibrator = HistoryMatchingMethod(
calibration_func=history_matching_func, specification=specification, engine="ies"
)
calibrator.specify().execute().analyze()
<calisim.history_matching.implementation.HistoryMatchingMethod at 0x7f8824174250>
Again, let’s view the parameter estimates produced by the calibrator, with some measure of uncertainty.
pd.DataFrame([
{ "parameter": estimate.name, "estimate": estimate.estimate, "uncertainty": estimate.uncertainty, "ground truth": model.GROUND_TRUTH[estimate.name] }
for estimate in calibrator.get_parameter_estimates().estimates
])
| parameter | estimate | uncertainty | ground truth | |
|---|---|---|---|---|
| 0 | beta | 0.400021 | 0.000296 | 0.4 |
| 1 | gamma | 0.100035 | 0.000213 | 0.1 |