1. Mathematical Modelling in Biology
Science will progress faster and further by a marriage of mathematical and empirical biology. This marriage will be even more successful if more biologists can use math, when needed, to advance their own research goals.
Intro
This chapter serves as a motivational one, highlighting the importance and usefulness of mathematical modelling in biology. To be fair, this is not a small book, and I’d be surprised if many biologists got as far as picking it up, without already being fairly motivated!. They start by discussing the sheer number of published articles using mathematical models of some kind (Table 1), and the ability of models to say a lot, with only an handful of symbols (you will of course need to read the book if you want to hear what this models are saying). The meat of the chapter takes the form of a case study of the role of mathematical modelling in understanding/predicting trajectories in the HIV/AIDS epidemic. This is put forward as a clear example of a hugely important application, and one where models have been essential in producing testable predictions and generating forecasts for use in prevention and healthcare interventions. I’m going to briefly mention 3 of the examples mentioned in the book. I won’t really go into the how’s or why’s of the maths, as we haven’t covered any of that in the book yet. Instead I’ll share code examples of how we can recreate (or expand upon) figures presented in the book and the associated primary literature, to show what we will be able to do once we’ve progressed and started developing our own models.
This page (unlike most others in the site) requires WebGL in order to view the visualisations. If you are using the Safri browser, you may have to enable WebGL
Journal (in 2001) | Number of articles | Genral use of models | Specific use of models | Equations Presented |
---|---|---|---|---|
American Naturalist | 105 | 96\% | 59\% | 58\% |
Ecology | 274 | 100\% | 35\% | 38\% |
Evolution | 231 | 100\% | 35\% | 33\% |
Dynamics of HIV After Initial Infection
Here the authors discuss a paper by Andrew Phillips (1996) which used a mathematical model to show that it is at least plausible that the drop off in the number of virions in a patient (following an early peak) is a simple consequence of a decline in the number of the particular cells (CD4+ T cells) that HIV infects. Previously it was assumed that this drop off was due to immune response. However, by modelling the dynamics of immune cells and HIV virions without any change in immune response, Phillips showed that it is possible to see the same pattern of early peak followed by decline, based solely on the population dynamics of virions and immune cells.
The model includes uninfected CD4 lymphocytes (\(R\)), latently infected cells (\(L\)), actively infected cells (\(E\)), and free virions (\(V\)). Activated, uninfected CD4 lymphocytes arise at a constant rate \(\Gamma_{\tau}\) and they are removed by HIV-independent death at rate \(\mu\), or by infection at rate \(\beta V\).
\[ \frac{d R}{d t}=\Gamma \tau-\mu R-\beta R V \]
Upon infection, a proportion \(p\) of cells become latently infected, and are removed by HIV-independent cell death or by activation at rate \(\alpha\) \[ \frac{d L}{d t}=p \beta R V-\mu L-\alpha L \]
Actively infected cells are generated immediately after infection or from the activation of latently infected cells before they die at rate \(\delta\) \[ \frac{d E}{d t}=(1-p) \beta R V+\alpha L-\delta E \] Free virions are produced at rate \(\pi\) by actively infected cells and removed at rate \(\sigma\) \[ \frac{d V}{d t}=\pi E-\sigma V \] And the total number of CD4 lymphocytes is \[ 1000(1 - \tau) + R + L + E \]
Given these equations and the parameter values and starting densities in the paper, we can recreate Figure 1.5 form the book. In fact we’ll go one better and reproduce Figure 1 A from the paper, which additionally includes the number of CD4 cells with a different y-axis scale from the number of virions.
First we list the parameter values from the paper, and collect them in a vector.
= 1.36
Γ = 1.36e-3
μ = 0.2
τ = 0.00027
β = 0.1
p = 3.6e-2
α = 2
σ = 0.33
δ π = 100
= [Γ, μ, τ, β, p, α, σ, δ, π] P
Next we define our system of equations within a function.
using DifferentialEquations
# Define a differential equation function
function phillips₉₆(du,u,P,t)
π = P # Unpack the parameters
Γ, μ, τ, β, p, α, σ, δ, = u[1:4] # Unpack the variables
R, L, E, V
1] = Γ*τ - μ*R - β*R*V #dR/dt
du[2] = p*β*R*V -μ*L - α*L #dL/dt
du[3] = (1-p)β*R*V + α*L -δ*E #dE/dt
du[4] = π*E - σ*V #dV/dt
du[end
Now we have our function and our parameter values, all we need are initial values for the variables (I’ve taken these from the paper also) and a timespan we are interested in.
When implementing someone else’ model take care with the timespan. Here the author of the paper state that their rate parameters are rates of change per day. Thus, the first 120 days is t = 0 to t = 120, but that won’t always be the case.
# Define and ordinary differential equation problem
= ODEProblem(phillips₉₆, # Function
prob 200.0,0.0,0.0,4e-7], # Variables inital values
[0.0,120), # Time span
(# Parameter values
P)
# Solve the problem
= solve(prob, # The ODEProblem defined above
sol = 0.5e-9, # Decreased "relative tolerance" as default value
reltol # led to slight innacuracy towards end of simulation
= 1) # Store a point at each interger value of t, for plotting saveat
Now we are ready to plot. Notice in the code above I manually set the reltol
argument in the solve
function. I only realised that I had to do this because I had a reference output, i.e. my plot did not match the plot from the paper until I lowered reltol
. Figure 1 A from the paper has two lines with different y-axis scales, so I thought it would be nice to show one approach to achieving this in Julia using the Makie plotting library.
using WGLMakie
= Array(sol)'
s = Figure()
fig = Axis(fig[1,1],xticks = 0:30:120,
ax1 = false, ygridvisible = false,
xgridvisible = "days", ylabel = "lymphocytes")
xlabel = Axis(fig[1,1],yaxisposition = :right,
ax2 = false, ygridvisible = false,
xgridvisible = "virions")
ylabel lines!(ax1,(1000(1 - τ) .+ s[:,1] .+ s[:,2] .+ s[:,3])[1:end],
= :black)
color lines!(ax2,log10.(s[:,4]), color = :black)
= (log10.([0.1,1,10,100,1000,10000]),
ax2.yticks string.([0.1,1,10,100,1000,10000]))
ylims!(ax2,(-1,log10(10000)))
ylims!(ax1,0.0,1200)
hidexdecorations!(ax2)
text!(ax1, "CD4 lymphocytes", position = (45,825))
text!(ax2, "Cell-free virus", position = (40,1))
fig
Predicting the Number of New Infections
Continuing the theme of models working together with data, here the authors present a 2001 paper by Williams et al. which developed a novel model for predicting age-specific risks of contracting HIV from (in some cases very sparse) data on prevalence of HIV. Aside from being a useful contribution, giving people the tools to make targeted decisions about where to focus resources in the battle against HIV/AIDS, they also take a really interesting approach to the maths. Unlike the other models discussed above, this is not system of differential equations but instead they fit a modified log-normal function with well reasoned (backed by evidence shown in the paper) assumptions, parameter values and choices of functions. The authors fit their model to data using maximum likelihood estimation and subsequently estimate confidence intervals by a monte-carlo procedure, all in bespoke Visual Basic code. Thankfully, with modern probabilistic programming languages PPLs like Turing.jl
now we can fit arbitrarily complex functions to data in easy, human-readable syntax, like so…
@model function hiv(age, tot, pos)
# priors
~ truncated(Cauchy(0,30), lower = 0.0)
N ~ truncated(Cauchy(0,30), lower = 0.0)
m ~ Exponential()
σ # transformed variables
= P.(age,N,m,σ)
x # likelihood
.~ Binomial.(tot,x)
pos end
where age
, tot
, and pos
are vectors or age groups (age in years) and their corresponding counts of total tested and the number found to be HIV positive. N
, m
, σ
are parameters to be inferred and P
is a function to compute the prevalence of HIV (for a given age group at a certain time) given those parameters. Thus, once we have figured out the appropriate functions to model reality (the hard part!) we can do full Bayesian inference and get a decent picture of our uncertainty around not only our parameter values but (more importantly in cases like this) our predictions.
Code
using Turing, StatsFuns
# The data are emebded in a text file that I extracted from the pdf
# of the paper.
= split.(readlines("out.txt")[225:260])
tbl = parse.(Int, [t[1] for t in tbl])
age = parse.(Int, [t[2] for t in tbl])
tot = parse.(Int, [t[3] for t in tbl])
pos = parse.(Float64, [t[4] for t in tbl])
obs # I'm wrapping everything in a `let` block so I can reassign some
# previousy used variable names as function names in the local scope.
let a₀ = 10.0
π = Base.π # re-assign π it's normal value
# Equations from the paper
P(t) = logistic(0.372t − 744.6)
function μ(â,ã)
= â - ã
d > 2 ? mort(d -2) : 1.0
d end
mort(diff) = exp(-diff/(7/log(2)))
f(a,N,m,σ,t) = R(a,N,m,σ)P(t)
R(a,N,m,σ) = a > a₀ ? N/(σ√(2π)*(a-a₀))*exp(-(log(a-a₀) - log(m))^2 /2σ^2) : 0.0
s(a,N,m,σ,t = 1998) = exp(-sum(f.(0:a,N,m,σ,t-a:t)))
c(a,N,m,σ,t = 1998) = sum( [μ(a,ã)*f(ã,N,m,σ,t-(a-ã))*s(ã,N,m,σ,t-(a-ã)) for ã in 0:a])
function P(a,N,M,σ,t = 1998)
= s(a,N,M,σ,t)
sₐₜ = c(a,N,M,σ,t)
cₐₜ /(cₐₜ + sₐₜ)
cₐₜend
# Probablistic model
@model function hiv(age, tot, pos)
~ truncated(Cauchy(0,30), lower = 0.0)
N ~ truncated(Cauchy(0,30), lower = 0.0)
m ~ Exponential()
σ
= P.(age,N,m,σ)
x .~ Binomial.(tot,x)
pos # return some generated quatities, to be extracted for plotting
return [P.(10:50,N,m,σ,1998), f.(10:50,N,m,σ,1998)]
end
# Fit model to data using 200 iterations of No U-Turns Sampler (NUTS)
= hiv(age, tot, pos)
hivmod = sample(hivmod, NUTS(), 200)
chn # Extract generated quanities for plotting
= generated_quantities(hivmod, chn)
gens # Plot
=scatter(age,obs)
fig,ax,plt lines!.(ax,[hcat(10:50,gen[1]) for gen in gens], color = (:grey,0.1))
lines!.(ax,[hcat(10:50,gen[2]) for gen in gens], color = (:red,0.1))
text!(ax, "newly infected \"risk\"", position = (18,0.02))
text!(ax, "prevalence", position = (30,0.45))
= "Age in years"
ax.xlabel = "Proportion of women"
ax.ylabel hidedecorations!(ax, ticklabels = false, label = false)
figend
Concluding Remarks
For me, this opening chapter really hit the mark. The HIV/AIDS case study was well chosen, since the nature of the infection meant that ecological (population dynamics) and evolutionary (mutation limitation) theory were employed both to understand the plausible range of processes going on during infection and to design treatments against a rapidly evolving antagonist. Thus, I was at once hit with the excitement of scientific discovery (guided by insights gleaned form models) and reminded that immediate and substantial human benefits can be derived from these insights. However, In their concluding message for the chapter Otto & Day also stress the limitations of mathematical modelling. For one, modelling results are only as interesting as the question(s) being asked. This is an obvious but important point. It is easy to become disillusioned when frequently confronted models that seem to have been formulated just for the sake of it. They then go on to stress the importance of a “marriage of mathematical and empirical biology”, stating that facilitating this indeed is the purpose of the book. Unfortunately, it is quite common to encounter the disconnect that often (but not always) exists between modelling and data, so I am glad that this is their aim. But as a final remark on the quality of this first chapter, the concluding message was not really needed. The examples presented throughout made it clear how powerful a proper marriage of theory and data can be when done properly and when asking the right questions.