Extended Likelihood

Unbinned Extended Likelihood

Let \(x\) be a random variable distributed according to a p.d.f. \(~f\left(x\,\middle|\,\vec{\theta}\right)\),

\[ \begin{equation*} x \sim f\left(x\,\middle|\,\vec{\theta}\right), \end{equation*} \]

with \(n\) observations, \(\vec{x} = \left(x_1, \cdots, x_n\right)\), and \(m\) unknown parameters, \(\vec{\theta} = \left(\theta_1, \cdots, \theta_m\right)\). The likelihood would normally then be

\[ L\left(\vec{\theta}\right) = \prod_{i=1}^{n} f\left(x_i; \vec{\theta}\right). \]

However, if \(n\) itself is a Poisson random variable with mean \(\nu\),

\[ n \sim \text{Pois}\left(n \,\middle|\, \nu\right), \]

then it follows that

\[\begin{split} \begin{align} L\left(\nu; \vec{\theta}\right) &= \text{Pois}\left(n; \nu\right) \prod_{i=1}^{n} f\left(x_i; \vec{\theta}\right) \notag\\ &= \frac{\nu^{n}\,e^{-\nu}}{n!} \prod_{i=1}^{n} f\left(x_i; \vec{\theta}\right) \notag\\ &= \frac{e^{-\nu}}{n!} \prod_{i=1}^{n} \nu\, f\left(x_i; \vec{\theta}\right). %\label{eq_extended-likelihood} \end{align} \end{split}\]

This equation is known as the “extended likelihood function”, as we have “extended” the information encoded in the likelihood to include the expected number of events — a quantity of great importance to physicists. It can be see from inspection though that the extended likelihood still follows the form of a likelihood, so no different treatment is required in finding its MLE estimators.

\(\nu\) is dependent on \(\vec{\theta}\)

In the instance that \(\nu\) is a function of \(\vec{\theta}\), \(\nu = \nu\left(\vec{\theta}\right)\), then

\[ L\left(\vec{\theta}\right) = \frac{e^{-\nu\left(\vec{\theta}\right)}}{n!} \prod_{i=1}^{n} \nu\left(\vec{\theta}\right)\, f\left(x_i; \vec{\theta}\right), \]

such that

\[ \ln L\left(\vec{\theta}\right) = - \nu\left(\vec{\theta}\right) - \ln n! + \sum_{i=1}^{n} \ln\left(\nu\left(\vec{\theta}\right)\, f\left(x_i; \vec{\theta}\right)\right), \]

where \(n\) is a constant of the data, and so will have no effect on finding the estimators of any parameters, leading it to be safely ignored. Thus,

\[ \begin{equation} \boxed{-\ln L\left(\vec{\theta}\right) = \nu\left(\vec{\theta}\right) -\sum_{i=1}^{n} \ln\left(\nu\left(\vec{\theta}\right)\, f\left(x_i; \vec{\theta}\right)\right)}\,. \end{equation} \]

Note that as the resultant estimators, \(\hat{\vec{\theta}}\), exploit information from both \(n\) and \(x\) this should generally lead to smaller variations for \(\hat{\vec{\theta}}\).

\(\nu\) is independent of \(\vec{\theta}\)

In the instance that \(\nu\) is independent of \(\vec{\theta}\),

\[ L\left(\nu; \vec{\theta}\right) = \frac{e^{-\nu}}{n!} \prod_{i=1}^{n} \nu\, f\left(x_i; \vec{\theta}\right), \]

then

\[\begin{split} \begin{split} \ln L\left(\nu; \vec{\theta}\right) &= - \nu - \ln n! + \sum_{i=1}^{n} \ln\left(\nu\, f\left(x_i; \vec{\theta}\right)\right)\\ &= - \nu + \sum_{i=1}^{n} \left(\ln\nu + \ln f\left(x_i; \vec{\theta}\right)\right) - \ln n! \\ &= - \nu + n \ln\nu + \sum_{i=1}^{n} \ln f\left(x_i; \vec{\theta}\right) - \ln n!\,, \end{split} \end{split}\]

such that

\[ \begin{equation} \boxed{-\ln L\left(\nu; \vec{\theta}\right) = \nu - n \ln\nu - \sum_{i=1}^{n} \ln f\left(x_i; \vec{\theta}\right)}\,. \end{equation} \]

As \(L\) is maximized with respect to a variable \(\alpha\) when \(−\ln ⁡L\) is minimized,

\[ \frac{\partial \left(-\ln L\right)}{\partial \alpha} = 0, \]

then it is seen from

\[ \frac{\partial \left(-\ln L\left(\nu; \vec{\theta}\right)\right)}{\partial \nu} = 1 - \frac{n}{\nu} = 0, \]

that the maximum likelihood estimator for \(\nu\) is \begin{equation} \hat{\nu} = n,, \end{equation}

and that

\[ \frac{\partial \left(-\ln L\left(\nu; \vec{\theta}\right)\right)}{\partial \theta_j} = 0, \]

results in the the same estimators \(\hat{\vec{\theta}}\) as in the “usual” maximum likelihood case.

If the p.d.f. is of the form of a mixture model,

\[ f\left(x; \vec{\theta}\right) = \sum_{i=1}^{m} \theta_i\, f_i\left(x\right), \]

and an estimate of the weights is of interest, then as the parameters are not fully independent, given the constraint

\[ \sum_{i=1}^{m} \theta_i = 1, \]

then one of the \(m\) parameters can be replaced with

\[ 1 - \sum_{i=1}^{m-1} \theta_i, \]

so that the p.d.f. only constrains \(m-1\) parameters. This then allows the the likelihood to be constructed that allows to find the estimator for the unconstrained parameter.

Equivalently, the extended likelihood function can be used, as

\[\begin{split} \begin{split} \ln L\left(\nu; \vec{\theta}\right) &= - \nu + n \ln\nu + \sum_{i=1}^{n} \ln f\left(x_i; \vec{\theta}\right) \\ &= - \nu + \sum_{i=1}^{n} \ln \left(\nu\,f\left(x_i; \vec{\theta}\right)\right)\\ &= - \nu + \sum_{i=1}^{n} \ln \left(\sum_{j=1}^{m} \nu\,\theta_j\, f_j\left(x_i\right)\right). \end{split} \end{split}\]

Letting \(\mu_i\), the expected number of events of type \(i\), be \(\mu_i \equiv \theta_i \nu\), for \(\vec{\mu} = \left(\mu_1, \cdots, \mu_m\right)\), then

\[ \ln L\left(\vec{\mu}\right) = - \sum_{j=1}^{m} \mu_j + \sum_{i=1}^{n} \ln \left(\sum_{j=1}^{m} \mu_j\, f_j\left(x_i\right)\right). \]

Here, \(\vec{\mu}\) are unconstrained and all parameters are treated symmetrically, such that \(\hat{\mu_i}\) give the maximum likelihood estimator means of number of events of type \(i\).

Toy Example

import numpy as np
from scipy.optimize import minimize
def NLL(x, n, S, B):
    nll = sum(
        (x[0] * S[meas] + B[meas]) - (n[meas] * np.log(x[0] * S[meas] + B[meas]))
        for meas in np.arange(0, len(n))
    )
    return nll
n_observed = [6, 24]
f = np.array([1.0])
S = [0.9, 4.0]
B = [0.2, 24.0]

model = minimize(NLL, f, args=(n_observed, S, B), method="L-BFGS-B", bounds=[(0, 10)])
print(f"The MLE estimate for f: {model.x[0]}")
The MLE estimate for f: 2.6155471948792126

HistFactory Example

Consider a single channel with one signal and one bakcagound contribution (and no systematics). For \(n\) events, signal model \(f_{S}(x_e)\), background model \(f_{B}(x_e)\), \(S\) expected signal events, \(B\) expected backagound events, and signal fraciton \(\mu\), a “marked Poisson model” [2] may be constructed, which treating the data as fixed results in the likelihood of

\[\begin{split} \begin{split} L\left(\mu\right) &= \text{Pois}\left(n \,\middle|\, \mu S + B\right) \prod_{e=1}^{n} \frac{\mu S\, f_{S}\left(x_e\right) + B\, f_{B}\left(x_e\right)}{\mu S + B}\\ &= \frac{\left(\mu S + B\right)^{n} e^{-\left(\mu S + B\right)}}{n!} \prod_{e=1}^{n} \frac{\mu S\, f_{S}\left(x_e\right) + B\, f_{B}\left(x_e\right)}{\mu S + B}\\ &= \frac{e^{-\left(\mu S + B\right)}}{n!} \prod_{e=1}^{n} \left(\,\mu S\, f_{S}\left(x_e\right) + B\, f_{B}\left(x_e\right)\right), \end{split} \end{split}\]

and so

\[ -\ln L\left(\mu\right) = \left(\mu S + B\right) - \sum_{e=1}^{n} \ln \left(\,\mu S\, f_{S}\left(x_e\right) + B\, f_{B}\left(x_e\right)\right) + \underbrace{\ln n!}_{\text{constant}}. \]

Binned Extended Likelihood

References and Acknowledgements

  1. Statistical Data Analysis, Glen Cowan, 1998

  2. ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, HistFactory: A tool for creating statistical models for use with RooFit and RooStats, 2012.

  3. Vince Croft, Discussions with the author at CERN, July 2017