TemplateFitter Reference¶

Reference page for the TemplateFitter package.

Histogram Module¶

AbstractHist¶

class templatefitter.AbstractHist¶

Abstract histogram class. Used as base class by all histograms.

bin_counts¶: numpy.ndarray – Bin counts. Shape is (num_bins,).

bin_edges¶: numpy.ndarray – Bin edges. Shape is (num_bins,).

bin_errors¶: numpy.ndarray – Bin errors, calculated as \(\sqrt{\sum_i w_i^2}\). Shape is (num_bins,).

bin_errors_sq¶: numpy.ndarray – Bin errors square, calculated as \(\sum_i w_i^2\). Shape is (num_bins,).

bin_mids¶: numpy.ndarray – Bin mids. Shape is (num_bins,).

bin_width¶: float – Bin width.

num_bins¶: int – Number of bins.

Hist1d¶

class templatefitter.Hist1d(bins, limits=None, data=None, weights=None)¶

Implementation of a 1 dimensional histogram.

Parameters:

bins (int, np.ndarray, sequence) – Either the number of bins for the histogram or a sequence of bin edges.
limits (tuple of float, optional) – Lower and upper bound of the histogram. only required if bins is passed as int. Default is False.
data (numpy.ndarray, pandas.Series, optional) – Sequence which is filled into the histogram. Default is None.
weights (numpy.ndarray, pandas.Series, optional) – Sequence which is used as event weights. If no weights are given, each event is weighted with 1.0. Default is None.

bin_counts¶: numpy.ndarray – Bin counts. Shape is (num_bins,).

bin_edges¶: numpy.ndarray – Bin edges. Shape is (num_bins,).

bin_errors¶: numpy.ndarray – Bin errors, calculated as \(\sqrt{\sum_i w_i^2}\). Shape is (num_bins,).

bin_errors_sq¶: numpy.ndarray – Bin errors square, calculated as \(\sum_i w_i^2\). Shape is (num_bins,).

bin_mids¶: numpy.ndarray – Bin mids. Shape is (num_bins,).

bin_width¶: float – Bin width.

fill(data, weights=None)¶

Fills the histogram with given data and weights.

Parameters:	data (numpy.ndarray, pandas.Series) – Sequence which is filled into the histogram. weights (numpy.ndarray, pandas.Series, optional) – Sequence which is used as event weights. If no weights are given, each event is weighted with 1.0. Default is None.

classmethod from_binned_data(bin_edges, bin_counts, bin_errors=None)¶

Creates a Hist1d from a binned dataset.

Parameters:	bin_edges (numpy.ndarray) – Array with of bin edges. bin_counts (numpy.ndarray) – Array of bin counts. bin_errors (numpy.ndarray) – Array of bin errors.
Returns:	histogram
Return type:	Hist1d

num_bins¶: int – Number of bins.

Template Module¶

AbstractTemplate¶

class templatefitter.AbstractTemplate(name)¶

bin_edges¶: numpy.ndarray – Bin edges of the templates in this model.

bin_mids¶: numpy.ndarray – Bin mids of the templates in this model.

bin_width¶: float – Bin width of the template histogram

limits¶: tuple of float – Limits of the bin edges.

name¶: str – Template identifier.

variable¶: str – Variable identifier.

SimpleTemplate¶

class templatefitter.Template(name, variable, num_bins, limits, df, weight='weight')¶

add_covariance_matrix(covariance_matrix)¶

Add a covariance matrix for a systematic error to this template. This updates the total covariance matrix, the correlation matrix and the relative bin errors.

Parameters:	covariance_matrix (numpy.ndarray) – A covariance matrix. It is not checked if the matrix is valid (symmetric, positive semi-definite. Shape is (num_bins, num_bins).

bin_edges¶: numpy.ndarray – Bin edges of the templates in this model.

bin_mids¶: numpy.ndarray – Bin mids of the templates in this model.

bin_width¶: float – Bin width of the template histogram

corr_mat¶: numpy.ndarray – The correlation matrix of the template errors. Shape is (num_bins, num_bins).

cov_mat¶: numpy.ndarray – The covariance matrix of the template errors. Shape is (num_bins, num_bins).

errors()¶: numpy.ndarray: Total uncertainty per bin. This value is the product of the relative uncertainty per bin and the current bin values. Shape is (num_bins,).

fractions(nui_params)¶

Calculates the per bin fraction \(f_i\) of the template. This value is used to calculate the expected number of events per bin \(\nu_i\) as \(\nu_i=f_i\cdot\nu\), where \(\nu\) is the expected yield. The fractions are given as

\[f_i=\sum\limits_{i=1}^{n_\mathrm{bins}} \frac{\nu_i(1+\theta_i\cdot\epsilon_i)}{\sum_{j=1}^{n_\mathrm{bins}} \nu_j (1+\theta_j\cdot\epsilon_j)},\]

where \(\theta_j\) are the nuissance parameters and \(\epsilon_j\) are the relative uncertainties per bin.

Parameters:	nui_params (numpy.ndarray) – An array with values for the nuissance parameters. Shape is (num_bins,)
Returns:	Bin fractions of this template. Shape is (num_bins,).
Return type:	numpy.ndarray

generate_asimov_dataset(integer_values=False)¶

Generates an Asimov dataset using the template. This is a binned dataset which corresponds to the current expectation values. Since data takes only integer values, the template expectation in each bin is rounded to the nearest integer

Parameters:	integer_values (bool, optional) – Wether to round Asimov data points to integer values or not. Default is False.
Returns:	asimov_dataset
Return type:	Hist1d

generate_toy_dataset()¶

Generates a toy dataset using the template. This is a binned dataset where each bin is treated a random number following a poisson distribution with mean equal to the bin content of all templates.

Returns:	toy_dataset
Return type:	Hist1d

inv_corr_mat¶: numpy.ndarray – The invers correlation matrix of the template errors. Shape is (num_bins, num_bins).

limits¶: tuple of float – Limits of the bin edges.

name¶: str – Template identifier.

nui_param_values¶: numpy.ndarray – Values of the the nuissance parameters. Shape is (num_bins,).

nui_params¶: TemplateParameter – Nuissance parameters.

nui_params_errors¶: numpy.ndarray – Errors of the the nuissance parameters. Shape is (num_bins,).

reset_parameters()¶: Sets all parameters to their original values.

values()¶

Calculates the expected number of events per bin using the current yield value and nuissance parameters.

Returns:
Return type:	numpy.ndarray

variable¶: str – Variable identifier.

yield_param¶: TemplateParameter – Yield parameter.

yield_param_errors¶: float – Error of the yield parameter

yield_param_values¶: float – Value of the yield parameter

AdvancedTemplate¶

class templatefitter.StackedTemplate(name, variable, num_bins, limits)¶

add_template(tid, template)¶

Adds an instance of an Template to the container.

Parameters:	tid (str) – Id for the template which is used as key in the internal map which stores the templates. template (Template) – Instance of a template model.
Raises:	`ValueError` – If the given template is not compatible with the container.

bin_edges¶: numpy.ndarray – Bin edges of the templates in this model.

bin_mids¶: numpy.ndarray – Bin mids of the templates in this model.

bin_width¶: float – Bin width of the template histogram

create_nll(dataset)¶

Creates a negative log likelihood object from the

Parameters:	dataset (Hist1d) – Binned dataset in the form of a histogram object
Returns:
Return type:	StackedTemplateNegLogLikelihood

create_template(tid, df, weight='weight')¶

Creates an instance of Template and adds it to the container.

Parameters:	tid (str) – Id for the template which is used as key in the internal map which stores the templates. df (pandas.DataFrame) – A pandas.DataFrame instance. The column specified by var_id is used to construct the template histogram. weight (str, optional) – Optional string specifying the column name in df with the event weights. Default is ‘weight’.

errors()¶: numpy.ndarray: Sum over all template errors squared in each bin (bin errors of the stacked template).

fractions(nuiss_params)¶

Evaluates all bin_fractions methods of all templates in this container. Here, the bin fractions depend on so called nuissance parameters which incorporate uncertainties on the template shape.

Parameters:	nuiss_params (numpy.ndarray) – Array of nuissance parameter values needed for the evaluation of the AdvancedTemplateModel bin_fraction method.
Returns:	A 2D array of bin fractions. The first axis represents the templates in this container and the second axis represents the bins of each template. Shape is (num_templates, num_bins).
Return type:	numpy.ndarray

generate_asimov_dataset(integer_values=False)¶

Generates an Asimov dataset from the given templates. This is a binned dataset which corresponds to the current expectation values. Since data takes only integer values, the template expectation in each bin is rounded to the nearest integer.

Parameters:	integer_values (bool, optional) – Wether to round Asimov data points to integer values or not. Default is False.
Returns:	asimov_dataset
Return type:	Hist1d

generate_toy_dataset()¶

Generates a toy dataset from the given templates. This is a binned dataset where each bin is treated a random number following a poisson distribution with mean equal to the bin content of all templates.

Returns:	toy_dataset
Return type:	Hist1d

inv_corr_mats¶: list of numpy.ndarray – A list of inverse correlation matrices for all templates.

limits¶: tuple of float – Limits of the bin edges.

name¶: str – Template identifier.

nui_param_errors¶: numpy.ndarray – An array with current nuissance parameter errors of all templates.

nui_param_values¶: numpy.ndarray – An array with current nuissance parameter values of all templates.

nui_params¶: List of TemplateParameter – Nuissance parameters.

num_templates¶: int – Number of templates in this container.

plot_on(ax, **kwargs)¶

Plots the templates as stacked histogram on a given axis. Also the total uncertainty is plotted as hatched bars.

Parameters:	ax (matplotlib.axes.Axes) – An instance of a matplotlib axis. **kwargs – Additional keyword arguments used to change the plot.

reset_parameters()¶: Sets all parameters of all templates to their original values.

template_names¶: list of str – List of all template names.

update_parameters(new_parameters, new_errors)¶

Updates all template yields and nuissance parameters.

Parameters:	new_parameters (np.ndarray) – New yield and nuissance parameter values. Shape is (num_templates + num_templates``num_bins,). new_errors* (np.ndarray) – New yield and nuissance parameter errors. Shape is (num_templates + num_templates`*`num_bins,).

values()¶: numpy.ndarray: Sum over all template values in each bin (bin counts of the stacked template).

variable¶: str – Variable identifier.

yield_param_errors¶: numpy.ndarray – An array with current yield parameter errors of all templates.

yield_param_values¶: numpy.ndarray – An array with current yield parameter values of all templates.

yield_params¶: List of TemplateParameter – Yield parameters.

NegativeLogLikelihood Module¶

AbstractTemplateCostFunction¶

class templatefitter.AbstractTemplateCostFunction(histdataset, templates)¶

Abstract base class for all cost function to estimate yields using the template method.

Parameters:	histdataset (AbstractHist) – Bin counts of the data histogram. Shape is (nbins,). templates (AbstractTemplate) – A CompositeTemplate instance. The templates are used to extract the contribution from each process described by the templates to the measured data set.

param_names¶: list of str – Parameter names. Used for convenience.

x0¶: numpy.ndarray – Starting values for the minimization.

StackedTemplateNegLogLikelihood¶

class templatefitter.StackedTemplateNegLogLikelihood(binned_dataset, templates)¶

A negative log likelihood (NLL) function for binned data using template histograms shapes as pdfs. The NLL is calculated as

\[-\log(L) = \sum \limits_{i=1}^{n_\mathrm{bins}} (\nu_i - n_i ) - n_i \log(\frac{\nu_i}{n_i}),\]

with:

\(\nu_i\) - total expected number of events in bin \(i\)
\(n_i\) - measured number of events in bin \(i\).

As you may note this is not the standard Poisson term in a Log Likelihood. A constant term has been added to bring the Likelihood in this form. The total expected number of events per bin is given by

\[\nu_i = \sum \limits_{k=1}^{n_\mathrm{templates}} f_{ik}\cdot \nu_{ik},\]

with:

\(\nu_{ik}\) - expected number of events in bin \(i\) of template \(k\)
\(f_{ik}\) - fraction of template \(k\) in bin \(i\).

\(f_{ik}\) does depend on a nuissance parameter \(\theta_{ik}\):

\[f_{ik} = \frac{\nu_{ik}(1 + \theta_{ik} \epsilon_{ik})}{\sum \limits_{j=1}^{ n_\mathrm{bins}}\nu_{jk}(1 + \theta_{jk}\epsilon_{jk})},\]

where \(\epsilon_{jk}\) is the relative uncertainty of template \(k\) in bin \(j\).

Parameters:	binned_dataset (Hist1d) – Histogram of the dataset. templates (StackedTemplate) – A StackedTemplate instance. The templates are used to extract the contribution from each process described by the templates to the measured data set.

param_names¶: list of str – Parameter names. Used for convenience.

x0¶: numpy.ndarray – Starting values for the minimization.

Fitter Module¶

TemplateFitter¶

class templatefitter.TemplateFitter(hdata, templates, minimizer_id)¶

This class performs the parameter estimation and calculation of a profile likelihood based on a constructed negative log likelihood function.

Parameters:	hdata (Hist1d) – Data histogram. templates (StackedTemplate) –

do_fit(update_templates=True, get_hesse=True, verbose=True, fix_nui_params=False)¶

Performs maximum likelihood fit by minimizing the provided negative log likelihoood function.

The log likelihood is minimized using the scipy’s minimize function with the ‘SLSQP’ method.

Parameters:	update_templates (bool, optional) – Whether to update the parameters of the given templates or not. Default is True. get_hesse (bool, optional) – Whether to calculate the Hesse matrix in the estimated minimum of the negative log likelihood function or not. Can be computationally expensive if the number of parameters in the likelihood is high. Default is True. verbose (bool, optional) – Whether to print fit information or not. Default is True fix_nui_params (bool, optional) – Wheter to fix nuissance parameters in the fit or not. Default is False.
Returns:	A namedtuple with the most important informations about the minimization.
Return type:	MinimizeResult

get_significance(tid, verbose=True)¶

Calculate significance for yield parameter of template specified by tid using the profile likelihood ratio.

Parameters:	tid (str) – Id of component in the composite template for which the significance of the yield parameter should be calculated.
Returns:	significance – Fit significance for the yield parameter in gaussian standard deviations.
Return type:	float

profile(param_id, num_points=100, sigma=2.0, subtract_min=True)¶

Performs a profile scan of the negative log likelihood function for the specified parameter.

Parameters:

param_id (int or string) – Parameter index or name.
num_points (int) – Number of points where the negative log likelhood is minimized.
sigma (float) – Defines the width of the scan. The scan range is given by sigma*uncertainty of the given parameter.
subtract_min (bool) – Wether to subtract the estimated minimum of the negative log likelihood function or not. Default is True.

Returns:

np.ndarray – Scan points. Shape is (num_points,).
np.ndarray – Profile values. Shape is (num_points,).
np.ndarray – Hesse approximation. Shape is (num_points,).

set_parameter_bounds(param_id, bounds)¶

Adds parameter and its boundaries to the bound parameter dictionary.

Parameters:	param_id (str or int) – Parameter identifier. boudns (tuple of float) – Lower and upper boundaries for this parameter.

set_parameter_fixed(param_id)¶

Adds parameter to the fixed parameter list.

Parameters:	param_id (str or int) – Parameter identifier.

ToyStudy¶

class templatefitter.ToyStudy(templates, minimizer_id)¶

This class helps you to perform toy monte carlo studies using given templates and an implementation of a negative log likelihood function. This is useful to discover possible biases or a over/under estimation of errors for fit parameters.

Parameters:	templates (TemplateCollection) – A instance of the TemplateCollection class.

do_experiments(n_exp=1000, max_tries=10)¶

Performs fits using the given template and generated toy monte carlo (following a poisson distribution) as data.

Parameters:	n_exp (int) – Number of toy experiments to run. max_tries (int) – Maximum number of tries for an experiment if a RuntimeError occurs.

do_linearity_test(template_id, limits, n_points=10, n_exp=200)¶

Performs a linearity test for the yield parameter of the specified template.

Parameters:

template_id (str) – Name of the template for which the linearity test should be performed.
limits (tuple of float) – Range where the yield parameter will be tested in.
n_points (int, optional) – Number of points to test in the given range. This samples n_points in a linear space in the range specified by limits. Default is 10.
n_exp (int, optional) – Number of toy experiments to perform per point. Default is 100.

get_toy_result_pulls(param_index)¶

Returns pulls of the results from the toy Monte Carlo study. The pull is defined as

\(p=\frac{\nu^{\mathrm{fit}} - \nu^{\mathrm{exp}}}{\sigma_{\nu^{\mathrm{exp}}}}\),

and should follow a standard noraml distribution.

Parameters:	param_index (int, list of int) – Index or indices of the parameter of interest.
Returns:	pulls – Pull values for the fitted values of parameters specified by param_index. Shape is (n_exp, len(param_index)).
Return type:	np.ndarray

get_toy_results(param_index)¶

Returns results from the toy Monte Carlo study.

Parameters:	param_index (int, list of int) – Index or indices of the parameter of interest.
Returns:	parameters (np.ndarray) – Results for the fitted values of parameters specified by param_index. Shape is (n_exp, len(param_index)). uncertainties (np.ndarray) – Results for the uncertainties of fitted values for parameters specified by param_index. Shape is (n_exp, len(param_index)).

result_parameters¶: np.ndarray – A 2D array of fit results for the parameters of the likelihood.

result_uncertainties¶: np.ndarray – A 2D array of uncertainties fo the fit results for the parameters of the likelihood.

Stats Module¶

templatefitter.stats.pearson_chi2_test(data, expectation, dof)¶

Performs a Pearson \(\chi^2\)-test. This test reflects the level of agreement between observed and expected histograms. The test statistic is

\[\chi^2=\sum\limits_{i=1}^{n_\mathrm{bins}} \frac{(n_i - \nu_i)^2}{\nu_i},\]

where \(n_i\) is the number of observations in bin \(i\) and \(\nu_i\) is the expected number of events in bin \(i\).

In the large sample limits, this test statistic follows a \(\chi^2\)-distribution with \(n_\mathrm{bins} - m\) degrees of freedom, where \(m\) is the number of unconstrained fit parameters.

Parameters:

data (np.ndarray) – Data bin counts. Shape is (num_bins,)
expectation (np.ndarray) – Expected bin counts. Shape is (num_bins,)
dof (int) – Degrees of freedom. This is the number of bins minus the number of free fit parameters.

Returns:

float – \(\chi^2/\mathrm{dof}\)
float – p-value.

templatefitter.stats.cowan_binned_likelihood_gof(data, expectation, dof)¶

Performs a GOF-test using a test statistic based on a binned likelihood function. The test statistic is the ratio \(\lambda(\nu) = L(\nu=\hat{\nu})/L(\theta=n)\), where \(\nu\) are the expected values in each bin. In the numerator (denominator), the likelihood is evaluated with the estimated values for \(\nu\) (the measured values).

In the large sample limit, the test statistic

\[\chi^2 = -2\log \lambda = 2\sum\limits_{i=1}^{n_\mathrm{bins}} n_i\log(\frac{n_i}{\hat{\nu_i}}) - \hat{\nu_i} - n_i,\]

follows a \(\chi^2\)-distribution with \(n_\mathrm{bins} - m\) degrees of freedom, where \(m\) is the number of unconstrained fit parameters.

Parameters:

data (np.ndarray) – Data bin counts. Shape is (num_bins,)
expectation (np.ndarray) – Expected bin counts. Shape is (num_bins,)
dof (int) – Degrees of freedom. This is the number of bins minus the number of free fit parameters.

Returns:

float – \(\chi^2/\mathrm{dof}\)
float – p-value.