Jackknife technique in Machine Learning

4 min readApr 15, 2021

Image Source: https://www.stockvault.net/

Introduction

The Jackknife or “leave one out” procedure is a cross-validation technique first developed by Maurice Quenouille (1949) to estimate the bias of an estimator. John W. Tukey then expanded the use of the jackknife to include variance estimation (1958) and tailored the name of jackknife because like a jackknife — a pocketknife akin to a Swiss army knife and typically used by boy scouts. Curiously, despite its remarkable influence on the statistical community, the seminal work of Tukey is available only from an abstract (which does not even mention the name of jackknife) and from an almost impossible to find unpublished note.

Methodology

The jackknife estimation of a parameter is an iterative process. First the parameter is estimated from the whole sample. Then each element is, in turn, dropped from the sample and the parameter of interest is estimated from this smaller sample. This estimation is called a partial estimate (or also a jackknife replication). A pseudo-value is then computed as the difference between the whole sample estimate and the partial estimate. These pseudo-values reduce the (linear) bias of the partial estimate (because the bias is eliminated by the subtraction between the two estimates). The pseudo-values are then used in lieu of the original values to estimate the parameter of interest and their standard deviation is used to estimate the parameter standard error which can then be used for null hypothesis testing and for computing confidence intervals. The jackknife is strongly related to the bootstrap (i.e., the jackknife is often a linear approximation of the bootstrap) which is currently the main technique for computational estimation of population parameters.

Assumptions

Although the jackknife makes no assumptions about the shape of the underlying probability distribution, it requires that the observations are independent of each other and are identically distributed. This implies that the Jackknife technique should not be applied to Time Series data — which are often identically but not independently distributed. Things that happen in one period tend to influence things that happen in the next — which is another way of saying that most time series processes are correlated over time, which is a violation of the independence assumption. When the independence assumption is violated, the jackknife underestimates the variance in the data set which makes the data look more reliable than they actually are.

Jackknife works appropriately for statistics which are linear functions of the parameters or the data , and whose distribution is continuous or at least “smooth enough” to be considered as such. Non-linear or non-continuous statistics such as the median will give spurious results even if we use any transformations.

Clearing the Air

A potential source of confusion, a somewhat different (but related) method LOOCV — Leave One Out Cross-Validation, also called jackknife is used to evaluate the quality of the prediction of computational models built to predict the value of dependent variable(s) from a set of independent variable(s). These models typically use a very large number of parameters (frequently more parameters than observations) and are therefore highly prone to over-fitting . The jackknife can be used to estimate the actual predictive power of such models by predicting the dependent variable values of each observation as if this observation were a new observation. In order to do so, the predicted value(s) of each observation is (are) obtained from the model built on
the sample of observations minus the observation to be predicted. The jackknife, in this context, is a procedure which is used to obtain an unbiased prediction (i.e., a random effect) and to minimize the risk of over-fitting. LOOCV is an extension of the Jackknife technique. In LOOCV we compute a statistic on the left-out sample(s) and we predict the left-out sample(s) by building a model on the remaining samples. In Jackknife, we only compute a statistic from remaining samples.

Pseudo-code for Implementation

Let us see the pseudo-code of using Jackknife to estimate the slope and intercept of a simple regression model:

Estimate the slope and intercept using all available data.
Leave out 1 observation and estimate the slope and intercept (also known as the “partial estimate” of the coefficients).
Calculate the difference between the “partial estimate” and the “all data” estimate of the slope and the intercept (also know as the “pseudo value” of the coefficients).
Repeat steps 2 & 3 for the entire data set.
Compute the mean of the pseudo values for each coefficient

This mean is the jackknife estimate of the slope and intercept of the model.

I hope this post clears the air around this amazing technique and also bring to light the essence and basis of techniques such as LOOCV, while giving due credit to the developers of this “pocket machete” — Jackknife.
I would like to thank Sir Herve Abdi and Sir Lynne J. Williams for their crisp and concise paper on this technique, which motivated me to write this article.

Thanks for reading! Stay safe!