Kernel Density Estimation
KernelDensityEstimation.jl is a package for calculating kernel density estimates from vectors of data. Its main features (and limitations) are:
- Uses a Gaussian kernel for smoothing (truncated at $4σ$).
- Supports closed boundaries.
- Supports processing weighted samples.
- Provides higher-order estimators to better capture variation in width and slope of distributions.
- A more sophisticated bandwidth estimator than the typical Silverman rule (only for 1D).
This package largely implements the algorithms described by Lewis (2019)[1] (and its corresponding Python package, GetDist).
Why another kernel density estimation package?
As of Mar 2026, much of the Julia ecosystem uses the KernelDensity.jl package (possibly implicitly, such as through density plots in Makie.jl, StatsPlots.jl, etc).
Consider the following (toy) examples: one case where we have samples drawn from a Gaussian distribution, and a second where we restrict to positive values. We'll also consider their joint distribution in 2D.
using Random
using Distributions
# A Gaussian distribution and sample of random deviates
x, dist = -5.0:0.01:5.0, Normal(0.0, 1.0)
exp_gauss = pdf.(dist, x)
rv_gauss = rand(dist, 500)
# Then the truncated distribution and more samples
tdist = truncated(dist, lower = 0.0)
exp_trunc = pdf.(tdist, x)
rv_trunc = rand(tdist, 500)
# the joint distribution
jdist = product_distribution(dist, tdist)
exp_joint = map(c -> pdf(jdist, [c...]), Iterators.product(x, x))If we then plot the outputs of running the KernelDensity.kde method on the individual and joint data sets:
import KernelDensity as KD
kd_gauss = KD.kde(rv_gauss)
kd_trunc = KD.kde(rv_trunc)
kd_joint = KD.kde((rv_gauss, rv_trunc))Figure 1: Kernel density estimates (solid blue) for the full (top left), truncated (lower right), and joint (lower left) Gaussian samples as produced using the default settings from KernelDensity.jl. Each can be compared to the equivalent theory distribution (dashed red), where the 68% and 95% contour levels are indicated in the 2D plane.
Plotting Code
fig = draw_densities(kd_gauss, kd_trunc, kd_joint)
Label(fig[0, :], "KernelDensity.jl", font = :bold, fontsize = 20)
resize_to_layout!(fig)For the Gaussian distribution (top left) where there are no edges, the density estimate appears to be a reasonable approximation of the known Gaussian distribution. In comparison, though, the truncated Gaussian distribution (lower right) fails to represent the hard cut-off at $y = 0$, instead "leaking" below zero with non-zero density despite the known closed boundary. The joint distribution (lower left) likewise has non-zero density in the excluded region.
Closed boundaries are common among many probability distributions,[bounded] and therefore the need to estimate a density corresponding to a (semi-)bounded distribution arises often. This package provides a density estimator that uses any provided boundary conditions to account for edge boundary effects, reproducing a more faithful representation of the underlying distribution.
Repeating the density estimation on the Gaussian and truncated Gaussian distributions shown above instead with this package's kde method:
import KernelDensityEstimation as KDE
kde_gauss = KDE.kde(rv_gauss)
kde_trunc = KDE.kde(rv_trunc, lo = 0.0, boundary = :closedleft)
kde_joint = KDE.kde(rv_gauss, rv_trunc, bounds = ((-Inf, Inf), (0.0, Inf, :closedleft)))Figure 2: Kernel density estimates using the same data as Figure 1 but now processed with this package, including additional information about the boundary conditions of the distribution.
Plotting Code
fig = draw_densities(kde_gauss, kde_trunc, kde_joint)
Label(fig[0, :], "KernelDensityEstimation.jl", font = :bold, fontsize = 20)
resize_to_layout!(fig)Most obviously, the truncated distribution retains its closed boundary condition at $y = 0$ and does not suffer from the leakage and suppression of the peak that occurs with the KernelDensity estimator. Furthermore, all density curves are smoother due to use of higher-order estimators which simultaneously permit using [relatively] wider bandwidth kernels while retaining the shapes of peaks (and non-flat slopes at closed boundaries).
- 1A. Lewis. GetDist: a Python package for analysing Monte Carlo samples (2019), arXiv:1910.13970.
- boundedFor example, see the list of distributions with bounded and semi-infinite support on Wikipedia.