API

User Interface

KernelDensityEstimation.kdeFunction
estim = kde(data;
            weights = nothing, method = MultiplicativeBiasKDE(),
            lo = nothing, hi = nothing, boundary = :open, bounds = nothing,
            bandwidth = ISJBandwidth(), bwratio = 8 nbins = nothing)

Calculate a discrete kernel density estimate (KDE) estim from samples data, optionally weighted by a corresponding vector of weights.

The default method of density estimation uses the MultiplicativeBiasKDE pipeline, which includes corrections for boundary effects and peak broadening which should be an acceptable default in many cases, but a different AbstractKDEMethod can be chosen if necessary.

The interval of the density estimate can be controlled by either the set of lo, hi, and boundary keywords or the bounds keyword, where the former are conveniences for setting bounds = (lo, hi, boundary). The minimum and maximum of v are used if lo and/or hi are nothing, respectively. (See also bounds.)

The KDE is constructed by first histogramming the input v into nbins many bins with outermost bin edges spanning lo to hi. The span of the histogram may be expanded outward based on boundary condition, dictating whether the boundaries are open or closed. The bwratio parameter is used to calculate nbins when it is not given and corresponds (approximately) to the ratio of the bandwidth to the width of each histogram bin.

Acceptable values of boundary are:

The histogram is then convolved with a Gaussian distribution with standard deviation bandwidth. The default bandwidth estimator is the Improved Sheather-Jones (ISJBandwidth) if no explicit bandwidth is given.

KernelDensityEstimation.MultivariateKDEType
MultivariateKDE{T, N, R<:Tuple{Vararg{AbstractRange,N}, V<:AbstractVector{T}} <: AbstractKDE{T, N}

Fields

  • axes::R: An N-tuple of the locations (bin centers) along the axes of the density estimate. Each axis is a range type with uniform step size.
  • density::V: An N-dimensional array with element type T of the density estimate values.

See also UnivariateKDE and BivariateKDE.

KernelDensityEstimation.UnivariateKDEType
UnivariateKDE{T, R, V} = MultivariateKDE{T, 1, Tuple{R}, V}

A simplifying alias of a 1-dimensional MultivariateKDE structure.

Properties

The following properties are defined to supplement the fields of the underlying MultivariateKDE struct.

  • x: An alias for the first (and only) axis; i.e. K.x == K.axes[1]
  • f: An alias for the name density (for backwards compatibility).
KernelDensityEstimation.BivariateKDEType
BivariateKDE{T, R, V} = MultivariateKDE{T, 2, R, V}

A simplifying alias of a 2-dimensional MultivariateKDE structure.

Properties

The following properties are defined to supplement the fields of the underlying MultivariateKDE struct.

  • x: An alias for the first axis; i.e. K.x == K.axes[1]
  • y: An alias for the first (and last) axis; i.e. K.y == K.axes[2]
  • f: An alias for the name density (for consistency with UnivariateKDE)
KernelDensityEstimation.BoundaryModule
@enum T Closed Open ClosedLeft ClosedRight
const OpenLeft = ClosedRight
const OpenRight = ClosedLeft

Enumeration to describe the desired boundary conditions of the domain of the kernel density estimate $K$. For some given data $d ∈ [a, b]$, the boundary conditions have the following impact:

  • Closed: The domain $K ∈ [a, b]$ is used directly as the bounds of the binning.
  • Open: The desired domain $K ∈ (-∞, +∞)$ is effectively achieved by widening the bounds of the data by the size of the finite convolution kernel. Specifically, the binning is defined over the range $[a - 8σ, b + 8σ]$ where $σ$ is the bandwidth of the Gaussian convolution kernel.
  • ClosedLeft: The left half-closed interval $K ∈ [a, +∞)$ is used as the bounds for binning by adjusting the upper limit to the range $[a, b + 8σ]$. The equivalent alias OpenRight may also be used.
  • ClosedRight: The right half-closed interval $K ∈ (-∞, b]$ is used as the bounds for binning by adjusting the lower limit to the range $[a - 8σ, b]$. The equivalent alias OpenLeft may also be used.

Advanced User Interface

KernelDensityEstimation.initFunction
data, weights, details = init(
        method::K, data::AbstractVector{T},
        weights::Union{Nothing,<:AbstractVector} = nothing;
        bounds = (nothing, nothing, Open),
        bwratio::Real = 1,
        nbins::Union{Nothing,<:Integer} = nothing,
        bandwidth::Union{<:Number,<:AbstractBandwidthEstimator} = ISJBandwidth(),
        kwargs...
    ) where {K<:AbstractKDEMethod, T}

Binning Methods

Density Estimation Methods

KernelDensityEstimation.BasicKDEType
BasicKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod

A baseline density estimation technique which convolves a binned dataset with a Gaussian kernel truncated at its $±4σ$ bounds.

Fields and Constructor Keywords

KernelDensityEstimation.LinearBoundaryKDEType
LinearBoundaryKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod

A method of KDE which applies the linear boundary correction of Jones and Foster [1] as described in Lewis [2] after BasicKDE density estimation. This correction primarily impacts the KDE near a closed boundary (see Boundary) and has the effect of improving any non-zero gradient at the boundary (when compared to normalization corrections which tend to leave the boundary too flat).

Fields and Constructor Keywords

KernelDensityEstimation.MultiplicativeBiasKDEType
MulitplicativeBiasKDE{B<:AbstractBinningKDE,M<:AbstractKDEMethod} <: AbstractKDEMethod

A method of KDE which applies the multiplicative bias correction described in Lewis [2]. This correction is designed to reduce the broadening of peaks inherent to kernel convolution by using a pilot KDE to flatten the distribution and run a second iteration of density estimation (since a perfectly uniform distribution cannot be broadened further).

Fields and Constructor Keywords

Note that if the given method has a configurable binning type, it is ignored in favor of the explicit binning chosen.

Bandwidth Estimators

KernelDensityEstimation.SilvermanBandwidthType
SilvermanBandwidth <: AbstractBandwidthEstimator

Estimates the necessary bandwidth of a vector of data $v$ using Silverman's Rule for a Gaussian smoothing kernel:

\[ h = \left(\frac{4}{3n_\mathrm{eff}}\right)^{1/5} σ̂\]

where $n_\mathrm{eff}$ is the effective number of degrees of freedom of $v$, and $σ̂^2$ is its sample variance.

See also ISJBandwidth

Extended help

The sample variance and effective number of degrees of freedom are calculated using weighted statistics, where the latter is defined to be Kish's effective sample size $n_\mathrm{eff} = (\sum_i w_i)^2 / \sum_i w_i^2$ for weights $w_i$. For uniform weights, this reduces to the length of the vector $v$.

References

KernelDensityEstimation.ISJBandwidthType
ISJBandwidth <: AbstractBandwidthEstimator

Estimates the necessary bandwidth of a vector of data $v$ using the Improved Sheather-Jones (ISJ) plug-in estimator of Botev et al. [4].

This estimator is more capable of choosing an appropriate bandwidth for bimodal (and other highly non-Gaussian) distributions, but comes at the expense of greater computation time and no guarantee that the estimator converges when given very few data points.

See also SilvermanBandwidth

Fields

  • binning::AbstractBinningKDE: The binning type to apply to a data vector as the first step of bandwidth estimation. Defaults to HistogramBinning().

  • bwratio::Int: The relative resolution of the binned data used by the ISJ plug-in estimator — there are bwratio bins per interval of size $h₀$, where the intial rough initial bandwidth estimate is given by the SilvermanBandwidth estimator. Defaults to 2.

  • niter::Int: The number of iterations to perform in the plug-in estimator. Defaults to 7, in accordance with Botev et. al. who state that higher orders show little benefit.

  • fallback::Bool: Whether to fallback to the SilvermanBandwidth if the ISJ estimator fails to converge. If false, an exception is thrown instead.

References

KernelDensityEstimation.bandwidthFunction
h = bandwidth(estimator::AbstractBandwidthEstimator, data::AbstractVector{T}
              lo::T, hi::T, boundary::Boundary.T;
              weights::Union{Nothing, <:AbstractVector} = nothing
              ) where {T}

Determine the appropriate bandwidth h of the data set data (optionally with corresponding weights) using chosen estimator algorithm. The bandwidth is provided the range (lo through hi) and boundary style (boundary) of the request KDE method for use in filtering and/or correctly interpreting the data, if necessary.


Interfaces

Density Estimation Methods

KernelDensityEstimation.MultivariateKDEInfoType
MultivariateKDEInfo{U,N} <: AbstractKDEInfo{U,N}

Information about the density estimation process, providing insight into both the entrypoint parameters and some internal state variables.

Extended help

Type parameters

  • U: A unitless element type compatible with the density estimate.
  • N: The dimensionality of the density estimate.

Fields

  • method::AbstractKDEMethod: The estimation method used to generate the KDE.

  • bounds::Any: The bounds specification of the estimate as passed to init(), prior to making it concrete via calling bounds().

  • domain::Union{Nothing, Tuple{Vararg{Tuple{Ei,Ei,Boundary.T} where Ei,N}}}: A tuple of the concrete range and boundary conditions of the density estimate axes after calling bounds() with the value of the .bounds field before adding any requisite padding for open boundary conditions.

  • bwratio::Union{Nothing, NTuple{N,U}}: The ratio between the bandwidth and the width of a histogram bin for each axis, used only when the number of bins is not explicitly provided.

  • nbins::Union{Nothing,NTuple{N,Int}}: The number of requested bins along each axis. If nothing, then the number of bins is calculated using the padded domain of the density estimate, the bandwidth, and the ratio .bwratio.

  • neffective::U: Kish's effective sample size of the data, which equals the number of samples for uniformly weighted data.

  • bandwidth_alg::Union{Nothing,AbstractBandwidthEstimator}: Algorithm used to estimate an appropriate bandwidth, if a concrete value was not provided to the estimator, otherwise nothing.

  • bandwidth::Union{Nothing,<:AbstractMatrix{U}}: The bandwidth (square root of covariance) of the convolution kernel.

  • kernel::Union{Nothing,MultivariateKDE{U,N}}: The convolution kernel used to smooth the density estimate.

KernelDensityEstimation.boundsFunction
lo, hi, bc = bounds(x, spec)

Determine the appropriate interval from lo to hi with boundary condition bc given the data vector x and bounds specification spec.

Packages may specialize this method on the spec argument to modify the behavior of the interval and boundary refinement for new argument types.

KernelDensityEstimation.estimateFunction
estim, info = estimate(method::AbstractKDEMethod, data::AbstractVector, weights::Union{Nothing, AbstractVector}; kwargs...)
estim, info = estimate(method::AbstractKDEMethod, data::AbstractKDE, info::AbstractKDEInfo; kwargs...)

Apply the kernel density estimation algorithm method to the given data, either in the form of a vector of data (and optionally with corresponding vector of weights) or a prior density estimate and its corresponding pipeline info (to support being part of a processing pipeline).

Returns

  • estim::AbstractKDE: The resultant kernel density estimate.
  • info::AbstractKDEInfo: Auxiliary information describing details of the density estimation either useful or necessary for constructing a pipeline of processing steps.
KernelDensityEstimation.estimator_orderFunction
p = estimator_order(::Type{<:AbstractKDEMethod})

The bias scaning of the density estimator method, where a return value of p corresponds to bandwidth-dependent biases of the order $\mathcal{O}(h^{2p})$.