API

User Interface

KernelDensityEstimation.kdeFunction
estim = kde(data;
            weights = nothing, method = MultiplicativeBiasKDE(),
            lo = nothing, hi = nothing, boundary = :open, bounds = nothing,
            bandwidth = ISJBandwidth(), bwratio = 8 nbins = nothing)

Calculate a discrete kernel density estimate (KDE) estim from samples data, optionally weighted by a corresponding vector of weights.

The default method of density estimation uses the MultiplicativeBiasKDE pipeline, which includes corrections for boundary effects and peak broadening which should be an acceptable default in many cases, but a different AbstractKDEMethod can be chosen if necessary.

The interval of the density estimate can be controlled by either the set of lo, hi, and boundary keywords or the bounds keyword, where the former are conveniences for setting bounds = (lo, hi, boundary). The minimum and maximum of v are used if lo and/or hi are nothing, respectively. (See also bounds.)

The KDE is constructed by first histogramming the input v into nbins many bins with outermost bin edges spanning lo to hi. The span of the histogram may be expanded outward based on boundary condition, dictating whether the boundaries are open or closed. The bwratio parameter is used to calculate nbins when it is not given and corresponds (approximately) to the ratio of the bandwidth to the width of each histogram bin.

Acceptable values of boundary are:

The histogram is then convolved with a Gaussian distribution with standard deviation bandwidth. The default bandwidth estimator is the Improved Sheather-Jones (ISJBandwidth) if no explicit bandwidth is given.

KernelDensityEstimation.UnivariateKDEType
UnivariateKDE{T,U,R<:AbstractRange{T},V<:AbstractVector{U}} <: AbstractKDE{T}

Fields

  • x::R: The locations (bin centers) of the corresponding density estimate values.
  • f::V: The density estimate values.
KernelDensityEstimation.BoundaryModule
@enum T Closed Open ClosedLeft ClosedRight
const OpenLeft = ClosedRight
const OpenRight = ClosedLeft

Enumeration to describe the desired boundary conditions of the domain of the kernel density estimate $K$. For some given data $d ∈ [a, b]$, the boundary conditions have the following impact:

  • Closed: The domain $K ∈ [a, b]$ is used directly as the bounds of the binning.
  • Open: The desired domain $K ∈ (-∞, +∞)$ is effectively achieved by widening the bounds of the data by the size of the finite convolution kernel. Specifically, the binning is defined over the range $[a - 8σ, b + 8σ]$ where $σ$ is the bandwidth of the Gaussian convolution kernel.
  • ClosedLeft: The left half-closed interval $K ∈ [a, +∞)$ is used as the bounds for binning by adjusting the upper limit to the range $[a, b + 8σ]$. The equivalent alias OpenRight may also be used.
  • ClosedRight: The right half-closed interval $K ∈ (-∞, b]$ is used as the bounds for binning by adjusting the lower limit to the range $[a - 8σ, b]$. The equivalent alias OpenLeft may also be used.

Advanced User Interface

KernelDensityEstimation.initFunction
data, weights, details = init(
        method::K, data::AbstractVector{T},
        weights::Union{Nothing,<:AbstractVector} = nothing;
        lo::Union{Nothing,<:Number} = nothing,
        hi::Union{Nothing,<:Number} = nothing,
        boundary::Union{Symbol,Boundary.T} = :open,
        bounds = nothing,
        bandwidth::Union{<:Number,<:AbstractBandwidthEstimator} = ISJBandwidth(),
        bwratio::Real = 1,
        nbins::Union{Nothing,<:Integer} = nothing,
        kwargs...
    ) where {K<:AbstractKDEMethod, T}

Binning Methods

Density Estimation Methods

KernelDensityEstimation.BasicKDEType
BasicKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod

A baseline density estimation technique which convolves a binned dataset with a Gaussian kernel truncated at its $±4σ$ bounds.

Fields and Constructor Keywords

KernelDensityEstimation.LinearBoundaryKDEType
LinearBoundaryKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod

A method of KDE which applies the linear boundary correction of Jones and Foster [1] as described in Lewis [2] after BasicKDE density estimation. This correction primarily impacts the KDE near a closed boundary (see Boundary) and has the effect of improving any non-zero gradient at the boundary (when compared to normalization corrections which tend to leave the boundary too flat).

Fields and Constructor Keywords

KernelDensityEstimation.MultiplicativeBiasKDEType
MulitplicativeBiasKDE{B<:AbstractBinningKDE,M<:AbstractKDEMethod} <: AbstractKDEMethod

A method of KDE which applies the multiplicative bias correction described in Lewis [2]. This correction is designed to reduce the broadening of peaks inherent to kernel convolution by using a pilot KDE to flatten the distribution and run a second iteration of density estimation (since a perfectly uniform distribution cannot be broadened further).

Fields and Constructor Keywords

Note that if the given method has a configurable binning type, it is ignored in favor of the explicit binning chosen.

Bandwidth Estimators

KernelDensityEstimation.SilvermanBandwidthType
SilvermanBandwidth <: AbstractBandwidthEstimator

Estimates the necessary bandwidth of a vector of data $v$ using Silverman's Rule for a Gaussian smoothing kernel:

\[ h = \left(\frac{4}{3n_\mathrm{eff}}\right)^{1/5} σ̂\]

where $n_\mathrm{eff}$ is the effective number of degrees of freedom of $v$, and $σ̂^2$ is its sample variance.

See also ISJBandwidth

Extended help

The sample variance and effective number of degrees of freedom are calculated using weighted statistics, where the latter is defined to be Kish's effective sample size $n_\mathrm{eff} = (\sum_i w_i)^2 / \sum_i w_i^2$ for weights $w_i$. For uniform weights, this reduces to the length of the vector $v$.

References

KernelDensityEstimation.ISJBandwidthType
ISJBandwidth <: AbstractBandwidthEstimator

Estimates the necessary bandwidth of a vector of data $v$ using the Improved Sheather-Jones (ISJ) plug-in estimator of Botev et al. [4].

This estimator is more capable of choosing an appropriate bandwidth for bimodal (and other highly non-Gaussian) distributions, but comes at the expense of greater computation time and no guarantee that the estimator converges when given very few data points.

See also SilvermanBandwidth

Fields

  • binning::AbstractBinningKDE: The binning type to apply to a data vector as the first step of bandwidth estimation. Defaults to HistogramBinning().

  • bwratio::Int: The relative resolution of the binned data used by the ISJ plug-in estimator — there are bwratio bins per interval of size $h₀$, where the intial rough initial bandwidth estimate is given by the SilvermanBandwidth estimator. Defaults to 2.

  • niter::Int: The number of iterations to perform in the plug-in estimator. Defaults to 7, in accordance with Botev et. al. who state that higher orders show little benefit.

  • fallback::Bool: Whether to fallback to the SilvermanBandwidth if the ISJ estimator fails to converge. If false, an exception is thrown instead.

References

KernelDensityEstimation.bandwidthFunction
h = bandwidth(estimator::AbstractBandwidthEstimator, data::AbstractVector{T}
              lo::T, hi::T, boundary::Boundary.T;
              weights::Union{Nothing, <:AbstractVector} = nothing
              ) where {T}

Determine the appropriate bandwidth h of the data set data (optionally with corresponding weights) using chosen estimator algorithm. The bandwidth is provided the range (lo through hi) and boundary style (boundary) of the request KDE method for use in filtering and/or correctly interpreting the data, if necessary.


Interfaces

Density Estimation Methods

KernelDensityEstimation.UnivariateKDEInfoType
UnivariateKDEInfo{T} <: AbstractKDEInfo{T}

Information about the density estimation process, providing insight into both the entrypoint parameters and some internal state variables.

Extended help

Fields

  • method::AbstractKDEMethod: The estimation method used to generate the KDE.

  • bounds::Any: The bounds specification of the estimate as passed to init(), prior to making it concrete via calling bounds(). Defaults to nothing.

  • interval::Tuple{T,T}: The concrete interval of the density estimate after calling bounds() with the value of the .bounds field but before adding requisite padding for open boundary conditions. Defaults to (zero(T), zero(T)).

  • boundary::Boundary.T: The concrete boundary condition assumed in the density estimate after calling boundary() with the value of the .bounds field. Defaults to Open.

  • neffective::T: Kish's effective sample size of the data, which equals the length of the original data vector for uniformly weighted samples. Defaults to NaN.

  • bandwidth_alg::Union{Nothing,AbstractBandwidthEstimator}: Algorithm used to estimate an appropriate bandwidth, if a concrete value was not provided to the estimator, otherwise nothing. Defaults to nothing.

  • bandwidth::T: The bandwidth of the convolution kernel. Defaults to zero(T).

  • bwratio::T: The ratio between the bandwidth and the width of a histogram bin, used only when the number of bins .nbins is not explicitly provided. Defaults to one(T).

  • lo::T: The lower edge of the first bin in the density estimate, after possibly adjusting for an open boundary condition compared to the .interval field. Defaults to zero(T).

  • hi::T: The upper edge of the last bin in the density estimate, after possibly adjusting for an open boundary condition compared to the .interval field. Defaults to zero(T).

  • nbins::Int: The number of bins used in the histogram at the beinning of the density estimatation. Defaults to -1.

  • kernel::Union{Nothing,UnivariateKDE{T}}: The convolution kernel used to process the density estimate. Defaults to nothing.

KernelDensityEstimation.boundaryFunction
B = boundary(spec)

Convert the specification spec to a boundary style B.

Packages may specialize this method on the spec argument to modify the behavior of the boundary inference for new argument types.

KernelDensityEstimation.boundsFunction
lo, hi, boundary = bounds(data::AbstractVector{T}, spec) where {T}

Determine the appropriate interval, from lo to hi with boundary style boundary, for the density estimate, given the data vector data and KDE argument bounds.

Packages may specialize this method on the spec argument to modify the behavior of the interval and boundary refinement for new argument types.

KernelDensityEstimation.estimateFunction
estim, info = estimate(method::AbstractKDEMethod, data::AbstractVector, weights::Union{Nothing, AbstractVector}; kwargs...)
estim, info = estimate(method::AbstractKDEMethod, data::AbstractKDE, info::AbstractKDEInfo; kwargs...)

Apply the kernel density estimation algorithm method to the given data, either in the form of a vector of data (and optionally with corresponding vector of weights) or a prior density estimate and its corresponding pipeline info (to support being part of a processing pipeline).

Returns

  • estim::AbstractKDE: The resultant kernel density estimate.
  • info::AbstractKDEInfo: Auxiliary information describing details of the density estimation either useful or necessary for constructing a pipeline of processing steps.
KernelDensityEstimation.estimator_orderFunction
p = estimator_order(::Type{<:AbstractKDEMethod})

The bias scaning of the density estimator method, where a return value of p corresponds to bandwidth-dependent biases of the order $\mathcal{O}(h^{2p})$.