API

User Interface
Advanced User Interface
Interfaces

User Interface

KernelDensityEstimation.kde — Function

estim = kde(data;
            weights = nothing, method = MultiplicativeBiasKDE(),
            lo = nothing, hi = nothing, boundary = :open, bounds = nothing,
            bandwidth = ISJBandwidth(), bwratio = 8 nbins = nothing)

Calculate a discrete kernel density estimate (KDE) estim from samples data, optionally weighted by a corresponding vector of weights.

The default method of density estimation uses the MultiplicativeBiasKDE pipeline, which includes corrections for boundary effects and peak broadening which should be an acceptable default in many cases, but a different AbstractKDEMethod can be chosen if necessary.

The interval of the density estimate can be controlled by either the set of lo, hi, and boundary keywords or the bounds keyword, where the former are conveniences for setting bounds = (lo, hi, boundary). The minimum and maximum of v are used if lo and/or hi are nothing, respectively. (See also bounds.)

The KDE is constructed by first histogramming the input v into nbins many bins with outermost bin edges spanning lo to hi. The span of the histogram may be expanded outward based on boundary condition, dictating whether the boundaries are open or closed. The bwratio parameter is used to calculate nbins when it is not given and corresponds (approximately) to the ratio of the bandwidth to the width of each histogram bin.

Acceptable values of boundary are:

:open or Open
:closed or Closed
:closedleft, :openright, ClosedLeft, or OpenRight
:closedright, :openleft, ClosedRight, or OpenLeft

The histogram is then convolved with a Gaussian distribution with standard deviation bandwidth. The default bandwidth estimator is the Improved Sheather-Jones (ISJBandwidth) if no explicit bandwidth is given.

KernelDensityEstimation.UnivariateKDE — Type

UnivariateKDE{T,U,R<:AbstractRange{T},V<:AbstractVector{U}} <: AbstractKDE{T}

Fields

x::R: The locations (bin centers) of the corresponding density estimate values.
f::V: The density estimate values.

KernelDensityEstimation.Boundary — Module

@enum T Closed Open ClosedLeft ClosedRight
const OpenLeft = ClosedRight
const OpenRight = ClosedLeft

Enumeration to describe the desired boundary conditions of the domain of the kernel density estimate $K$. For some given data $d ∈ [a, b]$, the boundary conditions have the following impact:

Closed: The domain $K ∈ [a, b]$ is used directly as the bounds of the binning.
Open: The desired domain $K ∈ (-∞, +∞)$ is effectively achieved by widening the bounds of the data by the size of the finite convolution kernel. Specifically, the binning is defined over the range $[a - 8σ, b + 8σ]$ where $σ$ is the bandwidth of the Gaussian convolution kernel.
ClosedLeft: The left half-closed interval $K ∈ [a, +∞)$ is used as the bounds for binning by adjusting the upper limit to the range $[a, b + 8σ]$. The equivalent alias OpenRight may also be used.
ClosedRight: The right half-closed interval $K ∈ (-∞, b]$ is used as the bounds for binning by adjusting the lower limit to the range $[a - 8σ, b]$. The equivalent alias OpenLeft may also be used.

Advanced User Interface

KernelDensityEstimation.init — Function

data, weights, details = init(
        method::K, data::AbstractVector{T},
        weights::Union{Nothing,<:AbstractVector} = nothing;
        lo::Union{Nothing,<:Number} = nothing,
        hi::Union{Nothing,<:Number} = nothing,
        boundary::Union{Symbol,Boundary.T} = :open,
        bounds = nothing,
        bandwidth::Union{<:Number,<:AbstractBandwidthEstimator} = ISJBandwidth(),
        bwratio::Real = 1,
        nbins::Union{Nothing,<:Integer} = nothing,
        kwargs...
    ) where {K<:AbstractKDEMethod, T}

Binning Methods

KernelDensityEstimation.AbstractBinningKDE — Type

AbstractBinningKDE <: AbstractKDEMethod

The abstract supertype of data binning methods which are the first step in the density estimation process. The two supported binning methods are HistogramBinning and LinearBinning.

KernelDensityEstimation.HistogramBinning — Type

struct HistogramBinning <: AbstractBinningKDE end

Base case which generates a density estimate by histogramming the data.

Density Estimation Methods

KernelDensityEstimation.BasicKDE — Type

BasicKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod

A baseline density estimation technique which convolves a binned dataset with a Gaussian kernel truncated at its $±4σ$ bounds.

Fields and Constructor Keywords

binning::AbstractBinningKDE: The binning type to apply to a data vector as the first step of density estimation. Defaults to HistogramBinning().

KernelDensityEstimation.LinearBoundaryKDE — Type

LinearBoundaryKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod

A method of KDE which applies the linear boundary correction of Jones and Foster [1] as described in Lewis [2] after BasicKDE density estimation. This correction primarily impacts the KDE near a closed boundary (see Boundary) and has the effect of improving any non-zero gradient at the boundary (when compared to normalization corrections which tend to leave the boundary too flat).

Fields and Constructor Keywords

binning::AbstractBinningKDE: The binning type to apply to a data vector as the first step of density estimation. Defaults to HistogramBinning().

KernelDensityEstimation.MultiplicativeBiasKDE — Type

MulitplicativeBiasKDE{B<:AbstractBinningKDE,M<:AbstractKDEMethod} <: AbstractKDEMethod

A method of KDE which applies the multiplicative bias correction described in Lewis [2]. This correction is designed to reduce the broadening of peaks inherent to kernel convolution by using a pilot KDE to flatten the distribution and run a second iteration of density estimation (since a perfectly uniform distribution cannot be broadened further).

Fields and Constructor Keywords

binning::AbstractBinningKDE: The binning type to apply to a data vector as the first step of density estimation. Defaults to HistogramBinning().
method::AbstractKDEMethod: The KDE method to use for the pilot and iterative density estimation. Defaults to LinearBoundaryKDE().

Note that if the given method has a configurable binning type, it is ignored in favor of the explicit binning chosen.

Bandwidth Estimators

KernelDensityEstimation.AbstractBandwidthEstimator — Type

AbstractBandwidthEstimator

Abstract supertype of kernel bandwidth estimation techniques.

KernelDensityEstimation.SilvermanBandwidth — Type

SilvermanBandwidth <: AbstractBandwidthEstimator

Estimates the necessary bandwidth of a vector of data $v$ using Silverman's Rule for a Gaussian smoothing kernel:

\[ h = \left(\frac{4}{3n_\mathrm{eff}}\right)^{1/5} σ̂\]

where $n_\mathrm{eff}$ is the effective number of degrees of freedom of $v$, and $σ̂^2$ is its sample variance.

Interfaces

Density Estimation Methods

KernelDensityEstimation.AbstractKDE — Type

AbstractKDE{T}

Abstract supertype of kernel density estimates.

See also UnivariateKDEInfo

KernelDensityEstimation.UnivariateKDEInfo — Type

UnivariateKDEInfo{T} <: AbstractKDEInfo{T}

Information about the density estimation process, providing insight into both the entrypoint parameters and some internal state variables.

Extended help

Fields

method::AbstractKDEMethod: The estimation method used to generate the KDE.
bounds::Any: The bounds specification of the estimate as passed to init(), prior to making it concrete via calling bounds(). Defaults to nothing.
interval::Tuple{T,T}: The concrete interval of the density estimate after calling bounds() with the value of the .bounds field but before adding requisite padding for open boundary conditions. Defaults to (zero(T), zero(T)).
boundary::Boundary.T: The concrete boundary condition assumed in the density estimate after calling boundary() with the value of the .bounds field. Defaults to Open.
neffective::T: Kish's effective sample size of the data, which equals the length of the original data vector for uniformly weighted samples. Defaults to NaN.
bandwidth_alg::Union{Nothing,AbstractBandwidthEstimator}: Algorithm used to estimate an appropriate bandwidth, if a concrete value was not provided to the estimator, otherwise nothing. Defaults to nothing.
bandwidth::T: The bandwidth of the convolution kernel. Defaults to zero(T).
bwratio::T: The ratio between the bandwidth and the width of a histogram bin, used only when the number of bins .nbins is not explicitly provided. Defaults to one(T).
lo::T: The lower edge of the first bin in the density estimate, after possibly adjusting for an open boundary condition compared to the .interval field. Defaults to zero(T).
hi::T: The upper edge of the last bin in the density estimate, after possibly adjusting for an open boundary condition compared to the .interval field. Defaults to zero(T).
nbins::Int: The number of bins used in the histogram at the beinning of the density estimatation. Defaults to -1.
kernel::Union{Nothing,UnivariateKDE{T}}: The convolution kernel used to process the density estimate. Defaults to nothing.

KernelDensityEstimation.AbstractKDEMethod — Type

AbstractKDEMethod

The abstract supertype of all kernel density estimation methods, including the data binning process (see AbstractBinningKDE) and subsequent density estimation techniques (such as BasicKDE).

KernelDensityEstimation.boundary — Function

B = boundary(spec)

Convert the specification spec to a boundary style B.

Packages may specialize this method on the spec argument to modify the behavior of the boundary inference for new argument types.

KernelDensityEstimation.bounds — Function

lo, hi, boundary = bounds(data::AbstractVector{T}, spec) where {T}

Determine the appropriate interval, from lo to hi with boundary style boundary, for the density estimate, given the data vector data and KDE argument bounds.

Packages may specialize this method on the spec argument to modify the behavior of the interval and boundary refinement for new argument types.

KernelDensityEstimation.estimate — Function

estim, info = estimate(method::AbstractKDEMethod, data::AbstractVector, weights::Union{Nothing, AbstractVector}; kwargs...)
estim, info = estimate(method::AbstractKDEMethod, data::AbstractKDE, info::AbstractKDEInfo; kwargs...)

Apply the kernel density estimation algorithm method to the given data, either in the form of a vector of data (and optionally with corresponding vector of weights) or a prior density estimate and its corresponding pipeline info (to support being part of a processing pipeline).

Returns

estim::AbstractKDE: The resultant kernel density estimate.
info::AbstractKDEInfo: Auxiliary information describing details of the density estimation either useful or necessary for constructing a pipeline of processing steps.

KernelDensityEstimation.estimator_order — Function

p = estimator_order(::Type{<:AbstractKDEMethod})

The bias scaning of the density estimator method, where a return value of p corresponds to bandwidth-dependent biases of the order $\mathcal{O}(h^{2p})$.