API
User Interface
KernelDensityEstimation.kde
— Functionestim = kde(v;
method = MultiplicativeBiasKDE(),
lo = nothing, hi = nothing, boundary = :open, bounds = nothing,
bandwidth = ISJBandwidth(), bwratio = 8 nbins = nothing)
Calculate a discrete kernel density estimate (KDE) f(x)
from samples v
.
The default method
of density estimation uses the MultiplicativeBiasKDE
pipeline, which includes corrections for boundary effects and peak broadening which should be an acceptable default in many cases, but a different AbstractKDEMethod
can be chosen if necessary.
The interval of the density estimate can be controlled by either the set of lo
, hi
, and boundary
keywords or the bounds
keyword, where the former are conveniences for setting bounds = (lo, hi, boundary)
. The minimum and maximum of v
are used if lo
and/or hi
are nothing
, respectively. (See also bounds
.)
The KDE is constructed by first histogramming the input v
into nbins
many bins with outermost bin edges spanning lo
to hi
. The span of the histogram may be expanded outward based on boundary
condition, dictating whether the boundaries are open or closed. The bwratio
parameter is used to calculate nbins
when it is not given and corresponds (approximately) to the ratio of the bandwidth to the width of each histogram bin.
Acceptable values of boundary
are:
:open
orOpen
:closed
orClosed
:closedleft
,:openright
,ClosedLeft
, orOpenRight
:closedright
,:openleft
,ClosedRight
, orOpenLeft
The histogram is then convolved with a Gaussian distribution with standard deviation bandwidth
. The default bandwidth estimator is the Improved Sheather-Jones (ISJBandwidth
) if no explicit bandwidth is given.
KernelDensityEstimation.UnivariateKDE
— TypeUnivariateKDE{T,U,R<:AbstractRange{T},V<:AbstractVector{U}} <: AbstractKDE{T}
Fields
x::R
: The locations (bin centers) of the corresponding density estimate values.f::V
: The density estimate values.
KernelDensityEstimation.Boundary
— Module@enum T Closed Open ClosedLeft ClosedRight
const OpenLeft = ClosedRight
const OpenRight = ClosedLeft
Enumeration to describe the desired boundary conditions of the domain of the kernel density estimate $K$. For some given data $d ∈ [a, b]$, the boundary conditions have the following impact:
Closed
: The domain $K ∈ [a, b]$ is used directly as the bounds of the binning.Open
: The desired domain $K ∈ (-∞, +∞)$ is effectively achieved by widening the bounds of the data by the size of the finite convolution kernel. Specifically, the binning is defined over the range $[a - 8σ, b + 8σ]$ where $σ$ is the bandwidth of the Gaussian convolution kernel.ClosedLeft
: The left half-closed interval $K ∈ [a, +∞)$ is used as the bounds for binning by adjusting the upper limit to the range $[a, b + 8σ]$. The equivalent aliasOpenRight
may also be used.ClosedRight
: The right half-closed interval $K ∈ (-∞, b]$ is used as the bounds for binning by adjusting the lower limit to the range $[a - 8σ, b]$. The equivalent aliasOpenLeft
may also be used.
Advanced User Interface
KernelDensityEstimation.init
— Functiondata, details = init(method::K, data::AbstractVector{T};
lo::Union{Nothing,<:Number} = nothing,
hi::Union{Nothing,<:Number} = nothing,
boundary::Union{Symbol,Boundary.T} = :open,
bounds = nothing,
bandwidth::Union{<:Number,<:AbstractBandwidthEstimator} = ISJBandwidth(),
bwratio::Real = 1,
nbins::Union{Nothing,<:Integer} = nothing,
kwargs...) where {K<:AbstractKDEMethod, T}
Binning Methods
KernelDensityEstimation.AbstractBinningKDE
— TypeAbstractBinningKDE <: AbstractKDEMethod
The abstract supertype of data binning methods which are the first step in the density estimation process. The two supported binning methods are HistogramBinning
and LinearBinning
.
KernelDensityEstimation.HistogramBinning
— Typestruct HistogramBinning <: AbstractBinningKDE end
Base case which generates a density estimate by histogramming the data.
See also LinearBinning
KernelDensityEstimation.LinearBinning
— Typestruct LinearBinning <: AbstractBinningKDE end
Base case which generates a density estimate by linear binning of the data.
See also HistogramBinning
Density Estimation Methods
KernelDensityEstimation.BasicKDE
— TypeBasicKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod
A baseline density estimation technique which convolves a binned dataset with a Gaussian kernel truncated at its $±4σ$ bounds.
Fields and Constructor Keywords
binning::
AbstractBinningKDE
: The binning type to apply to a data vector as the first step of density estimation. Defaults toHistogramBinning()
.
KernelDensityEstimation.LinearBoundaryKDE
— TypeLinearBoundaryKDE{M<:AbstractBinningKDE} <: AbstractKDEMethod
A method of KDE which applies the linear boundary correction of Jones and Foster [1] as described in Lewis [2] after BasicKDE
density estimation. This correction primarily impacts the KDE near a closed boundary (see Boundary
) and has the effect of improving any non-zero gradient at the boundary (when compared to normalization corrections which tend to leave the boundary too flat).
Fields and Constructor Keywords
binning::
AbstractBinningKDE
: The binning type to apply to a data vector as the first step of density estimation. Defaults toHistogramBinning()
.
KernelDensityEstimation.MultiplicativeBiasKDE
— TypeMulitplicativeBiasKDE{B<:AbstractBinningKDE,M<:AbstractKDEMethod} <: AbstractKDEMethod
A method of KDE which applies the multiplicative bias correction described in Lewis [2]. This correction is designed to reduce the broadening of peaks inherent to kernel convolution by using a pilot KDE to flatten the distribution and run a second iteration of density estimation (since a perfectly uniform distribution cannot be broadened further).
Fields and Constructor Keywords
binning::
AbstractBinningKDE
: The binning type to apply to a data vector as the first step of density estimation. Defaults toHistogramBinning()
.method::
AbstractKDEMethod
: The KDE method to use for the pilot and iterative density estimation. Defaults toLinearBoundaryKDE()
.
Note that if the given method
has a configurable binning type, it is ignored in favor of the explicit binning
chosen.
Bandwidth Estimators
KernelDensityEstimation.AbstractBandwidthEstimator
— TypeAbstractBandwidthEstimator
Abstract supertype of kernel bandwidth estimation techniques.
KernelDensityEstimation.SilvermanBandwidth
— TypeSilvermanBandwidth <: AbstractBandwidthEstimator
Estimates the necessary bandwidth of a vector of data $v$ using Silverman's Rule for a Gaussian smoothing kernel:
\[ h = \left(\frac{4}{3n}\right)^{1/5} σ̂\]
where $n$ is the length of $v$ and $σ̂$ is its sample variance.
See also ISJBandwidth
References
KernelDensityEstimation.ISJBandwidth
— TypeISJBandwidth <: AbstractBandwidthEstimator
Estimates the necessary bandwidth of a vector of data $v$ using the Improved Sheather-Jones (ISJ) plug-in estimator of Botev et al. [4].
This estimator is more capable of choosing an appropriate bandwidth for bimodal (and other highly non-Gaussian) distributions, but comes at the expense of greater computation time and no guarantee that the estimator converges when given very few data points.
See also SilvermanBandwidth
Fields
binning::
AbstractBinningKDE
: The binning type to apply to a data vector as the first step of bandwidth estimation. Defaults toHistogramBinning()
.bwratio::Int
: The relative resolution of the binned data used by the ISJ plug-in estimator — there arebwratio
bins per interval of size $h₀$, where the intial rough initial bandwidth estimate is given by theSilvermanBandwidth
estimator. Defaults to 2.niter::Int
: The number of iterations to perform in the plug-in estimator. Defaults to 7, in accordance with Botev et. al. who state that higher orders show little benefit.fallback::Bool
: Whether to fallback to theSilvermanBandwidth
if the ISJ estimator fails to converge. Iffalse
, an exception is thrown instead.
References
KernelDensityEstimation.bandwidth
— Functionh = bandwidth(estimator::AbstractBandwidthEstimator, data::AbstractVector{T},
lo::T, hi::T, boundary::Boundary.T) where {T}
Determine the appropriate bandwidth h
of the data set data
using chosen estimator
algorithm. The bandwidth is provided the range (lo
through hi
) and boundary style (boundary
) of the request KDE method for use in filtering and/or correctly interpreting the data, if necessary.
Interfaces
Density Estimation Methods
KernelDensityEstimation.AbstractKDE
— TypeKernelDensityEstimation.AbstractKDEInfo
— TypeAbstractKDEInfo{T}
Abstract supertype of auxiliary information used during kernel density estimation.
See also UnivariateKDEInfo
KernelDensityEstimation.UnivariateKDEInfo
— TypeUnivariateKDEInfo{T} <: AbstractKDEInfo{T}
Information about the density estimation process, providing insight into both the entrypoint parameters and some internal state variables.
Extended help
Fields
method::
AbstractKDEMethod
: The estimation method used to generate the KDE.bounds::Any
: The bounds specification of the estimate as passed toinit()
, prior to making it concrete via callingbounds()
. Defaults tonothing
.interval::Tuple{T,T}
: The concrete interval of the density estimate after callingbounds()
with the value of the.bounds
field but before adding requisite padding for open boundary conditions. Defaults to(zero(T), zero(T))
.boundary::
Boundary.T
: The concrete boundary condition assumed in the density estimate after callingboundary()
with the value of the.bounds
field. Defaults toOpen
.npoints::Int
: The number of values in the original data vector. Defaults to-1
.bandwidth_alg::Union{Nothing,
AbstractBandwidthEstimator
}
: Algorithm used to estimate an appropriate bandwidth, if a concrete value was not provided to the estimator, otherwisenothing
. Defaults tonothing
.bandwidth::T
: The bandwidth of the convolutionkernel
. Defaults tozero(T)
.bwratio::T
: The ratio between the bandwidth and the width of a histogram bin, used only when the number of bins.nbins
is not explicitly provided. Defaults toone(T)
.lo::T
: The lower edge of the first bin in the density estimate, after possibly adjusting for an open boundary condition compared to the.interval
field. Defaults tozero(T)
.hi::T
: The upper edge of the last bin in the density estimate, after possibly adjusting for an open boundary condition compared to the.interval
field. Defaults tozero(T)
.nbins::Int
: The number of bins used in the histogram at the beinning of the density estimatation. Defaults to-1
.kernel::Union{Nothing,
UnivariateKDE
{T}}
: The convolution kernel used to process the density estimate. Defaults tonothing
.
KernelDensityEstimation.AbstractKDEMethod
— TypeAbstractKDEMethod
The abstract supertype of all kernel density estimation methods, including the data binning process (see AbstractBinningKDE
) and subsequent density estimation techniques (such as BasicKDE
).
KernelDensityEstimation.boundary
— FunctionB = boundary(spec)
Convert the specification spec
to a boundary style B
.
Packages may specialize this method on the spec
argument to modify the behavior of the boundary inference for new argument types.
KernelDensityEstimation.bounds
— Functionlo, hi, boundary = bounds(data::AbstractVector{T}, spec) where {T}
Determine the appropriate interval, from lo
to hi
with boundary style boundary
, for the density estimate, given the data vector data
and KDE argument bounds
.
Packages may specialize this method on the spec
argument to modify the behavior of the interval and boundary refinement for new argument types.
KernelDensityEstimation.estimate
— Functionestim, info = estimate(method::AbstractKDEMethod, data::AbstractVector; kwargs...)
estim, info = estimate(method::AbstractKDEMethod, data::AbstractKDE, info::AbstractKDEInfo; kwargs...)
Apply the kernel density estimation algorithm method
to the given data, either in the form of a vector of data
or a prior density estimate and its corresponding pipeline info
(to support being part of a processing pipeline).
Returns
estim::
AbstractKDE
: The resultant kernel density estimate.info::
AbstractKDEInfo
: Auxiliary information describing details of the density estimation either useful or necessary for constructing a pipeline of processing steps.
KernelDensityEstimation.estimator_order
— Functionp = estimator_order(::Type{<:AbstractKDEMethod})
The bias scaning of the density estimator method, where a return value of p
corresponds to bandwidth-dependent biases of the order $\mathcal{O}(h^{2p})$.