pykoop.RandomBinningKernelApprox

class RandomBinningKernelApprox(kernel_or_ddot='laplacian', n_components=100, shape=1, encoder_kw=None, random_state=None)

Bases: KernelApproximation

Kernel approximation with random binning.

Highly experimental! For more details, see [RR07].

Parameters:
n_features_in_

Number of features input.

Type:

int

n_features_out_

Number of features output. This attribute is not available in estimators from sklearn.kernel_approximation.

Type:

int

ddot_

Probability distribution corresponding to \delta \ddot{k}(\delta).

Type:

scipy.stats.rv_continuous

pitches_

Grid pitches for each component.

Type:

np.ndarray, shape (n_features, n_components)

shifts_

Grid shifts for each component.

Type:

np.ndarray, shape (n_features, n_components)

encoder_

One-hot encoder used for hashing sample coordinates for each component.

Type:

sklearn.preprocessing.OneHotEncoder

Examples

Generate randomly binned features from a Laplacian kernel

>>> ka = pykoop.RandomBinningKernelApprox(
...     kernel_or_ddot='laplacian',
...     n_components=10,
...     shape=1,
...     random_state=1234,
... )
>>> ka.fit(X_msd[:, 1:])  # Remove episode feature
RandomBinningKernelApprox(n_components=10, random_state=1234)
>>> ka.transform(X_msd[:, 1:])
array([...])
__init__(kernel_or_ddot='laplacian', n_components=100, shape=1, encoder_kw=None, random_state=None)

Instantiate RandomBinningKernelApprox.

Parameters:
  • kernel_or_ddot (Union[str, scipy.stats.rv_continuous]) –

    Kernel to approximate. Possible options are

    • 'laplacian' – Laplacian kernel, with \delta \ddot{k}(\delta) being scipy.stats.gamma with shape parameter a=2 (default).

    Alternatively, a separable, positive, shift-invariant kernel can be implicitly specified by providing \delta \ddot{k}(\delta) as a univariate probability distribution subclassing scipy.stats.rv_continuous.

  • n_components (int) – Number of random samples used to generate features. The higher the number of components, the higher the number of features. Since unoccupied bins are eliminated, it’s impossible to know the exact number of features before fitting.

  • shape (float) – Shape parameter. Must be greater than zero. Larger numbers correspond to “sharper” kernels. Scaled to be consistent with gamma from sklearn.kernel_approximation.RBFSampler. This can lead to a mysterious factor of sqrt(2) in other kernels. Default is 1.

  • encoder_kw (Optional[Dict[str, Any]]) – Extra keyword arguments for internal sklearn.preprocessing.OneHotEncoder. For experimental use only. The wrong arguments can break everything. Overrides defaults.

  • random_state (Union[int, np.random.RandomState, None]) – Random seed.

Return type:

None

Methods

__init__([kernel_or_ddot, n_components, ...])

Instantiate RandomBinningKernelApprox.

fit(X[, y])

Fit kernel approximation.

fit_transform(X[, y])

Fit to data, then transform it.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_output(*[, transform])

Set output container.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform data.

fit(X, y=None)

Fit kernel approximation.

Parameters:
  • X (np.ndarray) – Data matrix.

  • y (Optional[np.ndarray]) – Ignored.

Returns:

Instance of itself.

Return type:

RandomBinningKernelApprox

Raises:

ValueError – If any of the constructor parameters are incorrect.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

  • ”default”: Default output format of a transformer

  • ”pandas”: DataFrame output

  • None: Transform configuration is unchanged

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transform(X)

Transform data.

Parameters:

X (np.ndarray) – Data matrix.

Returns:

Transformed data matrix.

Return type:

np.ndarray