ethograph.labels.ml.purge_small_blocks#

ethograph.labels.ml.purge_small_blocks(labels, min_length, label_thresholds=None)[source]#

Remove label blocks shorter than a threshold (set to background).

Scans the dense label array for contiguous runs of the same non-zero label. If a run is shorter than its threshold, every sample in it is set to 0 (background).

This is the dense-array counterpart of purge_short_intervals() (which works in seconds on interval DataFrames).

Parameters:

labels (np.ndarray) – 1-D dense label array (int), where 0 = background.
min_length (int) – Default minimum block length in samples. Blocks shorter than this are zeroed out. Convert from seconds: int(min_duration_s * sample_rate).
label_thresholds (dict[int, int], optional) – Per-label minimum lengths that override min_length. For example, {1: 10, 3: 30} means label 1 needs ≥10 samples and label 3 needs ≥30 samples; all other labels use min_length.

Returns:

Copy of labels with short blocks zeroed out.

Return type:

np.ndarray

Examples

Remove any block shorter than 3 samples:

>>> import numpy as np
>>> from ethograph.labels.ml import purge_small_blocks
>>> pred = np.array([0, 1, 0, 2, 2, 2, 2, 0])
>>> purge_small_blocks(pred, min_length=3).tolist()
[0, 0, 0, 2, 2, 2, 2, 0]

With per-label thresholds (label 2 needs ≥5 samples):

>>> purge_small_blocks(pred, min_length=1, label_thresholds={2: 5}).tolist()
[0, 1, 0, 0, 0, 0, 0, 0]

Typical pipeline — purge then stitch:

>>> pred = np.array([1, 1, 1, 0, 1, 0, 1, 1, 1])
>>> cleaned = purge_small_blocks(pred, min_length=2)  # remove isolated 1-sample
>>> cleaned.tolist()
[1, 1, 1, 0, 0, 0, 1, 1, 1]
>>> stitch_gaps(cleaned, max_gap_len=4).tolist()
[1, 1, 1, 1, 1, 1, 1, 1, 1]