ethograph.labels.ml.purge_small_blocks#
- ethograph.labels.ml.purge_small_blocks(labels, min_length, label_thresholds=None)[source]#
Remove label blocks shorter than a threshold (set to background).
Scans the dense label array for contiguous runs of the same non-zero label. If a run is shorter than its threshold, every sample in it is set to 0 (background).
This is the dense-array counterpart of
purge_short_intervals()(which works in seconds on interval DataFrames).- Parameters:
labels (np.ndarray) – 1-D dense label array (int), where 0 = background.
min_length (int) – Default minimum block length in samples. Blocks shorter than this are zeroed out. Convert from seconds:
int(min_duration_s * sample_rate).label_thresholds (dict[int, int], optional) – Per-label minimum lengths that override min_length. For example,
{1: 10, 3: 30}means label 1 needs ≥10 samples and label 3 needs ≥30 samples; all other labels use min_length.
- Returns:
Copy of labels with short blocks zeroed out.
- Return type:
np.ndarray
Examples
Remove any block shorter than 3 samples:
>>> import numpy as np >>> from ethograph.labels.ml import purge_small_blocks >>> pred = np.array([0, 1, 0, 2, 2, 2, 2, 0]) >>> purge_small_blocks(pred, min_length=3).tolist() [0, 0, 0, 2, 2, 2, 2, 0]
With per-label thresholds (label 2 needs ≥5 samples):
>>> purge_small_blocks(pred, min_length=1, label_thresholds={2: 5}).tolist() [0, 1, 0, 0, 0, 0, 0, 0]
Typical pipeline — purge then stitch:
>>> pred = np.array([1,1,1, 0, 1, 0, 1,1,1]) >>> cleaned = purge_small_blocks(pred, min_length=2) # remove isolated 1-sample >>> cleaned.tolist() [1, 1, 1, 0, 0, 0, 1, 1, 1] >>> stitch_gaps(cleaned, max_gap_len=4).tolist() [1, 1, 1, 1, 1, 1, 1, 1, 1]