ethograph.labels.ml#

Dense (array-based) label operations for ML pipelines.

This module provides tools for converting between the interval-based label format (used by the GUI and TSV storage) and dense integer arrays (used by ML models). It also contains post-processing operations commonly applied to model predictions before evaluation or storage.

Typical ML workflow#

  1. Load labels from TSVpd.DataFrame with onset_s, offset_s, labels, individual (plus n_samples per-trial metadata).

  2. Convert to denseintervals_to_dense(df, sample_rate, individuals, n_samples) gives an (n_samples, n_individuals) int8 array ready for training.

  3. Run model → get a dense prediction array of shape (T,) or (T, n_classes).

  4. Post-processpurge_small_blocksstitch_gapsfix_endings.

  5. Convert backdense_to_intervals(pred, individuals, sample_rate=sr) gives an intervals DataFrame for storage or evaluation.

The n_samples value stored in the TSV file (per-trial metadata) tells you exactly how long the dense array should be — you only need to additionally know the sample_rate to drive the conversion.

Functions

dense_to_intervals

Convert a dense label array to an intervals DataFrame.

intervals_to_dense

Convert an intervals DataFrame to a dense label array.

get_labels_start_end_indices

Return segment boundaries as sample indices (exclusive end).

find_blocks

Find contiguous True blocks in a boolean array.

stitch_gaps

Fill small background gaps between same-label segments.

purge_small_blocks

Remove label blocks shorter than a threshold (set to background).

fix_endings

Extend label endings by one sample at changepoint boundaries.