ethograph.labels.ml#
Dense (array-based) label operations for ML pipelines.
This module provides tools for converting between the interval-based label format (used by the GUI and TSV storage) and dense integer arrays (used by ML models). It also contains post-processing operations commonly applied to model predictions before evaluation or storage.
Typical ML workflow#
Load labels from TSV →
pd.DataFramewithonset_s,offset_s,labels,individual(plusn_samplesper-trial metadata).Convert to dense →
intervals_to_dense(df, sample_rate, individuals, n_samples)gives an(n_samples, n_individuals)int8 array ready for training.Run model → get a dense prediction array of shape
(T,)or(T, n_classes).Post-process →
purge_small_blocks→stitch_gaps→fix_endings.Convert back →
dense_to_intervals(pred, individuals, sample_rate=sr)gives an intervals DataFrame for storage or evaluation.
The n_samples value stored in the TSV file (per-trial metadata) tells you
exactly how long the dense array should be — you only need to additionally
know the sample_rate to drive the conversion.
Functions
Convert a dense label array to an intervals DataFrame. |
|
Convert an intervals DataFrame to a dense label array. |
|
Return segment boundaries as sample indices (exclusive end). |
|
Find contiguous True blocks in a boolean array. |
|
Fill small background gaps between same-label segments. |
|
Remove label blocks shorter than a threshold (set to background). |
|
Extend label endings by one sample at changepoint boundaries. |