Data Format Requirements#
EthoGraph supports three data backends. Pick the one that matches your workflow:
Backend |
Best for |
Core object |
|---|---|---|
xarray |
Custom datasets, pose estimation, multi-dim arrays |
|
Pynapple |
Neuroscience time-series, NWB interop |
|
NWB |
Standardised neurodata, DANDI archives |
|
This page covers the core format: a minimal working example, required attributes, how features are discovered, multi-subject datasets, and where media/alignment metadata lives. Optional features (custom dimensions, color variables) are at the bottom.
For multi-trial setups (multiple videos per trial, multi-camera, session-wide audio, ephys alignment) see Multi-trial setup.
Minimal working example#
import numpy as np
import xarray as xr
import ethograph as eto
ds = xr.Dataset(
data_vars={
"speed": xr.DataArray(
np.random.randn(9000),
dims=["time"],
coords={"time": np.arange(9000) / 30.0},
),
},
coords={"individuals": ["mouse1"]},
)
ds.attrs["trial"] = 1
ds.attrs["fps"] = 30.0
dt = eto.from_datasets([ds])
dt.save("session.nc")
import numpy as np
import pynapple as nap
speed = nap.Tsd(
t=np.arange(9000) / 30.0,
d=np.random.randn(9000),
)
nap.save_file({"speed": speed}, "session")
# Load in GUI: select the session.npz file
# NWB files from DANDI or NeuroConv are loaded directly — no conversion needed.
# In the GUI: select the .nwb file in the I/O widget and click Load.
#
# To create an NWB file programmatically, see the pynwb documentation:
# https://pynwb.readthedocs.io/en/stable/tutorials/general/plot_file.html
Required attributes#
Every trial’s xarray.Dataset must have:
Attribute |
Type |
Description |
|---|---|---|
|
|
Trial identifier (1, 2, 3, …). Must be unique across trials. |
|
|
Frame rate of the primary video. Not required for audio-only datasets. |
ds.attrs["trial"] = 1
ds.attrs["fps"] = 30.0
Pynapple objects carry timestamps natively — no fps or trial attribute is needed.
Timestamps: every
Tsd/TsdFrame/TsdTensorstores its own time axis.Trials: defined by an
IntervalSet(either from NWB trials or created manually).
import pynapple as nap
import numpy as np
trials = nap.IntervalSet(
start=[0.0, 300.0, 600.0],
end=[299.5, 599.5, 899.5],
)
speed = nap.Tsd(t=np.arange(27000) / 30.0, d=np.random.randn(27000))
NWB files follow the NWB standard. EthoGraph reads:
Trials:
nwb.trialstable (start_time,stop_time, plus custom columns).Behavioural data:
TimeSeriesinnwb.processingmodules.Electrophysiology:
ElectricalSeriesinnwb.acquisition.Pose estimation:
PoseEstimationcontainers (ndx-pose extension).
If nwb.trials is absent, the entire recording is treated as a single trial.
Features (plottable variables)#
Any data_var with at least one dimension whose name contains "time" appears
in the GUI’s Feature dropdown:
ds["speed"] = xr.DataArray(
speed_values,
dims=["time", "keypoints", "individuals"],
)
Different features can use different time coordinates with different sampling
rates (e.g. time, time_accelerometer, time_video).
All Tsd, TsdFrame, and
TsdTensor objects in the loaded data dict are detected
automatically. Column names become selectable dimensions.
speed = nap.Tsd(t=time_s, d=speed_values)
position = nap.TsdFrame(
t=time_s,
d=pos_array, # shape: (n_time, 3)
columns=["x", "y", "z"],
)
EthoGraph discovers features automatically from NWB processing modules
(excluding ecephys, ophys, ogen). NWB data is loaded via pynapple,
so all TimeSeries become pynapple objects internally.
Specifying individuals#
Individuals are stored as a coordinate, not an attribute. With multi-animal data, this allows the GUI to store separate labels and feature data for different individuals.
ds = xr.Dataset(
data_vars={
"speed": xr.DataArray(
speed_array, # shape: (time, individuals)
dims=["time", "individuals"],
),
},
coords={
"time": time_values,
"individuals": ["mouse1", "mouse2", "mouse3"],
},
)
When labelling, the selected individual filters which labels are shown and created.
Pynapple has no built-in concept of “individuals”. Multi-subject data is typically stored as separate objects:
data = {
"speed_mouse1": nap.Tsd(t=time_s, d=speed_mouse1),
"speed_mouse2": nap.Tsd(t=time_s, d=speed_mouse2),
}
Each object appears as a separate feature in the GUI. Individual selection is not available for pynapple backends.
NWB files represent a single subject per file —
subject is a singular
Subject object
(PyNWB docs).
Multi-subject experiments use separate .nwb files per subject. Individual
selection is not available when loading a single NWB file.
Media files and alignment#
Media filenames (video, audio, pose), trial timing, and stream offsets are
stored in an NWB alignment file (.ethograph/alignment.nwb) — not inside
individual trial datasets. This keeps data and metadata separate, and filenames
are stored as filename-only strings so datasets remain portable.
For single-trial recordings, the Getting Started Wizard creates the alignment file for you. For multi-trial, multi-camera, or session-wide media, see Multi-trial setup.
NWBAlignment provides programmatic access
to session metadata:
from ethograph.io.nwb_alignment import NWBAlignment
alignment = NWBAlignment("my_project/.ethograph/alignment.nwb")
print(alignment.trials_df)
print(alignment.cameras)
print(alignment.start_time(trial=1))
alignment.close()
For NWB sources, session metadata lives in the source NWB file directly —
no sidecar alignment is needed for standalone .nwb loading. For project
directories (e.g. DANDI), the GUI creates .ethograph/alignment.nwb
automatically.
Optional: custom dimensions#
Any dimension that co-occurs with a time dimension in at least one feature variable is automatically discovered and gets a selection combo box in the GUI.
ds["emg"] = xr.DataArray(
emg_data, # shape: (time, channels)
dims=["time", "channels"],
coords={"channels": ["biceps", "triceps"]},
)
Dimensions do not need to match across features. For example, position
may have (time, keypoints, space, individuals) while speed only has
(time, keypoints, individuals). The GUI creates combo boxes for the union of
all discovered dimensions. When a feature doesn’t have a selected dimension,
that selection is silently ignored via
sel_valid():
import ethograph as eto
# "keypoints" and "individuals" are applied; "space" is silently dropped
data, used_kwargs = eto.sel_valid(
ds["speed"],
{"keypoints": "nose", "space": "x", "individuals": "mouse1"},
)
Column names in a TsdFrame become a selectable dimension.
Objects with identical column names share a single combo in the GUI.
position = nap.TsdFrame(t=time_s, d=pos, columns=["x", "y", "z"])
velocity = nap.TsdFrame(t=time_s, d=vel, columns=["x", "y", "z"])
Optional: color variables#
Color variables are identified by name: any feature with "rgb" in its
name (case-insensitive) is automatically offered in the GUI’s Colors combo.
Values should lie in [0, 1] (float) or [0, 255] (int).
The variable should have an RGB dimension of size 3:
ds["angle_rgb"] = xr.DataArray(
rgb_values, # shape: (time, keypoints, individuals, 3)
dims=["time", "keypoints", "individuals", "RGB"],
)
To compute angle-based RGB automatically from pose data, use
add_angle_rgb_to_ds():
import ethograph as eto
ds = eto.add_angle_rgb_to_ds(ds, smoothing_params={"sigma": 3})
Store RGB as a TsdFrame with columns ["R", "G", "B"] (or
any 3-column frame whose name contains "rgb"):
import pynapple as nap
angle_rgb = nap.TsdFrame(
t=time_s,
d=rgb_values, # shape: (n_time, 3)
columns=["R", "G", "B"],
)
data = {"angle_rgb": angle_rgb}
To compute angle-based RGB automatically from a position
TsdFrame, use
add_angle_rgb_to_nap():
from ethograph.io.pynapple import add_angle_rgb_to_nap
angle_rgb = add_angle_rgb_to_nap(position, smoothing_params={"sigma": 3})
data["angle_rgb"] = angle_rgb