Data Format Requirements#

EthoGraph supports three data backends. Pick the one that matches your workflow:

Backend	Best for	Core object
xarray	Custom datasets, pose estimation, multi-dim arrays	`xarray.Dataset` / `TrialTree`
Pynapple	Neuroscience time-series, NWB interop	`Tsd` / `TsdFrame` / `TsGroup`
NWB	Standardised neurodata, DANDI archives	`.nwb` file (loaded via pynapple)

This page covers the core format: a minimal working example, required attributes, how features are discovered, multi-subject datasets, and where media/alignment metadata lives. Optional features (custom dimensions, color variables) are at the bottom.

For multi-trial setups (multiple videos per trial, multi-camera, session-wide audio, ephys alignment) see Multi-trial setup.

Minimal working example#

Xarray

import numpy as np
import xarray as xr
import ethograph as eto

ds = xr.Dataset(
    data_vars={
        "speed": xr.DataArray(
            np.random.randn(9000),
            dims=["time"],
            coords={"time": np.arange(9000) / 30.0},
        ),
    },
    coords={"individuals": ["mouse1"]},
)
ds.attrs["trial"] = 1
ds.attrs["fps"] = 30.0

dt = eto.from_datasets([ds])
dt.save("session.nc")

Pynapple

import numpy as np
import pynapple as nap

speed = nap.Tsd(
    t=np.arange(9000) / 30.0,
    d=np.random.randn(9000),
)
nap.save_file({"speed": speed}, "session")
# Load in GUI: select the session.npz file

NWB

# NWB files from DANDI or NeuroConv are loaded directly — no conversion needed.
# In the GUI: select the .nwb file in the I/O widget and click Load.
#
# To create an NWB file programmatically, see the pynwb documentation:
# https://pynwb.readthedocs.io/en/stable/tutorials/general/plot_file.html

Required attributes#

Xarray

Every trial’s xarray.Dataset must have:

Attribute	Type	Description
`attrs["trial"]`	`int`, `str`	Trial identifier (1, 2, 3, …). Must be unique across trials.
`attrs["fps"]`	`float`	Frame rate of the primary video. Not required for audio-only datasets.

ds.attrs["trial"] = 1
ds.attrs["fps"] = 30.0

Pynapple

Pynapple objects carry timestamps natively — no fps or trial attribute is needed.

Timestamps: every Tsd / TsdFrame / TsdTensor stores its own time axis.
Trials: defined by an IntervalSet (either from NWB trials or created manually).

import pynapple as nap
import numpy as np

trials = nap.IntervalSet(
    start=[0.0, 300.0, 600.0],
    end=[299.5, 599.5, 899.5],
)
speed = nap.Tsd(t=np.arange(27000) / 30.0, d=np.random.randn(27000))

NWB

NWB files follow the NWB standard. EthoGraph reads:

Trials: nwb.trials table (start_time, stop_time, plus custom columns).
Behavioural data: TimeSeries in nwb.processing modules.
Electrophysiology: ElectricalSeries in nwb.acquisition.
Pose estimation: PoseEstimation containers (ndx-pose extension).

If nwb.trials is absent, the entire recording is treated as a single trial.

Features (plottable variables)#

Xarray

Any data_var with at least one dimension whose name contains "time" appears in the GUI’s Feature dropdown:

ds["speed"] = xr.DataArray(
    speed_values,
    dims=["time", "keypoints", "individuals"],
)

Different features can use different time coordinates with different sampling rates (e.g. time, time_accelerometer, time_video).

Pynapple

All Tsd, TsdFrame, and TsdTensor objects in the loaded data dict are detected automatically. Column names become selectable dimensions.

speed = nap.Tsd(t=time_s, d=speed_values)

position = nap.TsdFrame(
    t=time_s,
    d=pos_array,                        # shape: (n_time, 3)
    columns=["x", "y", "z"],
)

NWB

EthoGraph discovers features automatically from NWB processing modules (excluding ecephys, ophys, ogen). NWB data is loaded via pynapple, so all TimeSeries become pynapple objects internally.

Specifying individuals#

Xarray

Individuals are stored as a coordinate, not an attribute. With multi-animal data, this allows the GUI to store separate labels and feature data for different individuals.

ds = xr.Dataset(
    data_vars={
        "speed": xr.DataArray(
            speed_array,                # shape: (time, individuals)
            dims=["time", "individuals"],
        ),
    },
    coords={
        "time": time_values,
        "individuals": ["mouse1", "mouse2", "mouse3"],
    },
)

When labelling, the selected individual filters which labels are shown and created.

Pynapple

Pynapple has no built-in concept of “individuals”. Multi-subject data is typically stored as separate objects:

data = {
    "speed_mouse1": nap.Tsd(t=time_s, d=speed_mouse1),
    "speed_mouse2": nap.Tsd(t=time_s, d=speed_mouse2),
}

Each object appears as a separate feature in the GUI. Individual selection is not available for pynapple backends.

NWB

NWB files represent a single subject per file — subject is a singular Subject object (PyNWB docs). Multi-subject experiments use separate .nwb files per subject. Individual selection is not available when loading a single NWB file.

Media files and alignment#

Media filenames (video, audio, pose), trial timing, and stream offsets are stored in an NWB alignment file (.ethograph/alignment.nwb) — not inside individual trial datasets. This keeps data and metadata separate, and filenames are stored as filename-only strings so datasets remain portable.

For single-trial recordings, the Getting Started Wizard creates the alignment file for you. For multi-trial, multi-camera, or session-wide media, see Multi-trial setup.

NWBAlignment provides programmatic access to session metadata:

from ethograph.io.nwb_alignment import NWBAlignment

alignment = NWBAlignment("my_project/.ethograph/alignment.nwb")
print(alignment.trials_df)
print(alignment.cameras)
print(alignment.start_time(trial=1))
alignment.close()

For NWB sources, session metadata lives in the source NWB file directly — no sidecar alignment is needed for standalone .nwb loading. For project directories (e.g. DANDI), the GUI creates .ethograph/alignment.nwb automatically.

Optional: custom dimensions#

Any dimension that co-occurs with a time dimension in at least one feature variable is automatically discovered and gets a selection combo box in the GUI.

Xarray

ds["emg"] = xr.DataArray(
    emg_data,                            # shape: (time, channels)
    dims=["time", "channels"],
    coords={"channels": ["biceps", "triceps"]},
)

Dimensions do not need to match across features. For example, position may have (time, keypoints, space, individuals) while speed only has (time, keypoints, individuals). The GUI creates combo boxes for the union of all discovered dimensions. When a feature doesn’t have a selected dimension, that selection is silently ignored via sel_valid():

import ethograph as eto

# "keypoints" and "individuals" are applied; "space" is silently dropped
data, used_kwargs = eto.sel_valid(
    ds["speed"],
    {"keypoints": "nose", "space": "x", "individuals": "mouse1"},
)

Pynapple

Column names in a TsdFrame become a selectable dimension. Objects with identical column names share a single combo in the GUI.

position = nap.TsdFrame(t=time_s, d=pos, columns=["x", "y", "z"])
velocity = nap.TsdFrame(t=time_s, d=vel, columns=["x", "y", "z"])

Optional: color variables#

Color variables are identified by name: any feature with "rgb" in its name (case-insensitive) is automatically offered in the GUI’s Colors combo. Values should lie in [0, 1] (float) or [0, 255] (int).

Xarray

The variable should have an RGB dimension of size 3:

ds["angle_rgb"] = xr.DataArray(
    rgb_values,                          # shape: (time, keypoints, individuals, 3)
    dims=["time", "keypoints", "individuals", "RGB"],
)

To compute angle-based RGB automatically from pose data, use add_angle_rgb_to_ds():

import ethograph as eto

ds = eto.add_angle_rgb_to_ds(ds, smoothing_params={"sigma": 3})

Pynapple

Store RGB as a TsdFrame with columns ["R", "G", "B"] (or any 3-column frame whose name contains "rgb"):

import pynapple as nap

angle_rgb = nap.TsdFrame(
    t=time_s,
    d=rgb_values,                        # shape: (n_time, 3)
    columns=["R", "G", "B"],
)
data = {"angle_rgb": angle_rgb}

To compute angle-based RGB automatically from a position TsdFrame, use add_angle_rgb_to_nap():

from ethograph.io.pynapple import add_angle_rgb_to_nap

angle_rgb = add_angle_rgb_to_nap(position, smoothing_params={"sigma": 3})
data["angle_rgb"] = angle_rgb