Data Format Requirements#

EthoGraph supports three data backends. Pick the one that matches your workflow:

Backend

Best for

Core object

xarray

Custom datasets, pose estimation, multi-dim arrays

xarray.Dataset / TrialTree

Pynapple

Neuroscience time-series, NWB interop

Tsd / TsdFrame / TsGroup

NWB

Standardised neurodata, DANDI archives

.nwb file (loaded via pynapple)

This page covers the core format: a minimal working example, required attributes, how features are discovered, multi-subject datasets, and where media/alignment metadata lives. Optional features (custom dimensions, color variables) are at the bottom.

For multi-trial setups (multiple videos per trial, multi-camera, session-wide audio, ephys alignment) see Multi-trial setup.


Minimal working example#

import numpy as np
import xarray as xr
import ethograph as eto

ds = xr.Dataset(
    data_vars={
        "speed": xr.DataArray(
            np.random.randn(9000),
            dims=["time"],
            coords={"time": np.arange(9000) / 30.0},
        ),
    },
    coords={"individuals": ["mouse1"]},
)
ds.attrs["trial"] = 1
ds.attrs["fps"] = 30.0

dt = eto.from_datasets([ds])
dt.save("session.nc")
import numpy as np
import pynapple as nap

speed = nap.Tsd(
    t=np.arange(9000) / 30.0,
    d=np.random.randn(9000),
)
nap.save_file({"speed": speed}, "session")
# Load in GUI: select the session.npz file
# NWB files from DANDI or NeuroConv are loaded directly — no conversion needed.
# In the GUI: select the .nwb file in the I/O widget and click Load.
#
# To create an NWB file programmatically, see the pynwb documentation:
# https://pynwb.readthedocs.io/en/stable/tutorials/general/plot_file.html

Required attributes#

Every trial’s xarray.Dataset must have:

Attribute

Type

Description

attrs["trial"]

int, str

Trial identifier (1, 2, 3, …). Must be unique across trials.

attrs["fps"]

float

Frame rate of the primary video. Not required for audio-only datasets.

ds.attrs["trial"] = 1
ds.attrs["fps"] = 30.0

Pynapple objects carry timestamps natively — no fps or trial attribute is needed.

import pynapple as nap
import numpy as np

trials = nap.IntervalSet(
    start=[0.0, 300.0, 600.0],
    end=[299.5, 599.5, 899.5],
)
speed = nap.Tsd(t=np.arange(27000) / 30.0, d=np.random.randn(27000))

NWB files follow the NWB standard. EthoGraph reads:

  • Trials: nwb.trials table (start_time, stop_time, plus custom columns).

  • Behavioural data: TimeSeries in nwb.processing modules.

  • Electrophysiology: ElectricalSeries in nwb.acquisition.

  • Pose estimation: PoseEstimation containers (ndx-pose extension).

If nwb.trials is absent, the entire recording is treated as a single trial.


Features (plottable variables)#

Any data_var with at least one dimension whose name contains "time" appears in the GUI’s Feature dropdown:

ds["speed"] = xr.DataArray(
    speed_values,
    dims=["time", "keypoints", "individuals"],
)

Different features can use different time coordinates with different sampling rates (e.g. time, time_accelerometer, time_video).

All Tsd, TsdFrame, and TsdTensor objects in the loaded data dict are detected automatically. Column names become selectable dimensions.

speed = nap.Tsd(t=time_s, d=speed_values)

position = nap.TsdFrame(
    t=time_s,
    d=pos_array,                        # shape: (n_time, 3)
    columns=["x", "y", "z"],
)

EthoGraph discovers features automatically from NWB processing modules (excluding ecephys, ophys, ogen). NWB data is loaded via pynapple, so all TimeSeries become pynapple objects internally.


Specifying individuals#

Individuals are stored as a coordinate, not an attribute. With multi-animal data, this allows the GUI to store separate labels and feature data for different individuals.

ds = xr.Dataset(
    data_vars={
        "speed": xr.DataArray(
            speed_array,                # shape: (time, individuals)
            dims=["time", "individuals"],
        ),
    },
    coords={
        "time": time_values,
        "individuals": ["mouse1", "mouse2", "mouse3"],
    },
)

When labelling, the selected individual filters which labels are shown and created.

Pynapple has no built-in concept of “individuals”. Multi-subject data is typically stored as separate objects:

data = {
    "speed_mouse1": nap.Tsd(t=time_s, d=speed_mouse1),
    "speed_mouse2": nap.Tsd(t=time_s, d=speed_mouse2),
}

Each object appears as a separate feature in the GUI. Individual selection is not available for pynapple backends.

NWB files represent a single subject per file — subject is a singular Subject object (PyNWB docs). Multi-subject experiments use separate .nwb files per subject. Individual selection is not available when loading a single NWB file.


Media files and alignment#

Media filenames (video, audio, pose), trial timing, and stream offsets are stored in an NWB alignment file (.ethograph/alignment.nwb) — not inside individual trial datasets. This keeps data and metadata separate, and filenames are stored as filename-only strings so datasets remain portable.

For single-trial recordings, the Getting Started Wizard creates the alignment file for you. For multi-trial, multi-camera, or session-wide media, see Multi-trial setup.

NWBAlignment provides programmatic access to session metadata:

from ethograph.io.nwb_alignment import NWBAlignment

alignment = NWBAlignment("my_project/.ethograph/alignment.nwb")
print(alignment.trials_df)
print(alignment.cameras)
print(alignment.start_time(trial=1))
alignment.close()

For NWB sources, session metadata lives in the source NWB file directly — no sidecar alignment is needed for standalone .nwb loading. For project directories (e.g. DANDI), the GUI creates .ethograph/alignment.nwb automatically.


Optional: custom dimensions#

Any dimension that co-occurs with a time dimension in at least one feature variable is automatically discovered and gets a selection combo box in the GUI.

ds["emg"] = xr.DataArray(
    emg_data,                            # shape: (time, channels)
    dims=["time", "channels"],
    coords={"channels": ["biceps", "triceps"]},
)

Dimensions do not need to match across features. For example, position may have (time, keypoints, space, individuals) while speed only has (time, keypoints, individuals). The GUI creates combo boxes for the union of all discovered dimensions. When a feature doesn’t have a selected dimension, that selection is silently ignored via sel_valid():

import ethograph as eto

# "keypoints" and "individuals" are applied; "space" is silently dropped
data, used_kwargs = eto.sel_valid(
    ds["speed"],
    {"keypoints": "nose", "space": "x", "individuals": "mouse1"},
)

Column names in a TsdFrame become a selectable dimension. Objects with identical column names share a single combo in the GUI.

position = nap.TsdFrame(t=time_s, d=pos, columns=["x", "y", "z"])
velocity = nap.TsdFrame(t=time_s, d=vel, columns=["x", "y", "z"])

Optional: color variables#

Color variables are identified by name: any feature with "rgb" in its name (case-insensitive) is automatically offered in the GUI’s Colors combo. Values should lie in [0, 1] (float) or [0, 255] (int).

The variable should have an RGB dimension of size 3:

ds["angle_rgb"] = xr.DataArray(
    rgb_values,                          # shape: (time, keypoints, individuals, 3)
    dims=["time", "keypoints", "individuals", "RGB"],
)

To compute angle-based RGB automatically from pose data, use add_angle_rgb_to_ds():

import ethograph as eto

ds = eto.add_angle_rgb_to_ds(ds, smoothing_params={"sigma": 3})

Store RGB as a TsdFrame with columns ["R", "G", "B"] (or any 3-column frame whose name contains "rgb"):

import pynapple as nap

angle_rgb = nap.TsdFrame(
    t=time_s,
    d=rgb_values,                        # shape: (n_time, 3)
    columns=["R", "G", "B"],
)
data = {"angle_rgb": angle_rgb}

To compute angle-based RGB automatically from a position TsdFrame, use add_angle_rgb_to_nap():

from ethograph.io.pynapple import add_angle_rgb_to_nap

angle_rgb = add_angle_rgb_to_nap(position, smoothing_params={"sigma": 3})
data["angle_rgb"] = angle_rgb