{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "88a6b5e9",
   "metadata": {},
   "source": [
    "# Flying & singing zebra finches - Birdpark\n",
    "\n",
    "Data from Rüttimann et al. (2024)¹ ([paper](https://peerj.com/articles/20203), [zenodo](https://zenodo.org/records/13144875)), a multimodal dataset of zebra finch groups with synchronized video, microphone arrays, and backpack-mounted vibration transducer (accelerometers).\n",
    "\n",
    "The code below shows how one can convert that existing dataset into the `Trials.nc` format. The sampling rate of the vibration transducer is very high (24kHz), therefore if the bottom plot loads very slowly, you may consider using the `Downsample` button in `I/O`.\n",
    "\n",
    "---\n",
    "\n",
    "¹ Rüttimann, L., Wang, Y., Rychen, J., Tomka, T., Hörster, H., & Hahnloser, R. H. R. (2025). Multimodal system for recording individual-level behaviors in songbird groups. PeerJ, 13, e20203. https://doi.org/10.7717/peerj.20203"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5ab3973",
   "metadata": {},
   "source": [
    "<img src=\"assets/birdpark1.png\" width=\"1200\">\n",
    "\n",
    "Left: GUI screenshot, Right: Adapted from Rüttimann et al. (2024)¹ - Fig. 2C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "78539d18",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import xarray as xr\n",
    "import h5py\n",
    "import pandas as pd\n",
    "import requests\n",
    "import zipfile\n",
    "from pathlib import Path\n",
    "from audioio import write_audio\n",
    "\n",
    "import ethograph as eto\n",
    "from ethograph.io.nwb_alignment import align_media_per_trial"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84866546",
   "metadata": {},
   "source": [
    "### Explore snippet\n",
    "\n",
    "Example data from the birdpark dataset (a 7-second snippet from the `copExpBP08` recording) is available in:\n",
    "\n",
    "- `data/examples/copExpBP08_trim.mp4` - Video file\n",
    "- `data/examples/copExpBP08_trim.wav` - Audio file\n",
    "- `data/examples/copExpBP08_trim.nc` - Accelerometer data and metadata\n",
    "\n",
    "You can open these files directly from the GUI to explore the dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7e415be",
   "metadata": {},
   "source": [
    "### Download dataset\n",
    "\n",
    "You can download the entire dataset from [here](https://zenodo.org/records/13144875) or use the code below. I only tested the `copExpBP08` recording. If problems arise, the ReadMe [here](https://zenodo.org/records/13144875) is very helpful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63cae97c",
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    _here = Path(__vsc_ipynb_file__).parent\n",
    "except NameError:\n",
    "    _here = Path().resolve()\n",
    "\n",
    "data_folder = _here.parent / \"data\" / \"birdpark\"\n",
    "data_folder.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "response = requests.get(\"https://zenodo.org/api/records/13144875\")\n",
    "data = response.json()\n",
    "\n",
    "for file in data[\"files\"]:\n",
    "    if file[\"checksum\"] == \"md5:32d1ae6049556c803f68b6d354c952ca\":\n",
    "        print(f\"Checksum matches: {file['key']}\")\n",
    "        output_path = data_folder / file[\"key\"]\n",
    "        r = requests.get(file[\"links\"][\"self\"], stream=True)\n",
    "        with open(output_path, \"wb\") as f:\n",
    "            for chunk in r.iter_content(chunk_size=8192):\n",
    "                f.write(chunk)\n",
    "        if output_path.suffix == \".zip\":\n",
    "            with zipfile.ZipFile(output_path, \"r\") as zip_ref:\n",
    "                zip_ref.extractall(data_folder)\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "build_header",
   "metadata": {},
   "source": [
    "### Build NWB alignment and dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ac4a5cb",
   "metadata": {},
   "outputs": [],
   "source": "# Recording: copExpBP08, session BP_2021-05-25_08-12-51_655154_0380000\nfps = 47.6837158203125\naudio_sr = 24414.0625  # audio and accelerometer sampling rate (from dataset metadata)\n\nh5_path = data_folder / \"copExpBP08\" / \"BP_2021-05-25_08-12-51_655154_0380000.h5\"\nvideo_path = data_folder / \"copExpBP08\" / \"BP_2021-05-25_08-12-51_655154_0380000.mp4\"\naudio_path = video_path.with_suffix(\".wav\")\nnc_path = video_path.with_suffix(\".nc\")\n\n# Read H5 file\nwith h5py.File(h5_path, \"r\") as f1:\n    radioSignals = f1[\"/radioSignals\"][()]  # accelerometer (one row per channel)\n    daqSignals = f1[\"/daqSignals\"][()]      # microphone channels\n\n# Create .wav file from microphone channels\nwrite_audio(audio_path, daqSignals.T, audio_sr)\n\n# ─── Build session table ───\n# trial=1 matches the ID assigned when loading a plain Dataset as a single-trial TrialTree\nsession_table = pd.DataFrame({\n    \"trial\":   [1],\n    \"video_0\": [str(video_path)],\n    \"audio_0\": [str(audio_path)],\n})\n\nnwb_path = data_folder / \"copExpBP08\" / \".ethograph\" / \"alignment.nwb\"\nalign_media_per_trial(\n    trial_table=session_table,\n    stream_rates={\"video\": float(fps), \"audio\": float(audio_sr)},\n    output_path=nwb_path,\n)\n\n# ─── Build xarray dataset ───\ntime_coords = np.arange(radioSignals.shape[1]) / audio_sr\n\nds = xr.Dataset(\n    data_vars={\n        \"vibration\": xr.DataArray(\n            radioSignals.T,\n            dims=[\"time\", \"individuals\"],\n        ),\n    },\n    coords={\n        \"time\": time_coords,\n        \"individuals\": [\"male (red radio)\", \"female (yellow radio)\"],  # specific to copExpBP08\n    },\n    attrs={\n        \"fps\": fps,\n        \"audio_sr\": audio_sr,\n    },\n)\n\nds.to_netcdf(nc_path)\nprint(f\"Saved to {nc_path}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a2a98bf",
   "metadata": {},
   "outputs": [],
   "source": "ds # Inspect"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "968719ee",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "1812734d",
   "metadata": {},
   "source": [
    "#### Decent for segmentation\n",
    "\n",
    "```python\n",
    "voc.segment.meansquared(\n",
    "    <data>,  # set by GUI\n",
    "    <sr>,    # set by GUI\n",
    "    threshold=15000,\n",
    "    min_dur=0.003,\n",
    "    min_silent_dur=0.0001,\n",
    "    freq_cutoffs=(500, 10000),\n",
    "    smooth_win=0.32,\n",
    "    scale=True,\n",
    "    scale_val=32768,\n",
    ")\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ethograph",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}