Creating the manifest file

This guide shows how to create a dataset’s manifest file, which is a list of participants recruited in the study, their visits, and expected data modalities.

Note

If the Nipoppy dataset was initialized from an existing BIDS dataset with nipoppy init --bids-source, then a manifest file containing valid and accurate entries for the imaging data was automatically generated from the BIDS input. In this case, it is not necessary to manually insert or update the imaging data information in the manifest file.

However, if the study has additional visits that were not present in the BIDS data (e.g., non-imaging visits), they will need to be added to the manifest separately.

Every Nipoppy dataset should have a manifest file at <NIPOPPY_PROJECT_ROOT>/manifest.tsv. This file is tab-separated and has four columns: participant_id, visit_id, session_id and datatype. Here is an example of a valid manifest file:

participant_id

visit_id

session_id

datatype

01

BL

BL

[‘anat’]

01

M06

01

M12

M12

[‘anat’]

02

BL

BL

[‘anat’,’dwi’]

02

M06

02

M12

M12

[‘anat’,’dwi’]

Columns in the manifest file

Attention

There must be only one row per unique participant_id/visit_id combination.

participant_id

A unique identifier for a participant in the study. Must be present in every row.

  • Cannot contain non-alphanumeric characters (spaces, dashes, underscores, etc.)

  • Cannot have the sub- prefix

  • Example valid values: 001, ABC01

    • Example valid but not recommended: control1, alzheimers1, sub1

  • Example invalid values: sub-001, ABC.01

visit_id

An identifier for a data collection event (imaging or non-imaging). Must be present in every row.

session_id

An identifier for an imaging data collection event. Should be left empty if no imaging data was collected.

  • Cannot contain non-alphanumeric characters (spaces, dashes, underscores, etc.)

  • Cannot have the ses- prefix

  • Example valid values: 1, baseline, Month12

  • Example invalid values: ses-1, follow-up Month_12

Session IDs vs visit IDs

Nipoppy uses the term “session ID” for imaging data, following the convention established by BIDS. The term “visit ID”, on the other hand, is used to refer to any data collection event (not necessarily imaging-related), and is more common in clinical contexts.

In most cases, session_id and visit_id will be identical (or session_ids will be a subset of visit_ids). However, having two descriptors becomes particularly useful when imaging and non-imaging assessments do not use the same naming conventions.

datatype

A list of datatypes expected to be in the BIDS data. Should be left empty if no imaging data was collected.

  • Example valid values: ['anat'], ['anat', 'dwi']

Common MRI datatypes include:

  • anat: anatomical MRI

  • dwi: diffusion MRI

  • func: functional MRI

  • fmap: field maps

The full list of valid datatypes is listed in the BIDS schema.

Note

If it is too difficult to determine the exact imaging datatypes collected for a given participant and session, you can set the datatype value for this row to be all available datatypes in the study.

Guidelines and examples for creating a study’s manifest file

We highly recommend writing a script that automatically generates the manifest based on existing files. These can be tabular files (CSVs, TSVs, Excel sheets, etc.) in <NIPOPPY_PROJECT_ROOT>/sourcedata/tabular and/or imaging data in <NIPOPPY_PROJECT_ROOT>/sourcedata/imaging. The script can be rerun whenever the source files are modified to automatically update the manifest, reducing future manual work and keeping a record of what was done.

Below are some examples from common cases we have encountered. Note that the example use Python scripts, but other programming languages like R can also be used to generate the manifest.

Creating a manifest from another tabular file for a cross-sectional study with different imaging datatypes

Creating a manifest from wide-form tabular files for a longitudinal study with imaging and non-imaging visits

Creating a manifest from data directories on disk for a study with different imaging datatypes