Creating the manifest file¶
This guide shows how to create a dataset’s manifest file, which is a list of participants recruited in the study, their visits, and expected data modalities.
Note
If the Nipoppy dataset was initialized from an existing BIDS dataset with nipoppy init --bids-source
, then a manifest file containing valid and accurate entries for the imaging data was automatically generated from the BIDS input. In this case, it is not necessary to manually insert or update the imaging data information in the manifest file.
However, if the study has additional visits that were not present in the BIDS data (e.g., non-imaging visits), they will need to be added to the manifest separately.
Every Nipoppy dataset should have a manifest file at <NIPOPPY_PROJECT_ROOT>/manifest.tsv
.
This file is tab-separated and has four columns: participant_id
, visit_id
, session_id
and datatype
.
Here is an example of a valid manifest file:
participant_id |
visit_id |
session_id |
datatype |
---|---|---|---|
01 |
BL |
BL |
[‘anat’] |
01 |
M06 |
||
01 |
M12 |
M12 |
[‘anat’] |
02 |
BL |
BL |
[‘anat’,’dwi’] |
02 |
M06 |
||
02 |
M12 |
M12 |
[‘anat’,’dwi’] |
Raw content of the example manifest file
1participant_id visit_id session_id datatype
201 BL BL ['anat']
301 M06
401 M12 M12 ['anat']
502 BL BL ['anat','dwi']
602 M06
702 M12 M12 ['anat','dwi']
Columns in the manifest file¶
Attention
There must be only one row per unique participant_id
/visit_id
combination.
participant_id
¶
A unique identifier for a participant in the study. Must be present in every row.
Cannot contain non-alphanumeric characters (spaces, dashes, underscores, etc.)
Cannot have the
sub-
prefixExample valid values:
001
,ABC01
Example valid but not recommended:
control1
,alzheimers1
,sub1
Example invalid values:
sub-001
,ABC.01
What if the participant IDs in my existing study files are not Nipoppy-compatible?
In those situations, you should still make sure that participant_id
values in the Nipoppy manifest do not contain non-alphanumeric characters.
To keep track of the mapping between the Nipoppy participant_id
s and the original study’s IDs (which we will refer to as recruitment_id
), you should create a recruitment.tsv
file, like so:
participant_id |
recruiment_id |
---|---|
ABC001 |
ABC-001 |
ABC002 |
ABC-002 |
DEF001 |
DEF-001 |
This file should be placed in <NIPOPPY_PROJECT_ROOT>/tabular
.
Note that existing/original study files do not have to be manually updated to use the Nipoppy participant_id
s. The same goes for file/directory names in the imaging source data (e.g. DICOM directories) – it is possible to configure the behaviour of some Nipoppy operations to account for the presence of recruitment_id
s instead of participant_id
s in file/directory names.
visit_id
¶
An identifier for a data collection event (imaging or non-imaging). Must be present in every row.
session_id
¶
An identifier for an imaging data collection event. Should be left empty if no imaging data was collected.
Cannot contain non-alphanumeric characters (spaces, dashes, underscores, etc.)
Cannot have the
ses-
prefixExample valid values:
1
,baseline
,Month12
Example invalid values:
ses-1
,follow-up
Month_12
Session IDs vs visit IDs
Nipoppy uses the term “session ID” for imaging data, following the convention established by BIDS. The term “visit ID”, on the other hand, is used to refer to any data collection event (not necessarily imaging-related), and is more common in clinical contexts.
In most cases, session_id
and visit_id
will be identical (or session_id
s will be a subset of visit_id
s).
However, having two descriptors becomes particularly useful when imaging and non-imaging assessments do not use the same naming conventions.
datatype
¶
A list of datatypes expected to be in the BIDS data. Should be left empty if no imaging data was collected.
Example valid values:
['anat']
,['anat', 'dwi']
Common MRI datatypes include:
anat
: anatomical MRIdwi
: diffusion MRIfunc
: functional MRIfmap
: field maps
The full list of valid datatypes is listed in the BIDS schema.
Note
If it is too difficult to determine the exact imaging datatypes collected for a given participant and session, you can set the datatype
value for this row to be all available datatypes in the study.
Guidelines and examples for creating a study’s manifest file¶
We highly recommend writing a script that automatically generates the manifest based on existing files.
These can be tabular files (CSVs, TSVs, Excel sheets, etc.) in <NIPOPPY_PROJECT_ROOT>/sourcedata/tabular
and/or imaging data in <NIPOPPY_PROJECT_ROOT>/sourcedata/imaging
.
The script can be rerun whenever the source files are modified to automatically update the manifest, reducing future manual work and keeping a record of what was done.
Below are some examples from common cases we have encountered. Note that the example use Python scripts, but other programming languages like R can also be used to generate the manifest.
Creating a manifest from another tabular file for a cross-sectional study with different imaging datatypes
Creating a manifest from wide-form tabular files for a longitudinal study with imaging and non-imaging visits
Creating a manifest from data directories on disk for a study with different imaging datatypes