Creating a manifest from data directories on disk for a study with different imaging datatypes

In this example, we have a longitudinal study with imaging visits, but not all participants have the same imaging datatypes for all visits.

We do not have a tabular file indicating which datatypes are available for which participants and visits. However, this information can be obtained by looking at the data directories on disk:

data/
├── ABC001/
│   ├── BL/
│   │   ├── T1w/
│   │   │   └── ...
│   │   └── diffusion/
│   │       └── ...
│   └── M12/
│       └── T1w/
│           └── ...
└── ABC002/
    ├── BL/
    │   ├── T1w/
    │   │   └── ...
    │   └── diffusion/
    │       └── ...
    └── M12/
        ├── T1w/
        │   └── ...
        └── diffusion/
            └── ...

Here is a script that creates a Nipoppy manifest for the directory structure above:

Attention

The script below was written for Python 3.11 with pandas 2.2.3. It may not work with older/different versions.

 1#!/usr/bin/env python
 2"""Manifest-generation script for Example 3."""
 3
 4from pathlib import Path
 5
 6import pandas as pd
 7
 8if __name__ == "__main__":
 9
10    # get the path to the data directory
11    # we assume that it is in the same directory as this script
12    path_data = Path(__file__).parent / "data"
13
14    data_for_manifest = []
15    for path_participant in sorted(path_data.iterdir()):
16        for path_participant_visit in sorted(path_participant.iterdir()):
17
18            # participant_id and visit_id are the names of the directories
19            participant_id = path_participant.name
20            visit_id = path_participant_visit.name
21
22            # use the visit_id as session_id
23            session_id = visit_id
24
25            # check which datatypes are present
26            datatype = []
27            if (path_participant_visit / "T1w").exists():
28                datatype.append("anat")
29            if (path_participant_visit / "diffusion").exists():
30                datatype.append("dwi")
31
32            # create the manifest entry
33            data_for_manifest.append(
34                {
35                    "participant_id": participant_id,
36                    "visit_id": visit_id,
37                    "session_id": session_id,
38                    "datatype": datatype,
39                }
40            )
41
42    df_manifest = pd.DataFrame(data_for_manifest)
43
44    # write the manifest in the same directory as this script
45    df_manifest.to_csv(
46        Path(__file__).parent / "example3-manifest.tsv", sep="\t", index=False
47    )

Running this script creates a manifest that looks like this:

participant_id

visit_id

session_id

datatype

ABC001

BL

BL

[‘anat’, ‘dwi’]

ABC001

M12

M12

[‘anat’]

ABC002

BL

BL

[‘anat’, ‘dwi’]

ABC002

M12

M12

[‘anat’, ‘dwi’]