Creating a manifest from a wide-form tabular file for a longitudinal study with imaging and non-imaging visits

In this example, we have a longitudinal study with both non-imaging and imaging visits. Specifically, non-imaging (neuropsychological) data was collected every year, and imaging data (anatomical only) was collected every two years.

We start with two CSV files:

  • example2-demographics_neuropsych.csv contains demographics information and dates for the neuropsych visits

    PARTICIPANT

    SEX

    DATE_OF_BIRTH

    DATE_NEUROPSYCH1

    DATE_NEUROPSYCH2

    DATE_NEUROPSYCH3

    ABC_001

    F

    1970/12/31

    2015/01/30

    2016/02/01

    2017/02/10

    ABC_002

    M

    1967/02/20

    2015/02/19

    2016/02/22

    2017/02/25

    ABC_003

    F

    1955/05/21

    2016/03/03

    2017/03/10

  • example2-mri.csv contains dates for the MRI visits

    PARTICIPANT

    DATE_MRI1

    DATE_MRI2

    ABC_001

    2015/02/07

    2017/02/15

    ABC_002

    2015/02/26

    2017/03/01

    ABC_003

    2016/03/09

These files give us the following information:

  • The study has 3 participants

  • Each participant has 3 non-imaging visits and 2 imaging visits

Given that we know that all imaging sessions collected anatomical data only, we have all the information required for the manifest file. Here is a manifest-generation script that does the job:

Attention

The script below was written for Python 3.11 with pandas 2.2.3. It may not work with older/different versions.

 1#!/usr/bin/env python
 2"""Manifest-generation script for Example 2."""
 3
 4from pathlib import Path
 5
 6import pandas as pd
 7
 8if __name__ == "__main__":
 9
10    # get the path to the demographics/neuropsych file and the MRI file
11    # we assume that it is in the same directory as this script
12    path_neuropsych = Path(__file__).parent / "example2-demographics_neuropsych.csv"
13    path_mri = Path(__file__).parent / "example2-mri.csv"
14
15    # load the files and merge them
16    df_neuropsych = pd.read_csv(path_neuropsych, dtype=str)
17    df_mri = pd.read_csv(path_mri, dtype=str)
18    df_merged = pd.merge(
19        df_neuropsych, df_mri, how="left", left_on="PARTICIPANT", right_on="PARTICIPANT"
20    )
21
22    data_for_manifest = []
23    for _, row in df_merged.iterrows():
24
25        # remove underscores
26        participant_id = row["PARTICIPANT"].replace("_", "")
27
28        # each row in the demographics file is multiple rows in the manifest file
29        for visit_id in [
30            "NEUROPSYCH1",
31            "NEUROPSYCH2",
32            "NEUROPSYCH3",
33            "MRI1",
34            "MRI2",
35        ]:
36
37            # if the DATE column is empty, the visit did not happen yet
38            if pd.isna(row[f"DATE_{visit_id}"]):
39                continue
40
41            # session_id is only defined for MRI visits
42            if visit_id.startswith("MRI"):
43                session_id = visit_id.removeprefix("MRI")
44
45                # all participants only have anat datatype
46                datatype = ["anat"]
47            else:
48                session_id = pd.NA
49                datatype = []
50
51            # create the manifest entry
52            data_for_manifest.append(
53                {
54                    "participant_id": participant_id,
55                    "visit_id": visit_id,
56                    "session_id": session_id,
57                    "datatype": datatype,
58                }
59            )
60
61    df_manifest = pd.DataFrame(data_for_manifest)
62
63    # write the manifest in the same directory as this script
64    df_manifest.to_csv(
65        Path(__file__).parent / "example2-manifest.tsv", sep="\t", index=False
66    )

Running this script creates a manifest that looks like this:

participant_id

visit_id

session_id

datatype

ABC001

NEUROPSYCH1

[]

ABC001

NEUROPSYCH2

[]

ABC001

NEUROPSYCH3

[]

ABC001

MRI1

1

[‘anat’]

ABC001

MRI2

2

[‘anat’]

ABC002

NEUROPSYCH1

[]

ABC002

NEUROPSYCH2

[]

ABC002

NEUROPSYCH3

[]

ABC002

MRI1

1

[‘anat’]

ABC002

MRI2

2

[‘anat’]

ABC003

NEUROPSYCH1

[]

ABC003

NEUROPSYCH2

[]

ABC003

MRI1

1

[‘anat’]