[RESOLVED] Wrong ordering of files (especially SBRef) when importing bids dataset

Description:

qunex does not import bids dataset in the right order, and it makes difficult to match single-band reference (sbref) image to the right one.

For example, this is the BIDS dataset that qunex does not import in the right order:

sub-10
├── anat
│   ├── sub-10_T1w.json
│   ├── sub-10_T1w.nii.gz
│   ├── sub-10_T2w.json
│   └── sub-10_T2w.nii.gz
├── fmap
│   ├── sub-10_acq-pre_dir-AP_run-01_epi.json
│   ├── sub-10_acq-pre_dir-AP_run-01_epi.nii.gz
│   ├── sub-10_acq-pre_dir-PA_run-02_epi.json
│   └── sub-10_acq-pre_dir-PA_run-02_epi.nii.gz
├── func
│   ├── sub-10_task-new_dir-AP_run-01_bold.json
│   ├── sub-10_task-new_dir-AP_run-01_bold.nii.gz
│   ├── sub-10_task-new_dir-AP_run-01_sbref.json
│   ├── sub-10_task-new_dir-AP_run-01_sbref.nii.gz
│   ├── sub-10_task-new_dir-AP_run-01_events.tsv
│   ├── sub-10_task-new_dir-AP_run-02_bold.json
│   ├── sub-10_task-new_dir-AP_run-02_bold.nii.gz
│   ├── sub-10_task-new_dir-AP_run-02_sbref.json
│   ├── sub-10_task-new_dir-AP_run-02_sbref.nii.gz
│   ├── sub-10_task-new_dir-AP_run-02_events.tsv
│   ├── sub-10_task-rest_dir-AP_run-01_bold.json
│   ├── sub-10_task-rest_dir-AP_run-01_bold.nii.gz
│   ├── sub-10_task-rest_dir-AP_run-01_sbref.json
│   ├── sub-10_task-rest_dir-AP_run-01_sbref.nii.gz
│   ├── sub-10_task-rest_dir-AP_run-01_events.tsv
│   ├── sub-10_task-rest_dir-PA_run-02_bold.json
│   ├── sub-10_task-rest_dir-PA_run-02_bold.nii.gz
│   ├── sub-10_task-rest_dir-PA_run-02_sbref.json
│   ├── sub-10_task-rest_dir-PA_run-02_sbref.nii.gz
│   └── sub-10_task-rest_dir-PA_run-02_events.tsv
└── sub-10_scans.tsv

I have imported the session using the following command:

qunex_container import_bids \
  --sessionsfolder="${WORK_DIR}/${STUDY_NAME}/sessions" \
  --inbox="${RAW_DATA}" \
  --action="link" \
  --archive="leave" \
  --overwrite=yes \
  --container="${QUNEX_CONTAINER}" 

This is the result mapping in session.txt:

# Generated by QuNex 0.94.11 on 2022-10-17_09.17.48.737047
#
session: 10
subject: 10
bids: /data/ihsl_hcp/sessions/10/bids
raw_data: /data/ihsl_hcp/sessions/10/nii
hcp: /data/ihsl_hcp/sessions/10/hcp

1: T1w
2: T2w
3: epi dir-AP acq-pre run-01
4: epi dir-PA acq-pre run-02
5: sbref new run-01
6: bold new run-01
7: bold new run-02
8: sbref new run-02
9: sbref rest run-01
10: bold rest run-01
11: sbref rest run-02
12: bold rest run-02

I tried to modify the run order with the hcp_mapping.txt file shown below:

T1w                        => T1w
T2w                        => T2w
epi dir-AP acq-pre run-01  => SE-FM-AP
epi dir-PA acq-pre run-02  => SE-FM-PA
sbref IL run-01            => boldref:task: bold_num(1)
bold IL run-01             => bold:task: bold_num(1)
sbref IL run-02            => boldref:task: bold_num(2)
bold IL run-02             => bold:task: bold_num(2)
sbref rest run-01          => boldref:rest:phenc(AP): bold_num(3)
bold rest run-01           => bold:rest:phenc(AP): bold_num(3)
sbref rest run-02          => boldref:rest:phenc(PA): bold_num(4)
bold rest run-02           => bold:rest:phenc(PA): bold_num(4)

But the resulting session_hcp.txt file does not generate what I want:

# Generated by QuNex 0.94.11 on 2022-10-17_09.19.08.208254
#
session: 10
subject: 10
bids: /data/ihsl_hcp/sessions/10/bids
raw_data: /data/ihsl_hcp/sessions/10/nii
hcp: /data/ihsl_hcp/sessions/10/hcp
hcpready: true
1   :T1w             :T1w: se(1)
2   :T2w             :T2w: se(1)
3   :SE-FM-AP        :epi dir-AP acq-pre run-01: se(1)
4   :SE-FM-PA        :epi dir-PA acq-pre run-02: se(1)
5   :boldref1:task   :sbref IL run-01: se(1): bold_num(1)
6   :bold1:task      :bold IL run-01: se(1): bold_num(1)
7   :bold2:task      :bold IL run-02: se(1): bold_num(2)
8   :boldref5:task   :sbref IL run-02: se(1): bold_num(2)
9   :boldref3:rest   :sbref rest run-01: se(1): bold_num(3): phenc(AP)
10  :bold3:rest      :bold rest run-01: se(1): bold_num(3): phenc(AP)
11  :boldref4:rest   :sbref rest run-02: se(1): bold_num(4): phenc(PA)
12  :bold4:rest      :bold rest run-02: se(1): bold_num(4): phenc(PA)

You can see that image 8 is tagged as boldref5 rather than expected boldref2.

I guess that this may be because of how import_bids import files.

In line 508-511 of bids.py, I found that file list is generated using os.walk:

for path, dirs, files in os.walk(candidate):
    for file in files:
        sourceFiles.append(os.path.join(path, file))

It has been known that os.walk can iterate files in a random order, so this may be the root of the problem.

Conclusion:

In short,

  • import_bids import files in a random order (likely due to how os.walk works)
  • It is not possible to manually match sbref files using bold_num parameter
  • Also, I found manual ordering does not work when there exists multiple pairs of field maps (using se(*) parameters).

Hi,

Nice catch! Thanks for reporting this, I think we should be able to fix this relatively quickly. The bug will be most likely removed in the next released version (that should be released by the end of October).

Cheers, Jure

1 Like

Hi,

Just to add information. While it is true that the files are read using walk, they are then sorted by several criteria, which are specified in python/qx_utilities/templates/import_bids.txt file. The file label is currently not included in the list. The fix is to add the following change to lines 10:15

    "func": {
        "label": ["sbref", "bold"],
        "info":  ["task", "acq", "rec", "run", "echo", "ses"],
        "sort":  ["rec", "echo", "acq", "label", "run", "task"],
        "tag":   ["label", "task", "acq", "echo", "rec", "run"]
    }

This fix will be in the next container version. If you would like to use the fix yourself before that, you can generate a replacement file outside of the container. You can put the replacement template files in ${WORK_DIR}/${STUDY_NAME}/sessions/specs folder and then change your QuNex command call to:

qunex_container import_bids \
  --sessionsfolder="${WORK_DIR}/${STUDY_NAME}/sessions" \
  --inbox="${RAW_DATA}" \
  --action="link" \
  --archive="leave" \
  --overwrite=yes \
  --bash_post="export NIUTemplateFolder=${WORK_DIR}/${STUDY_NAME}/sessions/specs" \
  --container="${QUNEX_CONTAINER}" 

This will set the NIUTemplateFolder system variable to the location of the spec folder and read the import_bids.txt from there instead of from the container.

With kind regards,

Grega

1 Like

Dear Jure and Grega,

Thank you for the helpful responses.

Based on Grega’s suggestion, I tried the approach below:

# -- Set the name of the study
export STUDY_NAME="test_study"

# -- Set your working directory
export WORK_DIR="/data"

# -- Specify the container
# -- For Docker use the container name and tag:
export QUNEX_CONTAINER="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.94.11"

# -- Location of previously prepared data
export RAW_DATA="${WORK_DIR}/bids"

# -- Batch parameters file
export INPUT_BATCH_FILE="${RAW_DATA}/batch_hcp_params.txt"

# -- Mapping file
export INPUT_MAPPING_FILE="${RAW_DATA}/hcp_mapping.txt"

# -- Sessions to run
export SESSIONS="01|02|03"

qunex_container import_bids \
  --sessionsfolder="${WORK_DIR}/${STUDY_NAME}/sessions" \
  --inbox="${RAW_DATA}" \
  --action="link" \
  --archive="leave" \
  --overwrite=yes \
  --bash_post="export NIUTemplateFolder=${WORK_DIR}/${STUDY_NAME}/sessions/specs" \
  --container="${QUNEX_CONTAINER}" 

However, the following error occurs:

Here's the error as caught by python:

Traceback (most recent call last):
  File "/opt/qunex/python/qx_utilities/gmri", line 522, in <module>
    main()
  File "/opt/qunex/python/qx_utilities/gmri", line 491, in main
    runCommand(comm, opts)
  File "/opt/qunex/python/qx_utilities/gmri", line 169, in runCommand
    gu.check_study(folders['basefolder'])
  File "/opt/qunex/python/qx_utilities/general/utilities.py", line 403, in check_study
    manage_study(studyfolder=studyfolder, action="check", folders=folders)
  File "/opt/qunex/python/qx_utilities/general/utilities.py", line 113, in manage_study
    folders = create_study_folders(folders)
  File "/opt/qunex/python/qx_utilities/general/utilities.py", line 259, in create_study_folders
    with open(folders_spec) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/test_study/sessions/specs/study_folders_default.txt`

I think check_study function is not able to properly locate the study folder, which is test_study.

For the reference, my folder is organized as below:

RESEARCH
├── bids
├── test_study
└── RAW

I was running the above script in RESEARCH folder, and if I understood correctly, running qunex_container automatically binds pwd (which is RESEARCH folder here) into /data in the docker container.

I guess the error occurs because /data folder is not the study folder (actually it is parent directory of the target study folder).

Anyways, I managed to run the pipeline by manually modifying the session_hcp.txt file inside every subject’s folder.

I look forward to the updated version!

Thank you all for maintaining this wonderful project,
Minho

Hi,

The folder stored under the NIUTemplateFolder contains a number of templates. If you are in a hurry, you can copy all of the .txt files in there into ${WORK_DIR}/${STUDY_NAME}/sessions/specs, edit the import_bids.txt and then rerun the above command. That should do the trick.

Or you can wait about a week or two for the next release, if this is not something pressing.

Jure

1 Like

Dear Jure,

Yes, I will wait for the update then!

Many thanks,
Minho

Hi Jure,

Sorry for re-opening resolved issue.

However, it seems like the issue still remains.

I have re-ran the import_bids commaind using Qunex version 0.94.11 with copying templates folder then replacing the import_bids.txt as suggested above.

One good thing is that now import_bids does not randomly sort files as the first thread.

However, even though sbref is documented earlier than bold in import_bids.txt,
import_bids sort bold files first as below:

# Generated by QuNex 0.94.11 on 2022-11-02_02.39.15.051034
#
session: 10
subject: 10
bids: /data/ihsl_hcp/sessions/10/bids
raw_data: /data/ihsl_hcp/sessions/10/nii
hcp: /data/ihsl_hcp/sessions/10/hcp
hcpready: true
1   :T1w             :T1w: se(1)
2   :T2w             :T2w: se(1)
3   :SE-FM-AP        :epi dir-AP acq-pre run-01: se(1)
4   :SE-FM-PA        :epi dir-PA acq-pre run-02: se(1)
5   :bold1:task      :bold IL run-01: se(1): bold_num(1)
6   :boldref2:task   :sbref IL run-01: se(1): bold_num(1)
7   :bold2:task      :bold IL run-02: se(1): bold_num(2)
8   :boldref3:task   :sbref IL run-02: se(1): bold_num(2)
9   :bold3:rest      :bold rest run-01: se(1): bold_num(3): phenc(AP)
10  :boldref4:rest   :sbref rest run-01: se(1): bold_num(3): phenc(AP)
11  :bold4:rest      :bold rest run-02: se(1): bold_num(4): phenc(PA)
12  :boldref5:rest   :sbref rest run-02: se(1): bold_num(4): phenc(PA)

Can you look for this issue?

Thanks!
Minho

Hi Minho,

we are looking into this and a patch should be available in the next release.

Jure

Minho, new container is now online, please test if this has now been resolved. Thanks!

Dear Jure,

Sorry for late response.
I just tested it and it works wonderfully!

Thanks again!
Minho

Thanks for letting us know!

Hi,
There might be a similar issue with the order of “AP” and “PA” in DWI data. We have two scans with similar data that were imported differently. In one of them direction “AP” preceded to “PA”, in accordance with alphabetical order:

Running map_bids2nii for subject 1014, session 1
================================================
--> linked 1.nii.gz <-- sub-1014_ses-1_ce-corrected_T1w.nii.gz
--> linked 2.nii.gz <-- sub-1014_ses-1_ce-uncorrected_T1w.nii.gz
--> linked 3.nii.gz <-- sub-1014_ses-1_ce-corrected_T2w.nii.gz
--> linked 4.nii.gz <-- sub-1014_ses-1_FLAIR.nii.gz
--> linked 5.nii.gz <-- sub-1014_ses-1_acq-dwi_dir-AP_epi.nii.gz
--> linked 6.nii.gz <-- sub-1014_ses-1_acq-dwi_dir-PA_epi.nii.gz
--> linked 7.nii.gz <-- sub-1014_ses-1_acq-func_dir-AP_epi.nii.gz
--> linked 8.nii.gz <-- sub-1014_ses-1_acq-func_dir-PA_epi.nii.gz
--> linked 9.nii.gz <-- sub-1014_ses-1_acq-task_dir-AP_epi.nii.gz
--> linked 10.nii.gz <-- sub-1014_ses-1_acq-task_dir-PA_epi.nii.gz
--> linked 11.nii.gz <-- sub-1014_ses-1_task-gaga_dir-AP_sbref.nii.gz
--> linked 12.nii.gz <-- sub-1014_ses-1_task-gaga_dir-AP_bold.nii.gz
--> linked 13.nii.gz <-- sub-1014_ses-1_task-rest_dir-AP_sbref.nii.gz
--> linked 14.nii.gz <-- sub-1014_ses-1_task-rest_dir-AP_bold.nii.gz
--> linked 15.nii.gz <-- sub-1014_ses-1_dir-AP_dwi.nii.gz
--> linked 16.nii.gz <-- sub-1014_ses-1_dir-PA_dwi.nii.gz 

But in another session direction “PA” was imported earlier:

Running map_bids2nii for subject 1013, session 2
================================================
--> linked 1.nii.gz <-- sub-1013_ses-2_ce-corrected_T1w.nii.gz
--> linked 2.nii.gz <-- sub-1013_ses-2_ce-uncorrected_T1w.nii.gz
--> linked 3.nii.gz <-- sub-1013_ses-2_ce-corrected_T2w.nii.gz
--> linked 4.nii.gz <-- sub-1013_ses-2_FLAIR.nii.gz
--> linked 5.nii.gz <-- sub-1013_ses-2_acq-dwi_dir-AP_epi.nii.gz
--> linked 6.nii.gz <-- sub-1013_ses-2_acq-dwi_dir-PA_epi.nii.gz
--> linked 7.nii.gz <-- sub-1013_ses-2_acq-func_dir-AP_epi.nii.gz
--> linked 8.nii.gz <-- sub-1013_ses-2_acq-func_dir-PA_epi.nii.gz
--> linked 9.nii.gz <-- sub-1013_ses-2_acq-task_dir-AP_epi.nii.gz
--> linked 10.nii.gz <-- sub-1013_ses-2_acq-task_dir-PA_epi.nii.gz
--> linked 11.nii.gz <-- sub-1013_ses-2_task-gaga_dir-AP_sbref.nii.gz
--> linked 12.nii.gz <-- sub-1013_ses-2_task-gaga_dir-AP_bold.nii.gz
--> linked 13.nii.gz <-- sub-1013_ses-2_task-rest_dir-AP_sbref.nii.gz
--> linked 14.nii.gz <-- sub-1013_ses-2_task-rest_dir-AP_bold.nii.gz
--> linked 15.nii.gz <-- sub-1013_ses-2_dir-PA_dwi.nii.gz
--> linked 16.nii.gz <-- sub-1013_ses-2_dir-AP_dwi.nii.gz 

We might be missing something but we could not find the reason for the differences, that appeared in multiple scans.

Since we created a mapping based on the expected NIfTI files number, “.bvals” and “.bvecs” files were associated to the opposite file in scans in which “PA” preceded “AP”. Despite that, hcp_diffusion terminated successfully with no errors.

The logs of map_raw_data as well as our mapping files are attached,
Thank you,
Theo

done_map_raw_data_1013_2_2024-01-23_10.54.21.932236.log (3.4 KB)
hcp_1013_2_mapping.txt (228 Bytes)
hcp_1013_2_mapping.txt (228 Bytes)
done_map_raw_data_1014_1_2024-01-23_10.54.20.163677.log (3.4 KB)

Hm, it seems like our sorting code for import_bids is not the most robust. I will check what is going on. It is possible that it is sorting by the file timestamp, which on paper should reflect the acquisition order. So it could be that in some cases your PA was created before AP and vice versa.

hcp_diffusion will finish, but the question is whether the results are optimal/valid if you swap bvals and bvecs. Often bvals and bvecs in a PA/AP pair are the same, so you can just ignore this. Just compare the contents of the sub-1013_ses-2_dir-AP_dwi.bval with sub-1013_ses-2_dir-PA_dwi.bval and sub-1013_ses-2_dir-AP_dwi.bvec with sub-1013_ses-2_dir-PA_dwi.bvec. If they are the same there is no reason for concern here.

Best, Jure

Thank you Jure,

It seems that the timestamps does not explain the order of import, because we have some contradicting cases. Out of 57 scans, “PA” was imported first in 30 of them and “AP” in the other 27. Therefore, random walk seems like a reasonable explanation.

In our case there is a big difference between bvals and bvecs of each direction: 186 values in “AP” but only 7 in “PA”. We wonder if the fact the “AP” is defined as “positive” matters - that way the “positive” direction is never associated with a file with more values then its number of images.

Anyway these are all speculations. For now we will use import_dicom, which seems to work well for us. Please let us know if you come up with any solution.
Thanks again,
Theo

I think we figured out what the sorting reason was, it seems like the direction was not used to establish consistent sorting. We are running some final tests, if all is well the fix will be released in the next version.

Best, Jure

Thank you very much,
Theo