@ryan.aker hi!
The issue you are running into is due to the organization of the input dataset. So far we have not encountered such dataset organization and import_dicom
is not (yet) equiped to handle it directly. Let me describe the organization of the dataset as is and what import_dicom
expects/requires.
The data are located in the study inbox folder in /gpfs/loomis/pi/n3/Studies/ABCD/site21/sessions/archive/BIDS
. The folder includes .tgz
files, one file per acquired session. There is no additional organization of the files within the folder.
The issue with processing the ABCD study master inbox using import_dicom
One possibility of using import_dicom
is to have a “master inbox” where all the data for a study is located. In this case import_dicom
is written to work with “packets”, where each packet contains all the dicom files from a data acquisition session. A packet can be a compressed archive or a folder with dicom files. In processing the data from the master inbox, import_dicom
identifies each packet and creates a <study folder>/sessions/<session id>
folder for that packet. The <session id>
is extracted from the packet name using the grep pattern that is specified by the --nameformat
parameter.
In your case this does not work, as there are multiple .tgz
files for each session. Each of the files is identified and processed as a separate packet, however, as they belong to the same session id, an error will be reported when the second packet from the same session is processed. To correctly map the files to individual session folders, the files from the same session would need to be moved to a subfolder with the session id. E.g., all the NDARINV0APJMRD1_baselineYear1Arm1.*
files should be moved to a NDARINV0APJMRD1_baselineYear1Arm1
subfolder in the dataset master inbox. Once that is done, the import_dicom
command could be run on the master inbox folder with --nameformat="(?P<subject_id>.*?)_(?P<session_name>.*)"
to extract the subject id and the session name from the folder name.
At this point you would encounter a second problem, namely, process_dicom
expects the packets (in this case the folders) to contain individual dicom files, whereas in your case dicoms from a sequence are combined in compressed .tgz
files, which import_dicom
currently does not handle. Once the .tgz
files are copied to the <study>/sessions/<session id>/inbox
folder, they are not unpacked and so no dicom files are found to be processed in the next step. To resolve this situation, we will change the import_dicom
command to identify any compressed files in the inbox
folder before attempting to identify and process the dicom files. This update will be in one of the future updates to QuNex.
Before we update the code, these are the possible solutions:
Solution 1: manually unzip files in the master inbox.
For this solution, you would move the .tgz
files into relevant subfolders as described above and unpack all the .tgz
files so that only unpacked dicom files remain in each subfolder. Then you could run `import_dicom’:
qunex import_dicom \
--sessionsfolder=<path to study sessions folder> \
--masterinbox=<path to master inbox folder> \
--archive=leave \
--nameformat="(?P<subject_id>.*?)_(?P<session_name>.*)"
Prepared this way the command should complete in one go.
Solution 2: manually unzip files after they are moved to sessions folder
If you do not unpack .tgz
files and run the above call, the call will copy the .dgz
files, but report that no DICOM files were found. Once the execution stops, you can then manually untar all the files in the sessions’ inbox folders and run import_dicom
again, this time setting --masterinbox=none
and specifying --sessions="*baseline*
to find all the baseline sessions in the sessions
folder. You will also need to specify --overwrite=yes
as the first run of import_dicom
created (empty) dicom
and nii
subfolders.
Solution 3: run import_dicom
twice
In this scenario, you run import_dicom
once to sort the .tgz
files into correct <session id>/inbox
folders and then run it again with the same settings as in solution 2. import_dicom
will unpack and process the dicom files. I have prepared an example session:
/gpfs/loomis/pi/n3/Studies/ABCD/site21/sessions/masterinbox
└── NDARINV0APJMRD1_baselineYear1Arm1
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-Diffusion-FM-AP_20170625115751.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-Diffusion-FM-PA_20170625115713.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-DTI_20170625115843.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-AP_20170625114257.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-AP_20170625121303.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-AP_20170625122513.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-AP_20170625123909.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-AP_20170625125119.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-AP_20170625130406.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-PA_20170625114238.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-PA_20170625121244.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-PA_20170625122454.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-PA_20170625123850.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-PA_20170625125100.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-fMRI-FM-PA_20170625130347.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-MID-fMRI_20170625125209.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-MID-fMRI_20170625125754.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-nBack-fMRI_20170625124027.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-nBack-fMRI_20170625124542.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-rsfMRI_20170625114409.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-rsfMRI_20170625114956.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-rsfMRI_20170625121348.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-rsfMRI_20170625121924.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-rsfMRI_20170625130448.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-SST-fMRI_20170625122617.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-SST-fMRI_20170625123229.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-T1_20170625114157.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-T1-NORM_20170625114157.tgz
├── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-T2_20170625121227.tgz
└── NDARINV0APJMRD1_baselineYear1Arm1_ABCD-T2-NORM_20170625121228.tgz
and a created a study folder at:
/gpfs/loomis/pi/n3/Studies/MBLab/abcdtest
Running these two commands, processes the data successfully:
# -- run to move the .tgz files to the correct <session id>/inbox folders
qunex import_dicom \
--sessionsfolder=/gpfs/loomis/pi/n3/Studies/MBLab/abcdtest/sessions \
--masterinbox=/gpfs/loomis/pi/n3/Studies/ABCD/site21/sessions/masterinbox \
--archive=leave \
--nameformat="(?P<subject_id>.*?)_(?P<session_name>.*)"
# -- run to process the data in the <session id>/inbox folders
qunex import_dicom \
--sessionsfolder=/gpfs/loomis/pi/n3/Studies/MBLab/abcdtest/sessions \
--masterinbox=none \
--sessions="*baseline*" \
--overwrite=yes
This solution is basically the same as solution 2 but without the manual extraction step. There is a caveat. It required a slight change in the code that is now in the latest develop
branch, but is not yet available in the master
or the container.