[RESOLVED] Can you map input data to QuNex without copying DICOMs?

Hi all,

I’m wondering if there’s any way to use import_dicom or one of it’s sub-steps to map data to the QuNex sessions format via link instead of copy. I see there’s a way to do this on the other end with export_hcp using the --mapaction flag. Does anything similar exist for import_dicom? We have too many scans for it to make sense to have two copies of everything. I’d also like to leave our current directory structure alone as we already have scripts that assume it. Thanks for any help!

Best,
John

John, hi!

<TL;DR> import_dicom currently does not support linking instead of copying, however there are workarounds to still save space. </TL;DR>

Throughout QuNex, we have took effort to use hard links instead of copy whenever that was feasible. As you have noticed, that is also supported in export_hcp. import_dicom supports many use cases and forms of input data (individual dicom files, zip or different forms of tar packages). Creating hard links instead of copying would be possible when the input data is in “spread” form, i.e. not in a compressed package. As hard links at the file system level are only supported for files and not for folders (the only exception being the way Apple TimeMachine works), the code would first have to reconstruct the folder structure and then populate it with hard links of dicom files. This is possible and we might add it as an option in a future release as indeed, it could allow significant space savings. We will need to evaluate it in light of other development priorities, so at this point I can not provide you with a timeline for when such functionality would be added.

There are other ways to avoid duplication and save the space, though. First, if your session packages only include dicom files and you are ok with the QuNex folder structure serving as a long term archive for the study imaging data, one possibility is to use --archive=delete option. In this case, once the package is successfully copied and processed, the original data is deleted. I myself am not a fan of deleting any source data without another backup, but this definitely is an option.

The second option is to create individual sessions folders in the <study folder>/sessions yourself, create an inbox folder in each session’s folder (<study folder>/sessions/<session id>/inbox) and then hard link all the dicom files for each session in the corresponding inbox folder. Once that is done, you can run import_dicom and set --masterinbox=none and list the sessions to be processed in --sessions parameter. In this case import_dicom would continue with sorting dicoms, which moves the files from <session id>/inbox to individual sequence folders in the <session id>/dicom folder and I believe hard links should be preserved, so no additional space would be used.

You can find more information with examples about the second option in the import_dicom inline help under “Processing data from a session folder” section and in “Examples”.

Thank you for the suggestion, and I hope this helps.

Thank you for the help! I think I’ll go with archive=delete. Looking forward to future updates!

Actually, one more note: It looks like I can’t use --archive=delete when my inbox is actually symbolic links to the real DICOM locations:

Processing packages: delete

===========================
… deleting packet [TABP88074]

ERROR
Traceback (most recent call last):
File “/opt/qunex/python/qx_utilities/general/core.py”, line 526, in runWithLog
result = function(**args)
File “/opt/qunex/python/qx_utilities/general/dicom.py”, line 2586, in import_dicom
shutil.rmtree(p)
File “/opt/env/qunex/lib/python3.7/shutil.py”, line 504, in rmtree
onerror(os.path.islink, path, sys.exc_info())
File “/opt/env/qunex/lib/python3.7/shutil.py”, line 502, in rmtree
raise OSError(“Cannot call rmtree on a symbolic link”)
OSError: Cannot call rmtree on a symbolic link


Finished at 2021-10-27 12:45:04

I’ve used symbolic links because my data folders are actually nested like so:
{DATA_ROOT}/{SCAN}/DICOMs
{DATA_ROOT}/{SCAN}/NIfTIs

Could I just point import_dicom to the DATA_ROOT folder? Will it find the DICOMs? Will the presence of the NIfTI folder confuse it?

John, hi!

We have not considered the possibility that the masterinbox contains soft links to the actual session folders. I put that on our issues list to resolve so the fix should be in a future version.

In regards to a current solution, to provide you with the most accurate advice, can you describe the actual folder structure from which you are importing the data? A snippet of a tree command output would be most helpful.

Right now, I’ll assume that you have the following folder structure:

<raw data folder>
├─ session1
│   ├─ DICOMs
│   └─ NIfTIs
├─ session2
│   ├─ DICOMs
│   └─ NIfTIs
├─ session3
│   ├─ DICOMs
│   └─ NIfTIs
...

If you set --masterinbox=<raw data folder> then the content of each session above would be first copied to the corresponding inbox folder in the QuNex study data structure:

<QuNex study folder>
└─ sessions
   ├─ session1
   │   └─ inbox
   │      ├─ DICOMs
   │      └─ NIfTIs
   ├─ session2
   │   └─ inbox
   │      ├─ DICOMs
   │      └─ NIfTIs
...

In the next step a dicom folder would be created in each session folder and any dicom file found in each inbox folder would be inspected and moved to a dicom subfolder for that sequence, so you would have:

<QuNex study folder>
└─ sessions
   ├─ session1
   │   ├─ dicom
   │   │  ├─ <sequence 1>
   │   │  ├─ <sequence 2>
   │   │  ├─ <...>
   │   │  └─ <sequence N>
   │   └─ inbox
   │      ├─ DICOMs
   │      └─ NIfTIs
...

A dcm2niix would then be run on each of the sequence folders and the resulting NIfTI files moved to the nii folder. DICOM-Report.txt and session.txt files would also be generated in this step:

<QuNex study folder>
└─ sessions
   ├─ session1
   │   ├─ dicom
   │   │  ├─ <sequence 1>
   │   │  ├─ <sequence 2>
   │   │  ├─ <...>
   │   │  ├─ <sequence N>
   │   │  └─ DICOM-Report.txt
   │   ├─ inbox
   │   │  ├─ DICOMs
   │   │  └─ NIfTIs
   │   ├─ nii
   │   │    └─ <sequence 1>.nii.gz
   │   │    ├─ <sequence 2>.nii.gz
   │   │    ├─ <...>.nii.gz
   │   │    └─ <sequence N>.nii.gz
   │   └─ session.txt
...

Now, what is most relevant to your question. After successful completion of the conversion to NIfTI step, with --archive=delete, the whole session[N] folder in your <raw data folder> would be deleted. At the same time, any non-dicom file that was copied to <QuNex study folder>/sessions/<session N>/inbox would remain there. So your NIfTIs folder with all the content would remain in the inbox folder. If you no longer need the data that was copied to the inbox folder, you could then manually (or using a script) delete the content of the inbox folder.

Writing this up, it seems that it would make sense to add an optional parameter to import_dicom that would clean up inbox folders for you on successful completion of import_dicom.

I hope this helps.

Thanks for the help! I went the route of copying everything over and writing a short script to delete all of the leftovers in the session inbox folders.

Great! I’m glad it worked!