Setting up Diffusion MRI analysis

Hi all,

When I tried running the “hcp_diffusion” command (without GPU), an error occurred. Could you please help me figure out why this error happened?

Here is my command:

qunex_container hcp_diffusion \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --overwrite="no" \
    --hcp_nogpu \
    --container="${QUNEX_CONTAINER}"

and my error log is attached as below:
error_hcp_diffusion_HCPA001_2025-03-07_09.39.59.692004.log (78.0 KB)

Thanks for the help!

Best,
Acacius

Hi Acacius,

Based on your log, I assume that your processing ran out of resources (most likely memory). As you can see in the log, there is a word Killed in there. This usually happens when the system does not have enough resources to execute something, so the operating system kills the process. Diffusion is computationally extremely heavy, per my experience, you need about 32 GB of memory for hcp_diffusion and even more for some of the steps that can follow. For, example for dense tractography, you often need 64 GB.

Best, Jure

Got it. Many thanks to your kindly help!

Best,
Acacius

Hi Jure,

I meet some errors while running ‘hcp_diffusion’, which is ‘CUDA driver version is insufficient for CUDA runtime version’. My cuda version is 11.5, is it enough to drive the analysis?

For better find out the problems, here is my command:

qunex_container hcp_diffusion \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --overwrite="no" \
    --container="${QUNEX_CONTAINER}" \
    --bash="module load CUDA/11.5" \
    --parelements=2 \
    --cuda-version=11.5 \
    --nv

And here is my log file:

(base) yumingz@localadmin:~$ qunex_container hcp_diffusion \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --overwrite="no" \
    --container="${QUNEX_CONTAINER}" \
    --bash="module load CUDA/11.5" \
    --parelements=2 \
    --cuda-version=11.5 \
    --nv

---> QuNex will run the command over 1 sessions. It will utilize:

    Maximum sessions run in parallel for a job: 1.
    Maximum elements run in parallel for a session: 2.
    Up to 2 processes will be utilized for a job.

    Job #1 will run sessions: HCPA001
(base) yumingz@localadmin:~$ ---> unsetting the following environment variables: PATH MATLABPATH PYTHONPATH QUNEXVer TOOLS QUNEXREPO QUNEXPATH QUNEXEXTENSIONS QUNEXLIBRARY QUNEXLIBRARYETC TemplateFolder FSL_FIXDIR FREESURFERDIR FREESURFER_HOME FREESURFER_SCHEDULER FreeSurferSchedulerDIR WORKBENCHDIR DCMNIIDIR DICMNIIDIR MATLABDIR MATLABBINDIR OCTAVEDIR OCTAVEPKGDIR OCTAVEBINDIR RDIR HCPWBDIR AFNIDIR PYLIBDIR FSLDIR FSLBINDIR PALMDIR QUNEXMCOMMAND HCPPIPEDIR CARET7DIR GRADUNWARPDIR HCPPIPEDIR_Templates HCPPIPEDIR_Bin HCPPIPEDIR_Config HCPPIPEDIR_PreFS HCPPIPEDIR_FS HCPPIPEDIR_FS_CUSTOM HCPPIPEDIR_PostFS HCPPIPEDIR_fMRISurf HCPPIPEDIR_fMRIVol HCPPIPEDIR_tfMRI HCPPIPEDIR_dMRI HCPPIPEDIR_dMRITract HCPPIPEDIR_Global HCPPIPEDIR_tfMRIAnalysis HCPCIFTIRWDIR MSMBin HCPPIPEDIR_dMRITractFull HCPPIPEDIR_dMRILegacy AutoPtxFolder EDDYCUDA USEOCTAVE QUNEXENV CONDADIR MSMBINDIR MSMCONFIGDIR R_LIBS FSL_FIX_CIFTIRW FSFAST_HOME SUBJECTS_DIR MINC_BIN_DIR MNI_DIR MINC_LIB_DIR MNI_DATAPATH FSF_OUTPUT_FORMAT ANTSDIR CUDIMOT

========================================================================
Generated by QuNex
------------------------------------------------------------------------
Version: 1.0.4 [QIO]
User: root
System: 321e4d905de6
OS: Debian Linux #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct  7 11:24:13 UTC 2
------------------------------------------------------------------------

        ██████\                  ║      ██\   ██\
       ██  __██\                 ║      ███\  ██ |
       ██ /  ██ |██\   ██\       ║      ████\ ██ | ██████\ ██\   ██\
       ██ |  ██ |██ |  ██ |      ║      ██ ██\██ |██  __██\\██\ ██  |
       ██ |  ██ |██ |  ██ |      ║      ██ \████ |████████ |\████  /
       ██ ██\██ |██ |  ██ |      ║      ██ |\███ |██   ____|██  ██\
       \██████ / \██████  |      ║      ██ | \██ |\███████\██  /\██\
        \___███\  \______/       ║      \__|  \__| \_______\__/  \__|
            \___|                ║


                       DEVELOPED & MAINTAINED BY:

               Mind & Brain Lab, University of Ljubljana
                       Cho Lab, Yale University

                      COPYRIGHT & LICENSE NOTICE:

Use of this software is subject to the terms and conditions defined in
QuNex LICENSES which can be found in the LICENSES folder of the QuNex
repository or at https://qunex.yale.edu/qunex-registration
========================================================================

---> Setting up Octave


.......................... Running QuNex v1.0.4 [QIO] ..........................


--- Full QuNex call for command: hcp_diffusion

qunex hcp_diffusion --sessionsfolder="/home/yumingz/qunex/diffusion/sessions" --overwrite="no" --bash="module load CUDA/11.5" --parelements="2" --cuda-version="11.5" --batchfile="/home/yumingz/qunex/diffusion/processing/batch.txt" --sessions="HCPA001"

---------------------------------------------------------


# Generated by QuNex 1.0.4 [QIO] on 2025-03-14_01.33.17.107961#
=================================================================
qunex hcp_diffusion \
  --sessionsfolder="/home/yumingz/qunex/diffusion/sessions" \
  --overwrite="no" \
  --bash="module load CUDA/11.5" \
  --parelements="2" \
  --cuda-version="11.5" \
  --sessions="/home/yumingz/qunex/diffusion/processing/batch.txt" \
  --sessionids="HCPA001" \
=================================================================

Starting multiprocessing sessions in /home/yumingz/qunex/diffusion/processing/batch.txt with a pool of 1 concurrent processes


Starting processing of sessions HCPA001 at Friday, 14. March 2025 01:33:17
Running external command: /opt/HCP/HCPpipelines/DiffusionPreprocessing/DiffPreprocPipeline.sh                 --path="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp"                 --subject="HCPA001"                 --PEdir=2                 --posData="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir98_PA.nii.gz@/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir99_PA.nii.gz"                 --negData="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir98_AP.nii.gz@/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir99_AP.nii.gz"                 --echospacing-seconds="0.000689998"                 --gdcoeffs="NONE"                 --combine-data-flag="1"                 --printcom=""                --gpu=True                --cuda-version=10.2

You can follow command's progress in:
/home/yumingz/qunex/diffusion/processing/logs/comlogs/tmp_hcp_diffusion_HCPA001_2025-03-14_01.33.17.108756.log
------------------------------------------------------------

------------------------------------------------------------
Session id: HCPA001
[started on Friday, 14. March 2025 01:33:17]
Running HCP DiffusionPreprocessing Pipeline [HCPStyleData] ...
---> The following pos direction files were found:
     HCPA001_DWI_dir98_PA.nii.gz
     HCPA001_DWI_dir99_PA.nii.gz
---> The following neg direction files were found:
     HCPA001_DWI_dir98_AP.nii.gz
     HCPA001_DWI_dir99_AP.nii.gz
---> Using image specific EchoSpacing: 0.000689998 s

------------------------------------------------------------
Running HCP Pipelines command via QuNex:

/opt/HCP/HCPpipelines/DiffusionPreprocessing/DiffPreprocPipeline.sh
    --path="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp"
    --subject="HCPA001"
    --PEdir=2
    --posData="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir98_PA.nii.gz@/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir99_PA.nii.gz"
    --negData="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir98_AP.nii.gz@/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir99_AP.nii.gz"
    --echospacing-seconds="0.000689998"
    --gdcoeffs="NONE"
    --combine-data-flag="1"
    --printcom=""
    --gpu=True
    --cuda-version=10.2
------------------------------------------------------------


Running HCP Diffusion Preprocessing

ERROR: Running HCP Diffusion Preprocessing failed with error 1
...
command executed:
/opt/HCP/HCPpipelines/DiffusionPreprocessing/DiffPreprocPipeline.sh                 --path="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp"                 --subject="HCPA001"                 --PEdir=2                 --posData="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir98_PA.nii.gz@/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir99_PA.nii.gz"                 --negData="/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir98_AP.nii.gz@/home/yumingz/qunex/diffusion/sessions/HCPA001/hcp/HCPA001/unprocessed/Diffusion/HCPA001_DWI_dir99_AP.nii.gz"                 --echospacing-seconds="0.000689998"                 --gdcoeffs="NONE"                 --combine-data-flag="1"                 --printcom=""                --gpu=True                --cuda-version=10.2

---> logfile: /home/yumingz/qunex/diffusion/processing/logs/comlogs/error_hcp_diffusion_HCPA001_2025-03-14_01.33.17.108756.log

HCP Diffusion Preprocessing completed on Friday, 14. March 2025 02:12:30
------------------------------------------------------------


---> Final report for command hcp_diffusion
... HCPA001 ---> Error
---> Not all tasks completed fully!

error_hcp_diffusion_HCPA001_2025-03-14_01.33.17.108756.log (79.6 KB)

Thanks for your help!

Best,
Acacius

Hi Acacius,

Can you also please provide the content of ${QUNEX_CONTAINER} as this depends on whether you are using Docker or Singularity since each one of the two handles GPUs differently.

One thing I noticed is that you should be using bash-pre instead of bash. The bash parameter is not for qunex_container. With qunex_container you have bash_pre that is executed after you are on the compute note but pre entering the container and bash_post which is executed on the compute note post entering the container, so inside the container (see General overview — QuNex documentation for additional details).

These things are sometimes tricky to resolve as they are system/host dependent and often outside of our (QuNex) control. What works best is to use the --cuda flag instead of --nv, but for this you need NVIDIA Container Toolkit installed on your host system.

For additional details, please check the section Using CUDA in the container at General overview — QuNex documentation.

Note that --cuda-version is irrelevant here. As long as you have a CUDA version above 10.2 you should be good to go once we sort out the other things.

To sum up, try:

qunex_container hcp_diffusion \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --overwrite="yes" \
    --container="${QUNEX_CONTAINER}" \
    --bash_pre="module load CUDA/11.5" \
    --cuda

Replace --cuda with --nv if you do not have the toolkit. Let me know how it goes.

Best, Jure

Hi Jure,

I have also tried:

qunex_container hcp_diffusion \
    --sessionsfolder="${STUDY_FOLDER}/sessions" \
    --batchfile="${STUDY_FOLDER}/processing/batch.txt" \
    --overwrite="yes" \
    --container="${QUNEX_CONTAINER}" \
    --bash_pre="module load CUDA/11.5" \
    --nv

But the error report is the same. My qunex_container is “gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:1.0.4”. I will also tried to download NVIDIA Container Toolkit to follow your suggestion.

If there have any update, I will let you know.

Thanks for your kindly help!

Best,
Acacius

Another thing explained on the page I linked above is the --cuda_path parameter, that one will bind the local CUDA into the container, assuring that CUDA runtime and host CUDA drivers match. The assumption here is that they do actually match on your system. It is a common issue with our users that CUDA drivers and CUDA runtime do not match on their host system. You can try running:

nvcc --version
nvidia-smi

on the host system for some quick CUDA debugging.

If you have admin privileges on the system, I would advise you to update CUDA runetime and CUDA drivers. On our system we use (12.4) but if you are updating, I would recommend to update to the latest version (12.8) as everything should be backwards compatible.

Best, Jure