[RESOLVED] Dwi_legacy_gpu command erroring out using QuNex version v0.97.3 with eddy_cuda10.2

Hi QuNex Team,

I am running the dwi_legacy_gpu command using QuNex version v0.97.3. I encountered an error with eddy_cuda10.2, error stack trace is provided below. I am unable to decipher the error log.

/opt/fsl/fsl-6.0.6.2/bin/eddy_cuda10.2 --imain=/data/Diffusion/10189_DWI_dir64_PA --mask=/data/Diffusion/rawdata/10189_DWI_dir64_PA_nodif_brain_mask --acqp=/data/Diffusion/acqparams/10189_DWI_dir64_PA/acqparams.txt --index=/data/Diffusion/acqparams/10189_DWI_dir64_PA/index.txt --bvecs=/data/Diffusion/10189_DWI_dir64_PA.bvec --bvals=/data/Diffusion/10189_DWI_dir64_PA.bval --fwhm=10,0,0,0,0 --ff=10 --nvoxhp=2000 --flm=quadratic --out=/data/Diffusion/eddy/10189_DWI_dir64_PA_eddy_corrected --data_is_shelled --repol -v

Reading images
Performing volume-to-volume registration
Running Register
��xݟ
��xݟ
��xݟ
��xݟ
EDDY::: Eddy failed with message q���U

QuNex command:

qunex dwi_legacy_gpu \
 --sessionsfolder='/dataset1/sessions' \
 --sessions='10189' \
 --diffdatasuffix='DWI_dir64_PA' \
 --usefieldmap='no' \
 --pedir=2 \
 --echospacing='0.69' \
 --unwarpdir='-y' \
 --scanner='siemens

nvidia-smi output on the gpu instance

I have updated the default CUDA version to 10.2 as I have encountered an issue with 10.1 /opt/qunex/bash/qx_utilities/dwi_legacy_gpu.sh: line 565: /opt/fsl/fsl/bin/eddy_cuda10.1: No such file or directory

Could you please assist me in resolving this issue? I have attached the complete log output of the dwi_legacy_gpu command below. Let me know if you need any more information. Thanks in advance.

Thanks
Suhas Reddy

qunex dwi_legacy_gpu_error.log (4.3 KB)

Hi there,

CUDA 10.2 and above support is currently functional only for the hcp_diffusion command. Next version (to be released in a week or so) will add this support to the whole DWI pipelines. The testing of the next release is in its final phase, so things are looking good for a release by the end of May.

Jure

Hi Jure,

Thank you for the update. I am looking forward to the release.

Thanks,
Suhas

Hi, we will be releasing a new version that should fix this soon. Please test it and let us know if there are any issues.

Jure

Dear Jure,

I tested the dwi_legacy_gpu command using the latest Qunex version (v0.98.0), but I am still observing an invalid eddy_cuda10.2 command output as seen below

Reading images
Performing volume-to-volume registration
Running Register
�����
�����
�����
0}��U
EDDY::: Eddy failed with message `q��U

Complete log output :
[v0.98.0]_qunex dwi_pipeline_error.log (4.8 KB)

Instance details:

Kindly review and let me know if you need any more info from my end.

Thanks in advance
Suhas

Hi Suhas,

the problem with this error:

�����
�����
�����
0}��U
EDDY::: Eddy failed with message `q��U

Is that it is not necesarily a CUDA or QuNex container error. When developing DWI rework I had to bust my head a number of times when I was receving this. Sometimes it was due to some bugs in my code, but often it was because of issues with input data. The problem here is that this is unreadable to human and hard to resolve.

What I found the easiest is to use the add the --nogpu='yes' parameter to the dwi_legacy_gpu command call. In this case command will be executed without CUDA. If it works, then the error is CUDA related and we can debug it with that in mind. If it does not work, it will throw a humanly readable error.

Also, could you just provide the full call you used at the end. So we are at the same page.

Jure

Dear @demsarjure ,

I have tested the code both without the --nogpu=‘yes’ parameter and with the --nogpu=‘yes’ parameter on the same input data as requested. The results are as follows.

Run details without --nogpu=‘yes’ :

QuNex command:

qunex dwi_legacy_gpu
–sessionsfolder=‘dataset_test/sessions’
–sessions=‘70060’
–diffdatasuffix=‘DWI_dir64_PA’
–usefieldmap=‘no’
–pedir=‘2’
–echospacing=‘0.69’
–unwarpdir=‘-y’
–overwrite=‘yes’

Eddy Failed with similar error message as above, Complete LogTrace is as follows


........................ Running QuNex v0.98.0 ........................


NOTE: Processing without FieldMap (TE option not needed)

Running dwi_legacy_gpu with the following parameters:
--------------------------------------------------------------
   Study Folder: dataset_test/sessions
   Sessions Folder: dataset_test/sessions
   Sessions: 70060
   Study Log Folder:
   Using FieldMap: no
   Echo Spacing: 0.69
   Phase Encoding Direction: 2
   TE value for Fieldmap:
   EPI Unwarp Direction: -y
   Diffusion Data Suffix Name: DWI_dir64_PA
   Overwrite prior run: append
   No GPU: no


--- Full QuNex call for command: dwi_legacy_gpu

/opt/qunex/bash/qx_utilities/dwi_legacy_gpu.sh     --sessionsfolder=dataset_test/sessions     --session=70060     --usefieldmap=no     --pedir=2     --echospacing=0.69     --te=     --unwarpdir=-y     --diffdatasuffix=DWI_dir64_PA     --overwrite=append     --nogpu=no

--------------------------------------------------------------




--------------------------------------------------------------

   Running dwi_legacy_gpu locally on ip-10-7-68-7.ec2.internal
   Command log:     dataset_test/sessions/processing/logs/runlogs/Log-dwi_legacy_gpu_2023-06-22_14.35.05.838396.log
   Command output: dataset_test/sessions/processing/logs/comlogs/tmp_dwi_legacy_gpu_70060_2023-06-22_14.35.05.838396.log

--------------------------------------------------------------



-- dwi_legacy_gpu.sh: Specified Command-Line Options - Start --
   Sessionsfolder: dataset_test/sessions
   Session: 70060
   Using fieldmap: no
   Diffusion data sufix: DWI_dir64_PA
   Overwrite: append
   No GPU: no
-- dwi_legacy_gpu.sh: Specified Command-Line Options - End --

 ------------------------- Start of work --------------------------------

 --- Establishing paths for all input and output folders:

T1w folder:           dataset_test/sessions/70060/hcp/70060/T1w
Diffusion folder:     dataset_test/sessions/70060/hcp/70060/Diffusion
T1w diffusion folder: dataset_test/sessions/70060/hcp/70060/T1w/Diffusion

 --- Backing up previous dataset_test/sessions/70060/hcp/70060/Diffusion as 2023-06-22_14.35.05.923703 ...

cp: omitting directory ‘dataset_test/sessions/70060/hcp/70060/Diffusion’
 --- Backing up previous dataset_test/sessions/70060/hcp/70060/T1w/Diffusion as 2023-06-22_14.35.05.923703 ...

cp: omitting directory ‘dataset_test/sessions/70060/hcp/70060/T1w/Diffusion’
 --- Copying unprocesed data into the Diffusion folder

Copying dataset_test/sessions/70060/hcp/70060/unprocessed/Diffusion/70060_DWI_dir64_PA.bval
Copying dataset_test/sessions/70060/hcp/70060/unprocessed/Diffusion/70060_DWI_dir64_PA.bvec
Copying dataset_test/sessions/70060/hcp/70060/unprocessed/Diffusion/70060_DWI_dir64_PA.nii.gz
 --- Setting up acquisition parameters:

Check acquisition parameter files:

acqparams.txt
index.txt


 --- Omitting FieldMap step...

 Getting the first volume of each DWI image...

 Run BET on the B0 EPI image to create masks...

IN=dataset_test/sessions/70060/hcp/70060/Diffusion/rawdata/70060_DWI_dir64_PA_nodif
OUT=dataset_test/sessions/70060/hcp/70060/Diffusion/rawdata/70060_DWI_dir64_PA_nodif_brain
bet2opts= -m -f 0.35 -v
verbose=1
debug=0
variation=0
min 0 thresh2 0 thresh 79.443 thresh98 794.43 max 4095
c-of-g 97.93 87.9159 55.0544 mm
radius 76.5741 mm
median within-brain intensity 317
self-intersection total 110.045 (threshold=4000.0)

 --- Checking if PreFreeSurfer was completed to obtain inputs for epi_reg...

 PreFreeSurfer data found:

dataset_test/sessions/70060/hcp/70060/T1w/T1w_acpc_dc_restore_brain.nii.gz

 FAST already completed.

 Setting inputs for epi_reg:

 --> T1w Data:             dataset_test/sessions/70060/hcp/70060/T1w/T1w_acpc_dc_restore
 --> T1w BET+FAST Data:    dataset_test/sessions/70060/hcp/70060/T1w/T1w_acpc_dc_restore_brain
 --> WM Segment FAST Data: dataset_test/sessions/70060/hcp/70060/T1w/T1w_acpc_dc_restore_brain_pve_2
 --> T1w Brain Mask Data:  dataset_test/sessions/70060/hcp/70060/T1w/T1w_acpc_brain_mask

 --- Running eddy...

 Using the following eddy binary: /opt/fsl/fsl/bin/eddy_cuda10.2

Running command:

 /opt/fsl/fsl/bin/eddy_cuda10.2 --imain=dataset_test/sessions/70060/hcp/70060/Diffusion/70060_DWI_dir64_PA --mask=dataset_test/sessions/70060/hcp/70060/Diffusion/rawdata/70060_DWI_dir64_PA_nodif_brain_mask --acqp=dataset_test/sessions/70060/hcp/70060/Diffusion/acqparams/70060_DWI_dir64_PA/acqparams.txt --index=dataset_test/sessions/70060/hcp/70060/Diffusion/acqparams/70060_DWI_dir64_PA/index.txt --bvecs=dataset_test/sessions/70060/hcp/70060/Diffusion/70060_DWI_dir64_PA.bvec --bvals=dataset_test/sessions/70060/hcp/70060/Diffusion/70060_DWI_dir64_PA.bval --fwhm=10,0,0,0,0 --ff=10 --nvoxhp=2000 --flm=quadratic --out=dataset_test/sessions/70060/hcp/70060/Diffusion/eddy/70060_DWI_dir64_PA_eddy_corrected --data_is_shelled --repol -v

Reading images
Performing volume-to-volume registration
Running Register
c�')V
@]�')V
]4
]4
EDDY::: Eddy failed with message pQ�')V 

Run details with --nogpu=‘yes’ :

QuNex command:

qunex dwi_legacy_gpu
–sessionsfolder=‘dataset_test/sessions’
–sessions=‘70060’
–diffdatasuffix=‘DWI_dir64_PA’
–usefieldmap=‘no’
–pedir=‘2’
–echospacing=‘0.69’
–unwarpdir=‘-y’
–overwrite=‘yes’
–nogpu=‘yes’

DWI Legacy command ran successfully, Complete LogTrace is attached below

qunex_dwi_legacy_success_output.log (10.4 KB)

OK, this is good. At least we have a plan B in case we cannot figure out what is happening when GPU mode is enabled. The problem here is, that on all of our CUDA systems the command is working and we have systems with completely different GPU cards and CUDA versions so I am unable to recreate your error. Furthermore, the reported error is utter gibberish and not helpful in any way whatsoever.

One more thing you could try is to add

--cuda_path=<PATH TO CUDA ON YOUR SYSTEM>

To the qunex_container call. The path is the path to your local CUDA installation, usually something like /usr/local/cuda-12 or similar.

Jure