[RESOLVED] CUDA error with dwi_bedpost_gpu

snason-tomaszewski · September 21, 2022, 6:39pm

Description:

I am trying to run the parcellation pipeline on the example data provided in the quickstart guide in Ubuntu 20.04 in a local machine (not a compute server). The turnkey procedure in the guide executes with no errors, and I am able to view the output scenes without issue in Connectome Workbench. Then afterwards, I run an hcp_diffusion with this command:

qunex_container hcp_diffusion --sessionsfolder="/home/samnt/qunex/quickstart/sessions" --batchfile="/home/samnt/qunex/quickstart/processing/HCPA001_parameters.txt" --bash="module load CUDA/9.1.85" --nv --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.9.sif" --overwrite="yes"

which executes without error. Then, to begin the parcellation pipeline (which I think should be dwi_bedpost_gpu → dwi_pre_tractography → dwi_probtrackx_dense_gpu → dwi_parcellate), I attempted to run dwi_bedpostx_gpu with this command:

qunex_container dwi_bedpostx_gpu --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.9.sif" --sessionsfolder='/home/samnt/qunex/quickstart/sessions' --sessions="HCPA001" --bash="module load CUDA/9.1.85" --nv --overwrite="yes"

but it crashes with the following error (from the log):

...................Allocated GPU 0...................
Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000
Number of Voxels to compute in this part: 374383
Number of Directions: 199
Rician noise model requested. Non-linear parameter initialization will be performed, overriding other initialization options!

SubPart 1 of 29: processing 12909 voxels
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: no kernel image is available for execution on the device
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/commands.txt: line 1:  3687 Aborted                 (core dumped) /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_9.1/bedpostx_gpu/bin/xfibres_gpu --data=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/data_0 --mask=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask -b /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvals -r /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvecs --forcedir --logdir=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000 --nf=3 --fudge=1 --bi=1000 --nj=1250 --se=25 --model=2 --cnonlinear --rician /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion 0 1 374383

I have searched the Internet for suggestions to fix the no kernel image is available for execution on the device error, but unfortunately none have worked. This is the outcome within both the Singularity container and the Docker container. The error (+Googling) seems to suggest an incompatibility between the CUDA version (9.1.85) and the device (NVIDIA GeForce RTX 3090), so I also tested this with an older NVIDIA GeForce GTX 1060, and so far the dwi_bedpostx_gpu command is processing, albeit much slower than a 3090 would probably handle. Do you have any tips to resolve this issue for the 3090? Thanks for any input you can provide!

Systems:

Custom build:
- CPU: AMD Ryzen 9 5950X
- RAM: 128 GB
- GPU: EVGA GeForce RTX 3090 FTW3 Ultra
- OS (lsb_release -a): Ubuntu 20.04.5 LTS
- Kernel (uname -msr): Linux 5.15.0-46-generic x86_64
- CUDA version (nvcc --version): V11.7.99 (also tested with same CUDA & driver versions as build #2 below, same error)
- NVIDIA Driver version (nvidia-smi): NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
Custom build #2:
- CPU: Intel Core i7-6950X
- RAM: 128 GB
- GPU: NVIDIA GeForce GTX 1060
- OS (lsb_release -a): Ubuntu 20.04.3 LTS
- Kernel (uname -msr): Linux 5.15.0-48-generic x86_64
- CUDA version (nvcc --version): V9.1.85
- NVIDIA Driver version (nvidia-smi): NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4

Quick disclaimer - I come from an electrophysiology background and have little experience with MRI, but I’m definitely learning more trying to get this to work . I also am relatively new to CUDA.

Logs:

# Generated by QuNex 0.94.9 on 2022-09-19_09.37.53.570409
#
e[32m ------------------------- Start of work --------------------------------e[0m
e[32m Note: The fibers parameter is not set, using default [3]e[0m
e[32m Note: The weight parameter is not set, using default [1]e[0m
e[32m Note: The burnin parameter is not set, using default [1000]e[0m
e[32m Note: The jumps parameter is not set, using default [1250]e[0m
e[32m Note: The sample parameter is not set, using default [25]e[0m
e[32m Note: The model parameter is not set, using default [2]e[0m
e[32m Note: The rician parameter is not set, using default [yes]e[0m

 --> Executing qunex.sh dwi_bedpostx_gpu:
     Study folder: /home/samnt/qunex/quickstart
     Sessions Folder: /home/samnt/qunex/quickstart/sessions
     Session: HCPA001
     Number of fibers: 3
     ARD weights: 1
     Burnin period: 1000
     Number of jumps: 1250
     Sample every: 25
     Model type: 2
     Rician flag: yes
     Diffusion data suffix: 
     Overwrite prior run: yes

e[31m --> Removing existing bedpostx run for HCPA001...e[0m

e[32m --> Checking if bedpostx was completed on HCPA001...e[0m

e[31m --> Prior bedpostx run not found or incomplete for HCPA001. Setting up new run...e[0m

e[32m --> Generating log foldere[0m

e[32m --> Not using gradient nonlinearities flag -ge[0m

e[32m --> Running FSL command:e[0m
    /opt/qunex/bash/qx_utilities/diffusion_tractography_dense/fsl_gpu/bedpostx_gpu /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion/. /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/. -n 3 -w 1 -b 1000 -j 1250 -s 25 -model 2 --rician
---------------------------------------------
------------ BedpostX GPU Version -----------
---------------------------------------------

subjectdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion

bedpostxdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX


-- Making bedpostx directory structure


-- Copying files to bedpostx directory


-- Pre-processing stage


-- Queuing parallel processing stage


----- Bedpostx Monitor -----

...................Allocated GPU 0...................
Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000
Number of Voxels to compute in this part: 374383
Number of Directions: 199
Rician noise model requested. Non-linear parameter initialization will be performed, overriding other initialization options!

SubPart 1 of 29: processing 12909 voxels
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: no kernel image is available for execution on the device
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/commands.txt: line 1:  3687 Aborted                 (core dumped) /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_9.1/bedpostx_gpu/bin/xfibres_gpu --data=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/data_0 --mask=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask -b /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvals -r /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvecs --forcedir --logdir=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000 --nf=3 --fudge=1 --bi=1000 --nj=1250 --se=25 --model=2 --cnonlinear --rician /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion 0 1 374383

-- Queuing post processing stage

-- Merging parts

Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask

-- Removing intermediate files


-- Creating identity xfm


-- Finished bedpostx_gpu


e[31m --> Checking outputs...e[0m

e[36m --> 9 merged samples for HCPA001 found. e[0m

e[31m --> bedpostx outputs missing or incomplete for HCPA001e[0m

e[31m ----------------------------------------------------e[0m

e[31m --> bedpostx run not found or incomplete for HCPA001. Something went wrong.e[0m
e[31m     Check output: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostXe[0m

e[31m ERROR: bedpostx run did not complete successfullye[0m

All parts processed

demsarjure · September 22, 2022, 6:30am

Hi, welcome to QuNex forums!

What is interesting is that hcp_diffusion works fine with the default version of CUDA while bedpostx has issues. Anyhow, you should be able to run bedpostx using a newer version of CUDA, try the command bellow:

qunex_container dwi_bedpostx_gpu \
  --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.9.sif" \
  --sessionsfolder="/home/samnt/qunex/quickstart/sessions" \
  --sessions="HCPA001" \
  --bash_post="export DEFAULT_CUDA_VERSION=11" \
  --nv \
  --overwrite="yes"

I added bash_post which is the bash code that gets executed post (after) entering the container but before executing the QuNex command. This should force bedpostx to use a newer version of CUDA. Also I am not sure that you need --bash="module load CUDA/9.1.85" module loading is used on HPC (high performance compute systems) for loading drivers, based on your description, you are running things on your local system? If you are using a HPC and need to load modules then you need to change the CUDA version here as well.

snason-tomaszewski · September 22, 2022, 1:10pm

Thanks for the quick reply @demsarjure! It seems the --bash_post got past the previous error, but now bedpostx is running into permission issues with /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_11/bedpostx_gpu/bin/split_parts_gpu and /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_11/bedpostx_gpu/bin/xfibres_gpu (see log below). I looked at the permissions within the Singularity container, and it seems that the items in bedpostx_gpu_cuda_11/bedpostx_gpu/bin are not executable but the items in bedpostx_gpu_cuda_9.1/bedpostx_gpu/bin are. I’m new to Singularity as well, but from what I can tell, I cannot manually change those permissions with

chmod -R +x /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_11/bedpostx_gpu/bin/

And I don’t have the root password to elevate permissions within Singularity. Any thoughts with how I can resolve this issue? Hoping it’s something I did wrong…

Log:

# Generated by QuNex 0.94.9 on 2022-09-22_08.52.11.910936
#
e[32m ------------------------- Start of work --------------------------------e[0m
e[32m Note: The fibers parameter is not set, using default [3]e[0m
e[32m Note: The weight parameter is not set, using default [1]e[0m
e[32m Note: The burnin parameter is not set, using default [1000]e[0m
e[32m Note: The jumps parameter is not set, using default [1250]e[0m
e[32m Note: The sample parameter is not set, using default [25]e[0m
e[32m Note: The model parameter is not set, using default [2]e[0m
e[32m Note: The rician parameter is not set, using default [yes]e[0m

 --> Executing qunex.sh dwi_bedpostx_gpu:
     Study folder: /home/samnt/qunex/quickstart
     Sessions Folder: /home/samnt/qunex/quickstart/sessions
     Session: HCPA001
     Number of fibers: 3
     ARD weights: 1
     Burnin period: 1000
     Number of jumps: 1250
     Sample every: 25
     Model type: 2
     Rician flag: yes
     Diffusion data suffix: 
     Overwrite prior run: yes

e[31m --> Removing existing bedpostx run for HCPA001...e[0m

e[32m --> Checking if bedpostx was completed on HCPA001...e[0m

e[31m --> Prior bedpostx run not found or incomplete for HCPA001. Setting up new run...e[0m

e[32m --> Generating log foldere[0m

e[32m --> Not using gradient nonlinearities flag -ge[0m

e[32m --> Running FSL command:e[0m
    /opt/qunex/bash/qx_utilities/diffusion_tractography_dense/fsl_gpu/bedpostx_gpu /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion/. /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/. -n 3 -w 1 -b 1000 -j 1250 -s 25 -model 2 --rician
---------------------------------------------
------------ BedpostX GPU Version -----------
---------------------------------------------

subjectdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion

bedpostxdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX


-- Making bedpostx directory structure


-- Copying files to bedpostx directory


-- Pre-processing stage

/opt/qunex/bash/qx_utilities/diffusion_tractography_dense/fsl_gpu/bedpostx_gpu: line 285: /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_11/bedpostx_gpu/bin/split_parts_gpu: Permission denied

-- Queuing parallel processing stage


----- Bedpostx Monitor -----
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/commands.txt: line 1: /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_11/bedpostx_gpu/bin/xfibres_gpu: Permission denied

-- Queuing post processing stage

-- Merging parts

Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask

-- Removing intermediate files


-- Creating identity xfm


-- Finished bedpostx_gpu


e[31m --> Checking outputs...e[0m

e[36m --> 9 merged samples for HCPA001 found. e[0m

e[31m --> bedpostx outputs missing or incomplete for HCPA001e[0m

e[31m ----------------------------------------------------e[0m

e[31m --> bedpostx run not found or incomplete for HCPA001. Something went wrong.e[0m
e[31m     Check output: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostXe[0m

e[31m ERROR: bedpostx run did not complete successfullye[0m

All parts processed

demsarjure · September 22, 2022, 1:24pm

Hi, thanks for reporting this. This seems like a permission issue on our end. I will fix this and release it in the next container.

Are permission messed up only for CUDA 11 or also for CUDA 10.1? If they are ok for 10.1 you can try running bedpostx with that version.

snason-tomaszewski · September 22, 2022, 1:38pm

Hi @demsarjure, thanks again for the quick reply! Yeah I just checked 10.1, and it also does not have execute permissions. 8.0 does, however.

I also noticed in /opt/qunex/qx_library/etc/fsl_gpu_binaries there is no probtrackx_gpu_cuda_11 directory even though there are probtrackx_gpu_cuda_10.1, probtrackx_gpu_cuda_9.1, and probtrackx_gpu_cuda_8.0. Should there be one for CUDA 11, or should I use 10.1 regardless? Furthermore, I noticed that the item in /opt/qunex/qx_library/etc/fsl_gpu_binaries/probtrackx_gpu_cuda_10.1 is not executable, but the equivalent in 9.1 and 8.0 are executable.

Let me know if I can provide any more information about those!

Thanks,

Sam

demsarjure · September 22, 2022, 4:00pm

Hi, there is no probtrackx_gpu_cuda_11 because it did not exist at the time we were updating these compiled CUDA binaries. I will check f it exists now and add it for the next release.

I just checked and 10.2 is the latest one (Probtrackx GPU).

demsarjure · September 29, 2022, 6:45pm

The 0.94.14 version that will be released tomorrow or on Monday will have fixed privileges for these libraries.

snason-tomaszewski · October 6, 2022, 3:23pm

Hey @demsarjure, thanks for updating the container! I just downloaded and tested it and I am hitting a different issue about being unable to access xfibres_gpu in both CUDA versions 11 and 10.1 for bedpostx_gpu. I checked inside the singularity container, and the file is present and executable, so I’m confused about what may be the issue. Thanks for any help you can provide!

# Generated by QuNex 0.94.14 on 2022-10-06_10.00.42.881992
#
e[32m ------------------------- Start of work --------------------------------e[0m
e[32m Note: The fibers parameter is not set, using default [3]e[0m
e[32m Note: The weight parameter is not set, using default [1]e[0m
e[32m Note: The burnin parameter is not set, using default [1000]e[0m
e[32m Note: The jumps parameter is not set, using default [1250]e[0m
e[32m Note: The sample parameter is not set, using default [25]e[0m
e[32m Note: The model parameter is not set, using default [2]e[0m
e[32m Note: The rician parameter is not set, using default [yes]e[0m

 --> Executing qunex.sh dwi_bedpostx_gpu:
     Study folder: /home/samnt/qunex/quickstart
     Sessions Folder: /home/samnt/qunex/quickstart/sessions
     Session: HCPA001
     Number of fibers: 3
     ARD weights: 1
     Burnin period: 1000
     Number of jumps: 1250
     Sample every: 25
     Model type: 2
     Rician flag: yes
     Overwrite prior run: yes

e[31m --> Removing existing bedpostx run for HCPA001...e[0m

e[32m --> Checking if bedpostx was completed on HCPA001...e[0m

e[31m --> Prior bedpostx run not found or incomplete for HCPA001. Setting up new run...e[0m

e[32m --> Generating log foldere[0m

e[32m --> Not using gradient nonlinearities flag -ge[0m

e[32m --> Running FSL command:e[0m
    /opt/qunex/bash/qx_utilities/diffusion_tractography_dense/fsl_gpu/bedpostx_gpu /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion/. /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/. -n 3 -w 1 -b 1000 -j 1250 -s 25 -model 2 --rician
---------------------------------------------
------------ BedpostX GPU Version -----------
---------------------------------------------

subjectdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion

bedpostxdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX


-- Making bedpostx directory structure


-- Copying files to bedpostx directory


-- Pre-processing stage


-- Queuing parallel processing stage


----- Bedpostx Monitor -----
/opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_10.1/bedpostx_gpu/bin/xfibres_gpu: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory

-- Queuing post processing stage

-- Merging parts

Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask

-- Removing intermediate files


-- Creating identity xfm


-- Finished bedpostx_gpu


e[31m --> Checking outputs...e[0m

e[36m --> 9 merged samples for HCPA001 found. e[0m

e[31m --> bedpostx outputs missing or incomplete for HCPA001e[0m

e[31m ----------------------------------------------------e[0m

e[31m --> bedpostx run not found or incomplete for HCPA001. Something went wrong.e[0m
e[31m     Check output: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostXe[0m

e[31m ERROR: bedpostx run did not complete successfullye[0m

All parts processed

demsarjure · October 7, 2022, 9:16am

The container is unable to access CUDA libraries, the error is actually:

libcudart.so.10.1: cannot open shared object file: No such file or directory

So libcudart.so.X.Y is missing, this is the CUDA Runtime API library and should be installed on the local system. It seems like you have CUDA 9.1 installed since that library is found? Can you provide the exact qunex call you are making. Thanks!

snason-tomaszewski · October 7, 2022, 11:55am

Hi @demsarjure, I had CUDA toolkit 11.7 installed when testing with 10.1 and 11, so it makes sense why bedpostx_gpu couldn’t find libcudart.so.10.1. I’m also guessing it’s looking for an earlier version than 11.7 when given --bash_post="export DEFAULT_CUDA_VERSION=11". The command I ran was

qunex_container dwi_bedpostx_gpu
    --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.14.sif" \
    --sessionsfolder="/home/samnt/qunex/quickstart/sessions" \
    --sessions="HCPA001" \
    --bash_post="export DEFAULT_CUDA_VERSION=10.1" \ # also tested 11 here
    --nv \
    --overwrite="yes"

I just tried installing CUDA toolkit 10.1 alongside the installation of 11.7, and it returned the same error unfortunately. /usr/local/cuda points to version 10.1 and I confirmed cudart-10-1 was installed with the rest of cuda-toolkit-10-1. Is there any more information I could provide?

demsarjure · October 7, 2022, 11:59am

Let me retest the latest container and CUDA 10/11 on our end (the default, that we regularly test is CUDA 9.1), I will let you know how it goes early next week.

This way we will know whether the issues are because of something that is miss-configured in the container or because of something on your end.

snason-tomaszewski · October 7, 2022, 12:07pm

Thank you, I’ll look out for an update from you next week!

demsarjure · October 10, 2022, 10:15am

I think I found the culprit. QuNex container already includes CUDA 9.1, this is not because of Singularity as it cannot use internal CUDA but needs external CUDA libraries. CUDA inside the container is there because of Docker containers, these actually need CUDA installed within the container. The problem is that with Singularity, the environment inside the container finds libcudart.so.9.1 instead of libcudart.so.10.1. Luckily, there is a workaround for this, below is a call that worked for me:

qunex_container dwi_bedpostx_gpu \
  --sessionsfolder="/gpfs/gibbs/pi/n3/Studies/MBLab/jd_tests/d1_study/sessions" \
  --sessions="S4453_P49_JV" \
  --overwrite="yes" \
  --bash_pre="module load CUDA/10.1.105" \
  --bash_post="export DEFAULT_CUDA_VERSION=10.1" \
  --bind="/gpfs/loomis/apps/avx/software/CUDA/10.1.105:/usr/local/cuda/" \
  --nv \
  --container="/gpfs/gibbs/pi/n3/software/Singularity/qunex_suite-0.94.14.sif" \
  --scheduler="SLURM,time=4:00:00,ntasks=1,cpus-per-task=2,mem=64GB,partition=gpu,gpus=1,jobname=gpu_test"

The crucial line is:

  --bind="/gpfs/loomis/apps/avx/software/CUDA/10.1.105:/usr/local/cuda/" \

With this I am binding the CUDA folder on my system (/gpfs/loomis/apps/avx/software/CUDA/10.1.105) over the CUDA inside the container (/usr/local/cuda/). This way CUDA 9.1 inside the container is overwritten with the proper version - CUDA 10.1.

Please test this solution and let me know how it works, thanks. You will have to change the first path in the --bind parameter so it matches your path to CUDA 10.1.

We will try to think of a better way to handle this, but for the moment this is what we are stuck with .

Cheers, Jure

snason-tomaszewski · October 10, 2022, 1:34pm

Hey Jure,

I hate to continue to be the bearer of bad news, but unfortunately I don’t think that worked either It returned the same error as my original post. The command I ran:

qunex_container dwi_bedpostx_gpu \
  --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.14.sif" \
  --sessionsfolder="/home/samnt/qunex/quickstart/sessions" \
  --sessions="HCPA001" \
  --bash_pre="module load CUDA/10.1" \
  --bash_post="export DEFAULT_CUDA_VERSION=10.1" \
  --bind="/usr/local/cuda-10.1/:/usr/local/cuda/" \
  --nv \
  --overwrite="yes"

The log:

# Generated by QuNex 0.94.14 on 2022-10-10_09.22.51.844713
#
e[32m ------------------------- Start of work --------------------------------e[0m
e[32m Note: The fibers parameter is not set, using default [3]e[0m
e[32m Note: The weight parameter is not set, using default [1]e[0m
e[32m Note: The burnin parameter is not set, using default [1000]e[0m
e[32m Note: The jumps parameter is not set, using default [1250]e[0m
e[32m Note: The sample parameter is not set, using default [25]e[0m
e[32m Note: The model parameter is not set, using default [2]e[0m
e[32m Note: The rician parameter is not set, using default [yes]e[0m

 --> Executing qunex.sh dwi_bedpostx_gpu:
     Study folder: /home/samnt/qunex/quickstart
     Sessions Folder: /home/samnt/qunex/quickstart/sessions
     Session: HCPA001
     Number of fibers: 3
     ARD weights: 1
     Burnin period: 1000
     Number of jumps: 1250
     Sample every: 25
     Model type: 2
     Rician flag: yes
     Overwrite prior run: yes

e[31m --> Removing existing bedpostx run for HCPA001...e[0m

e[32m --> Checking if bedpostx was completed on HCPA001...e[0m

e[31m --> Prior bedpostx run not found or incomplete for HCPA001. Setting up new run...e[0m

e[32m --> Generating log foldere[0m

e[32m --> Not using gradient nonlinearities flag -ge[0m

e[32m --> Running FSL command:e[0m
    /opt/qunex/bash/qx_utilities/diffusion_tractography_dense/fsl_gpu/bedpostx_gpu /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion/. /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/. -n 3 -w 1 -b 1000 -j 1250 -s 25 -model 2 --rician
---------------------------------------------
------------ BedpostX GPU Version -----------
---------------------------------------------

subjectdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion

bedpostxdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX


-- Making bedpostx directory structure


-- Copying files to bedpostx directory


-- Pre-processing stage


-- Queuing parallel processing stage


----- Bedpostx Monitor -----

...................Allocated GPU 0...................
Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000
Number of Voxels to compute in this part: 374383
Number of Directions: 199
Rician noise model requested. Non-linear parameter initialization will be performed, overriding other initialization options!

SubPart 1 of 29: processing 12909 voxels
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/commands.txt: line 1: 99206 Aborted                 (core dumped) /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_10.1/bedpostx_gpu/bin/xfibres_gpu --data=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/data_0 --mask=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask -b /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvals -r /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvecs --forcedir --logdir=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000 --nf=3 --fudge=1 --bi=1000 --nj=1250 --se=25 --model=2 --cnonlinear --rician /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion 0 1 374383

-- Queuing post processing stage

-- Merging parts

Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask

-- Removing intermediate files


-- Creating identity xfm


-- Finished bedpostx_gpu


e[31m --> Checking outputs...e[0m

e[36m --> 9 merged samples for HCPA001 found. e[0m

e[31m --> bedpostx outputs missing or incomplete for HCPA001e[0m

e[31m ----------------------------------------------------e[0m

e[31m --> bedpostx run not found or incomplete for HCPA001. Something went wrong.e[0m
e[31m     Check output: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostXe[0m

e[31m ERROR: bedpostx run did not complete successfullye[0m

All parts processed

I should’ve realized this earlier, but RTX 3090 requires CUDA 11.1 or greater, so 10.1 was never going to work. I was trying to use 10.1, since you mentioned probtrackx_gpu doesn’t support CUDA 11 yet, and I didn’t think to check that the RTX 3090 was supported by 10.1 in the first place.

Anyways, running the following command (using CUDA 11 instead of 10.1) does work, so thank you for all of your help!

qunex_container dwi_bedpostx_gpu \
  --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.14.sif" \
  --sessionsfolder="/home/samnt/qunex/quickstart/sessions" \
  --sessions="HCPA001" \
  --bash_pre="module load CUDA/11" \
  --bash_post="export DEFAULT_CUDA_VERSION=11" \
  --bind="/usr/local/cuda-11/:/usr/local/cuda/" \
  --nv \
  --overwrite="yes"

I’ll keep an eye out for updates to probtrackx_gpu to support CUDA 11, but for now, I’ll stick with the 1060 for the full pipeline.

demsarjure · October 11, 2022, 8:01am

Glad it worked! Yes, the error above is because your GPU cannot use the installed CUDA versions. This is the page for probtracx GPU binaries: Probtrackx GPU. If you notice the release of a CUDA 11 version let me know and I will add it to the container. I also check it out myself now and then.