Description:
I am trying to run the parcellation pipeline on the example data provided in the quickstart guide in Ubuntu 20.04 in a local machine (not a compute server). The turnkey procedure in the guide executes with no errors, and I am able to view the output scenes without issue in Connectome Workbench. Then afterwards, I run an hcp_diffusion
with this command:
qunex_container hcp_diffusion --sessionsfolder="/home/samnt/qunex/quickstart/sessions" --batchfile="/home/samnt/qunex/quickstart/processing/HCPA001_parameters.txt" --bash="module load CUDA/9.1.85" --nv --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.9.sif" --overwrite="yes"
which executes without error. Then, to begin the parcellation pipeline (which I think should be dwi_bedpost_gpu
→ dwi_pre_tractography
→ dwi_probtrackx_dense_gpu
→ dwi_parcellate
), I attempted to run dwi_bedpostx_gpu
with this command:
qunex_container dwi_bedpostx_gpu --container="/home/samnt/qunex/qunexcontainer/qunex_suite-0.94.9.sif" --sessionsfolder='/home/samnt/qunex/quickstart/sessions' --sessions="HCPA001" --bash="module load CUDA/9.1.85" --nv --overwrite="yes"
but it crashes with the following error (from the log):
...................Allocated GPU 0...................
Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000
Number of Voxels to compute in this part: 374383
Number of Directions: 199
Rician noise model requested. Non-linear parameter initialization will be performed, overriding other initialization options!
SubPart 1 of 29: processing 12909 voxels
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: no kernel image is available for execution on the device
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/commands.txt: line 1: 3687 Aborted (core dumped) /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_9.1/bedpostx_gpu/bin/xfibres_gpu --data=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/data_0 --mask=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask -b /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvals -r /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvecs --forcedir --logdir=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000 --nf=3 --fudge=1 --bi=1000 --nj=1250 --se=25 --model=2 --cnonlinear --rician /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion 0 1 374383
I have searched the Internet for suggestions to fix the no kernel image is available for execution on the device
error, but unfortunately none have worked. This is the outcome within both the Singularity container and the Docker container. The error (+Googling) seems to suggest an incompatibility between the CUDA version (9.1.85) and the device (NVIDIA GeForce RTX 3090), so I also tested this with an older NVIDIA GeForce GTX 1060, and so far the dwi_bedpostx_gpu
command is processing, albeit much slower than a 3090 would probably handle. Do you have any tips to resolve this issue for the 3090? Thanks for any input you can provide!
Systems:
- Custom build:
- CPU: AMD Ryzen 9 5950X
- RAM: 128 GB
- GPU: EVGA GeForce RTX 3090 FTW3 Ultra
- OS (
lsb_release -a
):Ubuntu 20.04.5 LTS
- Kernel (
uname -msr
):Linux 5.15.0-46-generic x86_64
- CUDA version (
nvcc --version
):V11.7.99
(also tested with same CUDA & driver versions as build #2 below, same error) - NVIDIA Driver version (
nvidia-smi
):NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
- Custom build #2:
- CPU: Intel Core i7-6950X
- RAM: 128 GB
- GPU: NVIDIA GeForce GTX 1060
- OS (
lsb_release -a
):Ubuntu 20.04.3 LTS
- Kernel (
uname -msr
):Linux 5.15.0-48-generic x86_64
- CUDA version (
nvcc --version
):V9.1.85
- NVIDIA Driver version (
nvidia-smi
):NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4
Quick disclaimer - I come from an electrophysiology background and have little experience with MRI, but I’m definitely learning more trying to get this to work . I also am relatively new to CUDA.
Logs:
# Generated by QuNex 0.94.9 on 2022-09-19_09.37.53.570409
#
e[32m ------------------------- Start of work --------------------------------e[0m
e[32m Note: The fibers parameter is not set, using default [3]e[0m
e[32m Note: The weight parameter is not set, using default [1]e[0m
e[32m Note: The burnin parameter is not set, using default [1000]e[0m
e[32m Note: The jumps parameter is not set, using default [1250]e[0m
e[32m Note: The sample parameter is not set, using default [25]e[0m
e[32m Note: The model parameter is not set, using default [2]e[0m
e[32m Note: The rician parameter is not set, using default [yes]e[0m
--> Executing qunex.sh dwi_bedpostx_gpu:
Study folder: /home/samnt/qunex/quickstart
Sessions Folder: /home/samnt/qunex/quickstart/sessions
Session: HCPA001
Number of fibers: 3
ARD weights: 1
Burnin period: 1000
Number of jumps: 1250
Sample every: 25
Model type: 2
Rician flag: yes
Diffusion data suffix:
Overwrite prior run: yes
e[31m --> Removing existing bedpostx run for HCPA001...e[0m
e[32m --> Checking if bedpostx was completed on HCPA001...e[0m
e[31m --> Prior bedpostx run not found or incomplete for HCPA001. Setting up new run...e[0m
e[32m --> Generating log foldere[0m
e[32m --> Not using gradient nonlinearities flag -ge[0m
e[32m --> Running FSL command:e[0m
/opt/qunex/bash/qx_utilities/diffusion_tractography_dense/fsl_gpu/bedpostx_gpu /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion/. /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/. -n 3 -w 1 -b 1000 -j 1250 -s 25 -model 2 --rician
---------------------------------------------
------------ BedpostX GPU Version -----------
---------------------------------------------
subjectdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion
bedpostxdir is /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX
-- Making bedpostx directory structure
-- Copying files to bedpostx directory
-- Pre-processing stage
-- Queuing parallel processing stage
----- Bedpostx Monitor -----
...................Allocated GPU 0...................
Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000
Number of Voxels to compute in this part: 374383
Number of Directions: 199
Rician noise model requested. Non-linear parameter initialization will be performed, overriding other initialization options!
SubPart 1 of 29: processing 12909 voxels
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: no kernel image is available for execution on the device
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/commands.txt: line 1: 3687 Aborted (core dumped) /opt/qunex/qx_library/etc/fsl_gpu_binaries/bedpostx_gpu_cuda_9.1/bedpostx_gpu/bin/xfibres_gpu --data=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/data_0 --mask=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask -b /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvals -r /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/bvecs --forcedir --logdir=/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts/data_part_0000 --nf=3 --fudge=1 --bi=1000 --nj=1250 --se=25 --model=2 --cnonlinear --rician /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion 0 1 374383
-- Queuing post processing stage
-- Merging parts
Log directory is: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/diff_parts
/home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostX/nodif_brain_mask
-- Removing intermediate files
-- Creating identity xfm
-- Finished bedpostx_gpu
e[31m --> Checking outputs...e[0m
e[36m --> 9 merged samples for HCPA001 found. e[0m
e[31m --> bedpostx outputs missing or incomplete for HCPA001e[0m
e[31m ----------------------------------------------------e[0m
e[31m --> bedpostx run not found or incomplete for HCPA001. Something went wrong.e[0m
e[31m Check output: /home/samnt/qunex/quickstart/sessions/HCPA001/hcp/HCPA001/T1w/Diffusion.bedpostXe[0m
e[31m ERROR: bedpostx run did not complete successfullye[0m
All parts processed