[RESOLVED] Issue with running NODDI

Estephan · May 7, 2024, 8:25pm

Description:

Hello, I tried to run the NODDI_Watson pipeline in my institution’s computer cluster and have found errors. This is the command I used:

Call:

msi_resources_time=04:00:00; msi_resources_nodes=1; msi_resources_ntaskspernode=12; msi_resources_mem=128000; msi_queue=v100; msi_resources_gpu=gpu:v100:1; msi_resources_jobname=XTRACK; \
study_sharedfolder=/home/moanae/shared/project_K99_ChrTMDHCP_qunex02; \
qunex_container dwi_xtract \
--batchfile=${study_sharedfolder}/processing/batch_K99Aim2.txt --sessionsfolder=${study_sharedfolder}/sessions \
--species="human" \
--parsessions=1 --overwrite="yes" \
--nv \
--bash_pre="module load fsl/6.0.7.9 cuda/11.2" \
--envars="FSLDIR=>${FSLDIR}" \
--scheduler=SLURM,time=${msi_resources_time},nodes=${msi_resources_nodes},cpus-per-task=${msi_resources_ntaskspernode},mem=${msi_resources_mem},partition=${msi_queue},gres=${msi_resources_gpu},jobname=${msi_resources_jobname} \
--bind=${study_sharedfolder}:${study_sharedfolder},${FSLDIR}:${FSLDIR} --container=${HOME}/qunex/qunex_suite-0.99.2d.sif

Right in the beginning of the log file, there are errors reported:

Logs:

# Generated by QuNex 0.99.2 on 2024-04-20_14.10.23.843019#
------------------------------------------------------------
Running external command via QuNex:

/opt/qunex/qx_library/etc/cudimot/cuda_10.2/bin/Pipeline_NODDI_Watson.sh                 /home/moanae/shared/project_K99_ChrTMDHCP_qunex02/sessions/10005/hcp/10005/T1w/Diffusion
------------------------------------------------------------

---------------------------------------------------------------------------------
------------------------------------ CUDIMOT ------------------------------------
----------------------------- MODEL: NODDI_Watson -----------------------------
---------------------------------------------------------------------------------
subjectdir is /home/moanae/shared/project_K99_ChrTMDHCP_qunex02/sessions/10005/hcp/10005/T1w/Diffusion
Making output directory structure
Copying files to output directory
cp: relocation error: /lib64/libacl.so.1: symbol getxattr, version ATTR_1.0 not defined in file libattr.so.1 with link time reference
cp: relocation error: /lib64/libacl.so.1: symbol getxattr, version ATTR_1.0 not defined in file libattr.so.1 with link time reference
Queue Dtifit
Queue GridSearch process

I tried then to get inside the container in a GPU node to try to run that command inside the container, but even without issuing the command there are errors related to some libraries:

cn2110:~ moana004$ module load singularity/current python3
cn2110:~ moana004$ study_sharedfolder=/home/moanae/shared/project_K99_ChrTMDHCP_qunex02
cn2110:~ moana004$ singularity shell -B ${study_sharedfolder}:${study_sharedfolder} ${HOME}/qunex/qunex_suite-0.99.2d.sif
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
Apptainer> source /opt/qunex/env/qunex_environment.sh
--> unsetting the following environment variables: PATH MATLABPATH PYTHONPATH QUNEXVer TOOLS QUNEXREPO QUNEXPATH QUNEXEXTENSIONS QUNEXLIBRARY QUNEXLIBRARYETC TemplateFolder FSL_FIXDIR FREESURFERDIR FREESURFER_HOME FREESURFER_SCHEDULER FreeSurferSchedulerDIR WORKBENCHDIR DCMNIIDIR DICMNIIDIR MATLABDIR MATLABBINDIR OCTAVEDIR OCTAVEPKGDIR OCTAVEBINDIR RDIR HCPWBDIR AFNIDIR PYLIBDIR FSLDIR FSLBINDIR PALMDIR QUNEXMCOMMAND HCPPIPEDIR CARET7DIR GRADUNWARPDIR HCPPIPEDIR_Templates HCPPIPEDIR_Bin HCPPIPEDIR_Config HCPPIPEDIR_PreFS HCPPIPEDIR_FS HCPPIPEDIR_PostFS HCPPIPEDIR_fMRISurf HCPPIPEDIR_fMRIVol HCPPIPEDIR_tfMRI HCPPIPEDIR_dMRI HCPPIPEDIR_dMRITract HCPPIPEDIR_Global HCPPIPEDIR_tfMRIAnalysis HCPCIFTIRWDIR MSMBin HCPPIPEDIR_dMRITractFull HCPPIPEDIR_dMRILegacy AutoPtxFolder EDDYCUDA USEOCTAVE QUNEXENV CONDADIR MSMBINDIR MSMCONFIGDIR R_LIBS FSL_FIX_CIFTIRW FSFAST_HOME SUBJECTS_DIR MINC_BIN_DIR MNI_DIR MINC_LIB_DIR MNI_DATAPATH FSF_OUTPUT_FORMAT ANTSDIR CUDIMOT
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
 
Generated by QuNex 
------------------------------------------------------------------------ 
Version: 0.99.2 
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
User: moana004 
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
System: cn2110 
OS: RedHat Linux #1 SMP Wed Mar 20 15:54:52 UTC 2024 
------------------------------------------------------------------------ 
 
        \u2588\u2588\u2588\u2588\u2588\u2588\                  \u2551      \u2588\u2588\   \u2588\u2588\                        
       \u2588\u2588  __\u2588\u2588\                 \u2551      \u2588\u2588\u2588\  \u2588\u2588 |                       
       \u2588\u2588 /  \u2588\u2588 |\u2588\u2588\   \u2588\u2588\       \u2551      \u2588\u2588\u2588\u2588\ \u2588\u2588 | \u2588\u2588\u2588\u2588\u2588\u2588\ \u2588\u2588\   \u2588\u2588\     
       \u2588\u2588 |  \u2588\u2588 |\u2588\u2588 |  \u2588\u2588 |      \u2551      \u2588\u2588 \u2588\u2588\\u2588\u2588 |\u2588\u2588  __\u2588\u2588\\\u2588\u2588\ \u2588\u2588  | 
       \u2588\u2588 |  \u2588\u2588 |\u2588\u2588 |  \u2588\u2588 |      \u2551      \u2588\u2588 \\u2588\u2588\u2588\u2588 |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 |\\u2588\u2588\u2588\u2588  /     
       \u2588\u2588 \u2588\u2588\\u2588\u2588 |\u2588\u2588 |  \u2588\u2588 |      \u2551      \u2588\u2588 |\\u2588\u2588\u2588 |\u2588\u2588   ____|\u2588\u2588  \u2588\u2588\      
       \\u2588\u2588\u2588\u2588\u2588\u2588 / \\u2588\u2588\u2588\u2588\u2588\u2588  |      \u2551      \u2588\u2588 | \\u2588\u2588 |\\u2588\u2588\u2588\u2588\u2588\u2588\u2588\\u2588\u2588  /\\u2588\u2588\     
        \___\u2588\u2588\u2588\  \______/       \u2551      \__|  \__| \_______\__/  \__|    
            \___|                \u2551                                       
 
 
                       DEVELOPED & MAINTAINED BY: 
 
                    Anticevic Lab, Yale University 
               Mind & Brain Lab, University of Ljubljana 
                     Murray Lab, Yale University 
 
                      COPYRIGHT & LICENSE NOTICE: 
 
Use of this software is subject to the terms and conditions defined in 
'LICENSES' which is a part of the QuNex Suite source code package: 
https://gitlab.qunex.yale.edu/qunex/qunex/-/tree/master/LICENSES 
 
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
 ---> Setting up Octave  

ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded: ignored.

I have used this qunex container version for other analysis pipelines, including those running in a GPU node (such as FSL xtract) without an issue, so it seems that this is related specifically to NODDI. Any thoughts on what be going on here? Thank you.

Estephan

demsarjure · May 8, 2024, 6:09am

Hi Estephan,

No idea what is happening . It does not seem like the usual missing libraries error. Could be some kind of a mismatch between library versions.

The call you are providing is for dwi_xtract which works right? The error log is for dwi_noddi_gpu though. Can you also provide your call for dwi_noddi_gpu. On top of the log, you can see that the cuda_10.2 version is used, maybe you can try swapping to 11.3 or 12 (we support those 3 versions, see dwi_noddi_gpu — QuNex documentation). This can be done through:

--cuda_version (str, default '10.2'):
    Which CUDA version to use. Supports 10.2, 11.3 and 12.

Let me know how it goes.

Best, Jure

Estephan · May 8, 2024, 3:12pm

Thanks Jure. I re-ran the command twice, using cuda 11.2 (that’s what is available in our computer cluster) and 12. Both logs showed errors. I pasted below the 2 commands I used to call and attached the respective log files. Please let me know your thoughts. Thank you.

Estephan

Command #1

msi_resources_time=12:00:00; msi_resources_nodes=1; msi_resources_ntaskspernode=12; msi_resources_mem=64000; msi_queue=v100; msi_resources_gpu=gpu:v100:1; msi_resources_jobname=NODDIWatson; study_sharedfolder=/home/moanae/shared/project_K99_ChrTMDHCP_qunex02; qunex_container dwi_noddi_gpu --batchfile=${study_sharedfolder}/processing/batch_K99Aim2.txt --sessionsfolder=${study_sharedfolder}/sessions --sessions="10001" --noddi_model="Watson" --nv --bash_pre="module load cuda/11.2" --cuda_version="11.3" --parsessions=12 --overwrite="yes" --scheduler=SLURM,time=${msi_resources_time},nodes=${msi_resources_nodes},cpus-per-task=${msi_resources_ntaskspernode},mem=${msi_resources_mem},partition=${msi_queue},jobname=${msi_resources_jobname} --bind=${study_sharedfolder}:${study_sharedfolder},${FSLDIR}:${FSLDIR} --container=${HOME}/qunex/qunex_suite-0.99.2d.sif

Command #2

msi_resources_time=12:00:00; msi_resources_nodes=1; msi_resources_ntaskspernode=12; msi_resources_mem=64000; msi_queue=v100; msi_resources_gpu=gpu:v100:1; msi_resources_jobname=NODDIWatson; study_sharedfolder=/home/moanae/shared/project_K99_ChrTMDHCP_qunex02; qunex_container dwi_noddi_gpu --batchfile=${study_sharedfolder}/processing/batch_K99Aim2.txt --sessionsfolder=${study_sharedfolder}/sessions --sessions="10001" --noddi_model="Watson" --nv --bash_pre="module load cuda/12" --cuda_version="12" --parsessions=12 --overwrite="yes" --scheduler=SLURM,time=${msi_resources_time},nodes=${msi_resources_nodes},cpus-per-task=${msi_resources_ntaskspernode},mem=${msi_resources_mem},partition=${msi_queue},jobname=${msi_resources_jobname} --bind=${study_sharedfolder}:${study_sharedfolder},${FSLDIR}:${FSLDIR} --container=${HOME}/qunex/qunex_suite-0.99.2d.sif

error_dwi_noddi_gpu_10001_2024-05-08_09.58.54.118812.log (10.2 KB)
error_dwi_noddi_gpu_10001_2024-05-08_10.05.38.669503.log (10.2 KB)

demsarjure · May 8, 2024, 3:24pm

qunex_container also has a parameter called --cuda_path that can be used to specify where the local CUDA is installed so it gets properly mounted over the CUDA inside the container. For this to work, you need to figure out where CUDA is installed on your system. This usually works:

module load cuda/12
which nvcc

This will give you an output like /usr/local/cuda-12/bin/nvcc then you set --cuda_path="/usr/local/cuda-12".

We are just in the process of building the container for the next version (0.100.0), it should be out this month. We are currently working hard to make usage of CUDA for DWI commands more robust. But this is unfortunately often painful and not the most user friendly .

Best, Jure

Estephan · May 20, 2024, 9:08pm

Hi Jure, I found a solution for this problem, at least in my cluster system. I installed CUDIMOT locally in my $HOME folder to try it out.

I found repeated errors regarding bash commands like cp, mv and ls while trying to run NODDI. Through trial and error, I identified that when loading msi modules for fsl and cuda, they add library folder paths to the variable LD_LIBRARY_PATH (which is initially empty). By doing so, they affect how basic bash commands run in terminal leading to errors like this: “ls: relocation error: /lib64/libacl.so.1: symbol getxattr, version ATTR_1.0 not defined in file libattr.so.1 with link time reference”.

SOLUTION: after loading the required msi modules, need to place the folder “/lib64/” in the beginning of the content of LD_LIBRARY_PATH: export LD_LIBRARY_PATH=/lib64/:${LD_LIBRARY_PATH}

Maybe this will help others with similar problems. Cheers.

Estephan

demsarjure · May 21, 2024, 6:01am

Great, glad you solved it!

And thank you for a detailed description of the solution. This will be useful if some others encounter this issue as well.

Best, Jure