I am running the dwi_legacy_gpu command using QuNex version v0.97.3. I encountered an error with eddy_cuda10.2, error stack trace is provided below. I am unable to decipher the error log.
I have updated the default CUDA version to 10.2 as I have encountered an issue with 10.1 /opt/qunex/bash/qx_utilities/dwi_legacy_gpu.sh: line 565: /opt/fsl/fsl/bin/eddy_cuda10.1: No such file or directory
Could you please assist me in resolving this issue? I have attached the complete log output of the dwi_legacy_gpu command below. Let me know if you need any more information. Thanks in advance.
CUDA 10.2 and above support is currently functional only for the hcp_diffusion command. Next version (to be released in a week or so) will add this support to the whole DWI pipelines. The testing of the next release is in its final phase, so things are looking good for a release by the end of May.
I tested the dwi_legacy_gpu command using the latest Qunex version (v0.98.0), but I am still observing an invalid eddy_cuda10.2 command output as seen below
�����
�����
�����
0}��U
EDDY::: Eddy failed with message `q��U
Is that it is not necesarily a CUDA or QuNex container error. When developing DWI rework I had to bust my head a number of times when I was receving this. Sometimes it was due to some bugs in my code, but often it was because of issues with input data. The problem here is that this is unreadable to human and hard to resolve.
What I found the easiest is to use the add the --nogpu='yes' parameter to the dwi_legacy_gpu command call. In this case command will be executed without CUDA. If it works, then the error is CUDA related and we can debug it with that in mind. If it does not work, it will throw a humanly readable error.
Also, could you just provide the full call you used at the end. So we are at the same page.
I have tested the code both without the --nogpu=‘yes’ parameter and with the --nogpu=‘yes’ parameter on the same input data as requested. The results are as follows.
OK, this is good. At least we have a plan B in case we cannot figure out what is happening when GPU mode is enabled. The problem here is, that on all of our CUDA systems the command is working and we have systems with completely different GPU cards and CUDA versions so I am unable to recreate your error. Furthermore, the reported error is utter gibberish and not helpful in any way whatsoever.
One more thing you could try is to add
--cuda_path=<PATH TO CUDA ON YOUR SYSTEM>
To the qunex_container call. The path is the path to your local CUDA installation, usually something like /usr/local/cuda-12 or similar.