Fmri processing issues

Regarding your diffusion issues Olivier. Did you maybe run run_qc for diffusion and investigate the outputs? Did you investigate the outputs and logs from any of the steps, e.g. dwi_probtracx_dense_gpu? The structural connectome indeed looks weird, so there are probably issues with some of the steps.

Hi, Jure

I really didn’t run run_qc to check the diffusion data, but given that the dtifit results are normal, I think the raw data may not be a problem.

The comlogs for the intermediate processing steps look fine, there are no exceptions reported in between, and they all output the results of each step correctly.

For example, the dwi_probtracx_dense_gpu logs are as follows.

# Generated by QuNex 1.1.0 [QIO] on 2025-05-06_19.56.45.771482
#


-- qunex.sh: Specified Command-Line Options - Start --
   Study Folder: xxxx
   Sessions Folder: xxxx
   Session: xxxx
   probtrackX GPU scripts Folder: /opt/qunex/bash/qx_utilities/diffusion_tractography_dense/tractography_gpu_scripts
   Compute Matrix1: yes
   Compute Matrix3: no
   Number of samples for Matrix1: 10000
   Number of samples for Matrix3: 3000
   Distance correction: no
   Store streamlines length: no
   Force Matrix1: no
   Overwrite prior run: yes
   No GPU: yes
-- qunex.sh: Specified Command-Line Options - End --

------------------------- Start of work --------------------------------


   --- probtrackX GPU for session xxxx...


 --- Removing existing Probtrackxgpu Matrix1 dense run for xxxx...


Checking if ProbtrackX Matrix 1 and dense connectome was completed on xxxx...


ProbtrackX Matrix 1 solution and dense connectome incomplete for xxxx. Starting run with 10000 samples...

Running the following probtrackX GPU command: 

---------------------------

   /opt/qunex/bash/qx_utilities/diffusion_tractography_dense/tractography_gpu_scripts/run_matrix1.sh xxxx 10000 no no yes

---------------------------

-- Queueing Probtrackx

Log directory is: xxxx/MNINonLinear/Results/Tractography
Running in seedmask mode
load seeds
done.
Load bedpostx samples
1_1
1_2
1_3
2_1
2_2
2_3
3_1
3_2
3_3

nfibres  : 3
nsamples : 50

Done loading samples.
Volume seeds
volume 1
volume 2
volume 3
volume 4
volume 5
volume 6
volume 7
volume 8
volume 9
volume 10
volume 11
volume 12
volume 13
volume 14
volume 15
volume 16
volume 17
volume 18
volume 19
Surface seeds
surface 0
surface 1

time spent tracking: 272439 seconds

save results
finished

-- Queueing Post-Matrix 1 Calls

parsed 'a/908573336' as 'a / 908573336'
parsed 'log(1+a)' as 'log(1 + a)'

-- Matrix 1 Probtrackx Completed successfully.

dwi_probtracx_dense_gpu for xxxx completed successfully!

------------------------- Successful completion of work --------------------------------

The output folder for dwi_probtracx_dense_gpu is as follows (xxxx/MNINonLinear/Results/Tractography/)

At a glance all looks good. Maybe try the Matrix3 model of the probtrackx?

Any updates regarding the issues outlined here?

Hi, Jure

Instead of using dwi_parcellate, I used wb_command -cifti-parcellate for the structural join matrix fetch, which seems to work fine from the results?

Of course, I’m still using the Matrix1 model, and I’m not quite sure what the exact difference is between the Matrix1 model and the Matrix3 model. It looks like the Matrix1 model stores the number of connections between all seed points and between seed points and other seed points, i.e., the rows and columns of the matrix correspond to different seed points (ROIs), which directly reflects the structural connectivity between ROIs, which seems to be closer to what I need? (Obtaining n×n structural connectivity matrices using a specified graph)

Also, I tried to accelerate the dwi_probtracx_dense_gpu process with a GPU in my local QuNex, but it reported an error CUDA Runtime Error: CUDA driver version is insufficient for CUDA runtime version, unfortunately I didn’t find a solution in the forum that could solve the problem.
Currently using A100 acceleration on the cluster and GPU acceleration locally (cuda12.2) both fail and I have to use the cluster CPU for processing (which is very slow indeed).

Interesting, maybe you just did not configure dwi_parcellate correctly, as it literally does what you desrcibe above (with some additional functionalities). See qunex/bash/qx_utilities/dwi_parcellate.sh at master · ULJ-Yale/qunex · GitHub for the source code of it.

For CUDA stuff, can you please paste exact commands that you used. On some systems getting GPUs to work with containers can be tricky.

Best, Jure

Hi, Jure

My command is as follows.

qunex_container dwi_probtrackx_dense_gpu 
--sessionsfolder='XXX/sessions' \
--sessions="XXX" \
--omatrix1='yes' \
--nsamplesmatrix1='10000' \
--overwrite='yes' \
--bind="/usr/local/cuda-12.2/:/usr/local/cuda/" \
--nv \
--bash_pre="module load CUDA/12.2"  \
--bash_post="export DEFAULT_CUDA_VERSION=12.2" \
--container="qunex/qunex_suite:1.1.0"

Also, I’m not quite sure if I should use the Matrix1 model or the Matrix3 model if I want to compute the structural connectivity matrix for a given atlas? I noticed that the documentation example seems to suggest Matrix1, while in the forum Q&A people seem to use Matrix3 more often!

Hi Oliver,

1/ MatrixN
Matrix1 assumes only one fiber direction per voxel. It is simpler and faster to fit as it uses a single tensor fit, this makes it less accurate, mainly regions with crossing or branching fibers. Matrix3 supports up to three fiber directions per voxel. It fits a Bayesian model to capture multiple fiber populations and is more accurate in regions with fiber crossings.

2/ CUDA
I would advise the following. Start by removing --bash_post="export DEFAULT_CUDA_VERSION=12.2", FSL does not have binaries compiled for this specific version and by setting this you are demanding binaries that do not exist. Next, you can try downloading the latest container (1.2.2 which has updated CUDA version in there). Finnally, instead of --nv you can try using --cuda but that one needs you to have the CUDA container toolkit installed. We found out that this option works the most robustly across different systems and containers. Also --bind="/usr/local/cuda-12.2/:/usr/local/cuda/" this might be redundant, also try without it. Unfortunately, CUDA setup and drivers are system specific and there is no simple way that works on all systems.

Let me know how it goes.

Best, Jure

1 Like

Hi, Jure

You are right, I can execute the dwi_probtrackx_dense_gpu command correctly after updating the container to version 1.2.2 (I didn’t download the CUDA container toolkit additionally).

The command I use is as follows.

qunex_container dwi_probtrackx_dense_gpu 
--sessionsfolder='XXX/sessions' \
--sessions="XXX" \
--omatrix1='yes' \
--nsamplesmatrix1='10000' \
--overwrite='yes' \
--bind="/usr/local/cuda-12.2/:/usr/local/cuda/" \
--cuda  \
--container="qunex/qunex_suite:1.2.2"

When executing the above command, I have some questions about some of the numbers involved in this and would appreciate your answers.

I am testing here using the Matrix1 model which seems to track all 91282 vertices of the whole brain as seed points for fiber tracking separately, and since I set nsamplesmatrix1 to 10000, a total of 91282*10000=912820000 streamlines are tracked. probtrackx can track 216166 fiber bundle streamlines in parallel, and two parallel streams are used to share the pressure of parallel tracking, a total of 912820000/216166=4222.77 rounds of tracking are needed, and there are two parallel streams for each round, which makes a total of 8446 parallel streams.

During the test I noticed that the memory footprint maxed out at around 32G, which seems to imply that if I want to do fiber tracing in parallel, the maximum number of parallel subjects set conservatively at 128G of memory is 3.

Running in seedmask mode
Loading tractography data
Number of Seeds: 91282
Dimensions Matrix1: 91282 x 91282

Time Loading Data: 28 seconds


...................Allocated GPU 0...................
Device memory available (MB): 22988 ---- Total device memory(MB): 24563
Memory required for allocating data (MB): 464
Device memory available after copying data (MB): 22486
Running 216166 streamlines in parallel using 2 STREAMS
Total number of streamlines: 912820000
Iteration 1 out of 8446
Iteration 2 out of 8446
Iteration 3 out of 8446
Iteration 4 out of 8446
Iteration 5 out of 8446

Below are the commands I used to compute the structural connectivity matrix using workbench, I’ll follow up by double-checking the details of the use of dwi_parcellate based on the code you’ve provided, and I think I’m running into some placement issues somewhere.

wb_command -cifti-parcellate \
/XXX/MNINonLinear/Results/Tractography/Conn1.dconn.nii.gz \
/XXX/Q1-Q6_RelatedValidation210.CorticalAreas_dil_Final_Final_Areas_Group_Colors.32k_fs_LR.dlabel.nii \
ROW \
/XXX/temp.dpconn.nii \
-method MEAN

wb_command -cifti-parcellate \
/XXX/temp.dpconn.nii \
/XXX/Q1-Q6_RelatedValidation210.CorticalAreas_dil_Final_Final_Areas_Group_Colors.32k_fs_LR.dlabel.nii \
COLUMN \
/XXX/MMP.pconn.nii \
-method MEAN

Best, Olivier

Glad we got GPU processing up and running!

Unfortunately, my DWI expertise is not sufficient to confidently answer your questions. Let me consult some of my colleagues that know much more about this and get back to you. I guess you would like to know if the following

...
--omatrix1='yes' \
--nsamplesmatrix1='10000' \
...

Followed by your parcellation makes sense?

I can chime in about the memory use though. Yes, this command is MEM heavy. For matrix3 I needed to use 64GB per sessions not to get out of memory errors. Maybe you can get away with 32GB for matrix1. So, if your system has 128 GB of memory, your parallelis cap seems to be 3-4 sessions.

Also note that you would need 1 GPU for each session if you want to process them in parallel.

Best, Jure

1 Like

Thank you very much, Mr. Jure!

This is just my parameter settings for testing dwi_probtrackx_dense_gpu, after I am sure that it will work correctly I will follow your advice to switch to the Matrix3 model and use the default number of samples. The whole fiber tracking process took me about 50min, and since I only have one GPU locally I guess I probably can’t process them in parallel, but it’s already much faster than doing fiber tracking with a CPU, and I’m quite content with that.

Also, I tried using the GPU version of dwi_probtrackx_dense_gpu on the cluster. Although I can successfully submit the job on A10080g, I seem to encounter a similar error as Estephan, which seems to be not Loading tractography data correctly, resulting in subsequent failure to read the appropriate data. (Even though Estephan seems to be reading in the data correctly there is still an error opening,failed to open error, which is strange!) (qunex_suite-1.2.2.sif)

我在集群上使用的指令如下:

qunex_container dwi_probtrackx_dense_gpu \
--sessionsfolder="XXX/sessions"  \
--sessions="XXX" \
--nsamplesmatrix3='3000' \
--omatrix3='yes' \
--overwrite='yes' \
--bash_pre="module load modules/cuda/12.2" \
--nv \
--container="${QUNEX_CONTAINER}" \
--scheduler=SLURM,jobname=probtrackx,time=05:00:00,partition=a10080g,gres=gpu:1

Could be an out of memory issue when loading the data for some reason. Can you add mem=64G or even mem=128G to the scheduler parameter. Estephan had to use 128G in his case I believe. You are not allocating any memory explicitly, so the default could be too low.

Also, whenever you run a command QuNex will create more detailed logs in <study>/processing/logs, there runlogs are high level overview logs and comlogs are detailed logs of downstream commands. You can also tweak the location or organization (e.g., put logs for each command in its own folder) of these logs via the --logfolder parameter. When the errors get specific uploading the logs helps a lot.

Best, Jure

Hi Oliver,

The difference between MRTRIX/Conn1 and MRTRIX/Conn3 is the seeding method - For Conn1 you seed a single streamline from each grey-ordinate 10,000 times (default) and see where it terminates in the grey matter. For Conn3, you seed from each white-ordinate 3,000 times (default) by sending two streamlines and when they hit grey matter that counts as a connection. Conn1 can overestimate the density of very short-range tracts relative to long-range tracts, but may also be more similar to what you would get with tracers (also there’s limited evidence here). Generally I run both to make sure an effect replicates, but they should be quite similar and I don’t think there’s strong evidence for decidedly choosing one over the other. The number of fiber orientations is determined by a different command (the bedpostx command).

Your command to generate probabilistic tractography looks correct (using the default matrix1 = ‘yes’ and ‘10000’ samples).

The very small values in the connectivity matrix make sense. I don’t see the parcellation command you used but likely you parcellated the waytotal normalized matrix using dwi_parcellate (i.e. I think you set the “waytotal” flag to “standard” or “log”). The waytotal matrix is the connectivity matrix divided by the waytotal value (which is the total number of valid streamlines generated for that participant; it’s a single number stored in the waytotal file in your directory). Since the waytotal value is very, very large, when you divide streamlines by this number they, of course, become very, very small. So streamline counts on the order of 10x-07 is expected. Using the waytotal-normalized matrix makes sense if you want to compare streamline counts between subjects or average data across subjects.

The reason your wb_command parcellation matrix looks different is because you probably parcellated the non-waytotal normalized matrix (i.e. Conn1.dconn.nii.gz). You also have Conn1_waytotnorm.dconn.nii.gz in that folder and Conn1_waytotnorm_log.dconn.nii.gz (log means log normalized - this accounts for distance bias). I would think about which dconn file best suits your hypotheses/analysis pipeline and go from there, but the command seems to be operating as expected.

Lastly, the qunex parcellate command does a bit more than just parcellate - I believe for Conn1 it also symmetrizes the matrix. Just an FYI in case you want to compare more outputs from wb_command -cifti-parcellate and the dwi_parcellate command.

I got a bit lost since these posts cover a lot of topics - hopefully this addressed your question and happy to try and answer any other diffusion-related questions you may have.

1 Like

Whoops, one more thing: definitely QC the tractography data. 64GB of memory works for me. However, if you don’t have enough memory sometimes the streamlines will “cut off” after a certain point - often near the bottom the brain. I normally use the sum file for QC purposes: “Conn1_sum.dscalar.nii”

1 Like

Hi Olivier,

Did we manage to resolve your issues or do you have any follow up questions?

Best, Jure

Hi, Jure

There are still some problems with the application of the dwi_parcellate command, but I think I should be able to work it out. Many thanks to you and amber for your help!

If you post the exact command you are using along with logs and issues in outputs, we can try to help you out on that front as well.

Best, Jure