[RESOLVED] How to troubleshoot hcp_pre_freesurfer not running, no errors

So sorry for all my questions, hoping i’ll be up and running soon… I’m trying to run HCP pipeline with qunex both on an HPC (with the singularity/apptainer image) and a WSL installation on my desktop PC (with docker). I’m running with the exact same commands / data / setup parameters for each and finding that hcp_pre_freesurfer will run as expected on the HPC, but on my desktop it finishes instantly without doing anything or throwing an error (note that i have the “run” parameter set to “run”).

This is the log file it produces:

============================= LOG ================================
# Generated by QuNex 0.98.0 on 2023-06-15_10.29.02.790643
================================================================
gmri hcp_pre_freesurfer \
--sessionsfolder="/home/haririlab/qunex//sessions" \
--sessions="/home/haririlab/qunex//processing/batch.txt" \
--sessionids="1" \ 
======================================================
Starting multiprocessing sessions in /home/haririlab/qunex//processing/batch.txt with a pool of 1 concurrent processes
===> Final report for command hcp_pre_freesurfer 
===> Successful completion of all tasks 

and this is the call:

qunex_container hcp_pre_freesurfer \
    --container="${container}" \
    --bind=$qunexdir \
    --sessionsfolder="${qunexdir }/sessions" \
    --batchfile="${qdir}/processing/batch.txt"

How can i troubleshoot this?
Thanks!

Hi there!

Just for transparency, could you also provide the commands that set the used variables (${container}, ${qunexdir}, ${qdir}). The command call also has a blank space in ${qunexdir }.

If I understand correctly, you are using Singularity on HPC and Docker on your local system? The bind parameter of qunex_container is for Singularity only. So you can skip that for Docker. Docker should automatically mount your home folder (/home/haririlab), meaning that you can probably just skip that. If you need to mount external folders with Docker you need to do it a bit differently, see General overview — QuNex documentation.

Below is an example hcp_pre_freesurfer call that I just tested on my local system with Docker:

qunex_container hcp_pre_freesurfer \
  --sessionsfolder="/jd_data/test_studies/quickstart/sessions" \
  --batchfile="/jd_data/test_studies/quickstart/processing/batch.txt" \
  --dockeropt="-v /jd_data/test_studies:/jd_data/test_studies" \
  --container="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.98.0"

Jure

1 Like

thanks so much for your response and apologies for my sloppiness! I have updated my command to use dockeropt instead of bind and am including variables here:

con="gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.98.0"
qdir=/home/haririlab/qunex/

# hcp_pre_freesurfer
qunex_container hcp_pre_freesurfer \
    --container="${con}" \
	--dockeropt="-v /${qdir}:/${qdir}" \
    --sessionsfolder="${qdir}/sessions" \
    --batchfile="${qdir}/processing/batch.txt"

And yes, I am running qunex both on my instutition’s HPC with apptainer and on my local system (desktop windows machine, where I’m using WSL and docker) for a different purpose. I have now successfully run through hcp_postfreesurfer on the HPC (yay!), but am still stuck here for WSL. I just now (after editing to the above code) also tried starting over from scratch for this subject and am getting the same result (here’s the log file):

# Generated by QuNex 0.98.0 on 2023-06-18_11.19.01.567872
#


============================= LOG ================================


# Generated by QuNex 0.98.0 on 2023-06-18_11.19.01.567628
#
=================================================================
gmri hcp_pre_freesurfer \
  --sessionsfolder="/home/haririlab/qunex//sessions" \
  --sessions="/home/haririlab/qunex//processing/batch.txt" \
  --sessionids="1" \
=================================================================

Starting multiprocessing sessions in /home/haririlab/qunex//processing/batch.txt with a pool of 1 concurrent processes



===> Final report for command hcp_pre_freesurfer
===> Successful completion of all tasks           

Whereas I know it should look more like this (from my HPC) if it actually starts running commands:

# Generated by QuNex 0.97.1 on 2023-06-14_11.29.24.936013
#


============================= LOG ================================


# Generated by QuNex 0.97.1 on 2023-06-14_11.29.09.791660
#
=================================================================
gmri hcp_pre_freesurfer \
  --sessionsfolder="/cifs/hariri-long/Studies/DBIS_P52/Imaging/qunex//sessions" \
  --sessions="/cifs/hariri-long/Studies/DBIS_P52/Imaging/qunex//processing/batch.txt" \
  --sessionids="1133" \
=================================================================

Starting multiprocessing sessions in /cifs/hariri-long/Studies/DBIS_P52/Imaging/qunex//processing/batch.txt with a pool of 1 concurrent processes


------------------------------------------------------------
Session id: 1133 
[started on Wednesday, 14. June 2023 11:29:09]
Running HCP PreFreeSurfer Pipeline [HCPStyleData] ...

---> T1w image file present.
---> T2w image file present.
---> Magnitude Field Map 1 file present.
---> Phase Field Map 1 file present.

------------------------------------------------------------
Running HCP Pipelines command via QuNex:
...

I have cross-checked that i’m running the same code and my input files (images, parameters and mapping) are essentially the same between these two (ok, i did discover that i had the sequences listed in a slightly dif order in the mapping file but can’t imagine that matters).

I’m not sure how else to troubleshoot. Thanks!

QuNex will create two logs, one is a more detailed one that can be by default found in processing/logs/comlogs, this one will include outputs from all the external commands that QuNex calls. The second one is inside processing/logs/runlogs and includes only some top level information. Could you check both log folders for any information.

I see that in one case you are using the latest QuNex (0.98.0) and in one an older version (0.97.1). It will be probably easier to get to the bottom of this if you used the same version.

Also, it might be that hcp_pre_freesurfer is already completed here? Try adding --overwrite="yes" parameter to the call.

thank you!

i tried pulling and running v0.97.1 here (on my WSL / docker set up, to match version i used successfully on HPC) but am getting the same result (even if i start from the beginning with this version)

There are no logs in the comlogs dir corresponding to this step (the most recent one is done_create_batch_1_2023-06-18_13.38.09.396532.log). All i get in the runlogs dir is still the short log with the success message i pasted above.

Is it possible that there’s some reason things aren’t working as expected because I’m using WSL?

I forgot to add that hcp_pre_freesurfer has not been completed - there is nothing in the sessions/<id>/hcp/<id> dir except for the unprocessed dir (and nothing changes if i add overwrite="yes").

Hi, could you please also upload the batch.txt file that gets used.

Another thing you could try is to enter the container manually and work in it interactively:

# enter the container interactively
docker run -it gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.98.0 -v /home/haririlab/qunex:/home/haririlab/qunex /bin/bash

# test if the container is working by printing help for hcp_pre_freesurfer
qunex hcp_pre_freesurfer -h

# run the command
qunex hcp_pre_freesurfer \
  --sessionsfolder="/home/haririlab/qunex/sessions" \
  --batchfile="/home/haririlab/qunex/processing/batch.txt" \
  --overwrite="yes"

Ah, thanks for the ideas! When i entered the container, hcp_pre_freesurfer -h worked fine, and hcp_pre_freesurfer with run options did the same thing as before.

NB, i had to switch around the order of arguments in the docker command to:

docker run -it -v /home/haririlab/qunex:/home/haririlab/qunex gitlab.qunex.yale.edu:5002/qunex/qunexcontainer:0.98.0 /bin/bash

since i was getting this error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-v": executable file not found in $PATH: unknown.                                                                                                                                                                               ERRO[0001] error waiting for container:

Here is the batch file:

# Generated by QuNex 0.97.1 on 2023-06-18_13.38.09.396908
#
# Sessions folder: /home/haririlab/qunex/sessions
# Source files: ['session_hcp.txt']
# Parameter file: /home/haririlab/qunex//sessions/specs/hcp_parameters.txt
#
# Generated by QuNex 0.98.0 on 2023-06-13_17.24.38.830995
#
#  Parameters file
#  =====================
#
#  ...

### ARK: these are pulled from https://gitlab.qunex.yale.edu/qunex/qunex/-/blob/master/python/qx_utilities/templates/parameters_multiband_example.txt

# ---> PreFreeSurfer
 
_hcp_suffix             :
_hcp_brainsize          : 150
_hcp_t2                 : t2						## does this just pair with name in session file or something?
_hcp_t1samplespacing    : 0.0000079
_hcp_t2samplespacing    : 0.0000026
_hcp_unwarpdir          : z
_hcp_gdcoeffs           : NONE
_hcp_avgrdcmethod       : SiemensFieldMap
_hcp_topupconfig        : b02b0.cnf					## i think this might refer to a file in the tools native config; prev we had set to /opt/HCP-Pipelines/global/config/b02b0.cnf
_hcp_printcom           :
_hcp_sephaseneg         : NONE
_hcp_sephasepos         : NONE
_hcp_seechospacing      : NONE						## this was just 'echospacing' before	
_hcp_seunwarpdir        : NONE
 
# --->  FS
_hcp_freesurfer_home    : ${FREESURFER_HOME}

# ---> PostFreeSurfer

_hcp_regname            : MSMSulc
_hcp_grayordinatesres   : 2
_hcp_hiresmesh          : 164
_hcp_lowresmesh         : 32
 

# --- hcp_pre_freesurfer options

_hcp_echodiff             : 2.46                # ... the delta in TE times for the hi-res fieldmap image [''].

# --- Processing options

_run                      : run # ... Run type: run - do the task, test - perform checks.
_log                      : keep            # ... Whether to remove ('remove') the temporary logs once jobs are completed, keep them in the study level processing/logs/comlogs folder ('keep' or 'study') in the hcp folder ('hcp') or in a <session id>/logs/comlogs folder ('sessions'). Multiple options can be specified separated by '|'.

---
# Generated by QuNex 0.97.1 on 2023-06-18_13.32.33.195258
#
session: 1
subject: 1
bids: /home/haririlab/qunex/sessions/1/bids
raw_data: /home/haririlab/qunex/sessions/1/nii
hcp: /home/haririlab/qunex/sessions/1/hcp
hcpready: true
1   :T1w             :T1w: fm(1)
2   :T2w             :T2w acq-FLAIR: fm(1)
3   :FM-Magnitude    :magnitude: fm(1)
4   :FM-Phase        :phasediff: fm(1)

I see that session name is different between one container and the other, in one case session name is 1 in the other 1133. On paper this should not change anything, but we never used such simple session names (1) in our tests and there might be a weird interaction there.

Could you try running this in the container:

gmri hcp_pre_freesurfer \
  --sessionsfolder="/home/haririlab/qunex/sessions" \
  --sessions="/home/haririlab/qunex/processing/batch.txt"

gmri is bypass of our entry point and runs the Python code directly. This will work if there are some issues with our entry point.

I would also try removing _run and _log from the batch file, these are the default values so setting them does not change anything. What happens if you set _run to test?

Could you maybe upload this session 1 somewhere, so I can try replicating everything on our end. That would be much easier to debug. I will be away for a couple of days and will continue helping you out next week.

thank you so much for your help! just now i moved over the data for 1133 that i was successfully working with on my HPC and also removed the _run and _log settings as you suggested, and now it’s running! Given that i made all the changes at once i’m not sure exactly which one it was that did the trick unfortunately. though the data are from dif subjects, they are the same in every other way (scanner, acquisitions, file format / naming) so i’d suspect the short session name (1 vs 1133) before that.

i’ll note also that i had tried running the above gmri command in the container before makign those changes, with the same result as earlier (not running, no error).

should be good on this for now unless i run into another question - thanks again!

No problem, glad it is working now. If you figure out what exactly was the issue, please let me know so we can fix it. If you encounter any other issues feel free to open new posts on the forum.