[RESOLVED] Two errors running HCP longitudinal pipeline

Description:

Not sure if it’s better to get help on this here or the HCP pipeline page, but thought I’d start here. I have 40 sets of scans that I’m running through the longitudinal HCP pipeline in qunex, and 33 have completed successfully - they all have the same data structure and i’m running them with the same script. Of the remaining 7, I’m getting the same error in 6 of them and a segmentation fault in the 7th (with the same result after attempting to re-run).

Call:
Here is my qunex call:

qdir=/work/long/qunex/ 							       
projectname=DBIS_longComb_21133
con=$qdir/qunex_suite-1.1.0.sif

qunex_container run_recipe \
    --container="${con}" \
	--bind="${qdir}" \
    --recipe_file="$qdir/$projectname/sessions/specs/recipe.yaml" \
    --recipe="hcp_longitudinal" \
	--scheduler="SLURM,jobname=hcp_long,cpus-per-task=6,time=96:00:00,mem-per-cpu=16000,partition=scavenger"

and my recipe:

global_parameters:
  sessionsfolder    : /work/long/qunex/DBIS_longComb_21133/sessions
  sessions          : p45,p52mprage,p52CSmprage1,p52CSmprage2,p52CSmprage3,p52CSmprage4
  overwrite         : "yes"
  batchfile         : /work/long/qunex/DBIS_longComb_21133/processing/batch.txt
  parsessions       : 6

recipes:

  hcp_longitudinal:
      parsessions: 6
      
      commands:
          - create_session_info:
              mapping: /work/long/qunex/DBIS_longComb_21133/sessions/specs/hcp_mapping.txt
          - setup_hcp
          - create_batch:
              targetfile: /work/long/qunex/DBIS_longComb_21133/processing/batch.txt
              paramfile : /work/long/qunex/DBIS_longComb_21133/sessions/specs/parameters.txt              
          - hcp_pre_freesurfer
          - hcp_freesurfer
          - hcp_post_freesurfer
          - hcp_long_freesurfer
          - hcp_long_post_freesurfer

Logs:

For the 6x repeated error, here’s the relevant output:

$ tail -20 /work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/error_hcp_long_post_freesurfer_21222_2025-03-21_03.08.42.941591.log 
Info: Time to read /work/long/qunex/DBIS_longComb_21222/subjects/21222/p45.long.base/T1w/wmparc.nii.gz was 0.922471 seconds.

some jobs had errors, please check /work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/extra_logs_hcp_long_post_freesurfer_21222/PostFreeSurferPipelineLongLauncher.sh.errjobs308835.15.log
Fri Mar 21 06:02:46 EDT 2025:PostFreeSurferPipelineLongLauncher.sh: While running '/opt/HCP/HCPpipelines/PostFreeSurfer/PostFreeSurferPipelineLongLauncher.sh --study-folder=/work/long/qunex/DBIS_longComb_21222/subjects/21222 --subject=21222 --sessions=p45@p52CSmprage1@p52CSmprage2@p52CSmprage3@p52CSmprage4@p52mprage --longitudinal-template=base --t1template=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_1mm.nii.gz --t1templatebrain=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_1mm_brain.nii.gz --t1template2mm=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_2mm.nii.gz --t2template=/opt/HCP/HCPpipelines/global/templates/MNI152_T2_1mm.nii.gz --t2templatebrain=/opt/HCP/HCPpipelines/global/templates/MNI152_T2_1mm_brain.nii.gz --t2template2mm=/opt/HCP/HCPpipelines/global/templates/MNI152_T2_2mm.nii.gz --templatemask=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_1mm_brain_mask.nii.gz --template2mmmask=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_2mm_brain_mask_dil.nii.gz --fnirtconfig=/opt/HCP/HCPpipelines/global/config/T1_2_MNI152_2mm.cnf --freesurferlabels=/opt/HCP/HCPpipelines/global/config/FreeSurferAllLut.txt --surfatlasdir=/opt/HCP/HCPpipelines/global/templates/standard_mesh_atlases --grayordinatesres=2 --grayordinatesdir=/opt/HCP/HCPpipelines/global/templates/91282_Greyordinates --hiresmesh=164 --lowresmesh=32 --subcortgraylabels=/opt/HCP/HCPpipelines/global/config/FreeSurferSubcorticalLabelTableLut.txt --refmyelinmaps=/opt/HCP/HCPpipelines/global/templates/standard_mesh_atlases/Conte69.MyelinMap_BC.164k_fs_LR.dscalar.nii --regname=MSMSulc --parallel-mode=BUILTIN --logdir=/work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/extra_logs_hcp_long_post_freesurfer_21222':
Fri Mar 21 06:02:46 EDT 2025:PostFreeSurferPipelineLongLauncher.sh: While running '/opt/HCP/HCPpipelines/PostFreeSurfer/PostFreeSurferPipelineLongLauncher.sh --study-folder=/work/long/qunex/DBIS_longComb_21222/subjects/21222 --subject=21222 --sessions=p45@p52CSmprage1@p52CSmprage2@p52CSmprage3@p52CSmprage4@p52mprage --longitudinal-template=base --t1template=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_1mm.nii.gz --t1templatebrain=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_1mm_brain.nii.gz --t1template2mm=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_2mm.nii.gz --t2template=/opt/HCP/HCPpipelines/global/templates/MNI152_T2_1mm.nii.gz --t2templatebrain=/opt/HCP/HCPpipelines/global/templates/MNI152_T2_1mm_brain.nii.gz --t2template2mm=/opt/HCP/HCPpipelines/global/templates/MNI152_T2_2mm.nii.gz --templatemask=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_1mm_brain_mask.nii.gz --template2mmmask=/opt/HCP/HCPpipelines/global/templates/MNI152_T1_2mm_brain_mask_dil.nii.gz --fnirtconfig=/opt/HCP/HCPpipelines/global/config/T1_2_MNI152_2mm.cnf --freesurferlabels=/opt/HCP/HCPpipelines/global/config/FreeSurferAllLut.txt --surfatlasdir=/opt/HCP/HCPpipelines/global/templates/standard_mesh_atlases --grayordinatesres=2 --grayordinatesdir=/opt/HCP/HCPpipelines/global/templates/91282_Greyordinates --hiresmesh=164 --lowresmesh=32 --subcortgraylabels=/opt/HCP/HCPpipelines/global/config/FreeSurferSubcorticalLabelTableLut.txt --refmyelinmaps=/opt/HCP/HCPpipelines/global/templates/standard_mesh_atlases/Conte69.MyelinMap_BC.164k_fs_LR.dscalar.nii --regname=MSMSulc --parallel-mode=BUILTIN --logdir=/work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/extra_logs_hcp_long_post_freesurfer_21222':
Fri Mar 21 06:02:46 EDT 2025:PostFreeSurferPipelineLongLauncher.sh: ERROR: 'return' command failed with return code: 1
Fri Mar 21 06:02:46 EDT 2025:PostFreeSurferPipelineLongLauncher.sh: ERROR: 'return' command failed with return code: 1

===> ERROR: Command returned with nonzero exit code
---------------------------------------------------
         script: PostFreeSurferPipelineLongLauncher.sh
stopped at line: 263
           call: return 1
  expanded call: return 1
       hostname: dcc-adrc-01
      exit code: 1
---------------------------------------------------

===> Aborting execution!

and

$ grep -A8 ERROR /work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/extra_logs_hcp_long_post_freesurfer_*/PostFreeSurferPipelineLongLauncher.sh.*.15.e.log 
ERRORS loading scene, output image may be incorrect.
NAME OF FILE: p45.long.base.StrainJ_MSMAll.164k_fs_LR.dscalar.nii
PATH TO FILE: /work/long/qunex/DBIS_longComb_21222/subjects/21222/p45.long.base/MNINonLinear
   File was not found
NAME OF FILE: p45.long.base.StrainR_MSMAll.164k_fs_LR.dscalar.nii
PATH TO FILE: /work/long/qunex/DBIS_longComb_21222/subjects/21222/p45.long.base/MNINonLinear
   File was not found

Info: Time to read /work/long/qunex/DBIS_longComb_21222/subjects/21222/p45.long.base/MNINonLinear/p45.long.base.L.midthickness.164k_fs_LR.surf.gii was 0.183195 seconds.

FWIW, I noticed that the p45.long.base.StrainJ_MSMAll.164k_fs_LR.dscalar.nii file does not exist for ANY of the jobs I’m running (including successful completions), but that p45.long.base.StrainJ_MSMSulc.164k_fs_LR.dscalar.nii exists for ALL of the jobs I’m running (including failures).

For the seg fault, here’s what I’m getting:

$ tail -30 /work/long/qunex/DBIS_longComb_21021/processing/logs/comlogs/error_hcp_freesurfer_p52CSmprage4_2025-03-21_11.56.50.328700.log 
029: dt: 0.0500, sse=76080.5, rms=0.428 (0.051%)
030: dt: 0.0500, sse=76114.1, rms=0.428 (0.025%)
positioning took 1.6 minutes
tol=1.0e-04, sigma=2.0, host=dcc-c, nav=16, nbrs=2, l_surf_repulse=5.000, l_tspring=0.100, l_nspring=0.050, l_location=0.250, l_curv=0.100
mom=0.00, dt=0.50
Segmentation fault 
Linux dcc-courses-15 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 19 21:25:22 EST 2024 x86_64 GNU/Linux

recon-all -s p52CSmprage4 exited with ERRORS at Sat Mar 22 03:21:03 EDT 2025

For more details, see the log file /work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/p52CSmprage4/scripts/recon-all.log
To report a problem, see http://surfer.nmr.mgh.harvard.edu/fswiki/BugReporting

Sat Mar 22 03:21:03 EDT 2025:FreeSurferPipeline.sh: While running '/opt/HCP/HCPpipelines/FreeSurfer/FreeSurferPipeline.sh --session-dir=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w --session=p52CSmprage4 --processing-mode=HCPStyleData --t1=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/T1w_acpc_dc_restore.nii.gz --t1brain=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/T1w_acpc_dc_restore_brain.nii.gz --t2=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/T2w_acpc_dc_restore.nii.gz':
Sat Mar 22 03:21:03 EDT 2025:FreeSurferPipeline.sh: While running '/opt/HCP/HCPpipelines/FreeSurfer/FreeSurferPipeline.sh --session-dir=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w --session=p52CSmprage4 --processing-mode=HCPStyleData --t1=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/T1w_acpc_dc_restore.nii.gz --t1brain=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/T1w_acpc_dc_restore_brain.nii.gz --t2=/work/long/qunex/DBIS_longComb_21021/sessions/p52CSmprage4/hcp/p52CSmprage4/T1w/T2w_acpc_dc_restore.nii.gz':
Sat Mar 22 03:21:03 EDT 2025:FreeSurferPipeline.sh: ERROR: '"${recon_all_cmd[@]}"' command failed with return code: 1
Sat Mar 22 03:21:03 EDT 2025:FreeSurferPipeline.sh: ERROR: '"${recon_all_cmd[@]}"' command failed with return code: 1

===> ERROR: Command returned with nonzero exit code
---------------------------------------------------
         script: FreeSurferPipeline.sh
stopped at line: 573
           call: "${recon_all_cmd[@]}"
  expanded call: "${recon_all_cmd[@]}"
       hostname: dcc-courses-15
      exit code: 1
---------------------------------------------------

===> Aborting execution!

(it popped up in the exactly the same place when i re-ran it)

Any suggestions on how to troubleshoot these would be appreciated! Happy to provide additional logs/info as needed. Thanks!

Hi!

  1. Are the 7 scans different from the other 33 in some way? Would they need a different mapping file perhaps?

  1. Can you please upload the full log file. I would need the whole output to see what is happening. This one:
/work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/error_hcp_long_post_freesurfer_21222_2025-03-21_03.08.42.941591.log

  1. This can be safely ignored:
$ grep -A8 ERROR /work/long/qunex/DBIS_longComb_21222/processing/logs/comlogs/extra_logs_hcp_long_post_freesurfer_*/PostFreeSurferPipelineLongLauncher.sh.*.15.e.log 
ERRORS loading scene, output image may be incorrect.
NAME OF FILE: p45.long.base.StrainJ_MSMAll.164k_fs_LR.dscalar.nii

This are just some QC things, you can see that it is trying to create some MSMAll related outputs, which cannot be generated since MSMAll was not executed yet.

Best, Jure

thanks for your quick response! Point #3 is good to know - seems like it might have been a bit of a red herring as I was troubleshooting.

And, it also turns out that whatever the real issue is seems to be sorting itself out by the THIRD re-run for most of these (some still going). As such, I now suspect that it’s related to something in the computing environment, and if any problems persist i’ll need to reach out to my hpc admins.

thanks again!!

Glad that things seem to work now. The seg fault issue in recon_all is something that is also unfortunately out of QuNex hands, could be a weird quirk with the data and some kind of a memory leak in there.

Let me know if it worked this time around.

Best, Jure

Hi aknodt!

Any updates here?

Best, Jure

Yes, all of my jobs have now completed successfully - re-running a third time ended up working for the last few that were throwing errors (-: Sorry to be premature in raising issues here, but thanks for your help nonetheless!