Hi, I’m trying to run a batch of subjects through the HCPpipeline in parallel on my institution’s HPC using qunex. I have read through as much of the documentation as I can find that seems relevant but am having trouble piecing everything together to figure out exactly how to do it.
I would like to use the “run_turnkey” command so that I only have to submit one command each time I have a new batch of subjects to run through, but my attempts to set it up have not worked as expected. This is my latest attempt (where I’ve largely followed the example here)
con=$H/Scripts/Tools/qunex/qunex_suite-0.97.1.sif
qdir=/work/long/qunex/
IDs="0226 0227"
qunex_container run_turnkey \
--container="${con}" \
--bind=${qdir} \
--dataformat="BIDS" \
--paramfile="${qdir}/DBIS_P52/sessions/specs/parameters.txt" \
--mappingfile="${qdir}/DBIS_P52/sessions/specs/hcp_mapping.txt" \
--workingdir="${qdir}" \
--projectname="DBIS_P52" \
--path="${qdir}/DBIS_P52" \
--sessions="$(echo $IDs | sed 's/ /,/g')" \
--sessionsfoldername="sessions" \
--turnkeytype="local" \
--overwrite="append" \
--turnkeysteps="create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer" \
--scheduler="SLURM,jobname=qunex_turnkey,time=48:00:00,cpus-per-task=2,mem-per-cpu=16000,partition=scavenger"
One thing I’m particularly unsure about is how the “batchfile” should work here. I have included overwrite="append"
and see that this results in the create_batch
step appending the info from each session to the batch file (which by default seems to be ${qdir}/DBIS_P52/processing/parameters.txt
). The scheduler is submitting a job for each subject, but it looks like each job is trying to run the pipeline for every subject, so i suspect i’m doing something wrong with the batch specification. If I have overwrite
set to yes
then that one batch file will only contain the session info for the last one to write to it, and both/all submitted jobs will run that one. (if i don’t specify, it defaults to append
)
It may be irrelevant once i solve my batching issue, but I’ll note also that at one point I attempted to run 80 subjects this way, and they all failed in hcp_pre_freesurfer with the error:
Image Exception : #22 :: Failed to read volume /work/long/qunex/DBIS_P52/sessions/0270/hcp/0270/T1w/T1w_acpc
Error : Error: short read, file may be truncated
I can run a single subject(/session) successfully this way, so I’m wondering if this error has something to do with many jobs trying to do the same thing with the container at once.
(I’ll note also that I couldn’t figure out how to get import from bids to work with this command, so I have been manually setting up the nii directory and session.txt file for qunex to take it from there - seems to work great for a single subject)
Grateful for any clarification you can provide!