[RESOLVED] What is the proper way to run multiple sessions in parallel with run_turnkey?

Hi, I’m trying to run a batch of subjects through the HCPpipeline in parallel on my institution’s HPC using qunex. I have read through as much of the documentation as I can find that seems relevant but am having trouble piecing everything together to figure out exactly how to do it.

I would like to use the “run_turnkey” command so that I only have to submit one command each time I have a new batch of subjects to run through, but my attempts to set it up have not worked as expected. This is my latest attempt (where I’ve largely followed the example here)

con=$H/Scripts/Tools/qunex/qunex_suite-0.97.1.sif
qdir=/work/long/qunex/ 
IDs="0226 0227"

qunex_container run_turnkey \
    --container="${con}" \
	--bind=${qdir} \
	--dataformat="BIDS" \
    --paramfile="${qdir}/DBIS_P52/sessions/specs/parameters.txt" \
    --mappingfile="${qdir}/DBIS_P52/sessions/specs/hcp_mapping.txt" \
    --workingdir="${qdir}" \
    --projectname="DBIS_P52" \
    --path="${qdir}/DBIS_P52" \
    --sessions="$(echo $IDs | sed 's/ /,/g')" \
    --sessionsfoldername="sessions" \
    --turnkeytype="local" \
	--overwrite="append" \
    --turnkeysteps="create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer" \
	--scheduler="SLURM,jobname=qunex_turnkey,time=48:00:00,cpus-per-task=2,mem-per-cpu=16000,partition=scavenger"

One thing I’m particularly unsure about is how the “batchfile” should work here. I have included overwrite="append" and see that this results in the create_batch step appending the info from each session to the batch file (which by default seems to be ${qdir}/DBIS_P52/processing/parameters.txt). The scheduler is submitting a job for each subject, but it looks like each job is trying to run the pipeline for every subject, so i suspect i’m doing something wrong with the batch specification. If I have overwrite set to yes then that one batch file will only contain the session info for the last one to write to it, and both/all submitted jobs will run that one. (if i don’t specify, it defaults to append)

It may be irrelevant once i solve my batching issue, but I’ll note also that at one point I attempted to run 80 subjects this way, and they all failed in hcp_pre_freesurfer with the error:

Image Exception : #22 :: Failed to read volume /work/long/qunex/DBIS_P52/sessions/0270/hcp/0270/T1w/T1w_acpc
Error : Error: short read, file may be truncated

I can run a single subject(/session) successfully this way, so I’m wondering if this error has something to do with many jobs trying to do the same thing with the container at once.

(I’ll note also that I couldn’t figure out how to get import from bids to work with this command, so I have been manually setting up the nii directory and session.txt file for qunex to take it from there - seems to work great for a single subject)

Grateful for any clarification you can provide!

Hi,

Based on what you are providing the study is already created and the data is onboarded. In other words, the data for both sessions 0226 and 0227 was imported into QuNex (e.g., through import_dicom or import_hcp). It would be useful to me, if you let me know which importing was used as the steps that follow are slightly different between the two. Once I have that info, I can prepare an example call that should work.

Jure

Yes, the study has already been created and the data imported. I couldn’t figure out how to get import from bids to work with run_turnkey, so I have been manually setting up the nii directory and session.txt file for qunex to take it from there - this seems to work great for a single subject.

This is what the session folders look like when i’ve done that:

ark19@dcc-core-28  /work/long/qunex $ ls -l /work/long/qunex/DBIS_P52/sessions/0223/*
-rw-rw-r--. 1 ark19 root 293 Jun 23 08:26 /work/long/qunex/DBIS_P52/sessions/0223/session.txt
/work/long/qunex/DBIS_P52/sessions/0223/nii:
total 24494
-rwxr-xr-x. 3 ark19 root 12445465 Jun 23 08:13 1.nii.gz
-rwxr-xr-x. 3 ark19 root  5767140 Jun 23 08:13 2.nii.gz
-rw-r--r--. 3 ark19 root  1935478 Jun 23 08:13 3.nii.gz
-rwxr-xr-x. 3 ark19 root  1140036 Jun 23 08:13 4.nii.gz

ark19@dcc-login-01  /work/long/qunex $ cat /work/long/qunex/DBIS_P52/sessions/0223/session.txt
# Generated by ARK to match qunex format
#
session: 0223
subject: 0223
bids: /work/long/qunex/DBIS_P52/sessions/0223/bids
raw_data: /work/long/qunex/DBIS_P52/sessions/0223/nii
hcp: /work/long/qunex/DBIS_P52/sessions/0223/hcp

1: T1w
2: T2w acq-FLAIR
3: magnitude
4: phasediff

(I did it this way, rather than first running qunex_container import_bids, because I would like to be able to run just a single qunex command to process the batch of subjects to streamline the process, since i will need to be doing this regularly. Since each qunex_container command spawns a process and returns to the calling environment, it seems I can’t just have a script where I first call qunex_container import_bids then qunex_container run_turnkey because run_turnkey may start before import_bids is finished. I very much apologize if this is explained in the documentation somewhere and I have just missed it :frowning: i also realize that it would be more straightforward to import from dicom data with the run_turnkey command but would rather work from my bids data, so i chose to implement the workaround instead)

But in case my data import workaround was the cause of my issue, i just now started fresh with creating a new study and importing from bids:

con=$H/Scripts/Tools/qunex/qunex_suite-0.97.1.sif
qdir=/work/long/qunex/ 
projectname=test_project
bidsdir=$qdir/data

qunex_container create_study \
	--container="$con" \
	--bind=$qdir/ \
	--studyfolder=$qdir/$projectname 

qunex_container import_bids \
	--container="$con" \
	--bind=$qdir,$bidsdir\
	--sessionsfolder=$qdir/$projectname/sessions \
	--inbox=$bidsdir\
	--archive='leave' \
	--overwrite='yes'

qunex_container run_turnkey \
    --container="${con}" \
	--bind="${qdir}" \
	--dataformat="BIDS" \
    --paramfile="${qdir}/${projectname}/sessions/specs/parameters.txt" \
    --mappingfile="${qdir}/${projectname}/sessions/specs/hcp_mapping.txt" \
    --workingdir="${qdir}" \
    --projectname="${projectname}" \
    --path="${qdir}/${projectname}" \
    --sessions="0223,0224" \
    --sessionsfoldername="sessions" \
    --turnkeytype="local" \
    --turnkeysteps="create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer" \
	--scheduler="SLURM,jobname=qunex_turnkey,time=48:00:00,cpus-per-task=2,mem-per-cpu=16000,partition=scavenger"

So from here I’m getting the same behavior, where the batch file has been appended to, so both of the jobs that have been spawned are now working on processing subject 0223.

I realized that I could do a workaround where i just make a new “project” for every subject, and that seems to be working great now, but i would like to better understand how it’s supposed to work and again apologize that i seem to be missing it somehow! Thank you!

Hi, could you please upload the contents of the batchfile, found in the processing subfolder. Based on your description it could be that the sessions 0223 is listed in the batchfile twice while the other session is not in there for some reason. If that is the case you could try running all the steps before hcp_pre_freesurfer manually and then running only that part via run_turnkey. So:

con=$H/Scripts/Tools/qunex/qunex_suite-0.97.1.sif
qdir=/work/long/qunex/ 
projectname=test_project
bidsdir=$qdir/data

qunex_container create_study \
    --container="$con" \
    --bind=$qdir/ \
    --studyfolder=$qdir/$projectname 

qunex_container import_bids \
    --container="$con" \
    --bind=$qdir,$bidsdir\
    --sessionsfolder=$qdir/$projectname/sessions \
    --inbox=$bidsdir\
    --archive='leave' \
    --overwrite='yes'

qunex_container create_session_info \
    --container="$con" \
    --bind=$qdir,$bidsdir\
    --sessionsfolder=$qdir/$projectname/sessions \
    --sessions="0223,0224" \
    --mapping="${qdir}/${projectname}/sessions/specs/hcp_mapping.txt" \
    --overwrite='yes'

qunex_container create_batch \
    --container="$con" \
    --bind=$qdir,$bidsdir\
    --sessionsfolder=$qdir/$projectname/sessions \
    --sessions="0223,0224" \
    --paramfile="${qdir}/${projectname}/sessions/specs/parameters.txt" \
    --targetfile="$qdir/$projectname/processing/batch.txt" \
    --overwrite='yes'

qunex_container setup_hcp \
    --container="$con" \
    --bind=$qdir,$bidsdir\
    --sessionsfolder=$qdir/$projectname/sessions \
    --sessions="0223,0224" \
    --overwrite='yes'

qunex_container run_turnkey \
    --container="${con}" \
    --bind="${qdir}" \
    --workingdir="${qdir}" \
    --projectname="${projectname}" \
    --path="${qdir}/${projectname}" \
    --batchfile="${qdir}/${projectname}/processing/batch.txt" \
    --sessions="0223,0224" \
    --overwrite="yes" \
    --sessionsfoldername="sessions" \
    --turnkeytype="local" \
    --turnkeysteps="hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer" \
    --scheduler="SLURM,jobname=qunex_turnkey,time=48:00:00,cpus-per-task=2,mem-per-cpu=16000,partition=scavenger"

A workaround while we are figuring this out, is not to use run_turnkey for now and just run everything command by command. That should definitely work.

I also managed to get BIDS running through run_turnkey from scratch on our system. Below is the call adopted to your parameters. Hope it works.

con=$H/Scripts/Tools/qunex/qunex_suite-0.97.1.sif
qdir=/work/long/qunex/ 

qunex_container run_turnkey \
    --container="${con}" \
    --bind="${qdir}" \
    --dataformat="BIDS" \
    --paramfile="${qdir}/DBIS_P52/sessions/specs/parameters.txt" \
    --mappingfile="${qdir}/DBIS_P52/sessions/specs/hcp_mapping.txt" \
    --rawdatainput="${qdir}/data" \
    --workingdir="${qdir}" \
    --projectname="DBIS_P52" \
    --path="${qdir}/DBIS_P52" \
    --sessionsfoldername="sessions" \
    --sessions="0223,0224" \
    --sessionids="0223,0224" \
    --turnkeytype="local" \
    --turnkeysteps="create_study,map_raw_data,create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer" \
    --scheduler="SLURM,jobname=qunex_turnkey,time=48:00:00,cpus-per-task=2,mem-per-cpu=16000,partition=scavenger"

parameters.txt (4.2 KB)

Thanks again for your help! Uploading the batch file, which does have both sessions.

Since I need everything as automated and parallelizable as possible, I came up with a workaround where i just make a new “project” for every subject, and that seems to be working great and will do for now.

thanks also for the additional input on using run_turnkey with bids data. I think maybe i had previous focused more on the “import_” commands than the “map_raw_data”, since now when i follow your above example (which also added the sessionids line, the import step does get successfully executed. However, it does end up running into an error pretty quickly:

===> Successful completion of task

ls: cannot access /work/long/qunex//test_project2/sessions/0223,0224/nii/*: No such file or directory

----------------------------------------------------------------------------
  --> Batch file transfer check: pass
  --> Mapping file transfer check: pass
  --> BIDS mapping check: fail

ERROR. Something went wrong.

map_raw_data

ERROR. Something went wrong.
 ===> run_turnkey acceptance testing map_raw_data logs for completion.

 ===> run_turnkey acceptance testing found comlog file for map_raw_data step:
      /work/long/qunex//test_project2/processing/logs/comlogs/error_map_raw_data_0223,0224_2023-07-07_12.16.14.057096.log

 ===> ERROR: run_turnkey acceptance test for map_raw_data step failed.

 ===> RUNNING run_turnkey step ~~~ create_session_info


 -- Executed call:
    /opt/qunex/bin/qunex.sh create_session_info --sessionsfolder=/work/long/qunex//test_project2/sessions --sessions=0223,0224 --mapping=/work/long/qunex//test_project2/sessions/specs/hcp_mapping.txt

i’m guessing this might be related to the added “sessionids” specification, not sure - seems to think there’s another id 0223,0224:

ark19@dcc-login-01  /work/long/qunex $ ls test_project2/sessions
0223  0223,0224  0224  archive  inbox  QC  specs

if it’s somehow helpful to the project, i’m happy to keep troubleshooting, otherwise i think it’s best i call my work around good enough for now and move on to actually analyzing these data :slight_smile: thanks again!

That is really weird, the command above is copy pasted from one of my tests that I run before releasing a new version and there it works without any issues. The test also runs across two sessions which I provide as a comma separated list, just like in your case. No idea why it “finds” an extra session in your call. Just as a sanity check, please provide the full call you used for the above error. Unless the call is the exact copy paste of my command that is. I will try to get to the bottom of this next week.

seems weird indeed. here is what i am running, would think identical for all practical purposes but i could be missing something:

qdir=/work/long/qunex/ 							
export PATH=/work/long/qunex/:$PATH
con=$H/Scripts/Tools/qunex/qunex_suite-0.97.1.sif
projectname=test_project2

# copy specs files into $qdir/$projectname/sessions/specs since qunex always seems to look there
mkdir -p $qdir/$projectname/sessions/specs
cp ${qdir}/DBIS_P52/sessions/specs/hcp_parameters.txt $qdir/$projectname/sessions/specs/parameters.txt
cp ${qdir}/DBIS_P52/sessions/specs/hcp_mapping.txt $qdir/$projectname/sessions/specs/hcp_mapping.txt

qunex_container run_turnkey \
    --container="${con}" \
	--bind=${qdir} \
	--dataformat="BIDS" \
    --paramfile="$qdir/$projectname/sessions/specs/parameters.txt" \
    --mappingfile="$qdir/$projectname/sessions/specs/hcp_mapping.txt" \
    --rawdatainput="${qdir}/DBIS_P52/data" \
    --workingdir="${qdir}" \
    --projectname="$projectname" \
    --path="${qdir}/$projectname" \
    --sessionsfoldername="sessions" \
    --sessions="0223,0224" \
    --sessionids="0223,0224" \
    --turnkeytype="local" \
    --turnkeysteps="create_study,map_raw_data,create_session_info,setup_hcp,create_batch,hcp_pre_freesurfer,hcp_freesurfer,hcp_post_freesurfer"
	--scheduler="SLURM,jobname=qunex_turnkey,time=48:00:00,cpus-per-task=2,mem-per-cpu=16000,partition=scavenger"

I tried a bit more and was unfortunately unable to recreate this issue on our end. I made the run_turnkey code that parses sessions a bit more robust. Another option you could try is to provide a space separated list of sessions instead of a comma separated list (QuNex supports both), so:

...
    --sessions="0223 0224" \
    --sessionids="0223 0224" \
...

thank you! i will let you know if i try it again.