[RESOLVED] SLURM arrays fail in qunex_suite-1.0.3

Description:
Using the built-in scheduler function to schedule SLURM arrays does not seem to work. I expect there to be a job created for each session. Each job would run with 4 cores, a 12G mem limit, and a 24h time limit. These jobs never make it to the queue.

import_bids, create_session_info, create_batch, and setup_hcp ran sucesfully for this dataset.

I manually created a SLURM script to submit hcp_pre_freesurfer jobs as arrays due to the same errors with --scheduler described here thinking it was specific to hcp_pre_freesurfer.

The functionality last worked as expected in v99.1, for both hcp_pre_freesurfer and hcp_freesurfer

Call:

${PROJSPACE}/qunex_container hcp_freesurfer \
	--sessionsfolder="${DATADIR}/sessions" \
	--batchfile=${DATADIR}/processing/batch.txt \
	--bind="${DATADIR}:${DATADIR}" \
	--container="${PROJSPACE}/qunex_suite-1.0.3.sif" \
	--hcp_fs_extra_reconall='-parallel|-openmp|4|' \
	--overwrite="yes" \
	--scheduler="SLURM,array,cpus-per-task=4,time=24:00:00,mem-per-cpu=3000,jobname=qnx_fs"

Logs:
Relevant output after running this command

singularity exec --bind /nas/longleaf/home/dcmonroe:/nas/longleaf/home/dcmonroe --cleanenv --env SLURM_ARRAY_TASK_ID=${p},SLURM_ARRAY_TASK_MAX=${p} --bind /work/users/d/c/dcmonroe:/work/users/d/c/dcmonroe /proj/dcmonroelab/SYNK-share/qunex_suite-1.0.3.sif bash /nas/longleaf/home/dcmonroe/qunex_container_command_2024-12-29_09.15.56.139445.sh

It is not clear to me what the ${p} is referring to. It seems to me that it is the cause of the error below.

The relevant language from the container_job*.txt

Traceback (most recent call last):
  File "/opt/qunex/python/qx_utilities/gmri", line 542, in <module>
    main()
  File "/opt/qunex/python/qx_utilities/gmri", line 488, in main
    runCommand(comm, opts)
  File "/opt/qunex/python/qx_utilities/gmri", line 70, in runCommand
    gp.run(command, args)
  File "/opt/qunex/python/qx_utilities/general/process.py", line 2222, in run
    sessions, gpref = gc.get_sessions_list(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/qunex/python/qx_utilities/general/core.py", line 369, in get_sessions_list
    slurm_array_ix = int(os.environ["SLURM_ARRAY_TASK_ID"])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''

finally output from head -20 ${PROJSPACE}/qunex_container

#!/usr/bin/env python
# encoding: utf-8
#
# SPDX-FileCopyrightText: 2021 QuNex development team <https://qunex.yale.edu/>
#
# SPDX-License-Identifier: GPL-3.0-or-later
#
# Version 0.100.0 [QX IO]

from __future__ import print_function, division
import subprocess
import os
import sys
import re
import math

from datetime import datetime


class CommandError(Exception):

-DCM

Hi,

Thanks for bringing this up. The good thing is that this seems like a qunex_container bug and we do not need a new container release to fix it. I just need to fix this in the script, which I should be able to do tomorrow. Just a couple of quick questions before I dig into this.

  1. What is the Python version you are using to run qunex_container. These {p} thingies should be replaces with some values by the python code. And it uses a somewhat recent technique called f-strings. So it might be that you are using an older Python and this does not get executed properly.

  2. Does it work without the array? If you remove the array keyword, it should create a job for each session, but without the job array. So, if you have a huge amount of sessions (several 100) you might reach the system job limit. This is why we added the array functionality.

Best, Jure

Jure,

Thanks for the quick response. I see–this makes sense.

  1. Python 3.6.8 is the default on our system. I think that version should handle the fstrings correctly.
  2. Right, of course. Yes, it works fine without ‘array’.

-Derek

Hi Derek,

I think I fixed it. Please try with this qunex_container script: qunex_container (41.7 KB).

You can also get it the official way:

wget http://jd.mblab.si/qunex/qunex_container

Best, Jure

Jure,

Fantastic. Yes, array works with the modified qunex_container.

This exercise suggests I may have misunderstood the documentation as it relates to the specific jobs I am running though. Would you mind confirming or correcting the points below?

  • The addition of array only modifies job naming/numbering behavior if the number of ‘elements per session’==1. This is the case for a freesurfer job being run on a single session with (cross-sectional) T1w and T2w images. The example in the documentation involves multiple tasks/elements given that there are multiple BOLD files per session
  • Setting parelements with array won’t do anything meaningful if there is only a single ‘task’ or ‘element’ per session.
  • To reduce SLURM overhead it is advantageous to combine parallel and serial functionality (e.g. 200 jobs with 20 cores each running 5 sessions in parallel vs. 1000 jobs with 4 cores each). In this case I should resort to using parjobs and parsessions and disregard array.
${PROJSPACE}/qunex_container hcp_freesurfer \
	--sessionsfolder="${DATADIR}/sessions" \
	--batchfile=${DATADIR}/processing/batch.txt \
	--bind="${DATADIR}:${DATADIR}" \
	--container="${PROJSPACE}/qunex_suite-1.0.3.sif" \
	--hcp_fs_extra_reconall='-parallel|-openmp|4|' \
    --parjobs=200 \
    --parsessions=5 \
	--overwrite="yes" \
	--scheduler="SLURM,cpus-per-task=20,time=24:00:00,mem-per-cpu=3000,jobname=qnx_fs"

Thanks again for your help!
-Derek

Hi,

  1. When running things on HCP systems through the scheduler there are often limitations about how many jobs a user can spawn or how many active jobs users can have at the same time. For example, on the system that I use the most, that limit is 200. That means that if I have a QuNex study with more than 200 sessions, I need to be careful. Let’s assume that I have a study with 1000 sessions and I want to run hcp_freesurfer. If I just fire it up with the defaults and without the array keyword, QuNex will spawn 1000 jobs, one for processing of each session. As a result, the first 200 jobs will start executing, while the other 800 will fail as they will violate my 200 jobs limit. If I add the array keyword, SLURM will create 1 “root/top-level” job that will orchestrate everything and spawn “sub-jobs”. So, I will have the root job + 199 processing jobs. Once one of the processing jobs finishes processing a session, the root job will spawn a new “sub-job”, this will repeat until all 1000 jobs are done. So, we added the array keyword to support processing of huge studies which violate HCP/scheduler constraints.

  2. parelements is useful only for commands that process multiple elements of a sessions. Two most used such examples are hcp_fmri_volume and hcp_fmri_surface, which process BOLD images within a session. So, if I have sessions with 4 BOLDs, I might set --parelements=4 and allocate 4 CPUs to a job in order to also process in parallel on “within-command” level.

  3. Optimal allocation of resources is very system specific, the QuNex defaults usually give decent results. Also note that SLURM overhead is more or less negligible here as most processing commands are quite heavy and will take several hours or sometimes even a day to finish. So the SLURM overhead can often be ignored as it represents a minute part of the whole workload. Our go-to strategy is one session per job as this granularity also gives us better control over what is happening. For example, if I set the wall time too low and one job fails, in this case I only have to restart the processing of 1 session/job. What is also handy is the SLURM setting that sends you email notifications when a job completes/fails.

If you have any additional questions, let me know and I will try to help.

Best, Jure

Jure,

Thanks for confirming the scheduler conventions. Good points around system-specific constraints–I had not considered that these would vary greatly across users.

Much appreciated!

-Derek