Description:
Using the built-in scheduler function to schedule SLURM arrays does not seem to work. I expect there to be a job created for each session. Each job would run with 4 cores, a 12G mem limit, and a 24h time limit. These jobs never make it to the queue.
import_bids
, create_session_info
, create_batch
, and setup_hcp
ran sucesfully for this dataset.
I manually created a SLURM script to submit hcp_pre_freesurfer
jobs as arrays due to the same errors with --scheduler
described here thinking it was specific to hcp_pre_freesurfer
.
The functionality last worked as expected in v99.1, for both hcp_pre_freesurfer
and hcp_freesurfer
Call:
${PROJSPACE}/qunex_container hcp_freesurfer \
--sessionsfolder="${DATADIR}/sessions" \
--batchfile=${DATADIR}/processing/batch.txt \
--bind="${DATADIR}:${DATADIR}" \
--container="${PROJSPACE}/qunex_suite-1.0.3.sif" \
--hcp_fs_extra_reconall='-parallel|-openmp|4|' \
--overwrite="yes" \
--scheduler="SLURM,array,cpus-per-task=4,time=24:00:00,mem-per-cpu=3000,jobname=qnx_fs"
Logs:
Relevant output after running this command
singularity exec --bind /nas/longleaf/home/dcmonroe:/nas/longleaf/home/dcmonroe --cleanenv --env SLURM_ARRAY_TASK_ID=${p},SLURM_ARRAY_TASK_MAX=${p} --bind /work/users/d/c/dcmonroe:/work/users/d/c/dcmonroe /proj/dcmonroelab/SYNK-share/qunex_suite-1.0.3.sif bash /nas/longleaf/home/dcmonroe/qunex_container_command_2024-12-29_09.15.56.139445.sh
It is not clear to me what the ${p}
is referring to. It seems to me that it is the cause of the error below.
The relevant language from the container_job*.txt
Traceback (most recent call last):
File "/opt/qunex/python/qx_utilities/gmri", line 542, in <module>
main()
File "/opt/qunex/python/qx_utilities/gmri", line 488, in main
runCommand(comm, opts)
File "/opt/qunex/python/qx_utilities/gmri", line 70, in runCommand
gp.run(command, args)
File "/opt/qunex/python/qx_utilities/general/process.py", line 2222, in run
sessions, gpref = gc.get_sessions_list(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/qunex/python/qx_utilities/general/core.py", line 369, in get_sessions_list
slurm_array_ix = int(os.environ["SLURM_ARRAY_TASK_ID"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''
finally output from head -20 ${PROJSPACE}/qunex_container
#!/usr/bin/env python
# encoding: utf-8
#
# SPDX-FileCopyrightText: 2021 QuNex development team <https://qunex.yale.edu/>
#
# SPDX-License-Identifier: GPL-3.0-or-later
#
# Version 0.100.0 [QX IO]
from __future__ import print_function, division
import subprocess
import os
import sys
import re
import math
from datetime import datetime
class CommandError(Exception):
-DCM