In computational environments, a desktop workstation is used to prepare a project, create simulations, and view results, but the simulations may be performed on a high-performance computing (HPC) system. External Queue Integration (EQI) allows users to queue simulations directly with such systems rather than manually submit them for execution.

XF supports job submissions to queues and cloud-based systems:

EQI must be configured in order to utilize external queues and Rescale cloud computing. This is done by customizing either xfqs.template or rescaleqs.py.template.

Requirements

Two basic requirements must be met in order to use EQI:

For external queues, the control folder must be named eqi-control.

For example, an external queue HPC system on Linux may see the filesystem /data/simulation containing both the eqi-control and projects folders. The Windows workstation may see the S:\ filesystem, which maps to /data/simulation on the HPC system and both the S:\eqi-control and S:\projects folders. In this case, XF projects that utilize EQI for simulation are saved in the S:\projects folder.

Additional requirements must be met when using rescaleqs.py:

Individual simulations that utilize multiple compute resources necessitate the installation of the message passing interface (MPI) on an external queue HPC system.

Process Description

When a simulation is queued using EQI from within XF, XF begins looking for eqi-control at the project directory and moves up the directory tree. The eqi-control folder must be a sibling of one of the project's ancestors. If it does not find the folder, an error is issued to the user. If the folder is found, XF writes a control file with a unique name ending with .xfsubmit to that folder. The control file contains the run location within a simulation being submitted for execution, as well as specifications about how to execute the simulation. When creating simulations that contain multiple runs, XF writes a control file for each run. Once XF writes the complete simulation, it returns control of the UI to the user.

When either xfqs or rescaleqs is running on the HPC system, it checks the control folder(s) periodically for both *.xfsubmit and *.xfcancel files. When either one finds a submission file, it performs the following tasks:

When either xfqs or rescaleqs finds a cancellation file, it performs the following tasks:

Control Files

XF's UI generates a *.xfsubmit file when creating a simulation and a *.xfcancel file when terminating a simulation. The format of the submission file is a set of lines, each of which contains one keyword and one value that are separated by a space.

Keyword Value Notes Optional
simDir string The path to the simulation folder relative to the control folder. No
userName string The username of the user who wrote the submission file. Yes
priority string Priority guidance that will be either Low, Normal, or High. Yes
useXStream number XStream guidance. If it does not exist, XStream use is not indicated. If it does exist, use either 0 to indicate that the daemon determines the number of GPUs, or enter the specified number. Yes
useMPI number MPI guidance. If it does not exist, MPI use is not indicated. If it does exist, use either 0 to indicate that the daemon determines the number of GPUs, or enter the specified number. Yes
batchOptions string Text that passes directly to the job submission command on the command line, most likely at the end of all other options. Yes

The format of the cancellation file is a set of lines, each of which contains one keyword and one value that are separated by a space.

Keyword Value Notes Optional
simDir string The path to the simulation folder relative to the control folder. No
userName string The username of the user who wrote the submission file. Yes

Directives Documentation

In the standard configuration, XF writes control files to the eqi-control folder, which is monitored by a single daemon and appears as the Local queue selection when creating a simulation. Multiple daemons may be active simultaneously by creating an eqi-control subfolder for each additional daemon instance. The daemons are each configured to watch the main eqi-control folder and one of its subfolders, which are available selections when creating a simulation and allow users to submit to one of several queues. The displayed eqi-control directory name can be customized by creating an eqi-control/eqi.txt file containing a single line of the text to be displayed.

For example, xfqs.template and rescaleqs.py.template configured to watch the eqi-control/ folder and eqi-control/Rescale folder, respectively. If an eqi-control/eqi.txt subfolder is then created that contains the text In-house Cluster, then Local, In-house Cluster, and Rescale will be the available queue options when submitting a simulation.

xfqs Documentation

System administrators can set up a xfqs.template bash script with external queues that is configured to monitor the eqi-control folder and submit commands to Slurm following the process description outlined above. Each HPC system is unique, so this script serves as a starting point for system administrators to implement EQI on their systems.

The xfqs.template script is provided in {xf-install-dir}/remcom/bin. The script itself is heavily commented and therefore will not be discussed in detail here.