psij.executors package¶
Subpackages¶
- psij.executors.batch package
- Submodules
- psij.executors.batch.batch_scheduler_executor module
- psij.executors.batch.cobalt module
- psij.executors.batch.escape_functions module
- psij.executors.batch.lsf module
- psij.executors.batch.pbspro module
- psij.executors.batch.script_generator module
- psij.executors.batch.slurm module
- psij.executors.batch.template_function_library module
- Module contents
Submodules¶
psij.executors.flux module¶
This module contains the Flux JobExecutor.
Implementation references: github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/job/executor.py flux-framework.readthedocs.io/projects/flux-core/en/latest/python/job_submission.html#the-fluxexecutor-interface
Events and state transitions: github.com/flux-framework/rfc/blob/master/spec_21.rst
- class FluxJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA
JobExecutorfor the Flux scheduler.The Flux resource manager framework is deployed and used on a per-user basis at many sites, and is slated to become the system-level resource manager at LLNL.
Uses Flux’s python library/bindings to submit, monitor, and manipulate jobs.
Initializes a FluxJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (Optional[JobExecutorConfig]) – The FluxJobExecutor does not have any configuration options.
- Return type
None
psij.executors.local module¶
This module contains the local JobExecutor.
- class LocalJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA job executor that runs jobs locally using
subprocess.Popen.This job executor is intended to be used when there is no resource manager, only the operating system. Or when there is a resource manager, but it should be ignored.
Limitations: in Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.
Initializes a LocalJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.
- Return type
None
- attach(job, native_id)[source]¶
Attaches a job to a process.
The job must be in the
NEWstate. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.
- list()[source]¶
Return a list of ids representing jobs that are running on the underlying implementation.
Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.
- submit(job)[source]¶
Submits the specified
Jobto be run locally.Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an
InvalidJobExceptionis thrown. If the actual submission fails for reasons outside the validity of the job, aSubmitExceptionis thrown.- Parameters
job (Job) – The job to be submitted.
- Return type
None
psij.executors.rp module¶
This module contains the RP JobExecutor.
- class RPJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA job executor that runs jobs via radical.pilot.
The RADICAL Pilot system.
Initializes a RPJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The RPJobExecutor does not have any configuration options.
- Return type
None
- list()[source]¶
See
list().Return a list of ids representing jobs that are running on the underlying implementation - in this case RP task IDs.
- submit(job)[source]¶
Submits the specified
Jobto the pilot.Successful return of this method indicates that the job has been submitted to RP and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an
InvalidJobExceptionis thrown. If the actual submission fails for reasons outside the validity of the job, aSubmitExceptionis thrown.- Parameters
job (Job) – The job to be submitted.
- Return type
None
Module contents¶
A package containing psij.JobExecutor implementations.
- class CobaltJobExecutor(url=None, config=None)[source]¶
Bases:
BatchSchedulerExecutorA
JobExecutorfor the Cobalt Workload Manager.The Cobalt HPC Job Scheduler, is used by Argonne’s ALCF systems.
Uses the
qsub,qstat, andqdelcommands, respectively, to submit, monitor, and cancel jobs.Creates a batch script with #COBALT directives when submitting a job.
Initializes a
CobaltJobExecutor.- Parameters
config (Optional[CobaltExecutorConfig]) –
- get_cancel_command(native_id)[source]¶
See
get_cancel_command().
- get_status_command(native_ids)[source]¶
See
get_status_command().- Parameters
native_ids (Collection[str]) –
- Return type
- get_submit_command(job, submit_file_path)[source]¶
See
get_submit_command().
- process_cancel_command_output(exit_code, out)[source]¶
See
process_cancel_command_output().This should be unnecessary because qdel only seems to fail on non-integer job IDs.
- class LocalJobExecutor(url=None, config=None)[source]¶
Bases:
JobExecutorA job executor that runs jobs locally using
subprocess.Popen.This job executor is intended to be used when there is no resource manager, only the operating system. Or when there is a resource manager, but it should be ignored.
Limitations: in Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.
Initializes a LocalJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.
- Return type
None
- attach(job, native_id)[source]¶
Attaches a job to a process.
The job must be in the
NEWstate. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.
- list()[source]¶
Return a list of ids representing jobs that are running on the underlying implementation.
Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.
- submit(job)[source]¶
Submits the specified
Jobto be run locally.Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an
InvalidJobExceptionis thrown. If the actual submission fails for reasons outside the validity of the job, aSubmitExceptionis thrown.- Parameters
job (Job) – The job to be submitted.
- Return type
None
- class LsfJobExecutor(url, config=None)[source]¶
Bases:
BatchSchedulerExecutorA
JobExecutorfor the LSF Workload Manager.The IBM Spectrum LSF workload manager is the system resource manager on LLNL’s Sierra and Lassen, and ORNL’s Summit.
Uses the ‘bsub’, ‘bjobs’, and ‘bkill’ commands, respectively, to submit, monitor, and cancel jobs.
Creates a batch script with #BSUB directives when submitting a job.
Initializes a
LsfJobExecutor.- Parameters
config (Optional[LsfExecutorConfig]) –
- get_cancel_command(native_id)[source]¶
See
get_cancel_command().bkillwill exit with an error set if the job does not exist or has already finished.
- get_status_command(native_ids)[source]¶
See
get_status_command().- Parameters
native_ids (Collection[str]) –
- Return type
- get_submit_command(job, submit_file_path)[source]¶
See
get_submit_command().
- parse_status_output(exit_code, out)[source]¶
-
Iterate through the RECORDS entry, grabbing JOBID and STAT entries, as well as any state-change reasons if present.
- process_cancel_command_output(exit_code, out)[source]¶
See
process_cancel_command_output().Check if the error was raised only because a job already exited.
- class SlurmJobExecutor(url=None, config=None)[source]¶
Bases:
BatchSchedulerExecutorA
JobExecutorfor the Slurm Workload Manager.The Slurm Workload Manager is a widely used resource manager running on machines such as NERSC’s Perlmutter, as well as a variety of LLNL machines.
Uses the ‘sbatch’, ‘squeue’, and ‘scancel’ commands, respectively, to submit, monitor, and cancel jobs.
Creates a batch script with #SBATCH directives when submitting a job.
Initializes a
SlurmJobExecutor.- Parameters
config (Optional[SlurmExecutorConfig]) –
- get_cancel_command(native_id)[source]¶
See
get_cancel_command().
- get_status_command(native_ids)[source]¶
See
get_status_command().- Parameters
native_ids (Collection[str]) –
- Return type
- get_submit_command(job, submit_file_path)[source]¶
See
get_submit_command().