psij.executors package

Subpackages

Submodules

psij.executors.flux module

This module contains the Flux JobExecutor.

Implementation references: github.com/flux-framework/flux-core/blob/master/src/bindings/python/flux/job/executor.py flux-framework.readthedocs.io/projects/flux-core/en/latest/python/job_submission.html#the-fluxexecutor-interface

Events and state transitions: github.com/flux-framework/rfc/blob/master/spec_21.rst

class FluxJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A JobExecutor for the Flux scheduler.

The Flux resource manager framework is deployed and used on a per-user basis at many sites, and is slated to become the system-level resource manager at LLNL.

Uses Flux’s python library/bindings to submit, monitor, and manipulate jobs.

Initializes a FluxJobExecutor.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (Optional[JobExecutorConfig]) – The FluxJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

See cancel().

Parameters

job (Job) –

Return type

None

list()[source]

See list().

Return a list of ids representing jobs that are running on the underlying implementation - in this case Flux job IDs.

Returns

The list of known tasks.

Return type

List[str]

submit(job)[source]

See submit().

Parameters

job (Job) –

Return type

None

psij.executors.local module

This module contains the local JobExecutor.

class LocalJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A job executor that runs jobs locally using subprocess.Popen.

This job executor is intended to be used when there is no resource manager, only the operating system. Or when there is a resource manager, but it should be ignored.

Limitations: in Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.

Initializes a LocalJobExecutor.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

Cancels a job.

Parameters

job (Job) – The job to cancel.

Return type

None

list()[source]

Return a list of ids representing jobs that are running on the underlying implementation.

Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.

Returns

The list of ~psij.NativeId objects corresponding to the current user’s processes running locally.

Return type

List[str]

submit(job)[source]

Submits the specified Job to be run locally.

Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an InvalidJobException is thrown. If the actual submission fails for reasons outside the validity of the job, a SubmitException is thrown.

Parameters

job (Job) – The job to be submitted.

Return type

None

psij.executors.rp module

This module contains the RP JobExecutor.

class RPJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A job executor that runs jobs via radical.pilot.

The RADICAL Pilot system.

Initializes a RPJobExecutor.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (JobExecutorConfig) – The RPJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

Cancels a job.

Parameters

job (Job) – The job to cancel.

Return type

None

list()[source]

See list().

Return a list of ids representing jobs that are running on the underlying implementation - in this case RP task IDs.

Returns

The list of known tasks.

Return type

List[str]

submit(job)[source]

Submits the specified Job to the pilot.

Successful return of this method indicates that the job has been submitted to RP and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an InvalidJobException is thrown. If the actual submission fails for reasons outside the validity of the job, a SubmitException is thrown.

Parameters

job (Job) – The job to be submitted.

Return type

None

Module contents

A package containing psij.JobExecutor implementations.

class CobaltJobExecutor(url=None, config=None)[source]

Bases: BatchSchedulerExecutor

A JobExecutor for the Cobalt Workload Manager.

The Cobalt HPC Job Scheduler, is used by Argonne’s ALCF systems.

Uses the qsub, qstat, and qdel commands, respectively, to submit, monitor, and cancel jobs.

Creates a batch script with #COBALT directives when submitting a job.

Initializes a CobaltJobExecutor.

Parameters
generate_submit_script(job, context, submit_file)[source]

See generate_submit_script().

Parameters
Return type

None

get_cancel_command(native_id)[source]

See get_cancel_command().

Parameters

native_id (str) –

Return type

List[str]

get_status_command(native_ids)[source]

See get_status_command().

Parameters

native_ids (Collection[str]) –

Return type

List[str]

get_submit_command(job, submit_file_path)[source]

See get_submit_command().

Parameters
  • job (Job) –

  • submit_file_path (Path) –

Return type

List[str]

job_id_from_submit_output(out)[source]

See job_id_from_submit_output().

Parameters

out (str) –

Return type

str

parse_status_output(exit_code, out)[source]

See parse_status_output().

Parameters
  • exit_code (int) –

  • out (str) –

Return type

Dict[str, JobStatus]

process_cancel_command_output(exit_code, out)[source]

See process_cancel_command_output().

This should be unnecessary because qdel only seems to fail on non-integer job IDs.

Parameters
  • exit_code (int) –

  • out (str) –

Return type

None

class LocalJobExecutor(url=None, config=None)[source]

Bases: JobExecutor

A job executor that runs jobs locally using subprocess.Popen.

This job executor is intended to be used when there is no resource manager, only the operating system. Or when there is a resource manager, but it should be ignored.

Limitations: in Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.

Initializes a LocalJobExecutor.

Parameters
  • url (Optional[str]) – Not used, but required by the spec for automatic initialization.

  • config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.

Return type

None

attach(job, native_id)[source]

Attaches a job to a process.

The job must be in the NEW state. The exit code of the attached job will not be available upon completion and a zero exit code will always be returned for jobs attached by the LocalJobExecutor.

Parameters
  • job (Job) – The job to attach.

  • native_id (str) – The native ID of the process to attached to, as obtained through list() method.

Return type

None

cancel(job)[source]

Cancels a job.

Parameters

job (Job) – The job to cancel.

Return type

None

list()[source]

Return a list of ids representing jobs that are running on the underlying implementation.

Specifically for the LocalJobExecutor, this returns a list of ~psij.NativeId objects corresponding to the processes running under the current user on the local machine. These processes need not correspond to jobs statrted by calling the submit() method of an instance of a LocalJobExecutor.

Returns

The list of ~psij.NativeId objects corresponding to the current user’s processes running locally.

Return type

List[str]

submit(job)[source]

Submits the specified Job to be run locally.

Successful return of this method indicates that the job has been started locally and all changes in the job status, including failures, are reported using notifications. If the job specification is invalid, an InvalidJobException is thrown. If the actual submission fails for reasons outside the validity of the job, a SubmitException is thrown.

Parameters

job (Job) – The job to be submitted.

Return type

None

class LsfJobExecutor(url, config=None)[source]

Bases: BatchSchedulerExecutor

A JobExecutor for the LSF Workload Manager.

The IBM Spectrum LSF workload manager is the system resource manager on LLNL’s Sierra and Lassen, and ORNL’s Summit.

Uses the ‘bsub’, ‘bjobs’, and ‘bkill’ commands, respectively, to submit, monitor, and cancel jobs.

Creates a batch script with #BSUB directives when submitting a job.

Initializes a LsfJobExecutor.

Parameters
generate_submit_script(job, context, submit_file)[source]

See generate_submit_script().

Parameters
Return type

None

get_cancel_command(native_id)[source]

See get_cancel_command().

bkill will exit with an error set if the job does not exist or has already finished.

Parameters

native_id (str) –

Return type

List[str]

get_status_command(native_ids)[source]

See get_status_command().

Parameters

native_ids (Collection[str]) –

Return type

List[str]

get_submit_command(job, submit_file_path)[source]

See get_submit_command().

Parameters
  • job (Job) –

  • submit_file_path (Path) –

Return type

List[str]

job_id_from_submit_output(out)[source]

See job_id_from_submit_output().

Parameters

out (str) –

Return type

str

parse_status_output(exit_code, out)[source]

See parse_status_output().

Iterate through the RECORDS entry, grabbing JOBID and STAT entries, as well as any state-change reasons if present.

Parameters
  • exit_code (int) –

  • out (str) –

Return type

Dict[str, JobStatus]

process_cancel_command_output(exit_code, out)[source]

See process_cancel_command_output().

Check if the error was raised only because a job already exited.

Parameters
  • exit_code (int) –

  • out (str) –

Return type

None

class SlurmJobExecutor(url=None, config=None)[source]

Bases: BatchSchedulerExecutor

A JobExecutor for the Slurm Workload Manager.

The Slurm Workload Manager is a widely used resource manager running on machines such as NERSC’s Perlmutter, as well as a variety of LLNL machines.

Uses the ‘sbatch’, ‘squeue’, and ‘scancel’ commands, respectively, to submit, monitor, and cancel jobs.

Creates a batch script with #SBATCH directives when submitting a job.

Initializes a SlurmJobExecutor.

Parameters
generate_submit_script(job, context, submit_file)[source]

See generate_submit_script().

Parameters
Return type

None

get_cancel_command(native_id)[source]

See get_cancel_command().

Parameters

native_id (str) –

Return type

List[str]

get_status_command(native_ids)[source]

See get_status_command().

Parameters

native_ids (Collection[str]) –

Return type

List[str]

get_submit_command(job, submit_file_path)[source]

See get_submit_command().

Parameters
  • job (Job) –

  • submit_file_path (Path) –

Return type

List[str]

job_id_from_submit_output(out)[source]

See job_id_from_submit_output().

Parameters

out (str) –

Return type

str

parse_status_output(exit_code, out)[source]

See parse_status_output().

Parameters
  • exit_code (int) –

  • out (str) –

Return type

Dict[str, JobStatus]

process_cancel_command_output(exit_code, out)[source]

See process_cancel_command_output().

Parameters
  • exit_code (int) –

  • out (str) –

Return type

None