The PSI/J API¶
The most important classes in this library are Job and JobExecutor,
followed by Launcher.
The Job class and its modifiers¶
The Job-related classes listed in this section (Job, JobSpec,
ResourceSpec, and JobAttributes) are independent of
executor implementations. The authors strongly recommend that users
program against these classes, rather than adding executor-specific
configuration options, to the extent possible.
- class Job(spec=None)[source]
This class represents a PSI/J job.
It encapsulates all of the information needed to run a job as well as the job’s state.
Constructs a Job object.
The object can optionally be initialized with the given
JobSpec. After construction, the job will be in theNEWstate.- cancel()[source]
Cancels this job.
The job is canceled by calling
cancel()on the job executor that was used to submit this job.- Raises
SubmitException – if the job has not yet been submitted.
- Return type
None
- property id: str
This job’s ID, read-only.
The ID is assigned automatically by the implementation when this Job object is constructed. The ID is guaranteed to be unique on the machine on which the Job object was instantiated. The ID does not have to match the ID of the underlying LRM job, but is used to identify Job instances as seen by a client application.
- property native_id: Optional[str]
The ID of this job according to the underlying LRM, read-only.
The native ID may not be available until after the job is submitted to a
JobExecutor, in which case the attribute isNone.
- set_job_status_callback(cb)[source]
Registers a status callback with this job.
The callback can either be a subclass of
JobStatusCallbackor a function accepting two arguments: aJoband aJobStatusand returning nothing.The callback will be invoked whenever a status change occurs for this job, independent of any callback registered on the job’s
JobExecutor. To remove the callback, set it to None.- Parameters
cb (Union[JobStatusCallback, Callable[[Job, JobStatus], None]]) – An instance of
JobStatusCallbackor a callable with two parameters, job of typeJoband job_status of typeJobStatusreturning nothing.- Return type
None
- spec
The job specification for this job. A valid job requires a valid specification.
- property status: JobStatus
Returns the current status of the job.
It is guaranteed that the status returned by this method is monotonic in time with respect to the partial ordering of
JobStatustypes. That is, if job_status_1.state and job_status_2.state are comparable and job_status_1.state < job_status_2.state, then it is impossible for job_status_2 to be returned by a call placed prior to a call that returns job_status_1 if both calls are placed from the same thread or if a proper memory barrier is placed between the calls. Furthermore the job is guaranteed to go through all intermediate states in the state model before reaching a particular state.- Returns
the current state of this job
- wait(timeout=None, target_states=None)[source]
Waits for the job to reach certain states.
This method returns either when the job reaches one of the target_states or when an amount of time indicated by the timeout parameter, if specified, passes. Returns the
JobStatusobject that has one of the desired target_states or None if the timeout is reached. If none of the states in target_states can be reached (such as, for example, because the job has entered theFAILEDstate while target_states consists ofCOMPLETED), this method throws anUnreachableStateException.- Parameters
- Returns
returns the
JobStatusobject that caused the caused this call to complete or None if the timeout is specified and reached.- Return type
- class JobStatus(state, time=None, message=None, exit_code=None, metadata=None)[source]
A class containing details about job transitions to new states.
Constructs a JobStatus object.
- Parameters
time (Optional[float]) – The time, as would be returned by
time.time()that the transition to the new state occurred. If None, the current time will be used.message (Optional[str]) – An optional message associated with the transition.
exit_code (Optional[int]) – An optional exit code for the job, if the job has completed.
metadata (Optional[Dict[str, object]]) – Optional metadata provided by the
JobExecutor.
- Return type
None
- property final: bool
Returns the final property of the underlying state.
- Returns
True if the state is final and False otherwise.
- class JobState(value)[source]
An enumeration holding the possible job states.
The possible states are: NEW, QUEUED, ACTIVE, COMPLETED, FAILED, and CANCELED.
- ACTIVE = 2
This state represents an actively running job.
- CANCELED = 5
Represents a job that was canceled by a call to
cancel().
- COMPLETED = 3
This state represents a job that has completed successfully (i.e., with a zero exit code). In other words, a job with the executable set to /bin/false cannot enter this state.
- FAILED = 4
Represents a job that has either completed unsuccessfully (with a non-zero exit code) or a job whose handling and/or execution by the backend has failed in some way.
- NEW = 0
This is the state of a job immediately after the
Jobobject is created and before being submitted to aJobExecutor.
- QUEUED = 1
This is the state of the job after being accepted by a backend for execution, but before the execution of the job begins.
- property final: bool
Returns True if this state final.
A state is final when no other state transition can occur after that state has been reached.
- Returns
True if this is a final state and False otherwise
- is_greater_than(other)[source]
Defines a (strict) partial ordering on the states.
Not all states are comparable. State transitions cannot violate this ordering.
- Parameters
other (JobState) – the other JobState to compare to
- Returns
if this state is comparable with other, this method returns True or False depending on the relative order between this state and other. That is, True is returned if and only if this state can come after other. If this state is not comparable with other, this method returns None.
- Return type
Job modifiers¶
There can be a lot of configuration information that goes into each resource manager job. Its walltime, partition/queue, the number of nodes it needs, what kind of nodes, what quality of service the job requires, and so on.
PSI/J splits those three attributes into three groups: one for generic POSIX information, one for resource information, and one for resource manager scheduling policies.
- class JobSpec(name=None, executable=None, arguments=None, directory=None, inherit_environment=True, environment=None, stdin_path=None, stdout_path=None, stderr_path=None, resources=None, attributes=None, pre_launch=None, post_launch=None, launcher=None)[source]
A class to hold information about the characteristics of a:class:~psij.Job.
Constructs a JobSpec object while allowing its properties to be initialized.
- Parameters
name (Optional[str]) – A name for the job. The name plays no functional role except that
JobExecutorimplementations may attempt to use the name to label the job as presented by the underlying implementation.executable (Optional[str]) – An executable, such as “/bin/date”.
arguments (Optional[List[str]]) – The argument list to be passed to the executable. Unlike with execve(), the first element of the list will correspond to argv[1] when accessed by the invoked executable.
directory (Optional[Path]) – The directory, on the compute side, in which the executable is to be run
inherit_environment (bool) – If this flag is set to False, the job starts with an empty environment. The only environment variables that will be accessible to the job are the ones specified by this property. If this flag is set to True, which is the default, the job will also have access to variables inherited from the environment in which the job is run.
environment (Optional[Dict[str, str]]) – A mapping of environment variable names to their respective values.
stdin_path (Optional[Path]) – Path to a file whose contents will be sent to the job’s standard input.
stdout_path (Optional[Path]) – A path to a file in which to place the standard output stream of the job.
stderr_path (Optional[Path]) – A path to a file in which to place the standard error stream of the job.
resources (Optional[ResourceSpec]) – The resource requirements specify the details of how the job is to be run on a cluster, such as the number and type of compute nodes used, etc.
attributes (Optional[JobAttributes]) – Job attributes are details about the job, such as the walltime, that are descriptive of how the job behaves. Attributes are, in principle, non-essential in that the job could run even though no attributes are specified. In practice, specifying a walltime is often necessary to prevent LRMs from prematurely terminating a job.
pre_launch (Optional[Path]) – An optional path to a pre-launch script. The pre-launch script is sourced before the launcher is invoked. It, therefore, runs on the service node of the job rather than on all of the compute nodes allocated to the job.
post_launch (Optional[Path]) – An optional path to a post-launch script. The post-launch script is sourced after all the ranks of the job executable complete and is sourced on the same node as the pre-launch script.
launcher (Optional[str]) – The name of a launcher to use, such as “mpirun”, “srun”, “single”, etc. For a list of available launchers,:ref:launchers
- property name: Optional[str]
Returns the name of the job.
- property to_dict: Dict[str, Any]
Returns a dictionary representation of this object.
- class ResourceSpec[source]
A base class for resource specifications.
The ResourceSpec class is an abstract base class for all possible resource specification classes in PSI/J.
- abstract property version: int
Returns the version of this resource specification class.
- class JobAttributes(duration=datetime.timedelta(seconds=600), queue_name=None, project_name=None, reservation_id=None, custom_attributes=None)[source]
A class containing ancillary job information that describes how a job is to be run.
Constructs a JobAttributes instance while allowing its various fields to be initialized.
- Parameters
duration (timedelta) – Specifies the duration (walltime) of the job. A job whose execution exceeds its walltime can be terminated forcefully.
queue_name (Optional[str]) – If a backend supports multiple queues, this parameter can be used to instruct the backend to send this job to a particular queue.
project_name (Optional[str]) – If a backend supports multiple projects for billing purposes, setting this attribute instructs the backend to bill the indicated project for the resources consumed by this job.
reservation_id (Optional[str]) – Allows specifying an advanced reservation ID. Advanced reservations enable the pre-allocation of a set of resources/compute nodes for a certain duration such that jobs can be run immediately, without waiting in the queue for resources to become available.
custom_attributes (Optional[Dict[str, object]]) – Specifies a dictionary of custom attributes. Implementations of
JobExecutordefine and are responsible for interpreting custom attributes.
- Return type
None
- get_custom_attribute(name)[source]
Retrieves the value of a custom attribute.
Executors¶
Executors are concrete implementations of mechanisms that execute jobs.
To get an instance of a specific executor, call
JobExecutor.get_instance(name),
with name being one of the installed executor names. Alternatively, directly
instantiate the executor, e.g.
from psij.executors.flux import FluxJobExecutor
ex = FluxJobExecutor()
Rather than
import psij
ex = psij.JobExecutor.get_instance('flux')
Executors can be installed from multiple sources, so the precise list of executors available to a specific installation of the PSI/J Python library can vary. In order to get a list of available executors, you can run, in a terminal:
$ python -m psij plugins
JobExecutor Base Class¶
The psij.JobExecutor class is abstract, but offers concrete static methods
for registering, fetching, and listing subclasses of itself.
- class JobExecutor(url=None, config=None)[source]
An abstract base class for all JobExecutor implementations.
Initializes this executor using an optional url and an optional configuration.
- Parameters
url (Optional[str]) – The URL is a string that a JobExecutor implementation can interpret as the location of a backend.
config (Optional[JobExecutorConfig]) – An configuration specific to each JobExecutor implementation. This parameter is marked as optional such that concrete JobExecutor classes can be instantiated with no config parameter. However, concrete JobExecutor classes must pass a default configuration up the inheritance tree and ensure that the config parameter of the ABC constructor is non-null.
The concrete executor implementations provided by this version of PSI/J Python are:
Cobalt¶
- class CobaltJobExecutor(url=None, config=None)[source]
A
JobExecutorfor the Cobalt Workload Manager.The Cobalt HPC Job Scheduler, is used by Argonne’s ALCF systems.
Uses the
qsub,qstat, andqdelcommands, respectively, to submit, monitor, and cancel jobs.Creates a batch script with #COBALT directives when submitting a job.
Initializes a
CobaltJobExecutor.- Parameters
config (Optional[CobaltExecutorConfig]) –
Flux¶
- class FluxJobExecutor(url=None, config=None)[source]
A
JobExecutorfor the Flux scheduler.The Flux resource manager framework is deployed and used on a per-user basis at many sites, and is slated to become the system-level resource manager at LLNL.
Uses Flux’s python library/bindings to submit, monitor, and manipulate jobs.
Initializes a FluxJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (Optional[JobExecutorConfig]) – The FluxJobExecutor does not have any configuration options.
- Return type
None
LSF¶
- class LsfJobExecutor(url, config=None)[source]
A
JobExecutorfor the LSF Workload Manager.The IBM Spectrum LSF workload manager is the system resource manager on LLNL’s Sierra and Lassen, and ORNL’s Summit.
Uses the ‘bsub’, ‘bjobs’, and ‘bkill’ commands, respectively, to submit, monitor, and cancel jobs.
Creates a batch script with #BSUB directives when submitting a job.
Initializes a
LsfJobExecutor.- Parameters
config (Optional[LsfExecutorConfig]) –
PBS¶
- class PBSProJobExecutor(url=None, config=None)[source]
A
JobExecutorfor PBS Pro.PBS Pro is a resource manager on certain machines at Argonne National Lab, among others.
Uses the
qsub,qstat, andqdelcommands, respectively, to submit, monitor, and cancel jobs.Creates a batch script with #PBS directives when submitting a job.
Initializes a
PBSProJobExecutor.- Parameters
config (Optional[PBSProExecutorConfig]) –
Slurm¶
- class SlurmJobExecutor(url=None, config=None)[source]
A
JobExecutorfor the Slurm Workload Manager.The Slurm Workload Manager is a widely used resource manager running on machines such as NERSC’s Perlmutter, as well as a variety of LLNL machines.
Uses the ‘sbatch’, ‘squeue’, and ‘scancel’ commands, respectively, to submit, monitor, and cancel jobs.
Creates a batch script with #SBATCH directives when submitting a job.
Initializes a
SlurmJobExecutor.- Parameters
config (Optional[SlurmExecutorConfig]) –
Local¶
- class LocalJobExecutor(url=None, config=None)[source]
A job executor that runs jobs locally using
subprocess.Popen.This job executor is intended to be used when there is no resource manager, only the operating system. Or when there is a resource manager, but it should be ignored.
Limitations: in Linux, attached jobs always appear to complete with a zero exit code regardless of the actual exit code.
Initializes a LocalJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The LocalJobExecutor does not have any configuration options.
- Return type
None
Radical Pilot¶
- class RPJobExecutor(url=None, config=None)[source]
A job executor that runs jobs via radical.pilot.
The RADICAL Pilot system.
Initializes a RPJobExecutor.
- Parameters
url (Optional[str]) – Not used, but required by the spec for automatic initialization.
config (JobExecutorConfig) – The RPJobExecutor does not have any configuration options.
- Return type
None
Launchers¶
Launchers are mechanisms to start the actual jobs on batch schedulers
once a set of nodes has been allocated for the job. In essence, launchers
are wrappers around the job executable which can provide additional
features, such as setting up an MPI environment, starting a copy of the
job executable on each allocated node, etc. To get a launcher instance,
call Launcher.get_instance(name)
with name being the name of a launcher. Like job executors, above,
launchers are plugins and can come from various places. To obtain a list
of launchers, you can run:
$ python -m psij plugins
Launcher base class¶
Like the executor, the Launcher base class is abstract, but offers
concrete static methods for registering and fetching subclasses of itself.
- class Launcher(config=None)[source]
An abstract base class for all launchers.
Base constructors for launchers.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration. If not specified,
DEFAULTis used.- Return type
None
The PSI/J Python library comes with a core set of launchers, which are:
aprun¶
- class AprunLauncher(config=None)[source]
Launches a job using Cobalt’s
aprun.Initializes this launcher using an optional configuration.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration.
jsrun¶
- class JsrunLauncher(config=None)[source]
Launches a job using LSF’s
jsrun.Initializes this launcher using an optional configuration.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration.
srun¶
- class SrunLauncher(config=None)[source]
Launches a job using Slurm’s
srun.See the Slurm Workload Manager.
Initializes this launcher using an optional configuration.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration.
mpirun¶
- class MPILauncher(config=None)[source]
Launches jobs using
mpirun.mpirunis a tool provided by MPI implementations, such as Open MPI.Initializes this launcher using an optional configuration.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration.
single¶
- class SingleLauncher(config=None)[source]
A launcher that launches a single copy of the executable. This is the default launcher.
Initializes this launcher using an optional configuration.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration.
multiple¶
- class MultipleLauncher(script_path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/psij-python/checkouts/0.1.0.post2/src/psij/launchers/scripts/multi_launch.sh'), config=None)[source]
A launcher that launches multiple identical copies of the executable.
The exit code of the job corresponds to the first non-zero exit code encountered in one of the executable copies or zero if all invocations of the executable succeed.
Initializes this launcher using an optional configuration.
- Parameters
config (Optional[JobExecutorConfig]) – An optional configuration.
script_path (Path) –
- get_additional_args(job)[source]
Other Package Contents¶
- exception InvalidJobException(message, exception=None)[source]
An exception describing a problem with a job specification.
Constructs an InvalidJobException while allowing properties to be initialized.
- Parameters
- Return type
None
- exception
Returns an optional underlying exception that can potentially be used for debugging purposes, but which should not, in general, be presented to an end-user.
- message
Retrieves the message associated with this exception. This is a descriptive message that is sufficiently clear to be presented to an end-user.
- exception SubmitException(message, exception=None, transient=False)[source]
An exception representing job submission issues.
This exception is thrown when the
submit()call fails for a reason that is independent of the job that is being submitted.Constructs a SubmitException and allows properties to be initialized.
- Parameters
- Return type
None
- exception
Returns an optional underlying exception that can potentially be used for debugging purposes, but which should not, in general, be presented to an end-user.
- message
Retrieves the message associated with this exception. This is a descriptive message that is sufficiently clear to be presented to an end-user.
- transient
Returns True if the underlying condition that triggered this exception is transient. Jobs that cannot be submitted due to a transient exceptional condition have chance of being successfully re-submitted at a later time, which is a suggestion to client code that it could re-attempt the operation that triggered this exception. However, the exact chances of success depend on many factors and are not guaranteed in any particular case. For example, a DNS resolution failure while attempting to connect to a remote service is a transient error since it can be reasonably assumed that DNS resolution is a persistent feature of an Internet-connected network. By contrast, an authentication failure due to an invalid username/password combination would not be a transient failure. While it may be possible for a temporary defect in a service to cause such a failure, under normal operating conditions such an error would persist across subsequent re-tries until correct credentials are used.
- exception UnreachableStateException(status)[source]
Indicates that a job state being waited for cannot be reached.
This exception is thrown when the
wait()method is called with a set of states that cannot be reached by the job when the call is made.Constructs an UnreachableStateException.
- Parameters
status (JobStatus) – The
JobStatusthat the job was in whenwait()was called and which prevents the desired states to be reached.- Return type
None
- status
Returns the job status that has caused an implementation to determine that the desired states passed to the
wait()method cannot be reached.
API Reference¶
- src
- psij package
- Subpackages
- psij.executors package
- Subpackages
- psij.executors.batch package
- Submodules
- psij.executors.batch.batch_scheduler_executor module
- psij.executors.batch.cobalt module
- psij.executors.batch.escape_functions module
- psij.executors.batch.lsf module
- psij.executors.batch.pbspro module
- psij.executors.batch.script_generator module
- psij.executors.batch.slurm module
- psij.executors.batch.template_function_library module
- Module contents
- psij.executors.batch package
- Submodules
- psij.executors.flux module
- psij.executors.local module
- psij.executors.rp module
- Module contents
- Subpackages
- psij.launchers package
- psij.executors package
- Submodules
- psij.descriptor module
- psij.exceptions module
- psij.job module
- psij.job_attributes module
- psij.job_executor module
- psij.job_executor_config module
- psij.job_launcher module
- psij.job_spec module
- psij.job_state module
- psij.job_status module
- psij.launcher module
- psij.resource_spec module
- psij.serialize module
- psij.utils module
- psij.version module
- Module contents
- Subpackages
- psij package