API Reference

Job Script API

The following functions may be used in your job scripts that are executed via cluster_utils.

cluster_utils.initialize_job(cmd_line: list[str] | None = None, verbose: bool = True, dynamic: bool = True) AttributeDict

Read parameters from command line and register at cluster_utils server.

This function is intended to be called at the beginning of your job scripts. It does two things at once:

  1. parse the command line arguments to get the parameters for the job, and

  2. if server information is provided via command line arguments, register at the cluster_utils server (i.e. the main process, that orchestrates the job execution).

Parameters:
cmd_line: list[str] | None = None

Command line arguments (defaults to sys.argv).

verbose: bool = True

If true, print the loaded parameters.

dynamic: bool = True

See smart_settings.loads()

Returns:

Parameters as loaded from the command line arguments with smart_settings.

cluster_utils.finalize_job(metrics: MutableMapping[str, float], params) None

Save metrics and parameters and send metrics to the cluster_utils server.

Save the used parameters and resulting metrics to CSV files (filenames defined by CLUSTER_PARAM_FILE and CLUSTER_METRIC_FILE) in the job’s working directory and report the metrics to the cluster_utils main process.

Make sure to call this function at the end of your job script, otherwise cluster_utils will not receive the resulting metrics and will consider the job as failed.

Parameters:
metrics: MutableMapping[str, float]

Dictionary with metrics that should be sent to the server.

params

Parameters that were used to run the job (given by initialize_job()).

cluster_utils.exit_for_resume() None

Send a “resume”-request to the cluster_utils server and exit with return code 3.

Use this to split a single long-running job into multiple shorter jobs by frequently saving the state of the job (e.g. checkpoints) and restarting by calling this function.

See Restart jobs using exit_for_resume() for more information.

cluster_utils.announce_early_results(metrics)

Report intermediate results to cluster_utils.

Results reported with this function are by hyperparameter optimization to stop bad jobs early (see kill_bad_jobs_early option).

Parameters:
metrics

Dictionary with metrics that should be sent to the server.

cluster_utils.announce_fraction_finished(fraction_finished: float) None

Report job progress to cluster_utils.

You may use this function to report the progress of the job. If done, the information is used by cluster_utils to estimate the remaining duration of the job.

Parameters:
fraction_finished: float

Value between 0 and 1, indicating the progress of the job.

cluster_utils.cluster_main(main_func=None, **read_params_args)

Decorator for your main function to automatically register with cluster_utils.

Use this as a decorator to automatically wrap a function (usually main) with calls to initialize_job() and finalize_job().

The parameters read by initialize_job() will be passed as kwargs to the function. Further, the function is expected to return the metrics dictionary as expected by finalize_job().

See Using the cluster_main Decorator for an usage example.

Output Filenames

The constants listed below define names of output files that are written by cluster_utils. They are listed here, so that other parts of the documentation can reference them.

cluster_utils.base.constants.CLUSTER_METRIC_FILE = 'metrics.csv'

Name of the CSV file to which resulting metrics of a job are saved.

cluster_utils.base.constants.CLUSTER_PARAM_FILE = 'param_choice.csv'

Name of the CSV file to which used parameters of a job are saved.

cluster_utils.base.constants.JSON_SETTINGS_FILE = 'settings.json'

Name of the JSON file to which used parameters of a job are saved.

Deprecated API

The following functions can still be used but are deprecated and may be removed in a future release. Do not use them anymore in new code! Also see the description of the individual functions on how using code should be updated.

cluster_utils.read_params_from_cmdline(cmd_line: list[str] | None = None, make_immutable: bool = True, verbose: bool = True, dynamic: bool = True, save_params: bool = True) AttributeDict

Alias for initialize_job().

Deprecated:

This function is deprecated and will be removed in a future release. Use initialize_job() instead.

cluster_utils.save_metrics_params(metrics: MutableMapping[str, float], params) None

Alias for finalize_job().

Deprecated:

This function is deprecated and will be removed in a future release. Use finalize_job() instead.