API Reference¶
Job Script API¶
The following functions may be used in your job scripts that are executed via cluster_utils.
-
cluster_utils.initialize_job(cmd_line: list[str] | None =
None
, verbose: bool =True
, dynamic: bool =True
) AttributeDict ¶ Read parameters from command line and register at cluster_utils server.
This function is intended to be called at the beginning of your job scripts. It does two things at once:
parse the command line arguments to get the parameters for the job, and
if server information is provided via command line arguments, register at the cluster_utils server (i.e. the main process, that orchestrates the job execution).
- cluster_utils.finalize_job(metrics: MutableMapping[str, float], params) None ¶
Save metrics and parameters and send metrics to the cluster_utils server.
Save the used parameters and resulting metrics to CSV files (filenames defined by
CLUSTER_PARAM_FILE
andCLUSTER_METRIC_FILE
) in the job’s working directory and report the metrics to the cluster_utils main process.Make sure to call this function at the end of your job script, otherwise cluster_utils will not receive the resulting metrics and will consider the job as failed.
- Parameters:¶
- metrics: MutableMapping[str, float]¶
Dictionary with metrics that should be sent to the server.
- params¶
Parameters that were used to run the job (given by
initialize_job()
).
- cluster_utils.exit_for_resume() None ¶
Send a “resume”-request to the cluster_utils server and exit with return code 3.
Use this to split a single long-running job into multiple shorter jobs by frequently saving the state of the job (e.g. checkpoints) and restarting by calling this function.
See Restart jobs using exit_for_resume() for more information.
- cluster_utils.announce_early_results(metrics)¶
Report intermediate results to cluster_utils.
Results reported with this function are by hyperparameter optimization to stop bad jobs early (see
kill_bad_jobs_early
option).
- cluster_utils.announce_fraction_finished(fraction_finished: float) None ¶
Report job progress to cluster_utils.
You may use this function to report the progress of the job. If done, the information is used by cluster_utils to estimate the remaining duration of the job.
-
cluster_utils.cluster_main(main_func=
None
, **read_params_args)¶ Decorator for your main function to automatically register with cluster_utils.
Use this as a decorator to automatically wrap a function (usually
main
) with calls toinitialize_job()
andfinalize_job()
.The parameters read by
initialize_job()
will be passed as kwargs to the function. Further, the function is expected to return the metrics dictionary as expected byfinalize_job()
.See Using the cluster_main Decorator for an usage example.
Output Filenames¶
The constants listed below define names of output files that are written by cluster_utils. They are listed here, so that other parts of the documentation can reference them.
-
cluster_utils.base.constants.CLUSTER_METRIC_FILE =
'metrics.csv'
¶ Name of the CSV file to which resulting metrics of a job are saved.
-
cluster_utils.base.constants.CLUSTER_PARAM_FILE =
'param_choice.csv'
¶ Name of the CSV file to which used parameters of a job are saved.
-
cluster_utils.base.constants.JSON_SETTINGS_FILE =
'settings.json'
¶ Name of the JSON file to which used parameters of a job are saved.
Deprecated API¶
The following functions can still be used but are deprecated and may be removed in a future release. Do not use them anymore in new code! Also see the description of the individual functions on how using code should be updated.
-
cluster_utils.read_params_from_cmdline(cmd_line: list[str] | None =
None
, make_immutable: bool =True
, verbose: bool =True
, dynamic: bool =True
, save_params: bool =True
) AttributeDict ¶ Alias for
initialize_job()
.- Deprecated:
This function is deprecated and will be removed in a future release. Use
initialize_job()
instead.
- cluster_utils.save_metrics_params(metrics: MutableMapping[str, float], params) None ¶
Alias for
finalize_job()
.- Deprecated:
This function is deprecated and will be removed in a future release. Use
finalize_job()
instead.