Usage

Run Batch of Jobs

cluster_utils provides two main commands to run batches of jobs on the cluster:

  • grid_search: Simple grid search over specified parameter ranges.

  • hp_optimization: Uses sampling-based optimization to search for best combination of hyperparameters within the specified ranges.

They are both run as modules via python -m and expect a configuration file as argument:

python3 -m cluster_utils.grid_search config_for_grid_search.json

python3 -m cluster_utils.hp_optimization config_for_hp_optim.json

See Configuration for information on the expected structure of the config file (note that there are differences between the two methods!).

Optionally, a list of key-value arguments can be provided in addition, to overwrite single settings from the config file. Use dot-notation to specify nested keys. Example:

python3 -m cluster_utils.hp_optimization config.json \
  'results_dir="/tmp"' 'optimization_setting.run_local=True'

Both commands can also be run with --help to get a complete list of arguments.

You can abort cluster_utils with Ctrl + C at any time. All running jobs are stopped, and submitted jobs are removed from the cluster queue.

Interactive Mode

While cluster_utils is running, it is possible to enter a command prompt which allows to get some information about finished and running jobs, as well as to stop running jobs.

To enter the command prompt, press ESC. You should now see the following prompt:

Enter command, e.g.  list_jobs, list_running_jobs, list_successful_jobs,
list_idle_jobs, show_job, stop_remaining_jobs
>>>

You now may enter one of the listed commands or simply press Enter to leave the prompt without executing a command.

Important

While the prompt is open, the main loop is blocked, i.e. no new jobs will be submitted during that time.

Commands

list_jobs

List IDs of all jobs that have been submitted so far (including finished ones).

list_running_jobs

List IDs of all jobs that are currently running.

list_successful_jobs

List IDs of all jobs that finished successfully.

list_idle_jobs

List IDs of all jobs that have been submitted but not yet started.

show_job

Will ask for a job ID and show information about this job.

stop_remaining_jobs

Abort all currently running jobs as well as jobs that already have been submitted but didn’t start yet.

This will not stop submission of new jobs. If you want to stop cluster_utils completely, press Ctrl + C instead.