API¶
Gantry’s public API.
- class Recipe(args: Sequence[str], name: str | None = None, description: str | None = None, workspace: str | None = None, budget: str | None = None, group_names: Sequence[str] | None = None, allow_dirty: bool = False, yes: bool | None = None, save_spec: PathLike | str | None = None, callbacks: Sequence[Callback] | None = None, clusters: Sequence[str] | None = None, gpu_types: Sequence[str] | None = None, interconnect: Literal['ib', 'tcpxo'] | None = None, tags: Sequence[str] | None = None, hostnames: Sequence[str] | None = None, cpus: float | None = None, gpus: int | None = None, memory: str | None = None, shared_memory: str | None = None, beaker_image: str | None = None, docker_image: str | None = None, datasets: Sequence[str] | None = None, env_vars: Sequence[str | tuple[str, str]] | None = None, env_secrets: Sequence[str | tuple[str, str]] | None = None, dataset_secrets: Sequence[str | tuple[str, str]] | None = None, mounts: Sequence[str | tuple[str, str]] | None = None, weka: Sequence[str | tuple[str, str]] | None = None, uploads: Sequence[str | tuple[str, str]] | None = None, ref: str | None = None, branch: str | None = None, git_repo: GitRepoState | None = None, gh_token_secret: str = 'GITHUB_TOKEN', aws_config_secret: str | None = None, aws_credentials_secret: str | None = None, google_credentials_secret: str | None = None, results: str = '/results', task_name: str = 'main', priority: str | None = None, task_timeout: str | None = None, preemptible: bool | None = None, retries: int | None = None, replicas: int | None = None, leader_selection: bool | None = None, host_networking: bool | None = None, propagate_failure: bool | None = None, propagate_preemption: bool | None = None, synchronized_start_timeout: str | None = None, skip_tcpxo_setup: bool = False, skip_nccl_setup: bool = False, runtime_dir: str = '/gantry-runtime', exec_method: Literal['exec', 'bash'] = 'exec', torchrun: bool = False, pre_setup: str | None = None, post_setup: str | None = None, python_manager: Literal['uv', 'conda'] | None = None, default_python_version: str = '3.10', system_python: bool = False, install: str | None = None, no_python: bool = False, uv_venv: str | None = None, uv_extras: Sequence[str] | None = None, uv_all_extras: bool | None = None, uv_torch_backend: str | None = None, conda_file: PathLike | str | None = None, conda_env: str | None = None)[source]¶
A recipe defines how Gantry creates a Beaker workload and can be used to programmatically launch Gantry runs from Python as opposed to from the command-line.
- git_repo: GitRepoState | None = None¶
- classmethod multi_node_torchrun(cmd: Sequence[str], gpus_per_node: int, num_nodes: int, shared_memory: str | None = '10GiB', **kwargs) Recipe[source]¶
Create a multi-node recipe using torchrun.
- launch(show_logs: bool | None = None, timeout: int | None = None, start_timeout: int | None = None, inactive_timeout: int | None = None, inactive_soft_timeout: int | None = None, client: Beaker | None = None) Workload[source]¶
Launch an experiment on Beaker. Same as the
gantry runcommand.- Returns:
The Beaker workload.
- class GitRepoState(repo: str, repo_url: str, ref: str, branch: str | None = None)[source]¶
Represents the state of a local git repository.
Tip
Use
from_env()to instantiate this class.- short_commit_message(max_length: int = 50) str | None[source]¶
The commit message, truncated to
max_lengthcharacters.
- classmethod from_env(ref: str | None = None, branch: str | None = None) GitRepoState[source]¶
Instantiate this class from the root of a git repository.
- Raises:
GitError – If this method isn’t called from the root of a valid git repository.
UnpushedChangesError – If there are unpushed commits.
RemoteBranchNotFoundError – If the local branch is not tracking a remote branch.
- class Callback(*args, type: str | None = None, **kwargs)[source]¶
Base class for gantry callbacks. Callbacks provide a way to hook into gantry’s launch loop to customize behavior on certain events.
- property git_repo: GitRepoState¶
The git repo state that can be accessed after
attach()is called.
- property spec: BeakerExperimentSpec¶
The experiment spec that can be accessed after
attach()is called.
- attach(*, beaker: Beaker, git_repo: GitRepoState, spec: BeakerExperimentSpec, workload: Workload)[source]¶
Runs when a callback is attached to the workload.
- on_log(job: Job, log_line: str, log_time: float)[source]¶
Runs when a new log event is received from the workload.
- on_no_new_logs(job: Job)[source]¶
Periodically runs when no new logs have been received from the workload recently.
- on_start_timeout(job: Job)[source]¶
Runs when the active job for the workload hits the configured start timeout before starting.
- on_timeout(job: Job)[source]¶
Runs when the active job for the workload hits the configured timeout before completing.
- on_inactive_timeout(job: Job)[source]¶
Runs when the active job for the workload hits the configured inactive timeout.
- on_inactive_soft_timeout(job: Job)[source]¶
Runs when the active job for the workload hits the configured inactive hard timeout.
- on_cancellation(job: Job | None)[source]¶
Runs when the active job for the workload is canceled by the user, either directly or because. a timeout was reached.
- class SlackCallback(*, type: dataclasses.InitVar[str | None] = 'slack', webhook_url: str)¶
-
- type: dataclasses.InitVar[str | None] = 'slack'¶
- launch_experiment(args: Sequence[str], name: str | None = None, description: str | None = None, task_name: str = 'main', workspace: str | None = None, group_names: Sequence[str] | None = None, clusters: Sequence[str] | None = None, gpu_types: Sequence[str] | None = None, interconnect: Literal['ib', 'tcpxo'] | None = None, tags: Sequence[str] | None = None, hostnames: Sequence[str] | None = None, beaker_image: str | None = None, docker_image: str | None = None, cpus: float | None = None, gpus: int | None = None, memory: str | None = None, shared_memory: str | None = None, datasets: Sequence[str] | None = None, gh_token_secret: str = 'GITHUB_TOKEN', ref: str | None = None, branch: str | None = None, conda_file: PathLike | str | None = None, conda_env: str | None = None, python_manager: Literal['uv', 'conda'] | None = None, system_python: bool = False, uv_venv: str | None = None, uv_extras: Sequence[str] | None = None, uv_all_extras: bool | None = None, uv_torch_backend: str | None = None, env_vars: Sequence[str | tuple[str, str]] | None = None, env_secrets: Sequence[str | tuple[str, str]] | None = None, dataset_secrets: Sequence[str | tuple[str, str]] | None = None, mounts: Sequence[str | tuple[str, str]] | None = None, weka: Sequence[str | tuple[str, str]] | None = None, uploads: Sequence[str | tuple[str, str]] | None = None, timeout: int | None = None, task_timeout: str | None = None, start_timeout: int | None = None, inactive_timeout: int | None = None, inactive_soft_timeout: int | None = None, show_logs: bool | None = None, allow_dirty: bool = False, dry_run: bool = False, yes: bool | None = None, save_spec: PathLike | str | None = None, priority: str | None = None, install: str | None = None, no_python: bool = False, replicas: int | None = None, leader_selection: bool | None = None, host_networking: bool | None = None, propagate_failure: bool | None = None, propagate_preemption: bool | None = None, synchronized_start_timeout: str | None = None, budget: str | None = None, preemptible: bool | None = None, retries: int | None = None, results: str = '/results', runtime_dir: str = '/gantry-runtime', exec_method: Literal['exec', 'bash'] = 'exec', torchrun: bool = False, skip_tcpxo_setup: bool = False, skip_nccl_setup: bool = False, default_python_version: str = '3.10', pre_setup: str | None = None, post_setup: str | None = None, aws_config_secret: str | None = None, aws_credentials_secret: str | None = None, google_credentials_secret: str | None = None, callbacks: Sequence[Callback] | None = None, git_repo: GitRepoState | None = None, client: Beaker | None = None) Workload | None[source]¶
Launch an experiment on Beaker. Same as the
gantry runcommand.- Parameters:
cli_mode – Set to
Trueif this function is being called from a CLI command. This mostly affects how certain prompts and messages are displayed.
- follow_workload(beaker: Beaker, workload: Workload, *, job: Job | None = None, task: Task | None = None, timeout: int | None = None, start_timeout: int | None = None, inactive_timeout: int | None = None, inactive_soft_timeout: int | None = None, tail: bool = False, show_logs: bool = True, auto_cancel: bool = False, callbacks: Sequence[Callback] | None = None) Job[source]¶
Follow a workload until completion while streaming logs to stdout.
- Parameters:
task – A specific task in the workload to follow. Defaults to the first task.
timeout – The number of seconds to wait for the workload to complete. Raises a timeout error if it doesn’t complete in time.
start_timeout – The number of seconds to wait for the workload to start running. Raises a timeout error if it doesn’t start in time.
inactive_timeout – The number of seconds to wait for new logs before timing out. Raises a timeout error if no new logs are produced in time.
inactive_soft_timeout – The number of seconds to wait for new logs before timing out. Issues a warning notification if no new logs are produced in time.
tail – Start tailing the logs if a job is already running. Otherwise shows all logs.
show_logs – Set to
Falseto avoid streaming the logs.auto_cancel – Set to
Trueto automatically cancel the workload on timeout or or SIGTERM.
- Returns:
The finalized
BeakerJobfrom the task being followed.- Raises:
BeakerJobTimeoutError – If
timeoutis set to a positive number and the workload doesn’t complete in time.
- update_workload_description(description: str, strategy: Literal['append', 'prepend', 'replace'] = 'replace', beaker_token: str | None = None, client: Beaker | None = None) str[source]¶
Update the description of the Gantry workload that this process is running in.
- Parameters:
description – The description to set or add, depending on the
strategy.strategy – One of “append”, “prepend”, or “replace” to indicate how the new description should be combined with the original description. Defaults to “replace”.
beaker_token – An optional Beaker API token to use. If not provided, the
BEAKER_TOKENenvironment variable will be used if set, or a Beaker config file. Alternatively you can provide an existingBeakerclient via theclientparameter.client – An optional existing
Beakerclient to use. If not provided, a new client will be created using the providedbeaker_tokenor environment/config.