Datasets

class DatasetClient(beaker: Beaker)[source]

Methods for interacting with Beaker Datasets. Accessed via the Beaker.dataset property.

Warning

Do not instantiate this class directly! The Beaker client will create one automatically which you can access through the corresponding property.

get(dataset: str) Dataset[source]
Examples:

>>> with Beaker.from_env() as beaker:
...     dataset = beaker.dataset.get(dataset_name)
Returns:

A BeakerDataset.

Raises:

BeakerDatasetNotFound – If the cluster doesn’t exist.

create(name: str, *sources: PathLike | str, target: PathLike | str | None = None, workspace: Workspace | None = None, description: str | None = None, force: bool = False, max_workers: int | None = None, commit: bool = True, strip_paths: bool = False) Dataset[source]

Create a dataset from local source files.

Parameters:
  • name – The name to assign to the new dataset.

  • sources – Local source files or directories to upload to the dataset.

  • target – If specified, all source files/directories will be uploaded under a directory of this name.

  • workspace – The workspace to upload the dataset to. If not specified your default workspace is used.

  • description – Text description for the dataset.

  • force – If True and a dataset by the given name already exists, it will be overwritten.

  • max_workers – The maximum number of thread pool workers to use to upload files concurrently.

  • commit – Whether to commit the dataset after successfully uploading source files.

  • strip_paths

    If True, all source files and directories will be uploaded under their name, not their path. E.g. the file “docs/source/index.rst” would be uploaded as just “index.rst”, instead of “docs/source/index.rst”.

    Note

    This only applies to source paths that are children of the current working directory. If a source path is outside of the current working directory, it will always be uploaded under its name only.

Returns:

A new beaker.types.BeakerDataset object.

Raises:

BeakerDatasetConflict – If a dataset with the given name already exists.

commit(dataset: Dataset) Dataset[source]

Commit a dataset.

Returns:

The updated BeakerDataset object.

upload(dataset: Dataset, source: PathLike | str | bytes, target: PathLike | str) int[source]

Upload a file to a dataset.

Parameters:
  • dataset – The dataset to upload to (must be uncommitted).

  • source – Path to the local source file or the contents as bytes.

  • target – The path within the dataset to upload the file to.

Returns:

The number of bytes uploaded.

Raises:

BeakerDatasetWriteError – If the dataset is already committed.

stream_file(dataset: Dataset, file_path: str, *, offset: int = 0, length: int = -1, chunk_size: int | None = None, validate_checksum: bool = True) Generator[bytes, None, None][source]

Stream download the bytes content of a file from a dataset.

list_files(dataset: Dataset, *, prefix: str | None = None) Iterable[DatasetFile][source]

List files in a dataset.

Returns:

An iterator over BeakerDatasetFile protobuf objects.

get_file_info(dataset: Dataset, file_path: str) DatasetFile[source]
Returns:

A BeakerDatasetFile protobuf object.

update(dataset: Dataset, *, description: str | None = None) Dataset[source]

Update fields of a dataset.

Returns:

The updated BeakerDataset object.

delete(*datasets: Dataset)[source]

Delete datasets.

list(*, org: Organization | None = None, author: User | None = None, workspace: Workspace | None = None, created_before: datetime | None = None, created_after: datetime | None = None, results: bool | None = None, committed: bool | None = None, name_or_description: str | None = None, sort_order: BeakerSortOrder | None = None, sort_field: Literal['created', 'name'] = 'name', limit: int | None = None) Iterable[Dataset][source]

List datasets.

Returns:

An iterator over BeakerDataset protobuf objects.

url(dataset: Dataset) str[source]

Get the URL to the cluster on the Beaker dashboard.