Datasets¶

class DatasetClient(beaker: Beaker)[source]¶

Methods for interacting with Beaker Datasets. Accessed via the Beaker.dataset property.

Warning

Do not instantiate this class directly! The Beaker client will create one automatically which you can access through the corresponding property.

get(dataset: str) → Dataset[source]¶

>>> with Beaker.from_env() as beaker:
...     dataset = beaker.dataset.get(dataset_name)

Create a dataset from local source files.

Parameters:

name – The name to assign to the new dataset.
sources – Local source files or directories to upload to the dataset.
target – If specified, all source files/directories will be uploaded under a directory of this name.
workspace – The workspace to upload the dataset to. If not specified your default workspace is used.
description – Text description for the dataset.
force – If True and a dataset by the given name already exists, it will be overwritten.
max_workers – The maximum number of thread pool workers to use to upload files concurrently.
commit – Whether to commit the dataset after successfully uploading source files.
strip_paths –
If True, all source files and directories will be uploaded under their name, not their path. E.g. the file “docs/source/index.rst” would be uploaded as just “index.rst”, instead of “docs/source/index.rst”.

Note

This only applies to source paths that are children of the current working directory. If a source path is outside of the current working directory, it will always be uploaded under its name only.

Returns:

A new beaker.types.BeakerDataset object.

Raises:

BeakerDatasetConflict – If a dataset with the given name already exists.

commit(dataset: Dataset) → Dataset[source]¶

Commit a dataset.

upload(dataset: Dataset, source: PathLike | str | bytes, target: PathLike | str) → int[source]¶

Upload a file to a dataset.

Parameters:

Returns:

The number of bytes uploaded.

Raises:

BeakerDatasetWriteError – If the dataset is already committed.

stream_file(dataset: Dataset, file_path: str, *, offset: int = 0, length: int = -1, chunk_size: int | None = None, validate_checksum: bool = True) → Generator[bytes, None, None][source]¶: Stream download the bytes content of a file from a dataset.

list_files(dataset: Dataset, *, prefix: str | None = None) → Iterable[DatasetFile][source]¶

List files in a dataset.

get_file_info(dataset: Dataset, file_path: str) → DatasetFile[source]¶

update(dataset: Dataset, *, description: str | None = None) → Dataset[source]¶

Update fields of a dataset.

List datasets.

url(dataset: Dataset) → str[source]¶: Get the URL to the cluster on the Beaker dashboard.