Datasets¶
- class DatasetClient(beaker: Beaker)[source]¶
Methods for interacting with Beaker Datasets. Accessed via the
Beaker.dataset
property.Warning
Do not instantiate this class directly! The
Beaker
client will create one automatically which you can access through the corresponding property.- get(dataset: str) Dataset [source]¶
- Examples:
>>> with Beaker.from_env() as beaker: ... dataset = beaker.dataset.get(dataset_name)
- Returns:
- Raises:
BeakerDatasetNotFound – If the cluster doesn’t exist.
- create(name: str, *sources: PathLike | str, target: PathLike | str | None = None, workspace: Workspace | None = None, description: str | None = None, force: bool = False, max_workers: int | None = None, commit: bool = True, strip_paths: bool = False) Dataset [source]¶
Create a dataset from local source files.
- Parameters:
name – The name to assign to the new dataset.
sources – Local source files or directories to upload to the dataset.
target – If specified, all source files/directories will be uploaded under a directory of this name.
workspace – The workspace to upload the dataset to. If not specified your default workspace is used.
description – Text description for the dataset.
force – If
True
and a dataset by the given name already exists, it will be overwritten.max_workers – The maximum number of thread pool workers to use to upload files concurrently.
commit – Whether to commit the dataset after successfully uploading source files.
strip_paths –
If
True
, all source files and directories will be uploaded under their name, not their path. E.g. the file “docs/source/index.rst” would be uploaded as just “index.rst”, instead of “docs/source/index.rst”.Note
This only applies to source paths that are children of the current working directory. If a source path is outside of the current working directory, it will always be uploaded under its name only.
- Returns:
A new
beaker.types.BeakerDataset
object.- Raises:
BeakerDatasetConflict – If a dataset with the given name already exists.
- commit(dataset: Dataset) Dataset [source]¶
Commit a dataset.
- Returns:
The updated
BeakerDataset
object.
- upload(dataset: Dataset, source: PathLike | str | bytes, target: PathLike | str) int [source]¶
Upload a file to a dataset.
- Parameters:
dataset – The dataset to upload to (must be uncommitted).
source – Path to the local source file or the contents as bytes.
target – The path within the dataset to upload the file to.
- Returns:
The number of bytes uploaded.
- Raises:
BeakerDatasetWriteError – If the dataset is already committed.
- stream_file(dataset: Dataset, file_path: str, *, offset: int = 0, length: int = -1, chunk_size: int | None = None, validate_checksum: bool = True) Generator[bytes, None, None] [source]¶
Stream download the bytes content of a file from a dataset.
- list_files(dataset: Dataset, *, prefix: str | None = None) Iterable[DatasetFile] [source]¶
List files in a dataset.
- Returns:
An iterator over
BeakerDatasetFile
protobuf objects.
- get_file_info(dataset: Dataset, file_path: str) DatasetFile [source]¶
- Returns:
A
BeakerDatasetFile
protobuf object.
- update(dataset: Dataset, *, description: str | None = None) Dataset [source]¶
Update fields of a dataset.
- Returns:
The updated
BeakerDataset
object.
- list(*, org: Organization | None = None, author: User | None = None, workspace: Workspace | None = None, created_before: datetime | None = None, created_after: datetime | None = None, results: bool | None = None, committed: bool | None = None, name_or_description: str | None = None, sort_order: BeakerSortOrder | None = None, sort_field: Literal['created', 'name'] = 'name', limit: int | None = None) Iterable[Dataset] [source]¶
List datasets.
- Returns:
An iterator over
BeakerDataset
protobuf objects.