Datasets¶
- class DatasetClient(beaker: Beaker)[source]¶
Methods for interacting with Beaker Datasets. Accessed via the
Beaker.datasetproperty.Warning
Do not instantiate this class directly! The
Beakerclient will create one automatically which you can access through the corresponding property.- get(dataset: str) Dataset[source]¶
- Examples:
>>> with Beaker.from_env() as beaker: ... dataset = beaker.dataset.get(dataset_name)
- Returns:
- Raises:
BeakerDatasetNotFound – If the cluster doesn’t exist.
- create(name: str, *sources: PathLike | str, target: PathLike | str | None = None, workspace: Workspace | None = None, description: str | None = None, force: bool = False, max_workers: int | None = None, commit: bool = True, strip_paths: bool = False) Dataset[source]¶
Create a dataset from local source files.
- Parameters:
name – The name to assign to the new dataset.
sources – Local source files or directories to upload to the dataset.
target – If specified, all source files/directories will be uploaded under a directory of this name.
workspace – The workspace to upload the dataset to. If not specified your default workspace is used.
description – Text description for the dataset.
force – If
Trueand a dataset by the given name already exists, it will be overwritten.max_workers – The maximum number of thread pool workers to use to upload files concurrently.
commit – Whether to commit the dataset after successfully uploading source files.
strip_paths –
If
True, all source files and directories will be uploaded under their name, not their path. E.g. the file “docs/source/index.rst” would be uploaded as just “index.rst”, instead of “docs/source/index.rst”.Note
This only applies to source paths that are children of the current working directory. If a source path is outside of the current working directory, it will always be uploaded under its name only.
- Returns:
A new
beaker.types.BeakerDatasetobject.- Raises:
BeakerDatasetConflict – If a dataset with the given name already exists.
- commit(dataset: Dataset) Dataset[source]¶
Commit a dataset.
- Returns:
The updated
BeakerDatasetobject.
- upload(dataset: Dataset, source: PathLike | str | bytes, target: PathLike | str) int[source]¶
Upload a file to a dataset.
- Parameters:
dataset – The dataset to upload to (must be uncommitted).
source – Path to the local source file or the contents as bytes.
target – The path within the dataset to upload the file to.
- Returns:
The number of bytes uploaded.
- Raises:
BeakerDatasetWriteError – If the dataset is already committed.
- stream_file(dataset: Dataset, file_path: str, *, offset: int = 0, length: int = -1, chunk_size: int | None = None, validate_checksum: bool = True) Generator[bytes, None, None][source]¶
Stream download the bytes content of a file from a dataset.
- list_files(dataset: Dataset, *, prefix: str | None = None) Iterable[DatasetFile][source]¶
List files in a dataset.
- Returns:
An iterator over
BeakerDatasetFileprotobuf objects.
- get_file_info(dataset: Dataset, file_path: str) DatasetFile[source]¶
- Returns:
A
BeakerDatasetFileprotobuf object.
- update(dataset: Dataset, *, description: str | None = None) Dataset[source]¶
Update fields of a dataset.
- Returns:
The updated
BeakerDatasetobject.
- list(*, org: Organization | None = None, author: User | None = None, workspace: Workspace | None = None, created_before: datetime | None = None, created_after: datetime | None = None, results: bool | None = None, committed: bool | None = None, name_or_description: str | None = None, sort_order: BeakerSortOrder | None = None, sort_field: Literal['created', 'name'] = 'name', limit: int | None = None) Iterable[Dataset][source]¶
List datasets.
- Returns:
An iterator over
BeakerDatasetprotobuf objects.