merlin.datasets.metadata module
- class merlin.datasets.metadata.Normalization(method, range, per_feature=True)
Bases:
objectDataset-level normalization metadata.
- Parameters:
- class merlin.datasets.metadata.FeatureNormalization(original_unit=None, scale_factor=None, offset=None)
Bases:
objectFeature-level normalization metadata.
- Parameters:
- class merlin.datasets.metadata.Feature(name, description, type, value_range=None, unit=None, stats=None, normalization=None)
Bases:
objectDescription of a dataset feature.
- Parameters:
name (str) – Feature name.
description (str) – Human-readable feature description.
type (str) – Feature dtype or semantic type.
value_range (tuple | None) – Expected value range.
unit (str | None) – Measurement unit.
stats (dict[str, float] | None) – Optional feature statistics.
normalization (FeatureNormalization | None) – Optional feature-level normalization metadata.
-
normalization:
Optional[FeatureNormalization] = None
- to_text()
Render feature metadata as human-readable text.
- Returns:
Human-readable feature description.
- Return type:
- class merlin.datasets.metadata.DatasetMetadata(name, description, features, num_instances, subset=None, num_features=None, normalization=None, task_type=<factory>, num_classes=None, characteristics=<factory>, homepage=None, license=None, citation=None, creators=<factory>, year=None, feature_relationships=None)
Bases:
objectStructured metadata describing a dataset.
- Parameters:
name (str) – Dataset name.
description (str) – Dataset description.
features (list[Feature]) – Descriptions of dataset features.
num_instances (int) – Number of instances in the dataset or subset.
subset (str) – Dataset split name.
num_features (int | None) – Number of input features.
normalization (Normalization | None) – Dataset-level normalization metadata.
num_classes (int | None) – Number of target classes, when relevant.
characteristics (list[str]) – High-level dataset characteristics.
homepage (str | None) – Dataset homepage.
license (str | None) – Dataset license.
citation (str | None) – Citation text.
year (int | None) – Creation or publication year.
feature_relationships (str | None) – Optional description of relationships between features.
- classmethod from_dict(data)
Build dataset metadata from a dictionary payload.
- Parameters:
- Returns:
Structured dataset metadata instance.
- Return type:
-
normalization:
Optional[Normalization] = None
- to_dict()
Convert the metadata to a dictionary.