merlin.datasets.metadata module

class merlin.datasets.metadata.Normalization(method, range, per_feature=True)

Bases: object

Dataset-level normalization metadata.

Parameters:
  • method (str) – Name of the normalization method.

  • range (tuple) – Target value range after normalization.

  • per_feature (bool) – Whether normalization is applied independently per feature.

method: str
per_feature: bool = True
range: tuple
to_text()

Render normalization metadata as human-readable text.

Returns:

Human-readable summary of the normalization settings.

Return type:

str

class merlin.datasets.metadata.FeatureNormalization(original_unit=None, scale_factor=None, offset=None)

Bases: object

Feature-level normalization metadata.

Parameters:
  • original_unit (str | None) – Original measurement unit before normalization.

  • scale_factor (float | None) – Scaling factor applied during normalization.

  • offset (float | None) – Offset applied during normalization.

offset: Optional[float] = None
original_unit: Optional[str] = None
scale_factor: Optional[float] = None
to_text()

Render feature normalization metadata as text.

Returns:

Human-readable normalization summary.

Return type:

str

class merlin.datasets.metadata.Feature(name, description, type, value_range=None, unit=None, stats=None, normalization=None)

Bases: object

Description of a dataset feature.

Parameters:
  • name (str) – Feature name.

  • description (str) – Human-readable feature description.

  • type (str) – Feature dtype or semantic type.

  • value_range (tuple | None) – Expected value range.

  • unit (str | None) – Measurement unit.

  • stats (dict[str, float] | None) – Optional feature statistics.

  • normalization (FeatureNormalization | None) – Optional feature-level normalization metadata.

description: str
name: str
normalization: Optional[FeatureNormalization] = None
stats: Optional[dict[str, float]] = None
to_text()

Render feature metadata as human-readable text.

Returns:

Human-readable feature description.

Return type:

str

type: str
unit: Optional[str] = None
value_range: Optional[tuple] = None
class merlin.datasets.metadata.DatasetMetadata(name, description, features, num_instances, subset=None, num_features=None, normalization=None, task_type=<factory>, num_classes=None, characteristics=<factory>, homepage=None, license=None, citation=None, creators=<factory>, year=None, feature_relationships=None)

Bases: object

Structured metadata describing a dataset.

Parameters:
  • name (str) – Dataset name.

  • description (str) – Dataset description.

  • features (list[Feature]) – Descriptions of dataset features.

  • num_instances (int) – Number of instances in the dataset or subset.

  • subset (str) – Dataset split name.

  • num_features (int | None) – Number of input features.

  • normalization (Normalization | None) – Dataset-level normalization metadata.

  • task_type (list[str] | None) – Supported task types.

  • num_classes (int | None) – Number of target classes, when relevant.

  • characteristics (list[str]) – High-level dataset characteristics.

  • homepage (str | None) – Dataset homepage.

  • license (str | None) – Dataset license.

  • citation (str | None) – Citation text.

  • creators (list[str]) – Dataset creators.

  • year (int | None) – Creation or publication year.

  • feature_relationships (str | None) – Optional description of relationships between features.

characteristics: list[str]
citation: Optional[str] = None
creators: list[str]
description: str
feature_relationships: Optional[str] = None
features: list[Feature]
classmethod from_dict(data)

Build dataset metadata from a dictionary payload.

Parameters:

data (dict[str, Any]) – Raw metadata dictionary.

Returns:

Structured dataset metadata instance.

Return type:

DatasetMetadata

homepage: Optional[str] = None
license: Optional[str] = None
name: str
normalization: Optional[Normalization] = None
num_classes: Optional[int] = None
num_features: int = None
num_instances: int
subset: str = None
task_type: Optional[list[str]]
to_dict()

Convert the metadata to a dictionary.

Returns:

Dictionary representation of the dataset metadata.

Return type:

dict[str, Any]

year: Optional[int] = None