merlin.datasets.utils module

merlin.datasets.utils.df_to_xy(df, feature_cols=None, label_cols=None)

Convert pandas DataFrame to numpy arrays for features (X) and labels (y)

Return type:: tuple[ndarray, ndarray]

Args:: df: Input DataFrame feature_cols: List of column names to use as features. If None, uses all columns except label_cols label_cols: List of column names to use as labels. If None, assumes last column is label
Returns:: X: numpy array of features y: numpy array of labels

merlin.datasets.utils.fetch(url, data_dir=None, force=False)

Fetch a file from URL, storing it in the virtual environment’s data directory. If the file already exists, return its path unless force=True. If the file is gzipped, extract it.

Return type:: Path

Args:: url: URL to fetch the file from data_dir: Optional override for the data directory force: If True, re-download even if file exists
Returns:: Path: Path to the downloaded (and potentially extracted) file

merlin.datasets.utils.read_idx(filepath)

Read an IDX file format as used in MNIST dataset.

Return type:: tuple[ndarray, dict]

Args:

filepath: Path to the IDX file

Returns:

Tuple[np.ndarray, dict]: Tuple containing:

numpy array with the data
metadata dictionary with magic number, data type, and dimensions