merlin.datasets.utils module
- merlin.datasets.utils.df_to_xy(df, feature_cols=None, label_cols=None)
Convert pandas DataFrame to numpy arrays for features (X) and labels (y)
- Return type:
tuple
[ndarray
,ndarray
]
- Args:
df: Input DataFrame feature_cols: List of column names to use as features. If None, uses all columns except label_cols label_cols: List of column names to use as labels. If None, assumes last column is label
- Returns:
X: numpy array of features y: numpy array of labels
- merlin.datasets.utils.fetch(url, data_dir=None, force=False)
Fetch a file from URL, storing it in the virtual environment’s data directory. If the file already exists, return its path unless force=True. If the file is gzipped, extract it.
- Return type:
Path
- Args:
url: URL to fetch the file from data_dir: Optional override for the data directory force: If True, re-download even if file exists
- Returns:
Path: Path to the downloaded (and potentially extracted) file
- merlin.datasets.utils.read_idx(filepath)
Read an IDX file format as used in MNIST dataset.
- Return type:
tuple
[ndarray
,dict
]
- Args:
filepath: Path to the IDX file
- Returns:
- Tuple[np.ndarray, dict]: Tuple containing:
numpy array with the data
metadata dictionary with magic number, data type, and dimensions