========================== MerlinProcessor User Guide ========================== Overview -------- ``MerlinProcessor`` is a lightweight RPC-style bridge between your PyTorch models and remote cloud QPU/simulator backends. It supports two backend paths: * **Perceval ``RemoteProcessor``** — the original Quandela Cloud path. * **Perceval ``ISession``** — the preferred path for Scaleway-hosted platforms (and any future session-based providers). With either backend you can: * Offload **quantum leaves** (e.g. ``QuantumLayer``) to the cloud while keeping **classical layers** local. * Submit batched inputs; when batches are large, Merlin will **chunk** them and (optionally) **run chunks in parallel**. * Drive execution **synchronously** (``forward``) or **asynchronously** (``forward_async`` returning a ``torch.futures.Future``). * Monitor status, collect **job IDs**, **cancel** jobs, and enforce **timeouts**. * Estimate **required shot counts per input** ahead of time. Merlin deliberately avoids hidden "auto-shots": **you control sampling**. The optional estimator is provided to help you choose appropriate values. Prerequisites ------------- You need **one** of the following backends configured: **Option A — Quandela Cloud (RemoteProcessor)** * ``perceval-quandela`` configured with a valid cloud token (via ``pcvl.RemoteConfig`` cache or environment). * A Perceval ``RemoteProcessor`` instance (e.g. a simulator like ``"sim:slos"`` or a QPU-backed platform). .. code-block:: python import perceval as pcvl from merlin.measurement.strategies import MeasurementStrategy # Configure your Quandela Cloud token (one of the following): RemoteConfig.set_token("YOUR_TOKEN") # option 1: global config rp = pcvl.RemoteProcessor("sim:slos") rp = pcvl.RemoteProcessor("sim:slos", "YOUR_TOKEN") # option 2: inline token **Option B — Scaleway (ISession)** * The ``perceval.providers.scaleway`` module installed. * A Scaleway project ID and API secret key (typically set via ``SCW_PROJECT_ID`` and ``SCW_SECRET_KEY`` environment variables). **Both paths require:** * A Merlin **quantum layer** that provides ``export_config()`` (e.g. ``merlin.algorithms.QuantumLayer``). Quick Start — Quandela Cloud ----------------------------- .. code-block:: python import perceval as pcvl import torch import torch.nn as nn from merlin.algorithms import QuantumLayer from merlin.builder.circuit_builder import CircuitBuilder from merlin.core.computation_space import ComputationSpace from merlin.core.merlin_processor import MerlinProcessor from merlin.measurement.strategies import MeasurementStrategy # 1) Create the Perceval RemoteProcessor (token must already be configured) rp = pcvl.RemoteProcessor("sim:slos") # 2) Wrap it with MerlinProcessor proc = MerlinProcessor( rp, microbatch_size=32, # batch chunk size per cloud call timeout=3600.0, # default wall-time per forward (seconds) max_shots_per_call=None, # optional cap per cloud call (see below) chunk_concurrency=1, # parallel chunk jobs within a quantum leaf ) # 3) Build a QuantumLayer and a small model b = CircuitBuilder(n_modes=6) b.add_rotations(trainable=True, name="theta") b.add_angle_encoding(modes=[0, 1], name="px") b.add_entangling_layer() q = QuantumLayer( input_size=2, builder=b, n_photons=2, measurement_strategy=MeasurementStrategy.probs( computation_space=ComputationSpace.UNBUNCHED, ), ).eval() model = nn.Sequential( nn.Linear(3, 2, bias=False), q, nn.Linear(15, 4, bias=False), # 15 = C(6,2) unbunched outputs nn.Softmax(dim=-1), ).eval() # 4) Run remotely with sampling X = torch.rand(8, 3) y = proc.forward(model, X, nsample=5000) print(y.shape) # (8, 4) Quick Start — Scaleway Session ------------------------------- .. code-block:: python import perceval.providers.scaleway as scw import torch from merlin.algorithms import QuantumLayer from merlin.builder.circuit_builder import CircuitBuilder from merlin.core.computation_space import ComputationSpace from merlin.core.merlin_processor import MerlinProcessor from merlin.measurement.strategies import MeasurementStrategy # 1) Open a Scaleway session (context manager handles cleanup) with scw.Session( "EMU-ASCELLA-6PQ", # platform name project_id="YOUR_SCW_PROJECT_ID", # or read from env token="YOUR_SCW_SECRET_KEY", # or read from env deduplication_id="merlin-guide", # reuse session if still alive max_idle_duration_s=300, max_duration_s=600, ) as session: # 2) Wrap the session with MerlinProcessor proc = MerlinProcessor( session=session, microbatch_size=32, timeout=300.0, max_shots_per_call=5000, ) # 3) Build a quantum layer b = CircuitBuilder(n_modes=6) b.add_rotations(trainable=True, name="theta") b.add_angle_encoding(modes=[0, 1], name="px") b.add_entangling_layer() q = QuantumLayer( input_size=2, builder=b, n_photons=2, measurement_strategy=MeasurementStrategy.probs( computation_space=ComputationSpace.UNBUNCHED, ), ).eval() # 4) Run remotely X = torch.rand(8, 2) y = proc.forward(q, X, nsample=1000) print(y.shape) # (8, 15) Instantiation & Options ----------------------- .. code-block:: text MerlinProcessor( remote_processor=None, # RemoteProcessor — legacy path session=None, # ISession — preferred path microbatch_size=32, timeout=3600.0, max_shots_per_call=None, chunk_concurrency=1, ) Exactly **one** of ``remote_processor`` or ``session`` must be provided. * **remote_processor (RemoteProcessor | None)**: Quandela Cloud backend. Merlin clones it internally per chunk so multiple jobs can run safely in parallel without altering your original instance. * **session (ISession | None)**: A Perceval session object — e.g. from ``perceval.providers.scaleway.Session``. Merlin builds a fresh ``RemoteProcessor`` from the session for each chunk, so chunking and concurrency work identically to the ``RemoteProcessor`` path. * **microbatch_size (int)**: maximum number of input rows per **cloud job**. If your input batch ``B`` is larger, the batch is split into chunks of size ``<= microbatch_size``. Applies to both the ``RemoteProcessor`` and ``ISession`` paths. * **timeout (float)**: default wall-clock limit (in seconds) for each ``forward`` / ``forward_async`` call. Use per-call override (see below). * **max_shots_per_call (int | None)**: cap for **each** cloud call's ``max_shots_per_call`` parameter on the Perceval ``Sampler``. If ``None``, Merlin uses an internal default (10 000). If the requested ``nsample`` for a call exceeds this cap, Merlin automatically raises it to match so that Perceval does not silently clamp the sample count. * **chunk_concurrency (int)**: maximum number of **chunks** submitted in parallel **per quantum leaf**. Default ``1`` (serial). Increase for higher throughput when the backend allows it. Computation Spaces ------------------ The computation space controls which output Fock states are included in the probability vector. It is specified via ``MeasurementStrategy``: .. code-block:: python from merlin.measurement.strategies import MeasurementStrategy # UNBUNCHED — at most one photon per mode. Output dim = C(m, n). MeasurementStrategy.probs(computation_space=ComputationSpace.UNBUNCHED) # FOCK — arbitrary photon occupation (bunching allowed). Output dim = C(m + n − 1, n). MeasurementStrategy.probs(computation_space=ComputationSpace.FOCK) ``MerlinProcessor`` automatically detects the computation space of each quantum leaf and arranges the returned probability tensor to match the state ordering used by the local SLOS backend. This ensures that index *i* of the cloud result maps to the same Fock state as index *i* of a local ``layer(X)`` call. Execution API ------------- Synchronous ^^^^^^^^^^^ .. code-block:: python y = proc.forward(layer_or_model, X, nsample=20000, timeout=15.0) * **nsample (int | None)**: If the backend exposes ``"probs"`` in ``remote_processor.available_commands``, Merlin uses exact probabilities and ignores ``nsample``. Otherwise, Merlin uses sampling; ``nsample`` controls the shots per input. * **timeout (float | None)**: overrides the constructor default for this call. ``None`` or ``0`` means no time limit. Asynchronous ^^^^^^^^^^^^ .. code-block:: python fut = proc.forward_async(layer_or_model, X, nsample=3000, timeout=None) # Helpers injected on the Future: fut.job_ids # list[str]: job ids across all chunks/leaves fut.status() # dict: {state, progress, message, chunks_*} fut.cancel_remote() # request cancellation; .wait() -> CancelledError y = fut.wait() * **Cancellation**: ``fut.cancel_remote()`` signals the worker to cancel and issues remote job cancellation (best effort). ``fut.wait()`` then raises ``concurrent.futures.CancelledError``. ``proc.cancel_all()`` cancels **all** active jobs across all futures. * **Context manager**: Exiting a ``with MerlinProcessor(...) as proc:`` block triggers ``cancel_all()``, ensuring stray jobs are stopped. Batching & Chunking ------------------- * If ``len(X) > microbatch_size``, Merlin splits into chunks of size ``<= microbatch_size`` and submits up to ``chunk_concurrency`` chunk-jobs in parallel **for that quantum leaf**. This applies to both the ``RemoteProcessor`` and ``ISession`` paths. * The Future aggregates **all job IDs** across leaves in ``future.job_ids``. It also exposes chunk counters via ``future.status()``: .. code-block:: text {"state": "...", "progress": ..., "message": "...", "chunks_total": N, "chunks_done": k, "active_chunks": c} * If a chunk fails, Merlin retries up to 3 times with exponential backoff. Cancellation and timeout errors are propagated immediately without retry. Device & dtype round-trip ------------------------- Inputs are moved to CPU for remote execution when needed, and the final tensor is returned on the **original device and dtype** of your input (e.g., preserve CUDA when possible for downstream ops). Offload Policy & Local Overrides -------------------------------- * By default, modules that provide ``export_config()`` are treated as **quantum leaves** and offloaded. * Set ``layer.force_local = True`` to force **local** execution (useful for debugging and A/B comparisons). Estimating Required Shots ------------------------- Merlin includes a helper that proxies Perceval's built-in estimator and **does not** submit jobs: .. code-block:: python estimates = proc.estimate_required_shots_per_input( layer=q, input=X, # shape [B, D] or [D] desired_samples_per_input=2_000, ) # -> list[int] of length B (or 1 for a single vector). # 0 means "not viable" under current platform/filters. This is a **planner** only; it doesn't modify processor state or job history. Timeouts & Errors ----------------- * **Timeout**: if a per-call or default timeout elapses, Merlin issues remote cancellation and raises ``TimeoutError``. * **Cancellation**: ``fut.cancel_remote()`` or ``proc.cancel_all()`` --> pending chunk workers raise ``CancelledError``; completed chunks are discarded for the call. * **Remote failures**: if the backend marks a job as failed, Merlin raises a ``RuntimeError`` with the platform message. If the message indicates an explicit remote cancel, Merlin maps it to ``CancelledError``. * **Retries**: transient failures (non-cancel, non-timeout) trigger up to 3 automatic retries per chunk with exponential backoff. Multiple Quantum Layers ----------------------- Sequential models with multiple quantum leaves are supported: * Each quantum leaf is processed in order; each may chunk and run those chunks with its own intra-leaf concurrency (``chunk_concurrency``). * ``future.job_ids`` will include all job IDs across all leaves. Workflow Recipes ---------------- Mixed classical → quantum → classical ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Works with both computation spaces — just adjust the output dimension: .. code-block:: python from math import comb from merlin.measurement.strategies import MeasurementStrategy # UNBUNCHED: output dim = C(m, n) q = QuantumLayer( input_size=2, builder=b, # your CircuitBuilder with n_modes=6 n_photons=2, measurement_strategy=MeasurementStrategy.probs( computation_space=ComputationSpace.UNBUNCHED, ), ).eval() dist = comb(6, 2) # 15 # Or FOCK (bunched): output dim = C(m + n - 1, n) # q = QuantumLayer( # input_size=2, builder=b, n_photons=2, # measurement_strategy=MeasurementStrategy.probs( # computation_space=ComputationSpace.FOCK, # ), # ).eval() # dist = comb(6 + 2 - 1, 2) # 21 model = nn.Sequential( nn.Linear(3, 2, bias=False), q, nn.Linear(dist, 4, bias=False), nn.Softmax(dim=-1), ).eval() proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos")) X = torch.rand(6, 3) y = proc.forward(model, X, nsample=5000) Gradient-free fine-tuning with COBYLA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ No autograd through the quantum layer — optimise circuit parameters directly using SciPy: .. code-block:: python from scipy.optimize import minimize from merlin.measurement.strategies import MeasurementStrategy q = QuantumLayer( input_size=2, builder=b, n_photons=2, measurement_strategy=MeasurementStrategy.probs( computation_space=ComputationSpace.UNBUNCHED, ), ).eval() dist = q(torch.rand(2, 2)).shape[1] readout = nn.Linear(dist, 1, bias=False).eval() pre = nn.Linear(3, 2, bias=False).eval() model = nn.Sequential(pre, q, readout).eval() # Flatten quantum params we will tune (keep classical layers fixed) q_params = [(n, p) for n, p in q.named_parameters() if p.requires_grad] shapes = [p.shape for _, p in q_params] sizes = [p.numel() for _, p in q_params] def get_flat(): return torch.cat([p.detach().flatten().cpu() for _, p in q_params], dim=0) def set_from_flat(vec): off = 0 with torch.no_grad(): for (_, p), sz, shp in zip(q_params, sizes, shapes, strict=False): chunk = vec[off : off + sz].view(shp).to(p.dtype) p.data.copy_(chunk.to(p.device)) off += sz x0 = get_flat().double().numpy() proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos")) X = torch.rand(8, 3) # Objective: maximise mean scalar output → minimise negative def objective(v_np): v = torch.from_numpy(v_np).to(torch.float64) set_from_flat(v.to(torch.float32)) with torch.no_grad(): y = proc.forward(model, X, nsample=5000) return -float(y.mean().item()) res = minimize(objective, x0, method="COBYLA", options={"maxiter": len(x0) + 6, "rhobeg": 0.5}) print("final objective:", res.fun) Local vs remote A/B (force simulation) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python q = QuantumLayer(...).eval() X = torch.rand(4, q.input_size) proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos")) # Remote path (offloaded) y_remote = proc.forward(q, X, nsample=5000) # Local path (force simulation) q.force_local = True y_local = proc.forward(q, X, nsample=5000) # Compare distributions (allowing some sampling noise) print((y_local - y_remote).abs().mean()) Monitoring status & safe cancellation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python fut = proc.forward_async(q, torch.rand(16, 2), nsample=40000, timeout=None) # Poll status (state/progress/message + chunk counters) print(fut.status()) # If needed, cancel cooperatively fut.cancel_remote() try: _ = fut.wait() except Exception as e: print("Cancelled:", type(e).__name__) High-throughput batching with chunking ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python proc = MerlinProcessor( pcvl.RemoteProcessor("sim:slos"), microbatch_size=8, chunk_concurrency=2, ) X = torch.rand(64, 2) fut = proc.forward_async(q, X, nsample=3000) Y = fut.wait() print("chunks:", fut.status()) Scaleway session with context manager ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python import os import perceval.providers.scaleway as scw with scw.Session( "sim:ascella", project_id=os.environ["SCW_PROJECT_ID"], token=os.environ["SCW_SECRET_KEY"], deduplication_id="my-training-run", max_idle_duration_s=300, max_duration_s=1800, ) as session: with MerlinProcessor(session=session, timeout=300.0) as proc: q = QuantumLayer(...).eval() y = proc.forward(q, X, nsample=1000) # ... # MerlinProcessor context manager cancels any stray jobs on exit. # Scaleway session is closed on exit. Troubleshooting --------------- * **No job IDs appear**: Your backend may be very fast, or your layer ran locally (e.g., ``force_local=True``). * **"Lowered max_samples" warning from Perceval**: This means ``nsample`` exceeded ``max_shots_per_call``. Merlin now auto-raises the cap, but if you see this with an older version, set ``max_shots_per_call`` >= your ``nsample``. * **Timeouts in CI**: Backends vary. Make tests resilient to fast or slow responses by polling ``future.done()`` before asserting on timeout exceptions. API Reference (Summary) ----------------------- **Constructor** * ``MerlinProcessor(remote_processor=None, session=None, microbatch_size=32, timeout=3600.0, max_shots_per_call=None, chunk_concurrency=1)`` **Execution** * ``forward(module, input, *, nsample=None, timeout=None) -> torch.Tensor`` * ``forward_async(module, input, *, nsample=None, timeout=None) -> Future`` * ``future.job_ids: list[str]`` * ``future.status() -> dict`` * ``future.cancel_remote() -> None`` **Lifecycle** * ``cancel_all() -> None`` * Context manager (``with MerlinProcessor(...) as proc:``) **Estimation** * ``estimate_required_shots_per_input(layer, input, desired_samples_per_input) -> list[int]`` **History** * ``get_job_history() -> list[RemoteJob]`` * ``clear_job_history() -> None`` Version Notes ------------- * ``session`` parameter added for ``ISession``-based backends (Scaleway). Exactly one of ``remote_processor`` or ``session`` must be provided. Both paths now support chunking and ``chunk_concurrency`` — each chunk gets an independent ``RemoteProcessor`` via ``session.build_remote_processor()``. * ``MeasurementStrategy.probs(computation_space=...)`` replaces the older ``no_bunching`` flag and bare ``computation_space`` parameter on ``QuantumLayer``. Both ``ComputationSpace.FOCK`` (bunched) and ``ComputationSpace.UNBUNCHED`` are fully supported for cloud execution. * Default ``chunk_concurrency`` is **1** (serial intra-leaf). * Failed chunks are retried up to 3 times with exponential backoff. Cancellation and timeout errors propagate immediately. * ``max_shots_per_call`` is automatically raised to match ``nsample`` when needed, preventing Perceval from silently clamping the sample count.