MerlinProcessor User Guide

Overview

MerlinProcessor is a lightweight RPC-style bridge between your PyTorch models and remote cloud QPU/simulator backends. It supports two backend paths:

Perceval ``RemoteProcessor`` — the original Quandela Cloud path.
Perceval ``ISession`` — the preferred path for Scaleway-hosted platforms (and any future session-based providers).

With either backend you can:

Offload quantum leaves (e.g. QuantumLayer) to the cloud while keeping classical layers local.
Submit batched inputs; when batches are large, Merlin will chunk them and (optionally) run chunks in parallel.
Drive execution synchronously (forward) or asynchronously (forward_async returning a torch.futures.Future).
Monitor status, collect job IDs, cancel jobs, and enforce timeouts.
Estimate required shot counts per input ahead of time.

Merlin deliberately avoids hidden “auto-shots”: you control sampling. The optional estimator is provided to help you choose appropriate values.

Prerequisites

You need one of the following backends configured:

Option A — Quandela Cloud (RemoteProcessor)

perceval-quandela configured with a valid cloud token (via pcvl.RemoteConfig cache or environment).
A Perceval RemoteProcessor instance (e.g. a simulator like "sim:slos" or a QPU-backed platform).

import perceval as pcvl
from merlin.measurement.strategies import MeasurementStrategy

# Configure your Quandela Cloud token (one of the following):
RemoteConfig.set_token("YOUR_TOKEN")                # option 1: global config
rp = pcvl.RemoteProcessor("sim:slos")

rp = pcvl.RemoteProcessor("sim:slos", "YOUR_TOKEN") # option 2: inline token

Option B — Scaleway (ISession)

The perceval.providers.scaleway module installed.
A Scaleway project ID and API secret key (typically set via SCW_PROJECT_ID and SCW_SECRET_KEY environment variables).

Both paths require:

A Merlin quantum layer that provides export_config() (e.g. merlin.algorithms.QuantumLayer).

Quick Start — Quandela Cloud

import perceval as pcvl
import torch
import torch.nn as nn

from merlin.algorithms import QuantumLayer
from merlin.builder.circuit_builder import CircuitBuilder
from merlin.core.computation_space import ComputationSpace
from merlin.core.merlin_processor import MerlinProcessor
from merlin.measurement.strategies import MeasurementStrategy

# 1) Create the Perceval RemoteProcessor (token must already be configured)
rp = pcvl.RemoteProcessor("sim:slos")

# 2) Wrap it with MerlinProcessor
proc = MerlinProcessor(
    rp,
    microbatch_size=32,        # batch chunk size per cloud call
    timeout=3600.0,            # default wall-time per forward (seconds)
    max_shots_per_call=None,   # optional cap per cloud call (see below)
    chunk_concurrency=1,       # parallel chunk jobs within a quantum leaf
)

# 3) Build a QuantumLayer and a small model
b = CircuitBuilder(n_modes=6)
b.add_rotations(trainable=True, name="theta")
b.add_angle_encoding(modes=[0, 1], name="px")
b.add_entangling_layer()

q = QuantumLayer(
    input_size=2,
    builder=b,
    n_photons=2,
    measurement_strategy=MeasurementStrategy.probs(
        computation_space=ComputationSpace.UNBUNCHED,
    ),
).eval()

model = nn.Sequential(
    nn.Linear(3, 2, bias=False),
    q,
    nn.Linear(15, 4, bias=False),   # 15 = C(6,2) unbunched outputs
    nn.Softmax(dim=-1),
).eval()

# 4) Run remotely with sampling
X = torch.rand(8, 3)
y = proc.forward(model, X, nsample=5000)
print(y.shape)  # (8, 4)

Quick Start — Scaleway Session

import perceval.providers.scaleway as scw
import torch

from merlin.algorithms import QuantumLayer
from merlin.builder.circuit_builder import CircuitBuilder
from merlin.core.computation_space import ComputationSpace
from merlin.core.merlin_processor import MerlinProcessor
from merlin.measurement.strategies import MeasurementStrategy

# 1) Open a Scaleway session (context manager handles cleanup)
with scw.Session(
    "EMU-ASCELLA-6PQ",                         # platform name
    project_id="YOUR_SCW_PROJECT_ID",      # or read from env
    token="YOUR_SCW_SECRET_KEY",           # or read from env
    deduplication_id="merlin-guide",       # reuse session if still alive
    max_idle_duration_s=300,
    max_duration_s=600,
) as session:

    # 2) Wrap the session with MerlinProcessor
    proc = MerlinProcessor(
        session=session,
        microbatch_size=32,
        timeout=300.0,
        max_shots_per_call=5000,
    )

    # 3) Build a quantum layer
    b = CircuitBuilder(n_modes=6)
    b.add_rotations(trainable=True, name="theta")
    b.add_angle_encoding(modes=[0, 1], name="px")
    b.add_entangling_layer()

    q = QuantumLayer(
        input_size=2,
        builder=b,
        n_photons=2,
        measurement_strategy=MeasurementStrategy.probs(
            computation_space=ComputationSpace.UNBUNCHED,
        ),
    ).eval()

    # 4) Run remotely
    X = torch.rand(8, 2)
    y = proc.forward(q, X, nsample=1000)
    print(y.shape)  # (8, 15)

Instantiation & Options

MerlinProcessor(
    remote_processor=None,       # RemoteProcessor — legacy path
    session=None,                # ISession — preferred path
    microbatch_size=32,
    timeout=3600.0,
    max_shots_per_call=None,
    chunk_concurrency=1,
)

Exactly one of remote_processor or session must be provided.

remote_processor (RemoteProcessor | None): Quandela Cloud backend. Merlin clones it internally per chunk so multiple jobs can run safely in parallel without altering your original instance.
session (ISession | None): A Perceval session object — e.g. from perceval.providers.scaleway.Session. Merlin builds a fresh RemoteProcessor from the session for each chunk, so chunking and concurrency work identically to the RemoteProcessor path.
microbatch_size (int): maximum number of input rows per cloud job. If your input batch B is larger, the batch is split into chunks of size <= microbatch_size. Applies to both the RemoteProcessor and ISession paths.
timeout (float): default wall-clock limit (in seconds) for each forward / forward_async call. Use per-call override (see below).
max_shots_per_call (int | None): cap for each cloud call’s max_shots_per_call parameter on the Perceval Sampler. If None, Merlin uses an internal default (10 000). If the requested nsample for a call exceeds this cap, Merlin automatically raises it to match so that Perceval does not silently clamp the sample count.
chunk_concurrency (int): maximum number of chunks submitted in parallel per quantum leaf. Default 1 (serial). Increase for higher throughput when the backend allows it.

Computation Spaces

The computation space controls which output Fock states are included in the probability vector. It is specified via MeasurementStrategy:

from merlin.measurement.strategies import MeasurementStrategy

# UNBUNCHED — at most one photon per mode. Output dim = C(m, n).
MeasurementStrategy.probs(computation_space=ComputationSpace.UNBUNCHED)

# FOCK — arbitrary photon occupation (bunching allowed). Output dim = C(m + n − 1, n).
MeasurementStrategy.probs(computation_space=ComputationSpace.FOCK)

MerlinProcessor automatically detects the computation space of each quantum leaf and arranges the returned probability tensor to match the state ordering used by the local SLOS backend. This ensures that index i of the cloud result maps to the same Fock state as index i of a local layer(X) call.

Execution API

Synchronous

y = proc.forward(layer_or_model, X, nsample=20000, timeout=15.0)

nsample (int | None): If the backend exposes "probs" in remote_processor.available_commands, Merlin uses exact probabilities and ignores nsample. Otherwise, Merlin uses sampling; nsample controls the shots per input.
timeout (float | None): overrides the constructor default for this call. None or 0 means no time limit.

Asynchronous

fut = proc.forward_async(layer_or_model, X, nsample=3000, timeout=None)

# Helpers injected on the Future:
fut.job_ids         # list[str]: job ids across all chunks/leaves
fut.status()        # dict: {state, progress, message, chunks_*}
fut.cancel_remote() # request cancellation; .wait() -> CancelledError

y = fut.wait()

Cancellation: fut.cancel_remote() signals the worker to cancel and issues remote job cancellation (best effort). fut.wait() then raises concurrent.futures.CancelledError. proc.cancel_all() cancels all active jobs across all futures.
Context manager: Exiting a with MerlinProcessor(...) as proc: block triggers cancel_all(), ensuring stray jobs are stopped.

Batching & Chunking

If len(X) > microbatch_size, Merlin splits into chunks of size <= microbatch_size and submits up to chunk_concurrency chunk-jobs in parallel for that quantum leaf. This applies to both the RemoteProcessor and ISession paths.
The Future aggregates all job IDs across leaves in future.job_ids. It also exposes chunk counters via future.status():
```
{"state": "...", "progress": ..., "message": "...",
 "chunks_total": N, "chunks_done": k, "active_chunks": c}
```
If a chunk fails, Merlin retries up to 3 times with exponential backoff. Cancellation and timeout errors are propagated immediately without retry.

Device & dtype round-trip

Inputs are moved to CPU for remote execution when needed, and the final tensor is returned on the original device and dtype of your input (e.g., preserve CUDA when possible for downstream ops).

Offload Policy & Local Overrides

By default, modules that provide export_config() are treated as quantum leaves and offloaded.
Set layer.force_local = True to force local execution (useful for debugging and A/B comparisons).

Estimating Required Shots

Merlin includes a helper that proxies Perceval’s built-in estimator and does not submit jobs:

estimates = proc.estimate_required_shots_per_input(
    layer=q,
    input=X,                          # shape [B, D] or [D]
    desired_samples_per_input=2_000,
)
# -> list[int] of length B (or 1 for a single vector).
#    0 means "not viable" under current platform/filters.

This is a planner only; it doesn’t modify processor state or job history.

Timeouts & Errors

Timeout: if a per-call or default timeout elapses, Merlin issues remote cancellation and raises TimeoutError.
Cancellation: fut.cancel_remote() or proc.cancel_all() –> pending chunk workers raise CancelledError; completed chunks are discarded for the call.
Remote failures: if the backend marks a job as failed, Merlin raises a RuntimeError with the platform message. If the message indicates an explicit remote cancel, Merlin maps it to CancelledError.
Retries: transient failures (non-cancel, non-timeout) trigger up to 3 automatic retries per chunk with exponential backoff.

Multiple Quantum Layers

Sequential models with multiple quantum leaves are supported:

Each quantum leaf is processed in order; each may chunk and run those chunks with its own intra-leaf concurrency (chunk_concurrency).
future.job_ids will include all job IDs across all leaves.

Workflow Recipes

Mixed classical → quantum → classical

Works with both computation spaces — just adjust the output dimension:

from math import comb
from merlin.measurement.strategies import MeasurementStrategy

# UNBUNCHED: output dim = C(m, n)
q = QuantumLayer(
    input_size=2,
    builder=b,  # your CircuitBuilder with n_modes=6
    n_photons=2,
    measurement_strategy=MeasurementStrategy.probs(
        computation_space=ComputationSpace.UNBUNCHED,
    ),
).eval()
dist = comb(6, 2)  # 15

# Or FOCK (bunched): output dim = C(m + n - 1, n)
# q = QuantumLayer(
#     input_size=2, builder=b, n_photons=2,
#     measurement_strategy=MeasurementStrategy.probs(
#         computation_space=ComputationSpace.FOCK,
#     ),
# ).eval()
# dist = comb(6 + 2 - 1, 2)  # 21

model = nn.Sequential(
    nn.Linear(3, 2, bias=False),
    q,
    nn.Linear(dist, 4, bias=False),
    nn.Softmax(dim=-1),
).eval()

proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos"))
X = torch.rand(6, 3)
y = proc.forward(model, X, nsample=5000)

Gradient-free fine-tuning with COBYLA

No autograd through the quantum layer — optimise circuit parameters directly using SciPy:

from scipy.optimize import minimize
from merlin.measurement.strategies import MeasurementStrategy

q = QuantumLayer(
    input_size=2,
    builder=b,
    n_photons=2,
    measurement_strategy=MeasurementStrategy.probs(
        computation_space=ComputationSpace.UNBUNCHED,
    ),
).eval()
dist = q(torch.rand(2, 2)).shape[1]

readout = nn.Linear(dist, 1, bias=False).eval()
pre = nn.Linear(3, 2, bias=False).eval()
model = nn.Sequential(pre, q, readout).eval()

# Flatten quantum params we will tune (keep classical layers fixed)
q_params = [(n, p) for n, p in q.named_parameters() if p.requires_grad]
shapes = [p.shape for _, p in q_params]
sizes = [p.numel() for _, p in q_params]

def get_flat():
    return torch.cat([p.detach().flatten().cpu() for _, p in q_params], dim=0)

def set_from_flat(vec):
    off = 0
    with torch.no_grad():
        for (_, p), sz, shp in zip(q_params, sizes, shapes, strict=False):
            chunk = vec[off : off + sz].view(shp).to(p.dtype)
            p.data.copy_(chunk.to(p.device))
            off += sz

x0 = get_flat().double().numpy()
proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos"))
X = torch.rand(8, 3)

# Objective: maximise mean scalar output → minimise negative
def objective(v_np):
    v = torch.from_numpy(v_np).to(torch.float64)
    set_from_flat(v.to(torch.float32))
    with torch.no_grad():
        y = proc.forward(model, X, nsample=5000)
        return -float(y.mean().item())

res = minimize(objective, x0, method="COBYLA",
               options={"maxiter": len(x0) + 6, "rhobeg": 0.5})
print("final objective:", res.fun)

Local vs remote A/B (force simulation)

q = QuantumLayer(...).eval()
X = torch.rand(4, q.input_size)
proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos"))

# Remote path (offloaded)
y_remote = proc.forward(q, X, nsample=5000)

# Local path (force simulation)
q.force_local = True
y_local = proc.forward(q, X, nsample=5000)

# Compare distributions (allowing some sampling noise)
print((y_local - y_remote).abs().mean())

Monitoring status & safe cancellation

fut = proc.forward_async(q, torch.rand(16, 2), nsample=40000, timeout=None)

# Poll status (state/progress/message + chunk counters)
print(fut.status())

# If needed, cancel cooperatively
fut.cancel_remote()
try:
    _ = fut.wait()
except Exception as e:
    print("Cancelled:", type(e).__name__)

High-throughput batching with chunking

proc = MerlinProcessor(
    pcvl.RemoteProcessor("sim:slos"),
    microbatch_size=8,
    chunk_concurrency=2,
)
X = torch.rand(64, 2)
fut = proc.forward_async(q, X, nsample=3000)
Y = fut.wait()
print("chunks:", fut.status())

Scaleway session with context manager

import os
import perceval.providers.scaleway as scw

with scw.Session(
    "sim:ascella",
    project_id=os.environ["SCW_PROJECT_ID"],
    token=os.environ["SCW_SECRET_KEY"],
    deduplication_id="my-training-run",
    max_idle_duration_s=300,
    max_duration_s=1800,
) as session:

    with MerlinProcessor(session=session, timeout=300.0) as proc:
        q = QuantumLayer(...).eval()
        y = proc.forward(q, X, nsample=1000)
        # ...

    # MerlinProcessor context manager cancels any stray jobs on exit.
# Scaleway session is closed on exit.

Troubleshooting

No job IDs appear: Your backend may be very fast, or your layer ran locally (e.g., force_local=True).
“Lowered max_samples” warning from Perceval: This means nsample exceeded max_shots_per_call. Merlin now auto-raises the cap, but if you see this with an older version, set max_shots_per_call >= your nsample.
Timeouts in CI: Backends vary. Make tests resilient to fast or slow responses by polling future.done() before asserting on timeout exceptions.

API Reference (Summary)

Constructor

MerlinProcessor(remote_processor=None, session=None, microbatch_size=32, timeout=3600.0, max_shots_per_call=None, chunk_concurrency=1)

Execution

forward(module, input, *, nsample=None, timeout=None) -> torch.Tensor
forward_async(module, input, *, nsample=None, timeout=None) -> Future
- future.job_ids: list[str]
- future.status() -> dict
- future.cancel_remote() -> None

Lifecycle

cancel_all() -> None
Context manager (with MerlinProcessor(...) as proc:)

Estimation

estimate_required_shots_per_input(layer, input, desired_samples_per_input) -> list[int]

History

get_job_history() -> list[RemoteJob]
clear_job_history() -> None

Version Notes

session parameter added for ISession-based backends (Scaleway). Exactly one of remote_processor or session must be provided. Both paths now support chunking and chunk_concurrency — each chunk gets an independent RemoteProcessor via session.build_remote_processor().
MeasurementStrategy.probs(computation_space=...) replaces the older no_bunching flag and bare computation_space parameter on QuantumLayer. Both ComputationSpace.FOCK (bunched) and ComputationSpace.UNBUNCHED are fully supported for cloud execution.
Default chunk_concurrency is 1 (serial intra-leaf).
Failed chunks are retried up to 3 times with exponential backoff. Cancellation and timeout errors propagate immediately.
max_shots_per_call is automatically raised to match nsample when needed, preventing Perceval from silently clamping the sample count.