MerlinProcessor User Guide
Overview
MerlinProcessor is a lightweight RPC-style bridge between your PyTorch
models and remote cloud QPU/simulator backends. It supports two backend paths:
Perceval ``RemoteProcessor`` — the original Quandela Cloud path.
Perceval ``ISession`` — the preferred path for Scaleway-hosted platforms (and any future session-based providers).
With either backend you can:
Offload quantum leaves (e.g.
QuantumLayer) to the cloud while keeping classical layers local.Submit batched inputs; when remote batches are large, Merlin will chunk them and (optionally) run chunks in parallel.
Drive execution synchronously (
forward) or asynchronously (forward_asyncreturning atorch.futures.Future).Monitor status, collect job IDs, cancel jobs, and enforce timeouts.
Estimate required shot counts per input ahead of time.
Merlin deliberately avoids hidden “auto-shots”: you control sampling. The optional estimator is provided to help you choose appropriate values.
Prerequisites
You need one of the following backends configured:
Option A — Quandela Cloud (RemoteProcessor)
perceval-quandelaconfigured with a valid cloud token (viapcvl.RemoteConfigcache or environment).A Perceval
RemoteProcessorinstance (e.g. a simulator like"sim:slos"or a QPU-backed platform).
import perceval as pcvl
from merlin.measurement.strategies import MeasurementStrategy
# Configure your Quandela Cloud token (one of the following):
RemoteConfig.set_token("YOUR_TOKEN") # option 1: global config
rp = pcvl.RemoteProcessor("sim:slos")
rp = pcvl.RemoteProcessor("sim:slos", "YOUR_TOKEN") # option 2: inline token
Option B — Scaleway (ISession)
The
perceval.providers.scalewaymodule installed.A Scaleway project ID and API secret key (typically set via
SCW_PROJECT_IDandSCW_SECRET_KEYenvironment variables).
Both paths require:
A Merlin quantum layer that provides
export_config()(e.g.merlin.algorithms.QuantumLayer).
Quick Start — Quandela Cloud
import perceval as pcvl
import torch
import torch.nn as nn
from merlin.algorithms import QuantumLayer
from merlin.builder.circuit_builder import CircuitBuilder
from merlin.core.computation_space import ComputationSpace
from merlin.core.merlin_processor import MerlinProcessor
from merlin.measurement.strategies import MeasurementStrategy
# 1) Create the Perceval RemoteProcessor (token must already be configured)
rp = pcvl.RemoteProcessor("sim:slos")
# 2) Wrap it with MerlinProcessor
proc = MerlinProcessor(
rp,
microbatch_size=32, # batch chunk size per cloud call
timeout=3600.0, # default wall-time per forward (seconds)
max_shots_per_call=None, # optional cap per cloud call (see below)
chunk_concurrency=1, # parallel chunk jobs within a quantum leaf
)
# 3) Build a QuantumLayer and a small model
b = CircuitBuilder(n_modes=6)
b.add_rotations(trainable=True, name="theta")
b.add_angle_encoding(modes=[0, 1], name="px")
b.add_entangling_layer()
q = QuantumLayer(
input_size=2,
builder=b,
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
model = nn.Sequential(
nn.Linear(3, 2, bias=False),
q,
nn.Linear(15, 4, bias=False), # 15 = C(6,2) unbunched outputs
nn.Softmax(dim=-1),
).eval()
# 4) Run remotely with sampling
X = torch.rand(8, 3)
y = proc.forward(model, X, nsample=5000)
print(y.shape) # (8, 4)
Quick Start — Scaleway Session
import perceval.providers.scaleway as scw
import torch
from merlin.algorithms import QuantumLayer
from merlin.builder.circuit_builder import CircuitBuilder
from merlin.core.computation_space import ComputationSpace
from merlin.core.merlin_processor import MerlinProcessor
from merlin.measurement.strategies import MeasurementStrategy
# 1) Open a Scaleway session (context manager handles cleanup)
with scw.Session(
"EMU-ASCELLA-6PQ", # platform name
project_id="YOUR_SCW_PROJECT_ID", # or read from env
token="YOUR_SCW_SECRET_KEY", # or read from env
deduplication_id="merlin-guide", # reuse session if still alive
max_idle_duration_s=300,
max_duration_s=600,
) as session:
# 2) Wrap the session with MerlinProcessor
proc = MerlinProcessor(
session=session,
microbatch_size=32,
timeout=300.0,
max_shots_per_call=5000,
)
# 3) Build a quantum layer
b = CircuitBuilder(n_modes=6)
b.add_rotations(trainable=True, name="theta")
b.add_angle_encoding(modes=[0, 1], name="px")
b.add_entangling_layer()
q = QuantumLayer(
input_size=2,
builder=b,
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
# 4) Run remotely
X = torch.rand(8, 2)
y = proc.forward(q, X, nsample=1000)
print(y.shape) # (8, 15)
Instantiation & Options
MerlinProcessor(
remote_processor=None, # RemoteProcessor — deprecated legacy path
session=None, # ISession — preferred path
microbatch_size=32,
timeout=3600.0,
max_shots_per_call=None,
chunk_concurrency=1,
token=None,
*,
processor=None, # AProcessor — keyword-only local/normalized path
)
Exactly one of remote_processor, session, or processor must be
provided.
Warning
Deprecated since version 0.4:
The remote_processor constructor argument is deprecated and will be
removed in a future release. Pass the same RemoteProcessor through
processor= instead.
remote_processor (RemoteProcessor | None): Deprecated Quandela Cloud backend entry point. Pass the same object through
processor=instead. Merlin clones it internally per chunk so multiple jobs can run safely in parallel without altering your original instance.session (ISession | None): A Perceval session object — e.g. from
perceval.providers.scaleway.Session. Merlin builds a freshRemoteProcessorfrom the session for each chunk, so chunking and concurrency work identically to theRemoteProcessorpath.microbatch_size (int): maximum number of input rows per cloud job. If your input batch
Bis larger, the batch is split into chunks of size<= microbatch_size. Applies to both theRemoteProcessorandISessionpaths.timeout (float): default wall-clock limit (in seconds) for each
forward/forward_asynccall. Use per-call override (see below).max_shots_per_call (int | None): cap for each cloud call’s
max_shots_per_callparameter on the PercevalSampler. IfNone, Merlin uses an internal cap default (100 000). If the requestednsamplefor a call exceeds this cap, Merlin clamps the submitted sample count to the cap.chunk_concurrency (int): maximum number of chunks submitted in parallel per quantum leaf. Default
1(serial). Increase for higher throughput when the backend allows it.token (str | None): authentication token forwarded to cloned remote processors. If omitted, Merlin extracts it from the
RemoteProcessorRPC handler. Ignored for local and session paths.processor (AProcessor | None): keyword-only Perceval processor entry point. Local processors use the local backend path.
RemoteProcessorinstances passed here are normalized to the remote processor path.
Computation Spaces
The computation space controls which output Fock states are included in the
probability vector. It is specified via MeasurementStrategy:
from merlin.measurement.strategies import MeasurementStrategy
# UNBUNCHED — at most one photon per mode. Output dim = C(m, n).
MeasurementStrategy.probs(computation_space=ComputationSpace.UNBUNCHED)
# FOCK — arbitrary photon occupation (bunching allowed). Output dim = C(m + n − 1, n).
MeasurementStrategy.probs(computation_space=ComputationSpace.FOCK)
MerlinProcessor automatically detects the computation space of each quantum
leaf and arranges the returned probability tensor to match the state ordering
used by the local SLOS backend. This ensures that index i of the cloud result
maps to the same Fock state as index i of a local layer(X) call.
Execution API
Synchronous
y = proc.forward(layer_or_model, X, nsample=20000, timeout=15.0)
nsample (int | None): If the backend exposes
"probs"inremote_processor.available_commands, Merlin uses exact probabilities and ignoresnsample. Otherwise, Merlin uses sampling;nsamplecontrols the shots per input.timeout (float | None): overrides the constructor default for this call.
Noneor0means no time limit.
Asynchronous
fut = proc.forward_async(layer_or_model, X, nsample=3000, timeout=None)
# Helpers injected on the Future:
fut.job_ids # list[str]: job ids across all chunks/leaves
fut.status() # dict: {state, progress, message, chunks_*}
fut.cancel_remote() # request cancellation; .wait() -> CancelledError
y = fut.wait()
Cancellation:
fut.cancel_remote()signals the worker to cancel and issues remote job cancellation (best effort).fut.wait()then raisesconcurrent.futures.CancelledError.proc.cancel_all()cancels all active jobs across all futures.Context manager: Exiting a
with MerlinProcessor(...) as proc:block triggerscancel_all(), ensuring stray jobs are stopped.
Batching & Chunking
If
len(X) > microbatch_size, Merlin splits remote execution into chunks of size<= microbatch_sizeand submits up tochunk_concurrencychunk-jobs in parallel for that quantum leaf. This applies to both theRemoteProcessorandISessionpaths.Local
processor=execution does not use remote chunking. Merlin keeps the full PyTorch input together, maps rows to Perceval sampler iterations, and returns a batched tensor. Perceval does not execute a native vectorized batch, so local users control batch size from their PyTorch input batch rather thanmicrobatch_size.The Future aggregates all job IDs across leaves in
future.job_ids. It also exposes chunk counters viafuture.status():{"state": "...", "progress": ..., "message": "...", "chunks_total": N, "chunks_done": k, "active_chunks": c}If a chunk fails, Merlin retries up to 3 times with exponential backoff. Cancellation and timeout errors are propagated immediately without retry.
Device & dtype round-trip
Inputs are moved to CPU for remote execution when needed, and the final tensor is returned on the original device and dtype of your input (e.g., preserve CUDA when possible for downstream ops).
Offload Policy & Local Overrides
By default, modules that provide
export_config()are treated as quantum leaves and offloaded.Set
layer.force_local = Trueto force local execution (useful for debugging and A/B comparisons).
Estimating Required Shots
Merlin includes a helper that proxies Perceval’s built-in estimator and does
not submit jobs. The estimator is available only for remote processor and
session backends; local processor= backends raise RuntimeError:
estimates = proc.estimate_required_shots_per_input(
layer=q,
input=X, # shape [B, D] or [D]
desired_samples_per_input=2_000,
)
# -> list[int] of length B (or 1 for a single vector).
# 0 means "not viable" under current platform/filters.
This is a planner only; it doesn’t modify processor state or job history.
Timeouts & Errors
Timeout: if a per-call or default timeout elapses, Merlin issues remote cancellation and raises
TimeoutError.Cancellation:
fut.cancel_remote()orproc.cancel_all()–> pending chunk workers raiseCancelledError; completed chunks are discarded for the call.Remote failures: if the backend marks a job as failed, Merlin raises a
RuntimeErrorwith the platform message. If the message indicates an explicit remote cancel, Merlin maps it toCancelledError.Retries: transient failures (non-cancel, non-timeout) trigger up to 3 automatic retries per chunk with exponential backoff.
Multiple Quantum Layers
Sequential models with multiple quantum leaves are supported:
Each quantum leaf is processed in order; remote leaves may chunk and run those chunks with their own intra-leaf concurrency (
chunk_concurrency).future.job_idswill include all job IDs across all leaves.
Workflow Recipes
Mixed classical → quantum → classical
Works with both computation spaces — just adjust the output dimension:
from math import comb
from merlin.measurement.strategies import MeasurementStrategy
# UNBUNCHED: output dim = C(m, n)
q = QuantumLayer(
input_size=2,
builder=b, # your CircuitBuilder with n_modes=6
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
dist = comb(6, 2) # 15
# Or FOCK (bunched): output dim = C(m + n - 1, n)
# q = QuantumLayer(
# input_size=2, builder=b, n_photons=2,
# measurement_strategy=MeasurementStrategy.probs(
# computation_space=ComputationSpace.FOCK,
# ),
# ).eval()
# dist = comb(6 + 2 - 1, 2) # 21
model = nn.Sequential(
nn.Linear(3, 2, bias=False),
q,
nn.Linear(dist, 4, bias=False),
nn.Softmax(dim=-1),
).eval()
proc = MerlinProcessor(processor=pcvl.RemoteProcessor("sim:slos"))
X = torch.rand(6, 3)
y = proc.forward(model, X, nsample=5000)
Gradient-free fine-tuning with COBYLA
No autograd through the quantum layer — optimise circuit parameters directly using SciPy:
from scipy.optimize import minimize
from merlin.measurement.strategies import MeasurementStrategy
q = QuantumLayer(
input_size=2,
builder=b,
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
dist = q(torch.rand(2, 2)).shape[1]
readout = nn.Linear(dist, 1, bias=False).eval()
pre = nn.Linear(3, 2, bias=False).eval()
model = nn.Sequential(pre, q, readout).eval()
# Flatten quantum params we will tune (keep classical layers fixed)
q_params = [(n, p) for n, p in q.named_parameters() if p.requires_grad]
shapes = [p.shape for _, p in q_params]
sizes = [p.numel() for _, p in q_params]
def get_flat():
return torch.cat([p.detach().flatten().cpu() for _, p in q_params], dim=0)
def set_from_flat(vec):
off = 0
with torch.no_grad():
for (_, p), sz, shp in zip(q_params, sizes, shapes, strict=False):
chunk = vec[off : off + sz].view(shp).to(p.dtype)
p.data.copy_(chunk.to(p.device))
off += sz
x0 = get_flat().double().numpy()
proc = MerlinProcessor(processor=pcvl.RemoteProcessor("sim:slos"))
X = torch.rand(8, 3)
# Objective: maximise mean scalar output → minimise negative
def objective(v_np):
v = torch.from_numpy(v_np).to(torch.float64)
set_from_flat(v.to(torch.float32))
with torch.no_grad():
y = proc.forward(model, X, nsample=5000)
return -float(y.mean().item())
res = minimize(objective, x0, method="COBYLA",
options={"maxiter": len(x0) + 6, "rhobeg": 0.5})
print("final objective:", res.fun)
Local vs remote A/B (force simulation)
q = QuantumLayer(...).eval()
X = torch.rand(4, q.input_size)
proc = MerlinProcessor(processor=pcvl.RemoteProcessor("sim:slos"))
# Remote path (offloaded)
y_remote = proc.forward(q, X, nsample=5000)
# Local path (force simulation)
q.force_local = True
y_local = proc.forward(q, X, nsample=5000)
# Compare distributions (allowing some sampling noise)
print((y_local - y_remote).abs().mean())
Monitoring status & safe cancellation
fut = proc.forward_async(q, torch.rand(16, 2), nsample=40000, timeout=None)
# Poll status (state/progress/message + chunk counters)
print(fut.status())
# If needed, cancel cooperatively
fut.cancel_remote()
try:
_ = fut.wait()
except Exception as e:
print("Cancelled:", type(e).__name__)
High-throughput batching with chunking
proc = MerlinProcessor(
pcvl.RemoteProcessor("sim:slos"),
microbatch_size=8,
chunk_concurrency=2,
)
X = torch.rand(64, 2)
fut = proc.forward_async(q, X, nsample=3000)
Y = fut.wait()
print("chunks:", fut.status())
Scaleway session with context manager
import os
import perceval.providers.scaleway as scw
with scw.Session(
"sim:ascella",
project_id=os.environ["SCW_PROJECT_ID"],
token=os.environ["SCW_SECRET_KEY"],
deduplication_id="my-training-run",
max_idle_duration_s=300,
max_duration_s=1800,
) as session:
with MerlinProcessor(session=session, timeout=300.0) as proc:
q = QuantumLayer(...).eval()
y = proc.forward(q, X, nsample=1000)
# ...
# MerlinProcessor context manager cancels any stray jobs on exit.
# Scaleway session is closed on exit.
Gradient Propagation
No gradient can flow through the MerLin processor. When calling backwards directly on a MerlinProcessor’s forward pass, this error will be shown:
element 0 of tensors does not require grad and does not have a grad_fn
If a MerLin processor is part of a bigger
Pytorchmodule, the quantum layer’s parameter gradient will be None. This also means that every upstream models’ (models called before theQuantumLayerin the model’s forward) gradient will be None.The gradient can not pass through right now as pytorch can not access directly the computation on the QPU.
Differentiation through the MerLin processor will be implemented in v0.5.
Troubleshooting
No job IDs appear: Your backend may be very fast, or your layer ran locally (e.g.,
force_local=True).“Lowered max_samples” warning from Perceval: This means
nsampleexceededmax_shots_per_call. Merlin clamps the submitted sample count, but if you see this with an older version, setmax_shots_per_call>= yournsample.Timeouts in CI: Backends vary. Make tests resilient to fast or slow responses by polling
future.done()before asserting on timeout exceptions.
API Reference (Summary)
Constructor
MerlinProcessor(remote_processor=None, session=None, microbatch_size=32, timeout=3600.0, max_shots_per_call=None, chunk_concurrency=1, token=None, *, processor=None)remote_processoris deprecated; pass remote processors throughprocessor=.
Execution
forward(module, input, *, nsample=None, timeout=None) -> torch.Tensorforward_async(module, input, *, nsample=None, timeout=None) -> Futurefuture.job_ids: list[str]future.status() -> dictfuture.cancel_remote() -> None
Lifecycle
cancel_all() -> NoneContext manager (
with MerlinProcessor(...) as proc:)
Estimation
estimate_required_shots_per_input(layer, input, desired_samples_per_input) -> list[int]
History
get_job_history() -> list[RemoteJob]clear_job_history() -> None
Version Notes
sessionparameter added forISession-based backends (Scaleway). Exactly one ofprocessor,remote_processor, orsessionmust be provided.remote_processoris deprecated and remains supported for compatibility. A given session then uses its remote processor from thesession.build_remote_processor()method to define the available commands. Both remote paths support chunking andchunk_concurrency— each chunk gets an independentRemoteProcessorviasession.build_remote_processor().To see the available commands and backend name, call the
backend_capabilitiesattribute of theMerlinProcessor.If the probs command is available and nsample is None or 0, the processor will run a probabilities simulations. Otherwise, sampling simulations or quantum executions will be done.
MeasurementStrategy.probs(computation_space=...)replaces the olderno_bunchingflag and barecomputation_spaceparameter onQuantumLayer. BothComputationSpace.FOCK(bunched) andComputationSpace.UNBUNCHEDare fully supported for cloud execution.Default
chunk_concurrencyis 1 (serial intra-leaf).Failed chunks are retried up to 3 times with exponential backoff. Cancellation and timeout errors propagate immediately.
max_shots_per_callsets the maximum number of samples sent in asingle call. When
nsample > max_shots_per_call, the call is limited tomax_shots_per_callsamples, which prevents Perceval from silently clamping the requested sample count. The defaultmax_shots_per_callis 100 000.