MerlinProcessor User Guide
Overview
MerlinProcessor is a lightweight RPC-style bridge between your PyTorch
models and remote cloud QPU/simulator backends. It supports two backend paths:
Perceval ``RemoteProcessor`` — the original Quandela Cloud path.
Perceval ``ISession`` — the preferred path for Scaleway-hosted platforms (and any future session-based providers).
With either backend you can:
Offload quantum leaves (e.g.
QuantumLayer) to the cloud while keeping classical layers local.Submit batched inputs; when batches are large, Merlin will chunk them and (optionally) run chunks in parallel.
Drive execution synchronously (
forward) or asynchronously (forward_asyncreturning atorch.futures.Future).Monitor status, collect job IDs, cancel jobs, and enforce timeouts.
Estimate required shot counts per input ahead of time.
Merlin deliberately avoids hidden “auto-shots”: you control sampling. The optional estimator is provided to help you choose appropriate values.
Prerequisites
You need one of the following backends configured:
Option A — Quandela Cloud (RemoteProcessor)
perceval-quandelaconfigured with a valid cloud token (viapcvl.RemoteConfigcache or environment).A Perceval
RemoteProcessorinstance (e.g. a simulator like"sim:slos"or a QPU-backed platform).
import perceval as pcvl
from merlin.measurement.strategies import MeasurementStrategy
# Configure your Quandela Cloud token (one of the following):
RemoteConfig.set_token("YOUR_TOKEN") # option 1: global config
rp = pcvl.RemoteProcessor("sim:slos")
rp = pcvl.RemoteProcessor("sim:slos", "YOUR_TOKEN") # option 2: inline token
Option B — Scaleway (ISession)
The
perceval.providers.scalewaymodule installed.A Scaleway project ID and API secret key (typically set via
SCW_PROJECT_IDandSCW_SECRET_KEYenvironment variables).
Both paths require:
A Merlin quantum layer that provides
export_config()(e.g.merlin.algorithms.QuantumLayer).
Quick Start — Quandela Cloud
import perceval as pcvl
import torch
import torch.nn as nn
from merlin.algorithms import QuantumLayer
from merlin.builder.circuit_builder import CircuitBuilder
from merlin.core.computation_space import ComputationSpace
from merlin.core.merlin_processor import MerlinProcessor
from merlin.measurement.strategies import MeasurementStrategy
# 1) Create the Perceval RemoteProcessor (token must already be configured)
rp = pcvl.RemoteProcessor("sim:slos")
# 2) Wrap it with MerlinProcessor
proc = MerlinProcessor(
rp,
microbatch_size=32, # batch chunk size per cloud call
timeout=3600.0, # default wall-time per forward (seconds)
max_shots_per_call=None, # optional cap per cloud call (see below)
chunk_concurrency=1, # parallel chunk jobs within a quantum leaf
)
# 3) Build a QuantumLayer and a small model
b = CircuitBuilder(n_modes=6)
b.add_rotations(trainable=True, name="theta")
b.add_angle_encoding(modes=[0, 1], name="px")
b.add_entangling_layer()
q = QuantumLayer(
input_size=2,
builder=b,
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
model = nn.Sequential(
nn.Linear(3, 2, bias=False),
q,
nn.Linear(15, 4, bias=False), # 15 = C(6,2) unbunched outputs
nn.Softmax(dim=-1),
).eval()
# 4) Run remotely with sampling
X = torch.rand(8, 3)
y = proc.forward(model, X, nsample=5000)
print(y.shape) # (8, 4)
Quick Start — Scaleway Session
import perceval.providers.scaleway as scw
import torch
from merlin.algorithms import QuantumLayer
from merlin.builder.circuit_builder import CircuitBuilder
from merlin.core.computation_space import ComputationSpace
from merlin.core.merlin_processor import MerlinProcessor
from merlin.measurement.strategies import MeasurementStrategy
# 1) Open a Scaleway session (context manager handles cleanup)
with scw.Session(
"EMU-ASCELLA-6PQ", # platform name
project_id="YOUR_SCW_PROJECT_ID", # or read from env
token="YOUR_SCW_SECRET_KEY", # or read from env
deduplication_id="merlin-guide", # reuse session if still alive
max_idle_duration_s=300,
max_duration_s=600,
) as session:
# 2) Wrap the session with MerlinProcessor
proc = MerlinProcessor(
session=session,
microbatch_size=32,
timeout=300.0,
max_shots_per_call=5000,
)
# 3) Build a quantum layer
b = CircuitBuilder(n_modes=6)
b.add_rotations(trainable=True, name="theta")
b.add_angle_encoding(modes=[0, 1], name="px")
b.add_entangling_layer()
q = QuantumLayer(
input_size=2,
builder=b,
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
# 4) Run remotely
X = torch.rand(8, 2)
y = proc.forward(q, X, nsample=1000)
print(y.shape) # (8, 15)
Instantiation & Options
MerlinProcessor(
remote_processor=None, # RemoteProcessor — legacy path
session=None, # ISession — preferred path
microbatch_size=32,
timeout=3600.0,
max_shots_per_call=None,
chunk_concurrency=1,
)
Exactly one of remote_processor or session must be provided.
remote_processor (RemoteProcessor | None): Quandela Cloud backend. Merlin clones it internally per chunk so multiple jobs can run safely in parallel without altering your original instance.
session (ISession | None): A Perceval session object — e.g. from
perceval.providers.scaleway.Session. Merlin builds a freshRemoteProcessorfrom the session for each chunk, so chunking and concurrency work identically to theRemoteProcessorpath.microbatch_size (int): maximum number of input rows per cloud job. If your input batch
Bis larger, the batch is split into chunks of size<= microbatch_size. Applies to both theRemoteProcessorandISessionpaths.timeout (float): default wall-clock limit (in seconds) for each
forward/forward_asynccall. Use per-call override (see below).max_shots_per_call (int | None): cap for each cloud call’s
max_shots_per_callparameter on the PercevalSampler. IfNone, Merlin uses an internal default (10 000). If the requestednsamplefor a call exceeds this cap, Merlin automatically raises it to match so that Perceval does not silently clamp the sample count.chunk_concurrency (int): maximum number of chunks submitted in parallel per quantum leaf. Default
1(serial). Increase for higher throughput when the backend allows it.
Computation Spaces
The computation space controls which output Fock states are included in the
probability vector. It is specified via MeasurementStrategy:
from merlin.measurement.strategies import MeasurementStrategy
# UNBUNCHED — at most one photon per mode. Output dim = C(m, n).
MeasurementStrategy.probs(computation_space=ComputationSpace.UNBUNCHED)
# FOCK — arbitrary photon occupation (bunching allowed). Output dim = C(m + n − 1, n).
MeasurementStrategy.probs(computation_space=ComputationSpace.FOCK)
MerlinProcessor automatically detects the computation space of each quantum
leaf and arranges the returned probability tensor to match the state ordering
used by the local SLOS backend. This ensures that index i of the cloud result
maps to the same Fock state as index i of a local layer(X) call.
Execution API
Synchronous
y = proc.forward(layer_or_model, X, nsample=20000, timeout=15.0)
nsample (int | None): If the backend exposes
"probs"inremote_processor.available_commands, Merlin uses exact probabilities and ignoresnsample. Otherwise, Merlin uses sampling;nsamplecontrols the shots per input.timeout (float | None): overrides the constructor default for this call.
Noneor0means no time limit.
Asynchronous
fut = proc.forward_async(layer_or_model, X, nsample=3000, timeout=None)
# Helpers injected on the Future:
fut.job_ids # list[str]: job ids across all chunks/leaves
fut.status() # dict: {state, progress, message, chunks_*}
fut.cancel_remote() # request cancellation; .wait() -> CancelledError
y = fut.wait()
Cancellation:
fut.cancel_remote()signals the worker to cancel and issues remote job cancellation (best effort).fut.wait()then raisesconcurrent.futures.CancelledError.proc.cancel_all()cancels all active jobs across all futures.Context manager: Exiting a
with MerlinProcessor(...) as proc:block triggerscancel_all(), ensuring stray jobs are stopped.
Batching & Chunking
If
len(X) > microbatch_size, Merlin splits into chunks of size<= microbatch_sizeand submits up tochunk_concurrencychunk-jobs in parallel for that quantum leaf. This applies to both theRemoteProcessorandISessionpaths.The Future aggregates all job IDs across leaves in
future.job_ids. It also exposes chunk counters viafuture.status():{"state": "...", "progress": ..., "message": "...", "chunks_total": N, "chunks_done": k, "active_chunks": c}If a chunk fails, Merlin retries up to 3 times with exponential backoff. Cancellation and timeout errors are propagated immediately without retry.
Device & dtype round-trip
Inputs are moved to CPU for remote execution when needed, and the final tensor is returned on the original device and dtype of your input (e.g., preserve CUDA when possible for downstream ops).
Offload Policy & Local Overrides
By default, modules that provide
export_config()are treated as quantum leaves and offloaded.Set
layer.force_local = Trueto force local execution (useful for debugging and A/B comparisons).
Estimating Required Shots
Merlin includes a helper that proxies Perceval’s built-in estimator and does not submit jobs:
estimates = proc.estimate_required_shots_per_input(
layer=q,
input=X, # shape [B, D] or [D]
desired_samples_per_input=2_000,
)
# -> list[int] of length B (or 1 for a single vector).
# 0 means "not viable" under current platform/filters.
This is a planner only; it doesn’t modify processor state or job history.
Timeouts & Errors
Timeout: if a per-call or default timeout elapses, Merlin issues remote cancellation and raises
TimeoutError.Cancellation:
fut.cancel_remote()orproc.cancel_all()–> pending chunk workers raiseCancelledError; completed chunks are discarded for the call.Remote failures: if the backend marks a job as failed, Merlin raises a
RuntimeErrorwith the platform message. If the message indicates an explicit remote cancel, Merlin maps it toCancelledError.Retries: transient failures (non-cancel, non-timeout) trigger up to 3 automatic retries per chunk with exponential backoff.
Multiple Quantum Layers
Sequential models with multiple quantum leaves are supported:
Each quantum leaf is processed in order; each may chunk and run those chunks with its own intra-leaf concurrency (
chunk_concurrency).future.job_idswill include all job IDs across all leaves.
Workflow Recipes
Mixed classical → quantum → classical
Works with both computation spaces — just adjust the output dimension:
from math import comb
from merlin.measurement.strategies import MeasurementStrategy
# UNBUNCHED: output dim = C(m, n)
q = QuantumLayer(
input_size=2,
builder=b, # your CircuitBuilder with n_modes=6
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
dist = comb(6, 2) # 15
# Or FOCK (bunched): output dim = C(m + n - 1, n)
# q = QuantumLayer(
# input_size=2, builder=b, n_photons=2,
# measurement_strategy=MeasurementStrategy.probs(
# computation_space=ComputationSpace.FOCK,
# ),
# ).eval()
# dist = comb(6 + 2 - 1, 2) # 21
model = nn.Sequential(
nn.Linear(3, 2, bias=False),
q,
nn.Linear(dist, 4, bias=False),
nn.Softmax(dim=-1),
).eval()
proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos"))
X = torch.rand(6, 3)
y = proc.forward(model, X, nsample=5000)
Gradient-free fine-tuning with COBYLA
No autograd through the quantum layer — optimise circuit parameters directly using SciPy:
from scipy.optimize import minimize
from merlin.measurement.strategies import MeasurementStrategy
q = QuantumLayer(
input_size=2,
builder=b,
n_photons=2,
measurement_strategy=MeasurementStrategy.probs(
computation_space=ComputationSpace.UNBUNCHED,
),
).eval()
dist = q(torch.rand(2, 2)).shape[1]
readout = nn.Linear(dist, 1, bias=False).eval()
pre = nn.Linear(3, 2, bias=False).eval()
model = nn.Sequential(pre, q, readout).eval()
# Flatten quantum params we will tune (keep classical layers fixed)
q_params = [(n, p) for n, p in q.named_parameters() if p.requires_grad]
shapes = [p.shape for _, p in q_params]
sizes = [p.numel() for _, p in q_params]
def get_flat():
return torch.cat([p.detach().flatten().cpu() for _, p in q_params], dim=0)
def set_from_flat(vec):
off = 0
with torch.no_grad():
for (_, p), sz, shp in zip(q_params, sizes, shapes, strict=False):
chunk = vec[off : off + sz].view(shp).to(p.dtype)
p.data.copy_(chunk.to(p.device))
off += sz
x0 = get_flat().double().numpy()
proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos"))
X = torch.rand(8, 3)
# Objective: maximise mean scalar output → minimise negative
def objective(v_np):
v = torch.from_numpy(v_np).to(torch.float64)
set_from_flat(v.to(torch.float32))
with torch.no_grad():
y = proc.forward(model, X, nsample=5000)
return -float(y.mean().item())
res = minimize(objective, x0, method="COBYLA",
options={"maxiter": len(x0) + 6, "rhobeg": 0.5})
print("final objective:", res.fun)
Local vs remote A/B (force simulation)
q = QuantumLayer(...).eval()
X = torch.rand(4, q.input_size)
proc = MerlinProcessor(pcvl.RemoteProcessor("sim:slos"))
# Remote path (offloaded)
y_remote = proc.forward(q, X, nsample=5000)
# Local path (force simulation)
q.force_local = True
y_local = proc.forward(q, X, nsample=5000)
# Compare distributions (allowing some sampling noise)
print((y_local - y_remote).abs().mean())
Monitoring status & safe cancellation
fut = proc.forward_async(q, torch.rand(16, 2), nsample=40000, timeout=None)
# Poll status (state/progress/message + chunk counters)
print(fut.status())
# If needed, cancel cooperatively
fut.cancel_remote()
try:
_ = fut.wait()
except Exception as e:
print("Cancelled:", type(e).__name__)
High-throughput batching with chunking
proc = MerlinProcessor(
pcvl.RemoteProcessor("sim:slos"),
microbatch_size=8,
chunk_concurrency=2,
)
X = torch.rand(64, 2)
fut = proc.forward_async(q, X, nsample=3000)
Y = fut.wait()
print("chunks:", fut.status())
Scaleway session with context manager
import os
import perceval.providers.scaleway as scw
with scw.Session(
"sim:ascella",
project_id=os.environ["SCW_PROJECT_ID"],
token=os.environ["SCW_SECRET_KEY"],
deduplication_id="my-training-run",
max_idle_duration_s=300,
max_duration_s=1800,
) as session:
with MerlinProcessor(session=session, timeout=300.0) as proc:
q = QuantumLayer(...).eval()
y = proc.forward(q, X, nsample=1000)
# ...
# MerlinProcessor context manager cancels any stray jobs on exit.
# Scaleway session is closed on exit.
Troubleshooting
No job IDs appear: Your backend may be very fast, or your layer ran locally (e.g.,
force_local=True).“Lowered max_samples” warning from Perceval: This means
nsampleexceededmax_shots_per_call. Merlin now auto-raises the cap, but if you see this with an older version, setmax_shots_per_call>= yournsample.Timeouts in CI: Backends vary. Make tests resilient to fast or slow responses by polling
future.done()before asserting on timeout exceptions.
API Reference (Summary)
Constructor
MerlinProcessor(remote_processor=None, session=None, microbatch_size=32, timeout=3600.0, max_shots_per_call=None, chunk_concurrency=1)
Execution
forward(module, input, *, nsample=None, timeout=None) -> torch.Tensorforward_async(module, input, *, nsample=None, timeout=None) -> Futurefuture.job_ids: list[str]future.status() -> dictfuture.cancel_remote() -> None
Lifecycle
cancel_all() -> NoneContext manager (
with MerlinProcessor(...) as proc:)
Estimation
estimate_required_shots_per_input(layer, input, desired_samples_per_input) -> list[int]
History
get_job_history() -> list[RemoteJob]clear_job_history() -> None
Version Notes
sessionparameter added forISession-based backends (Scaleway). Exactly one ofremote_processororsessionmust be provided. Both paths now support chunking andchunk_concurrency— each chunk gets an independentRemoteProcessorviasession.build_remote_processor().MeasurementStrategy.probs(computation_space=...)replaces the olderno_bunchingflag and barecomputation_spaceparameter onQuantumLayer. BothComputationSpace.FOCK(bunched) andComputationSpace.UNBUNCHEDare fully supported for cloud execution.Default
chunk_concurrencyis 1 (serial intra-leaf).Failed chunks are retried up to 3 times with exponential backoff. Cancellation and timeout errors propagate immediately.
max_shots_per_callis automatically raised to matchnsamplewhen needed, preventing Perceval from silently clamping the sample count.