Quantum Self-Supervised Learning (QSSL)

Paper Information

Title: Quantum Self-Supervised Learning

Authors: B. Jaderberg, L. W. Anderson, W. Xie, S. Albanie, M. Kiffner, D. Jaksch

Published: Quantum Science and Technology, Volume 7, Number 3 (2022)

DOI: 10.1088/2058-9565/ac6825

Paper URL: arXiv:2103.14653

Reproduction Status: ✅ Complete

Reproducer: Cassandre Notton (cassandre.notton@quandela.com)

Project Repository

Run the QSSL model

Complete code reproduction of the QSSL (classical, qiskit and Merlin)

Reproduced papersqSSLQiskit

Abstract

This reproduction studies the qSSL framework proposed by Jaderberg et al., where a quantum representation layer is trained inside a self-supervised SimCLR-style pipeline. The model uses two augmented views, InfoNCE contrastive loss, and linear evaluation on frozen representations.

The MerLin reproduction keeps the same high-level design and compares three representation backends under a shared training loop: Qiskit gate-model QNN, MerLin/Perceval photonic QNN, and a classical MLP baseline.

Significance

The original paper highlights self-supervised learning as a promising regime for practical quantum advantage because of the representational capacity required by SSL objectives. This reproduction validates that claim in the MerLin ecosystem and extends it with a photonic implementation that is competitive or better than the compared baselines in short-training settings.

MerLin Implementation

The implementation follows the standard qSSL recipe:

ResNet18 encoder (non-pretrained)
Linear compression from 512 features to width
Representation block selected among merlin, qiskit, and classical
2-layer projection head with BatchNorm
InfoNCE training on two augmented views, followed by frozen-encoder linear probing

How `QuantumLayer` is used (MerLin backend)

In papers/qSSL/lib/model.py, QuantumLayer is used as the MerLin representation_network:

# __init__: build the MerLin representation block
self.circuit = create_quantum_circuit(modes=self.modes, feature_size=self.width)
input_state = [(i + 1) % 2 for i in range(args.modes)]

self.representation_network = QuantumLayer(
    input_size=self.width,
    circuit=self.circuit,
    trainable_parameters=[
        p.name for p in self.circuit.get_parameters()
        if not p.name.startswith("feature")
    ],
    input_parameters=["feature"],
    input_state=input_state,
    computation_space=ComputationSpace.UNBUNCHED,
    measurement_strategy=MeasurementStrategy.PROBABILITIES,
)

# forward: encoder -> quantum layer -> projection head
x1 = self.comp(self.backbone(y1))
x1 = torch.sigmoid(x1) * (1 / torch.pi)  # MerLin scaling
z1 = self.representation_network(x1)
z1 = self.proj(z1)

Minimal reading of this flow:

feature-* circuit parameters receive the compressed image features.
Non-feature parameters are trainable variational parameters.
QuantumLayer output (probability features) is the representation used by InfoNCE after the projection head.

Default training settings used in reproduction runs:

Batch size: 256
Optimizer: Adam (betas=(0.9, 0.999), lr=1e-3, weight_decay=1e-5)
Temperature: tau=0.07

qSSL model architecture — qSSL architecture used in the reproduction (shared encoder + selectable representation network + projection head).

Key Contributions Reproduced

Unified backend comparison

Reproduced qSSL with Qiskit, MerLin photonic, and classical representations under the same training/evaluation stack.
Preserved the CIFAR-10 restricted-label setup used in the original work.

Photonic qSSL implementation

Replaced the gate-model quantum layer with a photonic interferometer implementation in MerLin.
Evaluated multiple mode counts and both no_bunching settings.

End-to-end reproducibility

Produced SSL losses, linear-probe accuracies, checkpoints, and summary metrics for each run.
Added pretrained checkpoint support and linear-probing utilities.

Experimental Results

Note

The original paper reports additional batch-level diagnostics (recorded every 256-image batch) beyond the main loss curve: average Hilbert-Schmidt distance between positive and negative pairs, mean positive-pair clustering, mean negative-pair clustering, and ensemble inter-cluster overlap. These diagnostics are not yet included in this reproduction page and will be added in a future update.

Original paper headline results (5 CIFAR-10 classes)

Setting	Classical SSL	Quantum SSL (statevector)	Quantum SSL (sampling)
Simulation	43.49 ± 1.31	46.51 ± 1.37	46.34 ± 2.07 (100 shots)
IBM QPU (27 qubits)	.	.	47.00

Reproduced results

CIFAR10 (5 classes), linear probing accuracy
Epochs	Classes	Qiskit based	Classical SSL	Quantum SSL (`no_bunching=False`)	Quantum SSL (`no_bunching=True`)
2	5	48.37, #32, x0.08/x0.008	48.08, #144, x1/x1	8 modes: 49.22 (#184, x0.97/x0.95); 10 modes: 47.28 (#320, x0.89/x0.88); 12 modes: 46.46 (#488, x0.83/x0.65)	8 modes: 45.58 (#184, x0.97/x0.97); 10 modes: 45.58 (#320, x0.97/x0.93); 12 modes: 45.76 (#488, x0.94/x0.82)
5	5	47.88	49.04	8 modes: 49.9; 10 modes: 51.12; 12 modes: 50.64	8 modes: 49.3; 10 modes: 48.86; 12 modes: 51.74

Legend:

#...: number of parameters in the representation network
x.../...: forward/backward speed-up relative to the classical baseline

Additional 10-epoch benchmark results

CIFAR10 classes	Classical SSL	Quantum SSL (`no_bunching=False`)	Quantum SSL (`no_bunching=True`)
2	81.4	88.65 (= param); with batch norm: 89.7 (= param); 14 modes: 90.35; 12 modes: 89.95; 10 modes: 87.8	with batch norm: 14 modes: 50.6; 12 modes: 53.3; 10 modes: 51.7; 14 modes: 80.35; 12 modes: 86.7; 10 modes: 84.15
5	58.3 and 48.64	61.22 (= param); 14 modes: 56.78; 12 modes: 58.94; 10 modes: 61.56	14 modes: 64.14; 12 modes: 59.7; 10 modes: 62.44

Parameter counts (representation network)

Modes	Classical baseline	`no_bunching=False`	`no_bunching=True`
10	11,182,034	11,202,150 (diff 0.18%)	11,184,650 (diff 0.02%)
12	11,182,034	11,306,058 (diff 1.11%)	11,191,538 (diff 0.08%)
14	11,182,034	11,957,698 (diff 6.94%)	11,216,818 (diff 0.30%)

Training curves

SSL training losses for qSSL backends — SSL training losses over epochs.

Fine-tuning losses and accuracies for qSSL backends — Fine-tuning losses and linear-probe accuracies.

Citation

@article{jaderberg2022quantum,
  title={Quantum self-supervised learning},
  author={Jaderberg, Ben and Anderson, Lewis W and Xie, Weidi and Albanie, Samuel and Kiffner, Martin and Jaksch, Dieter},
  journal={Quantum Science \& Technology},
  volume={7},
  number={3},
  pages={035005},
  year={2022},
  publisher={IOP Publishing}
}