{ "cells": [ { "cell_type": "markdown", "source": [ "## Quantum-Enhanced Language Model Fine-Tuning with Merlin\n", "\n", "This notebook demonstrates how to fine-tune language models using quantum photonic circuits as classification heads. We compare classical approaches (Logistic Regression, SVM, MLP) with quantum photonic classifiers implemented using the Merlin framework.\n", "\n", "\n", " ## 1. Setup and Imports\n", "\n", " First, we'll import all necessary libraries and set up our environment. This includes:\n", " - PyTorch for neural network operations\n", " - SetFit for few-shot learning\n", " - Merlin for quantum photonic circuit simulation\n", " - Standard ML libraries for evaluation and data handling\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "import numpy as np\n", "from torch.utils.data import DataLoader, TensorDataset\n", "import torch\n", "import torch.nn as nn\n", "from tqdm import tqdm\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.base import BaseEstimator, ClassifierMixin\n", "import merlin as ML # Using our Merlin framework\n", "import math\n", "import json\n", "import os\n", "from torch.utils.data import DataLoader, TensorDataset\n", "from datasets import load_dataset\n", "from setfit import SetFitModel, sample_dataset\n", "from sklearn.svm import SVC\n", "\n", "# Set random seeds for reproducibility\n", "torch.manual_seed(42)\n", "np.random.seed(42)\n", "\n", "# Check GPU availability\n", "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "print(f\"Using device: {device}\")\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:27.885549Z", "start_time": "2025-06-09T15:50:26.092264Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using device: cpu\n" ] } ], "execution_count": 2 }, { "cell_type": "markdown", "source": [ "\n", " ## 2. Model Wrapper for Sentence Transformers\n", "\n", " The `ModelWrapper` class provides a unified interface for handling tokenization and forward passes with sentence transformer models. This abstraction allows us to work with different model architectures seamlessly.\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "class ModelWrapper(nn.Module):\n", " def __init__(self, model):\n", " super(ModelWrapper, self).__init__()\n", " self.model = model\n", "\n", " def tokenize(self, texts):\n", " \"\"\"\n", " Delegates tokenization to the underlying model.\n", "\n", " Args:\n", " texts (List[str]): List of text strings to tokenize\n", "\n", " Returns:\n", " Dict or Tensor: Tokenized inputs in the format expected by the model\n", " \"\"\"\n", " try:\n", " # Try to use the tokenize method of the underlying model\n", " return self.model.tokenize(texts)\n", " except AttributeError:\n", " # If the model doesn't have a tokenize method, try alternative approaches\n", " if hasattr(self.model, 'tokenizer'):\n", " return self.model.tokenizer(texts, return_tensors='pt', padding=True, truncation=True)\n", " elif hasattr(self.model, '_first_module') and hasattr(self.model._first_module, 'tokenizer'):\n", " return self.model._first_module.tokenizer(texts, return_tensors='pt', padding=True, truncation=True)\n", " else:\n", " raise ValueError(\n", " \"Unable to tokenize texts with this model. Please provide a model that has a tokenize or tokenizer method.\")\n", "\n", " def forward(self, inputs):\n", " \"\"\"\n", " Process inputs through the model to get embeddings.\n", "\n", " Args:\n", " inputs: Can be raw text strings or pre-tokenized inputs\n", "\n", " Returns:\n", " torch.Tensor: The sentence embeddings\n", " \"\"\"\n", " try:\n", " # Handle different input formats\n", " if isinstance(inputs, dict) and all(isinstance(v, torch.Tensor) for v in inputs.values()):\n", " outputs = self.model(inputs)\n", " elif isinstance(inputs, list) and all(isinstance(t, str) for t in inputs):\n", " tokenized = self.tokenize(inputs)\n", " device = next(self.model.parameters()).device\n", " tokenized = {k: v.to(device) for k, v in tokenized.items()}\n", " outputs = self.model(tokenized)\n", " else:\n", " outputs = self.model(inputs)\n", "\n", " # Extract embeddings from various output formats\n", " if isinstance(outputs, dict) and \"sentence_embedding\" in outputs:\n", " return outputs[\"sentence_embedding\"]\n", " elif isinstance(outputs, dict) and \"pooler_output\" in outputs:\n", " return outputs[\"pooler_output\"]\n", " elif isinstance(outputs, tuple) and len(outputs) > 0:\n", " return outputs[0]\n", " else:\n", " return outputs\n", " except Exception as e:\n", " raise ValueError(f\"Error during forward pass: {str(e)}\")\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:30.621509Z", "start_time": "2025-06-09T15:50:30.616178Z" } }, "outputs": [], "execution_count": 3 }, { "cell_type": "markdown", "source": [ "\n", " ## 3. Evaluation Function\n", "\n", " This function evaluates a SetFit model on given texts and labels, processing data in batches for efficiency.\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "def evaluate(model, texts, labels):\n", " \"\"\"\n", " Evaluate SetFit model on given texts and labels.\n", "\n", " Args:\n", " model: SetFit model with a trained classification head\n", " texts: List of text strings to classify\n", " labels: True labels for evaluation\n", "\n", " Returns:\n", " tuple: (accuracy, predictions)\n", " \"\"\"\n", " batch_size = 16\n", " num_samples = len(texts)\n", " num_batches = (num_samples + batch_size - 1) // batch_size\n", "\n", " all_embeddings = []\n", "\n", " with torch.no_grad():\n", " for batch_idx in range(num_batches):\n", " start_idx = batch_idx * batch_size\n", " end_idx = min(start_idx + batch_size, num_samples)\n", "\n", " batch_texts = texts[start_idx:end_idx]\n", "\n", " # Get embeddings\n", " batch_embeddings = model.model_body.encode(batch_texts, convert_to_tensor=True)\n", " batch_embeddings_cpu = batch_embeddings.detach().cpu().numpy()\n", "\n", " all_embeddings.extend(batch_embeddings_cpu)\n", "\n", " # Use the classification head to predict\n", " predictions = model.model_head.predict(np.array(all_embeddings))\n", " accuracy = accuracy_score(labels, predictions)\n", " return accuracy, predictions" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:32.270898Z", "start_time": "2025-06-09T15:50:32.267479Z" } }, "outputs": [], "execution_count": 4 }, { "cell_type": "markdown", "source": [ "\n", " ## 4. Classical Classification Heads\n", "\n", " ### 4.1 MLP Classifier\n", "\n", " We implement a 3-layer Multi-Layer Perceptron (MLP) as one of our classical baselines. The architecture includes:\n", " - Input layer matching the embedding dimension (768 for most transformers)\n", " - Two hidden layers with ReLU activation and dropout\n", " - Output layer for classification\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "class MLPClassifier(nn.Module):\n", " \"\"\"3-layer MLP classifier with dropout regularization\"\"\"\n", "\n", " def __init__(self, input_dim, hidden_dim=100, num_classes=2):\n", " super(MLPClassifier, self).__init__()\n", " self.layers = nn.Sequential(\n", " nn.Linear(input_dim, hidden_dim),\n", " nn.ReLU(),\n", " nn.Dropout(0.2),\n", " nn.Linear(hidden_dim, hidden_dim // 2),\n", " nn.ReLU(),\n", " nn.Dropout(0.2),\n", " nn.Linear(hidden_dim // 2, num_classes)\n", " )\n", "\n", " def forward(self, x):\n", " return self.layers(x)\n", "\n", "\n", "class MLPClassifierWrapper(BaseEstimator, ClassifierMixin):\n", " \"\"\"Scikit-learn compatible wrapper for the MLP classifier\"\"\"\n", "\n", " def __init__(self, input_dim=768, hidden_dim=100, num_classes=2,\n", " lr=0.001, epochs=100, batch_size=32, device=None):\n", " self.input_dim = input_dim\n", " self.hidden_dim = hidden_dim\n", " self.num_classes = num_classes\n", " self.lr = lr\n", " self.epochs = epochs\n", " self.batch_size = batch_size\n", " self.device = device if device else ('cuda' if torch.cuda.is_available() else 'cpu')\n", " self.model = None\n", " self.classes_ = None\n", "\n", " def fit(self, X, y):\n", " \"\"\"Train the MLP classifier\"\"\"\n", " # Convert numpy arrays to PyTorch tensors\n", " X = torch.tensor(X, dtype=torch.float32).to(self.device)\n", "\n", " # Store unique classes\n", " self.classes_ = np.unique(y)\n", " y_tensor = torch.tensor(y, dtype=torch.long).to(self.device)\n", "\n", " # Initialize the model\n", " self.model = MLPClassifier(\n", " input_dim=self.input_dim,\n", " hidden_dim=self.hidden_dim,\n", " num_classes=len(self.classes_)\n", " ).to(self.device)\n", "\n", " print(f\"Number of parameters in MLP head: {sum([p.numel() for p in self.model.parameters()])}\")\n", "\n", " # Define loss function and optimizer\n", " criterion = nn.CrossEntropyLoss()\n", " optimizer = torch.optim.Adam(self.model.parameters(), lr=self.lr)\n", "\n", " # Training loop\n", " self.model.train()\n", " for epoch in range(self.epochs):\n", " # Mini-batch training\n", " indices = torch.randperm(len(X))\n", " total_loss = 0\n", "\n", " for i in range(0, len(X), self.batch_size):\n", " batch_indices = indices[i:i + self.batch_size]\n", " batch_X = X[batch_indices]\n", " batch_y = y_tensor[batch_indices]\n", "\n", " # Forward pass\n", " outputs = self.model(batch_X)\n", " loss = criterion(outputs, batch_y)\n", "\n", " # Backward pass and optimize\n", " optimizer.zero_grad()\n", " loss.backward()\n", " optimizer.step()\n", "\n", " total_loss += loss.item()\n", "\n", " # Print progress\n", " if (epoch + 1) % 10 == 0:\n", " avg_loss = total_loss / (len(X) // self.batch_size + 1)\n", " print(f'Epoch [{epoch + 1}/{self.epochs}], Loss: {avg_loss:.4f}')\n", "\n", " return self\n", "\n", " def predict(self, X):\n", " \"\"\"Predict classes for samples\"\"\"\n", " X_tensor = torch.tensor(X, dtype=torch.float32).to(self.device)\n", "\n", " self.model.eval()\n", " with torch.no_grad():\n", " outputs = self.model(X_tensor)\n", " _, predicted = torch.max(outputs, 1)\n", " return self.classes_[predicted.cpu().numpy()]\n", "\n", " def predict_proba(self, X):\n", " \"\"\"Predict class probabilities\"\"\"\n", " X_tensor = torch.tensor(X, dtype=torch.float32).to(self.device)\n", "\n", " self.model.eval()\n", " with torch.no_grad():\n", " outputs = self.model(X_tensor)\n", " probabilities = torch.softmax(outputs, dim=1).cpu().numpy()\n", " return probabilities\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:38.502479Z", "start_time": "2025-06-09T15:50:38.495880Z" } }, "outputs": [], "execution_count": 5 }, { "cell_type": "markdown", "source": [ "\n", " ### 4.2 Helper Function to Replace SetFit Head\n", "\n", " This function allows us to easily swap the default classification head with our custom MLP.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "def replace_setfit_head_with_mlp(model, input_dim=768, hidden_dim=100, num_classes=2, epochs=100):\n", " \"\"\"Replace the classification head of a SetFitModel with an MLP.\"\"\"\n", " # Get the device the model is on\n", " device = next(model.model_body.parameters()).device\n", "\n", " # Create new MLP head\n", " mlp_head = MLPClassifierWrapper(\n", " input_dim=input_dim,\n", " hidden_dim=hidden_dim,\n", " num_classes=num_classes,\n", " epochs=epochs,\n", " lr=0.001,\n", " device=device\n", " )\n", "\n", " # Replace the model head\n", " model.model_head = mlp_head\n", "\n", " return model" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:40.469419Z", "start_time": "2025-06-09T15:50:40.467067Z" } }, "outputs": [], "execution_count": 6 }, { "cell_type": "markdown", "source": [ "\n", " ## 5. Quantum Classification Head\n", "\n", " ### 5.1 Quantum Photonic Classifier\n", "\n", " The quantum classifier uses a photonic interferometer implemented with the Merlin framework. The architecture consists of:\n", "\n", " 1. **Downscaling layer**: Reduces the embedding dimension to match quantum circuit requirements\n", " 2. **Quantum photonic circuit**: Processes the downscaled features through quantum interference\n", " 3. **Output layer**: Maps quantum measurements to class predictions\n", "\n", " The quantum circuit parameters:\n", " - **Modes**: Number of optical modes in the interferometer\n", " - **Photons**: Number of photons in the input state\n", " - **Input state**: Distribution of photons across modes\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "class QuantumClassifier(nn.Module):\n", " def __init__(self, input_dim, hidden_dim=100, modes=10, num_classes=2, input_state=None):\n", " super(QuantumClassifier, self).__init__()\n", "\n", " # This layer downscales the inputs to fit in the QLayer\n", " self.downscaling_layer = nn.Linear(input_dim, hidden_dim)\n", "\n", " # Building the QLayer with Merlin\n", " experiment = ML.PhotonicBackend(\n", " circuit_type=ML.CircuitType.SERIES,\n", " n_modes=modes,\n", " n_photons=sum(input_state) if input_state else modes // 2,\n", " state_pattern=ML.StatePattern.PERIODIC\n", " )\n", "\n", " # Default input state\n", " if input_state is None:\n", " input_state = [(i + 1) % 2 for i in range(modes)]\n", "\n", " photons_count = sum(input_state)\n", " # PNR (Photon Number Resolving) output size\n", " #output_size_slos = math.comb(modes + photons_count - 1, photons_count)\n", "\n", " # Create ansatz for the quantum layer\n", " ansatz = ML.AnsatzFactory.create(\n", " PhotonicBackend=experiment,\n", " input_size=hidden_dim,\n", " # output_size=output_size_slos,\n", " output_mapping_strategy=ML.OutputMappingStrategy.NONE\n", " )\n", "\n", " # Build the QLayer using Merlin\n", " self.q_circuit = ML.QuantumLayer(input_size=hidden_dim, ansatz=ansatz)\n", "\n", " # Linear output layer as in the original paper\n", " self.output_layer = nn.Linear(self.q_circuit.output_size, num_classes)\n", "\n", " def forward(self, x):\n", " # Forward pass through the quantum-classical hybrid\n", " x = self.downscaling_layer(x)\n", " x = torch.sigmoid(x) # Normalize for quantum layer\n", " x = self.q_circuit(x)\n", " return self.output_layer(x)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:41.880819Z", "start_time": "2025-06-09T15:50:41.877252Z" } }, "outputs": [], "execution_count": 7 }, { "cell_type": "markdown", "source": [ "\n", "\n", " ### 5.2 Quantum Layer Training Wrapper\n", "\n", " This wrapper provides scikit-learn compatible training for the quantum classifier, including proper initialization, training loops, and prediction methods.\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "class QLayerTraining(BaseEstimator, ClassifierMixin):\n", " def __init__(self, input_dim=768, hidden_dim=100, modes=10, num_classes=2,\n", " dropout_rate=0.2, lr=0.001, weight_decay=1e-5,\n", " epochs=100, batch_size=32, device=None, input_state=None):\n", " self.input_dim = input_dim\n", " self.hidden_dim = hidden_dim\n", " self.modes = modes\n", " self.input_state = input_state\n", " self.num_classes = num_classes\n", " self.dropout_rate = dropout_rate\n", " self.lr = lr\n", " self.weight_decay = weight_decay\n", " self.epochs = epochs\n", " self.batch_size = batch_size\n", " self.device = device if device else ('cuda' if torch.cuda.is_available() else 'cpu')\n", "\n", " # Initialize model\n", " self.model = None\n", " self.classes_ = None\n", " self.is_fitted_ = False\n", " # Training history\n", " self.history = {\n", " 'train_loss': [],\n", " 'val_loss': [],\n", " 'val_accuracy': []\n", " }\n", "\n", " def _initialize_model(self):\n", " \"\"\"Initialize or re-initialize the model.\"\"\"\n", " self.model = QuantumClassifier(\n", " input_dim=self.input_dim,\n", " hidden_dim=self.hidden_dim,\n", " modes=self.modes,\n", " num_classes=len(self.classes_),\n", " input_state=self.input_state,\n", " ).to(self.device)\n", "\n", " print(f\"Number of parameters in Quantum head: {sum([p.numel() for p in self.model.parameters()])}\")\n", "\n", " def _train_epoch(self, train_loader, criterion, optimizer):\n", " \"\"\"Train for one epoch.\"\"\"\n", " self.model.train()\n", " epoch_loss = 0\n", " for X_batch, y_batch in train_loader:\n", " X_batch, y_batch = X_batch.to(self.device), y_batch.to(self.device)\n", "\n", " # Forward pass\n", " outputs = self.model(X_batch)\n", " loss = criterion(outputs, y_batch)\n", "\n", " # Backward pass and optimizer step\n", " optimizer.zero_grad()\n", " loss.backward()\n", " optimizer.step()\n", "\n", " epoch_loss += loss.item()\n", "\n", " return epoch_loss / len(train_loader)\n", "\n", " def fit(self, X, y):\n", " \"\"\"Train the QLayer with a manual training loop.\"\"\"\n", " # Store classes\n", " self.classes_ = np.unique(y)\n", "\n", " # Initialize model\n", " self._initialize_model()\n", "\n", " # Convert to PyTorch tensors\n", " X_tensor = torch.tensor(X, dtype=torch.float32)\n", " y_tensor = torch.tensor(y, dtype=torch.long)\n", " train_dataset = TensorDataset(X_tensor, y_tensor)\n", " train_loader = DataLoader(train_dataset, batch_size=self.batch_size, shuffle=True)\n", "\n", " # Loss function and optimizer\n", " criterion = nn.CrossEntropyLoss()\n", " optimizer = torch.optim.Adam(\n", " self.model.parameters(),\n", " lr=self.lr,\n", " weight_decay=self.weight_decay\n", " )\n", "\n", " # Training loop\n", " for epoch in range(self.epochs):\n", " # Train for one epoch\n", " train_loss = self._train_epoch(train_loader, criterion, optimizer)\n", " self.history['train_loss'].append(train_loss)\n", "\n", " if (epoch + 1) % 50 == 0:\n", " print(f'Epoch {epoch + 1}/{self.epochs}, Train Loss: {train_loss:.4f}')\n", "\n", " self.is_fitted_ = True\n", " return self\n", "\n", " def predict(self, X):\n", " \"\"\"Predict class labels for samples in X.\"\"\"\n", " self._check_is_fitted()\n", " X_tensor = torch.tensor(X, dtype=torch.float32).to(self.device)\n", "\n", " self.model.eval()\n", " with torch.no_grad():\n", " outputs = self.model(X_tensor)\n", " _, predicted = torch.max(outputs, 1)\n", "\n", " return self.classes_[predicted.cpu().numpy()]\n", "\n", " def predict_proba(self, X):\n", " \"\"\"Predict class probabilities for samples in X.\"\"\"\n", " self._check_is_fitted()\n", " X_tensor = torch.tensor(X, dtype=torch.float32).to(self.device)\n", "\n", " self.model.eval()\n", " with torch.no_grad():\n", " outputs = self.model(X_tensor)\n", " probabilities = torch.softmax(outputs, dim=1).cpu().numpy()\n", "\n", " return probabilities\n", "\n", " def _check_is_fitted(self):\n", " \"\"\"Check if model is fitted.\"\"\"\n", " if not self.is_fitted_ or self.model is None:\n", " raise ValueError(\"This model has not been fitted yet. Call 'fit' before using this method.\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:50:43.118650Z", "start_time": "2025-06-09T15:50:43.113324Z" } }, "outputs": [], "execution_count": 8 }, { "cell_type": "markdown", "source": [ "\n", " ### 5.3 Helper Function for Quantum SetFit Models\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "def create_setfit_with_q_layer(model, input_dim=768, hidden_dim=100, modes=10,\n", " num_classes=2, epochs=100, input_state=None):\n", " \"\"\"\n", " Replace the classification head of a SetFit model with a quantum layer.\n", "\n", " Args:\n", " model: SetFit model to modify\n", " input_dim: Dimension of input embeddings\n", " hidden_dim: Dimension after downscaling\n", " modes: Number of modes in the quantum circuit\n", " num_classes: Number of output classes\n", " epochs: Training epochs for the quantum head\n", " input_state: Photon distribution across modes\n", "\n", " Returns:\n", " Modified SetFit model with quantum classification head\n", " \"\"\"\n", " # Replace model head with QLayer\n", " model.model_head = QLayerTraining(\n", " input_dim=input_dim,\n", " hidden_dim=hidden_dim,\n", " modes=modes,\n", " num_classes=num_classes,\n", " epochs=epochs,\n", " input_state=input_state\n", " )\n", "\n", " return model" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:51:01.989410Z", "start_time": "2025-06-09T15:51:01.985275Z" } }, "outputs": [], "execution_count": 10 }, { "cell_type": "markdown", "source": [ "\n", " ## 6. Utility Functions\n", "\n", " ### 6.1 Results Storage\n", "\n", " Function to save experimental results in JSON format for later analysis.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "def save_experiment_results(results, filename='ft-qllm_exp.json'):\n", " \"\"\"\n", " Append experiment results to a JSON file.\n", "\n", " Args:\n", " results (dict): Dictionary containing experiment results\n", " filename (str): Path to the JSON file to store results\n", " \"\"\"\n", " filename = os.path.join(\"./results\", filename)\n", "\n", " # Create results directory if it doesn't exist\n", " os.makedirs(\"./results\", exist_ok=True)\n", "\n", " # Check if file exists and load existing data\n", " if os.path.exists(filename):\n", " try:\n", " with open(filename, 'r') as file:\n", " all_results = json.load(file)\n", " except json.JSONDecodeError:\n", " all_results = []\n", " else:\n", " all_results = []\n", "\n", " # Append new results\n", " all_results.append(results)\n", "\n", " # Write updated data back to file\n", " with open(filename, 'w') as file:\n", " json.dump(all_results, file, indent=4)\n", "\n", " print(f\"Results saved. Total experiments: {len(all_results)}\")\n", " return len(all_results)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T15:51:03.391059Z", "start_time": "2025-06-09T15:51:03.388003Z" } }, "outputs": [], "execution_count": 11 }, { "cell_type": "markdown", "source": [ "\n", " ### 6.2 Contrastive Loss Implementation\n", "\n", " Simplified supervised contrastive loss for fine-tuning the sentence transformer body. In production, you would implement the full contrastive loss formula.\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "class SupConLoss(nn.Module):\n", " \"\"\"Supervised Contrastive Learning: https://arxiv.org/pdf/2004.11362.pdf.\"\"\"\n", " def __init__(self, model, temperature=0.07, contrast_mode=\"all\", base_temperature=0.07):\n", " super(SupConLoss, self).__init__()\n", " self.model = model\n", " self.temperature = temperature\n", " self.contrast_mode = contrast_mode\n", " self.base_temperature = base_temperature\n", "\n", " def forward(self, sentence_features, labels=None, mask=None):\n", " \"\"\"Computes loss for model.\"\"\"\n", " # Au lieu d'utiliser encode() qui peut détacher le graphe de calcul,\n", " # utilisons directement le modèle pour générer les embeddings\n", " # Tokenize the inputs\n", " tokenized_inputs = self.model.tokenize(sentence_features[0])\n", "\n", " # Si le modèle est sur un device particulier, déplacer les inputs sur ce device\n", " device = next(self.model.parameters()).device\n", " tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}\n", "\n", " # Forward pass avec le modèle\n", " outputs = self.model(tokenized_inputs)\n", "\n", " # Récupérer les embeddings\n", " if isinstance(outputs, dict) and \"sentence_embedding\" in outputs:\n", " features = outputs[\"sentence_embedding\"]\n", " else:\n", " # Si le modèle renvoie un format différent, adaptez ici\n", " features = outputs # Ou une autre méthode pour extraire les embeddings\n", "\n", " # Normalize embeddings\n", " features = torch.nn.functional.normalize(features, p=2, dim=1)\n", " # Add n_views dimension\n", " features = torch.unsqueeze(features, 1)\n", " device = features.device\n", "\n", " # Le reste du code reste inchangé\n", " if len(features.shape) < 3:\n", " raise ValueError(\"`features` needs to be [bsz, n_views, ...],\" \"at least 3 dimensions are required\")\n", " if len(features.shape) > 3:\n", " features = features.view(features.shape[0], features.shape[1], -1)\n", "\n", " batch_size = features.shape[0]\n", " if labels is not None and mask is not None:\n", " raise ValueError(\"Cannot define both `labels` and `mask`\")\n", " elif labels is None and mask is None:\n", " mask = torch.eye(batch_size, dtype=torch.float32).to(device)\n", " elif labels is not None:\n", " labels = labels.contiguous().view(-1, 1)\n", " if labels.shape[0] != batch_size:\n", " raise ValueError(\"Num of labels does not match num of features\")\n", " mask = torch.eq(labels, labels.T).float().to(device)\n", " else:\n", " mask = mask.float().to(device)\n", "\n", " contrast_count = features.shape[1]\n", " contrast_feature = torch.cat(torch.unbind(features, dim=1), dim=0)\n", " if self.contrast_mode == \"one\":\n", " anchor_feature = features[:, 0]\n", " anchor_count = 1\n", " elif self.contrast_mode == \"all\":\n", " anchor_feature = contrast_feature\n", " anchor_count = contrast_count\n", " else:\n", " raise ValueError(\"Unknown mode: {}\".format(self.contrast_mode))\n", "\n", " # Compute logits\n", " anchor_dot_contrast = torch.div(torch.matmul(anchor_feature, contrast_feature.T), self.temperature)\n", " # For numerical stability\n", " logits_max, _ = torch.max(anchor_dot_contrast, dim=1, keepdim=True)\n", " logits = anchor_dot_contrast - logits_max.detach()\n", "\n", " # Tile mask\n", " mask = mask.repeat(anchor_count, contrast_count)\n", " # Mask-out self-contrast cases\n", " logits_mask = torch.scatter(\n", " torch.ones_like(mask),\n", " 1,\n", " torch.arange(batch_size * anchor_count).view(-1, 1).to(device),\n", " 0,\n", " )\n", " mask = mask * logits_mask\n", "\n", " # Compute log_prob\n", " exp_logits = torch.exp(logits) * logits_mask\n", " log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))\n", "\n", " # Compute mean of log-likelihood over positive\n", " mean_log_prob_pos = (mask * log_prob).sum(1) / mask.sum(1)\n", "\n", " # Loss\n", " loss = -(self.temperature / self.base_temperature) * mean_log_prob_pos\n", " loss = loss.view(anchor_count, batch_size).mean()\n", "\n", " return loss\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:08:22.838292Z", "start_time": "2025-06-09T16:08:22.830766Z" } }, "outputs": [], "execution_count": 23 }, { "cell_type": "markdown", "source": [ "\n", " ## 7. Main Training Pipeline\n", "\n", " Now we'll set up the complete training pipeline that:\n", " 1. Loads the SST-2 sentiment analysis dataset\n", " 2. Fine-tunes the sentence transformer with contrastive learning\n", " 3. Trains multiple classification heads (classical and quantum)\n", " 4. Evaluates and compares all approaches\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "SAMPLES_PER_CLASS = 8 # Few-shot setting\n", "BODY_EPOCHS = 20 # Epochs for sentence transformer fine-tuning\n", "HEAD_EPOCHS = 200 # Epochs for classification head training\n", "LEARNING_RATE = 1e-5 # Learning rate for body fine-tuning\n", "BATCH_SIZE = 16 # Batch size for evaluation\n", "\n", "print(f\"Configuration:\")\n", "print(f\"- Samples per class: {SAMPLES_PER_CLASS}\")\n", "print(f\"- Body training epochs: {BODY_EPOCHS}\")\n", "print(f\"- Head training epochs: {HEAD_EPOCHS}\")\n", "print(f\"- Learning rate: {LEARNING_RATE}\")\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:08:25.141288Z", "start_time": "2025-06-09T16:08:25.138320Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Configuration:\n", "- Samples per class: 8\n", "- Body training epochs: 20\n", "- Head training epochs: 200\n", "- Learning rate: 1e-05\n" ] } ], "execution_count": 24 }, { "cell_type": "markdown", "source": [ "\n", " ### 7.1 Load Dataset\n", "\n", " We use the Stanford Sentiment Treebank (SST-2) dataset for binary sentiment classification. In the few-shot setting, we sample only a small number of examples per class for training.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(f\"\\nLoading dataset with {SAMPLES_PER_CLASS} samples per class...\")\n", "dataset = load_dataset(\"sst2\")\n", "\n", "# Simulate few-shot regime by sampling examples per class\n", "train_dataset = sample_dataset(dataset[\"train\"], label_column=\"label\", num_samples=SAMPLES_PER_CLASS)\n", "eval_dataset = dataset[\"validation\"].select(range(250))\n", "test_dataset = dataset[\"validation\"].select(range(250, len(dataset[\"validation\"])))\n", "\n", "# Extract texts and labels\n", "texts = [example[\"sentence\"] for example in train_dataset]\n", "features = [texts]\n", "labels = torch.tensor([example[\"label\"] for example in train_dataset])\n", "\n", "print(f\"Dataset sizes:\")\n", "print(f\"- Training: {len(train_dataset)} samples\")\n", "print(f\"- Validation: {len(eval_dataset)} samples\")\n", "print(f\"- Test: {len(test_dataset)} samples\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:08:28.146285Z", "start_time": "2025-06-09T16:08:26.192309Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Loading dataset with 8 samples per class...\n", "Dataset sizes:\n", "- Training: 16 samples\n", "- Validation: 250 samples\n", "- Test: 622 samples\n" ] } ], "execution_count": 25 }, { "cell_type": "markdown", "source": [ "\n", " ### 7.2 Initialize Base Model\n", "\n", " We use a pre-trained sentence transformer as our base model. The SetFit framework provides an efficient way to perform few-shot learning.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\nLoading pre-trained model...\")\n", "model = SetFitModel.from_pretrained(\"sentence-transformers/paraphrase-mpnet-base-v2\")\n", "sentence_transformer = model.model_body\n", "classification_head = model.model_head\n", "\n", "print(f\"Model loaded: {type(sentence_transformer).__name__}\")\n", "print(f\"Embedding dimension: 768\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:08:31.416926Z", "start_time": "2025-06-09T16:08:29.635244Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Loading pre-trained model...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Model loaded: SentenceTransformer\n", "Embedding dimension: 768\n" ] } ], "execution_count": 26 }, { "cell_type": "markdown", "source": [ "\n", " ### 7.3 Fine-tune Sentence Transformer Body\n", "\n", " We fine-tune the sentence transformer using contrastive learning to better adapt it to our specific task.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\nTraining model body with contrastive learning...\")\n", "model_wrapped = ModelWrapper(sentence_transformer)\n", "criterion = SupConLoss(model=model_wrapped)\n", "\n", "# Enable gradients for fine-tuning\n", "for param in sentence_transformer.parameters():\n", " param.requires_grad = True\n", "\n", "optimizer = torch.optim.Adam(model_wrapped.parameters(), lr=LEARNING_RATE)\n", "model_wrapped.train()\n", "\n", "# Training loop\n", "for iteration in tqdm(range(BODY_EPOCHS), desc=\"Contrastive Learning\"):\n", " optimizer.zero_grad()\n", " loss = criterion(features, labels)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " if (iteration + 1) % 5 == 0:\n", " print(f\"Iteration {iteration + 1}/{BODY_EPOCHS}, Loss: {loss.item():.6f}\")\n", "\n", "print(\"Model body fine-tuning completed!\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:08:38.008810Z", "start_time": "2025-06-09T16:08:31.900320Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Training model body with contrastive learning...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Contrastive Learning: 30%|███ | 6/20 [00:03<00:04, 3.15it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iteration 5/20, Loss: 2.355021\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Contrastive Learning: 55%|█████▌ | 11/20 [00:04<00:01, 4.93it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iteration 10/20, Loss: 2.112763\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Contrastive Learning: 80%|████████ | 16/20 [00:05<00:00, 5.45it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iteration 15/20, Loss: 2.109605\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Contrastive Learning: 100%|██████████| 20/20 [00:06<00:00, 3.28it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Iteration 20/20, Loss: 2.044540\n", "Model body fine-tuning completed!\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "execution_count": 27 }, { "cell_type": "markdown", "source": [ "\n", "\n", " ### 7.4 Generate Embeddings\n", "\n", " After fine-tuning, we generate embeddings for all training samples. These embeddings will be used to train the various classification heads.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\nGenerating embeddings for training data...\")\n", "sentence_transformer.eval()\n", "train_embeddings = []\n", "train_labels = []\n", "\n", "with torch.no_grad():\n", " num_batches = (len(train_dataset[\"sentence\"]) + BATCH_SIZE - 1) // BATCH_SIZE\n", "\n", " for batch_idx in tqdm(range(num_batches), desc=\"Encoding\"):\n", " start_idx = batch_idx * BATCH_SIZE\n", " end_idx = min(start_idx + BATCH_SIZE, len(train_dataset[\"sentence\"]))\n", "\n", " batch_texts = train_dataset[\"sentence\"][start_idx:end_idx]\n", " batch_labels = train_dataset[\"label\"][start_idx:end_idx]\n", "\n", " batch_embeddings = sentence_transformer.encode(batch_texts, convert_to_tensor=True)\n", " batch_embeddings_cpu = batch_embeddings.detach().cpu().numpy()\n", "\n", " for emb, lbl in zip(batch_embeddings_cpu, batch_labels):\n", " train_embeddings.append(emb)\n", " train_labels.append(lbl)\n", "\n", "train_embeddings = np.array(train_embeddings)\n", "train_labels = np.array(train_labels)\n", "\n", "print(f\"Embeddings shape: {train_embeddings.shape}\")\n", "print(f\"Labels shape: {train_labels.shape}\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:09:02.081794Z", "start_time": "2025-06-09T16:09:01.700677Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Generating embeddings for training data...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Encoding: 100%|██████████| 1/1 [00:00<00:00, 2.81it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Embeddings shape: (16, 768)\n", "Labels shape: (16,)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "execution_count": 28 }, { "cell_type": "markdown", "source": [ "\n", " ## 8. Train and Evaluate Classification Heads\n", "\n", " Now we'll train different classification heads and compare their performance:\n", " 1. **Logistic Regression**: Simple linear classifier (baseline)\n", " 2. **SVM**: Support Vector Machine with linear kernel\n", " 3. **MLP**: Multi-layer perceptron\n", " 4. **Quantum Layers**: Multiple configurations with different numbers of modes and photons" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "num_classes = len(set(train_dataset[\"label\"]))\n", "results = {\n", " \"training_samples\": SAMPLES_PER_CLASS,\n", " \"epochs\": BODY_EPOCHS,\n", " \"lr\": LEARNING_RATE\n", "}\n", "\n", "print(f\"\\nTraining classification heads for {num_classes}-class classification...\")\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:09:08.255523Z", "start_time": "2025-06-09T16:09:08.252854Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Training classification heads for 2-class classification...\n" ] } ], "execution_count": 29 }, { "cell_type": "markdown", "source": [ "\n", " ### 8.1 Logistic Regression Head\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\n1. Training Logistic Regression head...\")\n", "# Reset to default logistic regression head\n", "model = SetFitModel.from_pretrained(\"sentence-transformers/paraphrase-mpnet-base-v2\")\n", "model.model_body = sentence_transformer # Use our fine-tuned body\n", "\n", "# Train\n", "model.model_head.fit(train_embeddings, train_labels)\n", "\n", "# Evaluate\n", "lg_val_accuracy, _ = evaluate(model, eval_dataset[\"sentence\"], eval_dataset[\"label\"])\n", "lg_test_accuracy, _ = evaluate(model, test_dataset[\"sentence\"], test_dataset[\"label\"])\n", "\n", "print(f\"Logistic Regression - Val: {lg_val_accuracy:.4f}, Test: {lg_test_accuracy:.4f}\")\n", "results[\"LogisticRegression\"] = [lg_val_accuracy, lg_test_accuracy]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:09:19.043745Z", "start_time": "2025-06-09T16:09:10.638256Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "1. Training Logistic Regression head...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:200: RuntimeWarning: divide by zero encountered in matmul\n", " raw_prediction = X @ weights + intercept\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:200: RuntimeWarning: overflow encountered in matmul\n", " raw_prediction = X @ weights + intercept\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:200: RuntimeWarning: invalid value encountered in matmul\n", " raw_prediction = X @ weights + intercept\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:330: RuntimeWarning: divide by zero encountered in matmul\n", " grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:330: RuntimeWarning: overflow encountered in matmul\n", " grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/linear_model/_linear_loss.py:330: RuntimeWarning: invalid value encountered in matmul\n", " grad[:n_features] = X.T @ grad_pointwise + l2_reg_strength * weights\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: divide by zero encountered in matmul\n", " ret = a @ b\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: overflow encountered in matmul\n", " ret = a @ b\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: invalid value encountered in matmul\n", " ret = a @ b\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Logistic Regression - Val: 0.8000, Test: 0.7669\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: divide by zero encountered in matmul\n", " ret = a @ b\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: overflow encountered in matmul\n", " ret = a @ b\n", "/Users/cassandrenotton/Documents/projects/CoreDev/venv-merlin-quandela/lib/python3.12/site-packages/sklearn/utils/extmath.py:203: RuntimeWarning: invalid value encountered in matmul\n", " ret = a @ b\n" ] } ], "execution_count": 30 }, { "cell_type": "markdown", "source": [ "\n", " ### 8.2 SVM Head\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\n2. Training SVM head...\")\n", "# Replace head with SVM\n", "model.model_head = SVC(C=1.0, kernel='linear', gamma='scale', probability=True)\n", "model.model_head.fit(train_embeddings, train_labels)\n", "\n", "# Evaluate\n", "svc_val_accuracy, _ = evaluate(model, eval_dataset[\"sentence\"], eval_dataset[\"label\"])\n", "svc_test_accuracy, _ = evaluate(model, test_dataset[\"sentence\"], test_dataset[\"label\"])\n", "\n", "print(f\"SVM - Val: {svc_val_accuracy:.4f}, Test: {svc_test_accuracy:.4f}\")\n", "results[\"SVC\"] = [svc_val_accuracy, svc_test_accuracy]" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:12:15.636156Z", "start_time": "2025-06-09T16:12:11.346850Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "2. Training SVM head...\n", "SVM - Val: 0.8080, Test: 0.7846\n" ] } ], "execution_count": 31 }, { "cell_type": "markdown", "source": [ " ### 8.3 MLP Head" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\n3. Training MLP head...\")\n", "# Replace head with MLP\n", "model = replace_setfit_head_with_mlp(\n", " model,\n", " input_dim=768,\n", " hidden_dim=100,\n", " num_classes=num_classes,\n", " epochs=HEAD_EPOCHS\n", ")\n", "model.model_head.fit(train_embeddings, train_labels)\n", "\n", "# Evaluate\n", "mlp_val_accuracy, _ = evaluate(model, eval_dataset[\"sentence\"], eval_dataset[\"label\"])\n", "mlp_test_accuracy, _ = evaluate(model, test_dataset[\"sentence\"], test_dataset[\"label\"])\n", "\n", "print(f\"MLP - Val: {mlp_val_accuracy:.4f}, Test: {mlp_test_accuracy:.4f}\")\n", "results[\"MLP\"] = [mlp_val_accuracy, mlp_test_accuracy]\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:12:32.002902Z", "start_time": "2025-06-09T16:12:26.110869Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "3. Training MLP head...\n", "Number of parameters in MLP head: 82052\n", "Epoch [10/200], Loss: 0.4457\n", "Epoch [20/200], Loss: 0.1034\n", "Epoch [30/200], Loss: 0.0177\n", "Epoch [40/200], Loss: 0.0014\n", "Epoch [50/200], Loss: 0.0009\n", "Epoch [60/200], Loss: 0.0002\n", "Epoch [70/200], Loss: 0.0006\n", "Epoch [80/200], Loss: 0.0002\n", "Epoch [90/200], Loss: 0.0001\n", "Epoch [100/200], Loss: 0.0001\n", "Epoch [110/200], Loss: 0.0004\n", "Epoch [120/200], Loss: 0.0002\n", "Epoch [130/200], Loss: 0.0001\n", "Epoch [140/200], Loss: 0.0001\n", "Epoch [150/200], Loss: 0.0002\n", "Epoch [160/200], Loss: 0.0001\n", "Epoch [170/200], Loss: 0.0001\n", "Epoch [180/200], Loss: 0.0001\n", "Epoch [190/200], Loss: 0.0008\n", "Epoch [200/200], Loss: 0.0001\n", "MLP - Val: 0.8040, Test: 0.7701\n" ] } ], "execution_count": 32 }, { "cell_type": "markdown", "source": [ "\n", "\n", " ### 8.4 Quantum Layer Heads\n", "\n", " We test multiple quantum configurations with varying numbers of modes and photons. Each configuration represents a different quantum circuit complexity and expressivity.\n", "\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "print(\"\\n4. Training Quantum Layer heads...\")\n", "modes_to_test = [ 2, 4, 6, 8]\n", "quantum_results = {}\n", "\n", "for mode in modes_to_test:\n", " photon_max = int(mode // 2)\n", "\n", " for k in range(1, photon_max + 1):\n", " # Create input state with k photons\n", " input_state = [0] * mode\n", " for p in range(k):\n", " input_state[2 * p] = 1\n", "\n", " print(f\"\\n Training Quantum Head: {mode} modes, {k} photons\")\n", " print(f\" Input state: {input_state}\")\n", "\n", " # Create quantum model\n", " model = create_setfit_with_q_layer(\n", " model,\n", " input_dim=768,\n", " hidden_dim=100,\n", " modes=mode,\n", " num_classes=num_classes,\n", " epochs=HEAD_EPOCHS,\n", " input_state=input_state\n", " )\n", "\n", " # Train the quantum head\n", " model.model_head.fit(train_embeddings, train_labels)\n", "\n", " # Evaluate\n", " q_val_predictions = model.model_head.predict(\n", " sentence_transformer.encode(eval_dataset[\"sentence\"], convert_to_tensor=True).cpu().numpy()\n", " )\n", " q_val_accuracy = accuracy_score(eval_dataset[\"label\"], q_val_predictions)\n", "\n", " q_test_predictions = model.model_head.predict(\n", " sentence_transformer.encode(test_dataset[\"sentence\"], convert_to_tensor=True).cpu().numpy()\n", " )\n", " q_test_accuracy = accuracy_score(test_dataset[\"label\"], q_test_predictions)\n", "\n", " print(f\" Quantum {mode}-{k} - Val: {q_val_accuracy:.4f}, Test: {q_test_accuracy:.4f}\")\n", " quantum_results[f\"{mode}-qlayer-{k}\"] = [q_val_accuracy, q_test_accuracy]\n", "\n", "results[\"Qlayer\"] = quantum_results" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:15:03.048557Z", "start_time": "2025-06-09T16:14:24.657523Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "4. Training Quantum Layer heads...\n", "\n", " Training Quantum Head: 2 modes, 1 photons\n", " Input state: [1, 0]\n", "Number of parameters in Quantum head: 76914\n", "Epoch 50/200, Train Loss: 0.6942\n", "Epoch 100/200, Train Loss: 0.6181\n", "Epoch 150/200, Train Loss: 0.5496\n", "Epoch 200/200, Train Loss: 0.4893\n", " Quantum 2-1 - Val: 0.5800, Test: 0.5691\n", "\n", " Training Quantum Head: 4 modes, 1 photons\n", " Input state: [1, 0, 0, 0]\n", "Number of parameters in Quantum head: 76942\n", "Epoch 50/200, Train Loss: 0.6326\n", "Epoch 100/200, Train Loss: 0.5825\n", "Epoch 150/200, Train Loss: 0.5257\n", "Epoch 200/200, Train Loss: 0.4681\n", " Quantum 4-1 - Val: 0.8120, Test: 0.7990\n", "\n", " Training Quantum Head: 4 modes, 2 photons\n", " Input state: [1, 0, 1, 0]\n", "Number of parameters in Quantum head: 76946\n", "Epoch 50/200, Train Loss: 0.6133\n", "Epoch 100/200, Train Loss: 0.5335\n", "Epoch 150/200, Train Loss: 0.4743\n", "Epoch 200/200, Train Loss: 0.4264\n", " Quantum 4-2 - Val: 0.7320, Test: 0.6961\n", "\n", " Training Quantum Head: 6 modes, 1 photons\n", " Input state: [1, 0, 0, 0, 0, 0]\n", "Number of parameters in Quantum head: 76986\n", "Epoch 50/200, Train Loss: 0.5370\n", "Epoch 100/200, Train Loss: 0.4549\n", "Epoch 150/200, Train Loss: 0.3864\n", "Epoch 200/200, Train Loss: 0.3370\n", " Quantum 6-1 - Val: 0.8360, Test: 0.8280\n", "\n", " Training Quantum Head: 6 modes, 2 photons\n", " Input state: [1, 0, 1, 0, 0, 0]\n", "Number of parameters in Quantum head: 77004\n", "Epoch 50/200, Train Loss: 0.6424\n", "Epoch 100/200, Train Loss: 0.5845\n", "Epoch 150/200, Train Loss: 0.5213\n", "Epoch 200/200, Train Loss: 0.4624\n", " Quantum 6-2 - Val: 0.8160, Test: 0.7781\n", "\n", " Training Quantum Head: 6 modes, 3 photons\n", " Input state: [1, 0, 1, 0, 1, 0]\n", "Number of parameters in Quantum head: 77014\n", "Epoch 50/200, Train Loss: 0.6078\n", "Epoch 100/200, Train Loss: 0.5386\n", "Epoch 150/200, Train Loss: 0.4767\n", "Epoch 200/200, Train Loss: 0.4192\n", " Quantum 6-3 - Val: 0.6720, Test: 0.6399\n", "\n", " Training Quantum Head: 8 modes, 1 photons\n", " Input state: [1, 0, 0, 0, 0, 0, 0, 0]\n", "Number of parameters in Quantum head: 77046\n", "Epoch 50/200, Train Loss: 0.5943\n", "Epoch 100/200, Train Loss: 0.5106\n", "Epoch 150/200, Train Loss: 0.4470\n", "Epoch 200/200, Train Loss: 0.3959\n", " Quantum 8-1 - Val: 0.7960, Test: 0.7572\n", "\n", " Training Quantum Head: 8 modes, 2 photons\n", " Input state: [1, 0, 1, 0, 0, 0, 0, 0]\n", "Number of parameters in Quantum head: 77086\n", "Epoch 50/200, Train Loss: 0.6094\n", "Epoch 100/200, Train Loss: 0.5392\n", "Epoch 150/200, Train Loss: 0.4776\n", "Epoch 200/200, Train Loss: 0.4242\n", " Quantum 8-2 - Val: 0.8480, Test: 0.8376\n", "\n", " Training Quantum Head: 8 modes, 3 photons\n", " Input state: [1, 0, 1, 0, 1, 0, 0, 0]\n", "Number of parameters in Quantum head: 77142\n", "Epoch 50/200, Train Loss: 0.6362\n", "Epoch 100/200, Train Loss: 0.5685\n", "Epoch 150/200, Train Loss: 0.5048\n", "Epoch 200/200, Train Loss: 0.4432\n", " Quantum 8-3 - Val: 0.7960, Test: 0.7781\n", "\n", " Training Quantum Head: 8 modes, 4 photons\n", " Input state: [1, 0, 1, 0, 1, 0, 1, 0]\n", "Number of parameters in Quantum head: 77170\n", "Epoch 50/200, Train Loss: 0.6389\n", "Epoch 100/200, Train Loss: 0.5645\n", "Epoch 150/200, Train Loss: 0.4995\n", "Epoch 200/200, Train Loss: 0.4413\n", " Quantum 8-4 - Val: 0.6600, Test: 0.6270\n" ] } ], "execution_count": 35 }, { "cell_type": "markdown", "source": [ "\n", " ## 9. Results Summary and Visualization\n", "\n", " Let's visualize and analyze the results from all classification heads.\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "# Extract results for visualization\n", "classical_methods = ['LogisticRegression', 'SVC', 'MLP']\n", "classical_val_accs = [results[method][0] for method in classical_methods]\n", "classical_test_accs = [results[method][1] for method in classical_methods]\n", "\n", "# Process quantum results\n", "quantum_configs = list(results['Qlayer'].keys())\n", "quantum_val_accs = [results['Qlayer'][config][0] for config in quantum_configs]\n", "quantum_test_accs = [results['Qlayer'][config][1] for config in quantum_configs]\n", "\n", "# Create comparison plot\n", "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))\n", "\n", "# Validation accuracies\n", "x_classical = range(len(classical_methods))\n", "x_quantum = range(len(classical_methods), len(classical_methods) + len(quantum_configs))\n", "x_labels_quantum = [f\"{c.split('-qlayer-')[0]}-{c.split('-qlayer-')[1]}\" for c in quantum_configs]\n", "\n", "ax1.bar(x_classical, classical_val_accs, color='skyblue', label='Classical')\n", "ax1.bar(x_quantum, quantum_val_accs, color='lightcoral', label='Quantum')\n", "ax1.set_xticks(list(x_classical) + list(x_quantum))\n", "ax1.set_xticklabels(classical_methods + x_labels_quantum, rotation=45, ha='right')\n", "ax1.set_ylabel('Validation Accuracy')\n", "ax1.set_title('Validation Performance Comparison')\n", "ax1.legend()\n", "ax1.grid(axis='y', alpha=0.3)\n", "\n", "# Test accuracies\n", "ax2.bar(x_classical, classical_test_accs, color='skyblue', label='Classical')\n", "ax2.bar(x_quantum, quantum_test_accs, color='lightcoral', label='Quantum')\n", "ax2.set_xticks(list(x_classical) + list(x_quantum))\n", "ax2.set_xticklabels(classical_methods + x_labels_quantum, rotation=45, ha='right')\n", "ax2.set_ylabel('Test Accuracy')\n", "ax2.set_title('Test Performance Comparison')\n", "ax2.legend()\n", "ax2.grid(axis='y', alpha=0.3)\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "# Print summary statistics\n", "print(\"\\n=== RESULTS SUMMARY ===\")\n", "print(\"\\nClassical Methods:\")\n", "for i, method in enumerate(classical_methods):\n", " print(f\"{method:20s} - Val: {classical_val_accs[i]:.4f}, Test: {classical_test_accs[i]:.4f}\")\n", "\n", "print(\"\\nQuantum Methods (best per mode count):\")\n", "modes_processed = set()\n", "for config in quantum_configs:\n", " mode_count = config.split('-')[0]\n", " if mode_count not in modes_processed:\n", " # Find best accuracy for this mode count\n", " mode_configs = [c for c in quantum_configs if c.startswith(mode_count + '-')]\n", " best_val = max(results['Qlayer'][c][0] for c in mode_configs)\n", " best_test = max(results['Qlayer'][c][1] for c in mode_configs)\n", " print(f\"{mode_count + ' modes':20s} - Val: {best_val:.4f}, Test: {best_test:.4f}\")\n", " modes_processed.add(mode_count)\n" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:23:56.268795Z", "start_time": "2025-06-09T16:23:56.167534Z" } }, "outputs": [ { "data": { "text/plain": [ "
" ], "image/png": "" }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "=== RESULTS SUMMARY ===\n", "\n", "Classical Methods:\n", "LogisticRegression - Val: 0.8000, Test: 0.7669\n", "SVC - Val: 0.8080, Test: 0.7846\n", "MLP - Val: 0.8040, Test: 0.7701\n", "\n", "Quantum Methods (best per mode count):\n", "2 modes - Val: 0.5800, Test: 0.5691\n", "4 modes - Val: 0.8120, Test: 0.7990\n", "6 modes - Val: 0.8360, Test: 0.8280\n", "8 modes - Val: 0.8480, Test: 0.8376\n" ] } ], "execution_count": 40 }, { "cell_type": "markdown", "source": [ "\n", " ### 9.1 Save Results\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "source": [ "save_experiment_results(results, \"quantum_llm_finetuning_results.json\")\n", "\n", "print(\"\\nExperiment completed successfully!\")\n", "print(f\"Results saved to ./results/quantum_llm_finetuning_results.json\")" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2025-06-09T16:24:04.900244Z", "start_time": "2025-06-09T16:24:04.895690Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Results saved. Total experiments: 2\n", "\n", "Experiment completed successfully!\n", "Results saved to ./results/quantum_llm_finetuning_results.json\n" ] } ], "execution_count": 41 }, { "cell_type": "markdown", "source": [ "\n", " ## 10. Analysis and Conclusions\n", "\n", " Based on the experimental results, we can draw several insights:\n", "\n", " ### Performance Comparison\n", "\n", " 1. **Classical Baselines**:\n", " - Logistic Regression provides a strong baseline despite its simplicity\n", " - SVM often performs competitively in few-shot scenarios\n", " - MLP can capture non-linear patterns but may overfit with limited data\n", "\n", " 2. **Quantum Classifiers**:\n", " - Performance varies with the number of modes and photons\n", " - Smaller quantum circuits (2-4 modes) often perform comparably to classical methods\n", " - Larger circuits may suffer from optimization challenges\n", "\n", " ### Key Observations\n", "\n", " - **Few-shot learning**: With only 8 samples per class, simpler models often generalize better\n", " - **Quantum advantage**: Quantum models show promise but require careful hyperparameter tuning\n", " - **Computational trade-offs**: Quantum simulations are computationally intensive compared to classical methods\n", "\n", " ### Future Directions\n", "\n", " 1. **Scaling studies**: Test with more training samples to see if quantum models benefit from larger datasets\n", " 2. **Architecture exploration**: Try different quantum circuit designs and encoding strategies\n", " 3. **Hardware implementation**: Evaluate on real quantum photonic hardware when available\n", " 4. **Hybrid approaches**: Combine classical and quantum layers for potentially better performance\n", "\n", " The results demonstrate that quantum photonic classifiers can achieve competitive performance in NLP tasks, opening new avenues for quantum-enhanced machine learning in language processing." ], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }