Output Mapping Strategies

Overview

Output mapping strategies transform quantum probability distributions into classical neural network outputs. The choice of mapping strategy significantly impacts model performance, interpretability, and computational efficiency.

Mapping Strategies

LINEAR

Description: Standard neural network linear transformation with learnable weights and biases.

Mathematical Form:

y = Wx + b

where W is a matrix of size (output_size x input_size), b is a vector of size output_size

Characteristics:

  • Fully learnable transformation

  • Can produce any real-valued output

  • Standard backpropagation applies

  • Most flexible but computationally intensive

Use Cases:

  • Classification tasks requiring logits

  • Regression with arbitrary output ranges

  • When maximum learning flexibility is needed

  • Standard deep learning integration

ansatz = ML.AnsatzFactory.create(
    experiment=experiment,
    input_size=4,
    output_size=10,
    output_mapping_strategy=ML.OutputMappingStrategy.LINEAR
)

Advantages:

  • Universal approximation capability

  • Familiar optimization landscape

  • Easy integration with existing architectures

  • Good gradient flow properties

Disadvantages:

  • Adds learnable parameters

  • May lose quantum structure information

  • Computationally more expensive

LEXGROUPING

Description: Lexicographical grouping of probability amplitudes into equal-sized buckets.

Mathematical Form:

For probability distribution p of size n mapping to output y of size m:

y_i = sum(p_j) for j from i*k to (i+1)*k-1

where k = ceiling(n/m) (bucket size)

Padding Behavior:

If n is not divisible by m, zeros are padded to make equal-sized groups.

Characteristics:

  • No learnable parameters

  • Preserves probability mass

  • Deterministic grouping scheme

  • Output values in [0, 1] range

Use Cases:

  • Probability-based outputs

  • When preserving quantum measurement statistics

  • Resource-constrained environments

  • Interpretable quantum outputs

ansatz = ML.AnsatzFactory.create(
    experiment=experiment,
    input_size=4,
    output_size=6,
    output_mapping_strategy=ML.OutputMappingStrategy.LEXGROUPING
)

Example:

# Input distribution: [0.1, 0.2, 0.3, 0.1, 0.2, 0.1] (6 elements)
# Output size: 3
# Grouping: [0.1+0.2, 0.3+0.1, 0.2+0.1] = [0.3, 0.4, 0.3]

Advantages:

  • No additional parameters

  • Preserves quantum measurement structure

  • Fast computation

  • Interpretable outputs

Disadvantages:

  • Limited flexibility

  • May not capture optimal feature combinations

  • Fixed grouping scheme

MODGROUPING

Description: Groups probability amplitudes based on modulo arithmetic.

Mathematical Form:

y_i = sum(p_j) for all j where j mod m = i

Characteristics:

  • No learnable parameters

  • Distributes indices cyclically

  • Good for capturing periodic patterns

  • Output values in [0, 1] range

Use Cases:

  • Periodic or cyclic data patterns

  • When spatial/temporal locality matters

  • Specific problem structures with modular symmetry

ansatz = ML.AnsatzFactory.create(
    experiment=experiment,
    input_size=4,
    output_size=3,
    output_mapping_strategy=ML.OutputMappingStrategy.MODGROUPING
)

Example:

# Input distribution: [0.1, 0.2, 0.3, 0.15, 0.1, 0.15] (indices 0-5)
# Output size: 3
# Group 0 (indices 0,3): 0.1 + 0.15 = 0.25
# Group 1 (indices 1,4): 0.2 + 0.1 = 0.3
# Group 2 (indices 2,5): 0.3 + 0.15 = 0.45
# Result: [0.25, 0.3, 0.45]

Advantages:

  • Captures cyclic patterns

  • No additional parameters

  • Even distribution of information

  • Suitable for certain symmetries

Disadvantages:

  • Limited to specific problem types

  • May not suit arbitrary distributions

  • Less intuitive than lexicographical grouping

NONE (Identity)

Description: Direct use of quantum probability distribution as output.

Mathematical Form:

y = p (identity mapping)

Requirements:

  • Distribution size must equal desired output size

  • No size transformation possible

Characteristics:

  • No parameters or computation overhead

  • Direct quantum measurement interpretation

  • Outputs are valid probability distributions

  • Sum to 1 (normalized)

Use Cases:

  • Probability estimation tasks

  • When quantum distribution is the desired output

  • Maximum efficiency requirements

  • Quantum-native applications

# Must ensure distribution size matches output size
temp_ansatz = ML.AnsatzFactory.create(experiment, input_size=4, output_size=10)
temp_layer = ML.QuantumLayer(input_size=4, ansatz=temp_ansatz)
# Determine actual distribution size
dist_size = temp_layer(torch.rand(1, 4)).shape[1]

# Create NONE mapping with matching size
ansatz = ML.AnsatzFactory.create(
    experiment=experiment,
    input_size=4,
    output_size=dist_size,  # Must match
    output_mapping_strategy=ML.OutputMappingStrategy.NONE
)

Advantages:

  • Zero computational overhead

  • Pure quantum information preservation

  • No additional parameters

  • Maximum computational efficiency

Disadvantages:

  • Rigid size constraints

  • Limited output range [0,1]

  • Sum constraint (sum of all y_i = 1)

  • Not suitable for arbitrary outputs

Selection Guidelines

Task-Based Recommendations

Task Type

Primary Choice

Alternative

Reasoning

Classification

LINEAR

LEXGROUPING

Need logits/flexible outputs

Regression

LINEAR

None

Require arbitrary output ranges

Probability Estimation

NONE

LEXGROUPING

Want direct probabilities

Structured Outputs

MODGROUPING

LEXGROUPING

Exploit pattern structure

Performance Considerations

Strategy

Parameter Cost

Computation Cost

Memory Usage

LINEAR

O(input_size × output_size)

O(input_size × output_size)

High

LEXGROUPING

0

O(input_size)

Low

MODGROUPING

0

O(input_size)

Low

NONE

0

0

Minimal

Size Compatibility

def check_mapping_compatibility(quantum_output_size, desired_output_size):
    """Check which mapping strategies are compatible."""

    compatible = []

    # LINEAR: Always compatible
    compatible.append('LINEAR')

    # LEXGROUPING: Always compatible (uses padding)
    compatible.append('LEXGROUPING')

    # MODGROUPING: Always compatible
    compatible.append('MODGROUPING')

    # NONE: Only if sizes match exactly
    if quantum_output_size == desired_output_size:
        compatible.append('NONE')

    return compatible

Advanced Usage Patterns

Dynamic Strategy Selection

class AdaptiveOutputLayer(nn.Module):
    def __init__(self, quantum_layer, strategies=['LINEAR', 'LEXGROUPING']):
        super().__init__()
        self.quantum_layer = quantum_layer
        self.strategies = strategies
        self.current_strategy = 0

        # Create multiple output mappings
        self.mappers = nn.ModuleDict()
        for strategy in strategies:
            if strategy == 'LINEAR':
                self.mappers[strategy] = nn.Linear(dist_size, output_size)
            elif strategy == 'LEXGROUPING':
                self.mappers[strategy] = ML.LexGroupingMapper(dist_size, output_size)
            # ... other strategies

    def forward(self, x):
        quantum_out = self.quantum_layer(x)
        strategy = self.strategies[self.current_strategy]
        return self.mappers[strategy](quantum_out)

    def switch_strategy(self, new_strategy_idx):
        self.current_strategy = new_strategy_idx

Ensemble Output Mapping

class EnsembleOutputMapping(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        # Multiple mapping strategies
        self.linear = nn.Linear(input_size, output_size)
        self.lexgroup = ML.LexGroupingMapper(input_size, output_size)
        self.modgroup = ML.ModGroupingMapper(input_size, output_size)

        # Learnable combination weights
        self.combination_weights = nn.Parameter(torch.ones(3) / 3)

    def forward(self, quantum_distribution):
        # Apply all strategies
        linear_out = self.linear(quantum_distribution)
        lex_out = self.lexgroup(quantum_distribution)
        mod_out = self.modgroup(quantum_distribution)

        # Weighted combination
        weights = torch.softmax(self.combination_weights, dim=0)
        combined = (weights[0] * linear_out +
                   weights[1] * lex_out +
                   weights[2] * mod_out)

        return combined

Hierarchical Mapping

class HierarchicalMapping(nn.Module):
    def __init__(self, input_size, intermediate_size, output_size):
        super().__init__()

        # First stage: Reduce dimensionality with grouping
        self.stage1 = ML.LexGroupingMapper(input_size, intermediate_size)

        # Second stage: Learn final transformation
        self.stage2 = nn.Linear(intermediate_size, output_size)

    def forward(self, quantum_distribution):
        intermediate = self.stage1(quantum_distribution)
        return self.stage2(intermediate)

Optimization Strategies

Gradient Flow Analysis

Different mapping strategies affect gradient flow differently:

LINEAR:

  • Full gradient backpropagation through learned weights

  • May benefit from learning rate scheduling

  • Standard optimization techniques apply

LEXGROUPING/MODGROUPING:

  • Direct gradient flow through grouping operation

  • Generally stable gradients

  • May require careful quantum layer optimization

NONE:

  • Direct gradients to quantum layer

  • No intermediate transformation noise

  • Depends entirely on quantum circuit optimization

Performance Tuning

def optimize_mapping_choice(model, val_loader, strategies):
    """Empirically determine best mapping strategy."""

    results = {}

    for strategy in strategies:
        # Create model variant with this strategy
        model_variant = create_model_with_strategy(strategy)

        # Evaluate performance
        val_loss = evaluate_model(model_variant, val_loader)
        train_time = measure_training_speed(model_variant)
        memory_usage = measure_memory_usage(model_variant)

        results[strategy] = {
            'val_loss': val_loss,
            'train_time': train_time,
            'memory_usage': memory_usage,
            'score': val_loss + 0.1 * train_time + 0.01 * memory_usage
        }

    # Return best strategy
    best_strategy = min(results.keys(), key=lambda k: results[k]['score'])
    return best_strategy, results

Integration with Classical Networks

Pre-quantum Processing

class PreQuantumProcessor(nn.Module):
    def __init__(self, classical_size, quantum_size):
        super().__init__()
        self.processor = nn.Sequential(
            nn.Linear(classical_size, quantum_size * 2),
            nn.ReLU(),
            nn.Linear(quantum_size * 2, quantum_size),
            nn.Sigmoid()  # Normalize for quantum layer
        )

    def forward(self, x):
        return self.processor(x)

Post-quantum Processing

class PostQuantumProcessor(nn.Module):
    def __init__(self, quantum_size, final_size, mapping_strategy):
        super().__init__()

        # Quantum output mapping
        if mapping_strategy == 'LINEAR':
            self.mapper = nn.Linear(quantum_size, quantum_size // 2)
        elif mapping_strategy == 'LEXGROUPING':
            self.mapper = ML.LexGroupingMapper(quantum_size, quantum_size // 2)

        # Classical post-processing
        self.post_processor = nn.Sequential(
            nn.Linear(quantum_size // 2, final_size * 2),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(final_size * 2, final_size)
        )

    def forward(self, quantum_output):
        mapped = self.mapper(quantum_output)
        return self.post_processor(mapped)

Debugging Output Mappings

Diagnostic Tools

def diagnose_output_mapping(layer, test_input):
    """Diagnose output mapping behavior."""

    # Get quantum distribution
    with torch.no_grad():
        quantum_dist = layer.computation_process.compute(
            layer.prepare_parameters([test_input])
        )

    print(f"Quantum distribution shape: {quantum_dist.shape}")
    print(f"Distribution sum: {quantum_dist.sum():.6f}")
    print(f"Distribution range: [{quantum_dist.min():.6f}, {quantum_dist.max():.6f}]")

    # Test mapping
    final_output = layer.output_mapping(quantum_dist)
    print(f"Final output shape: {final_output.shape}")
    print(f"Output range: [{final_output.min():.6f}, {final_output.max():.6f}]")

    # Check gradients
    loss = final_output.sum()
    loss.backward()

    if hasattr(layer.output_mapping, 'weight'):
        if layer.output_mapping.weight.grad is not None:
            grad_norm = layer.output_mapping.weight.grad.norm()
            print(f"Output mapping gradient norm: {grad_norm:.6f}")