Model Security
šŸ”„ Advanced
T1588.002

Adversarial Machine Learning

Adversarial ML techniques reveal blind spots in vision and biometric models, helping defenders understand model fragility in real environments. This knowledge is essential for assessing whether surveillance systems can be pushed below reliable detection thresholds through lawful physical-world modifications.

Why This Matters for Counter-Surveillance

Understanding adversarial ML is not about building attacks — it's about knowing what's possible. If you're evaluating whether a surveillance system is robust, you need to understand the attack surface. Defenders who understand adversarial examples can better evaluate vendor claims, set realistic confidence thresholds, and design multi-sensor fallback architectures.

Adversarial Attack Taxonomy

Choosing the right adversarial technique depends on two key decisions: how much access you have to the target model, and whether the attack needs to survive physical-world conditions (printing, lighting, camera angles). The decision tree below maps these choices to specific attack families.

Adversarial Attack Decision Tree

flowchart TD A["Threat Model"] --> B{"Access Level?"} B -->|White-box| C["Gradient-Based"] B -->|Black-box| D["Transfer or Query"] C --> E["FGSM Single-step"] C --> F["PGD Iterative"] C --> G["C+W Optimization"] D --> H["Ensemble Transfer"] D --> I["Query-Based ZOO, HSJA"] E --> J{"Physical?"} F --> J G --> J H --> J I --> J J -->|Yes| K["EOT + Patch Printable"] J -->|No| L["Digital-Only Lp bounded"] style A fill:#facc15,stroke:#facc15,color:#000 style C fill:#4ade80,stroke:#4ade80,color:#000 style D fill:#22d3ee,stroke:#22d3ee,color:#000 style K fill:#ec4899,stroke:#ec4899,color:#000 style L fill:#a855f7,stroke:#a855f7,color:#000

White-box → Gradient-Based

Full access to model weights and architecture. Compute exact gradients to craft minimal perturbations. Most effective but requires model access — common in research and internal audits.

Black-box → Transfer / Query

No model access. Either craft adversarial examples on a surrogate model and hope they transfer, or probe the target API with repeated queries to estimate gradients. Realistic for attacking deployed systems.

Physical → EOT + Patch

Attack must survive printing, variable lighting, camera angles, and distance. Uses Expectation over Transformation (EOT) to optimize across conditions. Produces wearable patches, adversarial clothing, or printed patterns.

Digital-Only → Lp Bounded

Perturbation applied directly to pixel values with a mathematical bound (Lāˆž or L2) ensuring changes are imperceptible. Useful for testing model robustness but won't work against real-world cameras.

Attack Class Method Norm Bound Strength Physical Viability
FGSM Single-step gradient sign Lāˆž Moderate Low (digital only)
PGD Iterative gradient descent Lāˆž Strong Low-Medium
C&W Optimization-based (Carlini-Wagner) L2 Very Strong Low (digital only)
DeepFool Minimal perturbation search L2 Strong Low (digital only)
Physical patches Printable adversarial patterns Unconstrained Variable High (physical world)
EOT Expectation over Transformation Variable Strong High (robust to conditions)

Defensive Use Cases

šŸ” Model Robustness Auditing

Test surveillance model brittleness before operational deployment. Identify failure modes, compute adversarial accuracy, and establish minimum confidence thresholds.

šŸ”„ Multi-Sensor Fallback Design

Design fallback paths when single-model confidence drops. If face recognition is unreliable, the system should gracefully degrade to gait or device-based identification rather than failing silently.

āš™ļø Threshold Calibration

Tune confidence thresholds and human-review gates for high-impact alerts. Adversarial testing reveals where automatic thresholds may produce dangerous false positives or false negatives.

šŸ“‹ Governance Documentation

Document model failure modes for governance, external audit, and regulatory compliance. Adversarial evaluation reports demonstrate due diligence in model deployment decisions.

Physical-World Constraints

Digital adversarial examples do not automatically transfer to the physical world. Understanding these constraints is critical for realistic threat assessment.

  • ⚠ Printing artifacts: Printer resolution, color gamut limitations, and paper reflectivity degrade subtle perturbations to noise.
  • ⚠ Environmental variation: Lighting changes, weather, distance, and sensor compression all affect perturbation effectiveness. EOT-trained patches are more robust but still degraded.
  • ⚠ Model transferability: Adversarial examples crafted for one model often fail against different architectures. Black-box transferability is limited without ensemble techniques.
  • ⚠ Multi-camera coverage: Single-angle patches may be neutralized by overlapping cameras at different angles. Multi-view systems are significantly harder to defeat.
  • ⚠ Adversarial defenses: Some deployed systems include adversarial detection preprocessing (spatial smoothing, JPEG compression, feature squeezing) that specifically targets adversarial perturbations.

FGSM Attack Implementation

The simplest adversarial attack — demonstrates the core concept of gradient-based perturbation.

fgsm_attack.py
python
#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""FGSM (Fast Gradient Sign Method) adversarial example generation.
Demonstrates how small perturbations can fool classifiers."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np

# Load pre-trained model
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≄0.13
model = models.resnet50(pretrained=True)
model.eval()

# Standard ImageNet preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    # ImageNet dataset mean and std — required for pre-trained model compatibility
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def fgsm_attack(image_tensor, epsilon, data_grad):
    """Generate adversarial example using FGSM."""
    sign_grad = data_grad.sign()
    perturbed = image_tensor + epsilon * sign_grad
    return torch.clamp(perturbed, 0, 1)

# Load and preprocess image
img = Image.open("test_face.jpg")
input_tensor = preprocess(img).unsqueeze(0)
input_tensor.requires_grad = True

# Forward pass
output = model(input_tensor)
original_class = output.argmax(dim=1).item()
original_conf = F.softmax(output, dim=1).max().item()

# Generate adversarial gradient
loss = F.cross_entropy(output, torch.tensor([original_class]))
model.zero_grad()
loss.backward()

# Test different epsilon values
# Epsilon controls perturbation magnitude: 0.001–0.01 = imperceptible; 0.02–0.05 = faint artifacts visible on zoom
for eps in [0.001, 0.005, 0.01, 0.02, 0.05]:
    perturbed = fgsm_attack(input_tensor, eps, input_tensor.grad.data)
    adv_output = model(perturbed)
    adv_class = adv_output.argmax(dim=1).item()
    adv_conf = F.softmax(adv_output, dim=1).max().item()
    
    # Measure perturbation visibility
    l2_norm = torch.norm(perturbed - input_tensor).item()
    linf_norm = torch.max(torch.abs(perturbed - input_tensor)).item()
    
    print(f"eps={eps:.3f} | class: {original_class}→{adv_class} | "
          f"conf: {original_conf:.3f}→{adv_conf:.3f} | "
          f"L2: {l2_norm:.3f} | Lāˆž: {linf_norm:.3f}")

# Expected output:
# === FGSM Attack Results ===
# Epsilon | Original Class    | Adversarial Class | Confidence | Success
# ──────────────────────────────────────────────────────────────────────
# 0.001   | golden_retriever  | golden_retriever  |     89.2%  | NO
# 0.005   | golden_retriever  | golden_retriever  |     71.4%  | NO
# 0.01    | golden_retriever  | Labrador          |     54.8%  | YES ← misclassified
# 0.02    | golden_retriever  | tennis_ball       |     67.3%  | YES
# 0.05    | golden_retriever  | shower_curtain    |     82.1%  | YES
#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""FGSM (Fast Gradient Sign Method) adversarial example generation.
Demonstrates how small perturbations can fool classifiers."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np

# Load pre-trained model
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≄0.13
model = models.resnet50(pretrained=True)
model.eval()

# Standard ImageNet preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    # ImageNet dataset mean and std — required for pre-trained model compatibility
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def fgsm_attack(image_tensor, epsilon, data_grad):
    """Generate adversarial example using FGSM."""
    sign_grad = data_grad.sign()
    perturbed = image_tensor + epsilon * sign_grad
    return torch.clamp(perturbed, 0, 1)

# Load and preprocess image
img = Image.open("test_face.jpg")
input_tensor = preprocess(img).unsqueeze(0)
input_tensor.requires_grad = True

# Forward pass
output = model(input_tensor)
original_class = output.argmax(dim=1).item()
original_conf = F.softmax(output, dim=1).max().item()

# Generate adversarial gradient
loss = F.cross_entropy(output, torch.tensor([original_class]))
model.zero_grad()
loss.backward()

# Test different epsilon values
# Epsilon controls perturbation magnitude: 0.001–0.01 = imperceptible; 0.02–0.05 = faint artifacts visible on zoom
for eps in [0.001, 0.005, 0.01, 0.02, 0.05]:
    perturbed = fgsm_attack(input_tensor, eps, input_tensor.grad.data)
    adv_output = model(perturbed)
    adv_class = adv_output.argmax(dim=1).item()
    adv_conf = F.softmax(adv_output, dim=1).max().item()
    
    # Measure perturbation visibility
    l2_norm = torch.norm(perturbed - input_tensor).item()
    linf_norm = torch.max(torch.abs(perturbed - input_tensor)).item()
    
    print(f"eps={eps:.3f} | class: {original_class}→{adv_class} | "
          f"conf: {original_conf:.3f}→{adv_conf:.3f} | "
          f"L2: {l2_norm:.3f} | Lāˆž: {linf_norm:.3f}")

# Expected output:
# === FGSM Attack Results ===
# Epsilon | Original Class    | Adversarial Class | Confidence | Success
# ──────────────────────────────────────────────────────────────────────
# 0.001   | golden_retriever  | golden_retriever  |     89.2%  | NO
# 0.005   | golden_retriever  | golden_retriever  |     71.4%  | NO
# 0.01    | golden_retriever  | Labrador          |     54.8%  | YES ← misclassified
# 0.02    | golden_retriever  | tennis_ball       |     67.3%  | YES
# 0.05    | golden_retriever  | shower_curtain    |     82.1%  | YES

PGD Attack (Iterative)

Stronger iterative attack used as the gold standard for adversarial robustness evaluation.

pgd_attack.py
python
#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""PGD (Projected Gradient Descent) — stronger iterative attack.
More robust than FGSM, commonly used for adversarial training evaluation."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image

# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≄0.13
model = models.resnet50(pretrained=True)
model.eval()

def pgd_attack(model, image, label,
               eps=0.03,    # Maximum total perturbation budget (Lāˆž norm) — ~8/255 pixel intensity
               alpha=0.005, # Step size per iteration — smaller = finer-grained but slower convergence
               iters=40):   # Number of projected gradient steps — more iterations = stronger attack but slower
    """Projected Gradient Descent adversarial attack."""
    perturbed = image.clone().detach().requires_grad_(True)
    
    for i in range(iters):
        output = model(perturbed)
        loss = F.cross_entropy(output, label)
        loss.backward()
        
        # Gradient step
        adv = perturbed + alpha * perturbed.grad.sign()
        
        # Project back to epsilon ball
        perturbation = torch.clamp(adv - image, min=-eps, max=eps)
        perturbed = torch.clamp(image + perturbation, min=0, max=1).detach().requires_grad_(True)
    
    return perturbed

# Usage
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
input_tensor = preprocess(Image.open("test.jpg")).unsqueeze(0)
original_output = model(input_tensor)
original_class = original_output.argmax(dim=1).item()
label = torch.tensor([original_class])
adversarial = pgd_attack(model, input_tensor, label, eps=0.03, iters=40)

# Check result
adv_output = model(adversarial)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = torch.nn.functional.softmax(adv_output, dim=1).max().item()
print(f"Original: {original_class} | Adversarial: {adv_class} ({adv_conf:.1%})")
print(f"Lāˆž perturbation: {torch.max(torch.abs(adversarial - input_tensor)).item():.4f}")
print(f"L2 perturbation: {torch.norm(adversarial - input_tensor).item():.3f}")

# Expected output:
# PGD Attack (eps=0.030, alpha=0.005, iters=40)
# Original prediction: golden_retriever (91.3%)
# Adversarial prediction: paper_towel (73.8%) ← misclassified
# Lāˆž perturbation: 0.0300 | L2 perturbation: 2.847
#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""PGD (Projected Gradient Descent) — stronger iterative attack.
More robust than FGSM, commonly used for adversarial training evaluation."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image

# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≄0.13
model = models.resnet50(pretrained=True)
model.eval()

def pgd_attack(model, image, label,
               eps=0.03,    # Maximum total perturbation budget (Lāˆž norm) — ~8/255 pixel intensity
               alpha=0.005, # Step size per iteration — smaller = finer-grained but slower convergence
               iters=40):   # Number of projected gradient steps — more iterations = stronger attack but slower
    """Projected Gradient Descent adversarial attack."""
    perturbed = image.clone().detach().requires_grad_(True)
    
    for i in range(iters):
        output = model(perturbed)
        loss = F.cross_entropy(output, label)
        loss.backward()
        
        # Gradient step
        adv = perturbed + alpha * perturbed.grad.sign()
        
        # Project back to epsilon ball
        perturbation = torch.clamp(adv - image, min=-eps, max=eps)
        perturbed = torch.clamp(image + perturbation, min=0, max=1).detach().requires_grad_(True)
    
    return perturbed

# Usage
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
input_tensor = preprocess(Image.open("test.jpg")).unsqueeze(0)
original_output = model(input_tensor)
original_class = original_output.argmax(dim=1).item()
label = torch.tensor([original_class])
adversarial = pgd_attack(model, input_tensor, label, eps=0.03, iters=40)

# Check result
adv_output = model(adversarial)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = torch.nn.functional.softmax(adv_output, dim=1).max().item()
print(f"Original: {original_class} | Adversarial: {adv_class} ({adv_conf:.1%})")
print(f"Lāˆž perturbation: {torch.max(torch.abs(adversarial - input_tensor)).item():.4f}")
print(f"L2 perturbation: {torch.norm(adversarial - input_tensor).item():.3f}")

# Expected output:
# PGD Attack (eps=0.030, alpha=0.005, iters=40)
# Original prediction: golden_retriever (91.3%)
# Adversarial prediction: paper_towel (73.8%) ← misclassified
# Lāˆž perturbation: 0.0300 | L2 perturbation: 2.847

IBM ART Comprehensive Framework

The Adversarial Robustness Toolbox provides a unified interface for multiple attack methods and defenses.

art_evaluation.py
python
#!/usr/bin/env python3
# Prerequisites: pip install adversarial-robustness-toolbox torch torchvision numpy
"""IBM Adversarial Robustness Toolbox (ART) — comprehensive attack/defense framework.
Supports multiple attack methods, defenses, and detection techniques."""
from art.attacks.evasion import ProjectedGradientDescent, FastGradientMethod
from art.attacks.evasion import CarliniL2Method, DeepFool
from art.estimators.classification import PyTorchClassifier
from art.defences.preprocessor import SpatialSmoothing, FeatureSqueezing
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
import numpy as np

# Load pre-trained model and wrap for ART
model = models.resnet50(weights="IMAGENET1K_V2")
model.eval()

criterion = nn.CrossEntropyLoss()
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=None,
    input_shape=(3, 224, 224),
    nb_classes=1000,
    clip_values=(0.0, 1.0),  # Pixel value range after normalization (0.0–1.0 for float tensors)
)

# Load a small test batch (ImageNet-style or custom face images)
# ⚠ test_images/ must use ImageFolder format: test_images/class_name/image.jpg (e.g., test_images/dog/photo1.jpg)
# Replace 'test_images/' with your dataset directory
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
test_dataset = datasets.ImageFolder("test_images/", transform=preprocess)
loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
x_batch, y_batch = next(iter(loader))
x_test = x_batch.numpy()   # shape: (10, 3, 224, 224)
y_test = y_batch.numpy()

# Attack comparison
attacks = {
    "FGSM": FastGradientMethod(estimator=classifier, eps=0.03),
    "PGD-40": ProjectedGradientDescent(
        estimator=classifier, eps=0.03, eps_step=0.005, max_iter=40
    ),
    "C&W-L2": CarliniL2Method(
        classifier=classifier, confidence=0.5, max_iter=100  # C&W confidence parameter — higher = more confident adversarial, but larger perturbation needed
    ),
    "DeepFool": DeepFool(classifier=classifier, max_iter=50),
}

print(f"{'':<12} | {'Success':>8} | {'Avg L2':>10}")
print("-" * 40)
for name, attack in attacks.items():
    x_adv = attack.generate(x=x_test)
    
    # Evaluate
    preds_clean = np.argmax(classifier.predict(x_test), axis=1)
    preds_adv = np.argmax(classifier.predict(x_adv), axis=1)
    
    success_rate = np.mean(preds_clean != preds_adv)
    l2_dist = np.mean(np.sqrt(np.sum((x_adv - x_test)**2, axis=(1,2,3))))
    
    print(f"{name:<12} | {success_rate:>7.1%} | {l2_dist:>10.4f}")

# Defense evaluation
print("\n--- Defenses ---")
defenses = {
    "Spatial Smoothing 3x3": SpatialSmoothing(window_size=3),
    "Feature Squeezing 8-bit": FeatureSqueezing(bit_depth=8, clip_values=(0,1)),
}

# Use the last attack's adversarial examples for defense testing
for def_name, defense in defenses.items():
    x_def, _ = defense(x_adv)
    preds_def = np.argmax(classifier.predict(x_def), axis=1)
    recovery_rate = np.mean(preds_def == preds_clean)
    print(f"{def_name:<30} | Recovery: {recovery_rate:.1%}")

# Expected output:
# === Multi-Attack Comparison ===
# Attack    | Success Rate | Avg L2 Dist | Avg Confidence Drop
# ─────────────────────────────────────────────────────────────
# FGSM      |       72.0%  |       3.214 |             -43.2%
# PGD       |       88.0%  |       2.891 |             -56.7%
# C&W       |       96.0%  |       1.247 |             -71.8%
# DeepFool  |       84.0%  |       1.892 |             -48.3%
#!/usr/bin/env python3
# Prerequisites: pip install adversarial-robustness-toolbox torch torchvision numpy
"""IBM Adversarial Robustness Toolbox (ART) — comprehensive attack/defense framework.
Supports multiple attack methods, defenses, and detection techniques."""
from art.attacks.evasion import ProjectedGradientDescent, FastGradientMethod
from art.attacks.evasion import CarliniL2Method, DeepFool
from art.estimators.classification import PyTorchClassifier
from art.defences.preprocessor import SpatialSmoothing, FeatureSqueezing
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
import numpy as np

# Load pre-trained model and wrap for ART
model = models.resnet50(weights="IMAGENET1K_V2")
model.eval()

criterion = nn.CrossEntropyLoss()
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=None,
    input_shape=(3, 224, 224),
    nb_classes=1000,
    clip_values=(0.0, 1.0),  # Pixel value range after normalization (0.0–1.0 for float tensors)
)

# Load a small test batch (ImageNet-style or custom face images)
# ⚠ test_images/ must use ImageFolder format: test_images/class_name/image.jpg (e.g., test_images/dog/photo1.jpg)
# Replace 'test_images/' with your dataset directory
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
test_dataset = datasets.ImageFolder("test_images/", transform=preprocess)
loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
x_batch, y_batch = next(iter(loader))
x_test = x_batch.numpy()   # shape: (10, 3, 224, 224)
y_test = y_batch.numpy()

# Attack comparison
attacks = {
    "FGSM": FastGradientMethod(estimator=classifier, eps=0.03),
    "PGD-40": ProjectedGradientDescent(
        estimator=classifier, eps=0.03, eps_step=0.005, max_iter=40
    ),
    "C&W-L2": CarliniL2Method(
        classifier=classifier, confidence=0.5, max_iter=100  # C&W confidence parameter — higher = more confident adversarial, but larger perturbation needed
    ),
    "DeepFool": DeepFool(classifier=classifier, max_iter=50),
}

print(f"{'':<12} | {'Success':>8} | {'Avg L2':>10}")
print("-" * 40)
for name, attack in attacks.items():
    x_adv = attack.generate(x=x_test)
    
    # Evaluate
    preds_clean = np.argmax(classifier.predict(x_test), axis=1)
    preds_adv = np.argmax(classifier.predict(x_adv), axis=1)
    
    success_rate = np.mean(preds_clean != preds_adv)
    l2_dist = np.mean(np.sqrt(np.sum((x_adv - x_test)**2, axis=(1,2,3))))
    
    print(f"{name:<12} | {success_rate:>7.1%} | {l2_dist:>10.4f}")

# Defense evaluation
print("\n--- Defenses ---")
defenses = {
    "Spatial Smoothing 3x3": SpatialSmoothing(window_size=3),
    "Feature Squeezing 8-bit": FeatureSqueezing(bit_depth=8, clip_values=(0,1)),
}

# Use the last attack's adversarial examples for defense testing
for def_name, defense in defenses.items():
    x_def, _ = defense(x_adv)
    preds_def = np.argmax(classifier.predict(x_def), axis=1)
    recovery_rate = np.mean(preds_def == preds_clean)
    print(f"{def_name:<30} | Recovery: {recovery_rate:.1%}")

# Expected output:
# === Multi-Attack Comparison ===
# Attack    | Success Rate | Avg L2 Dist | Avg Confidence Drop
# ─────────────────────────────────────────────────────────────
# FGSM      |       72.0%  |       3.214 |             -43.2%
# PGD       |       88.0%  |       2.891 |             -56.7%
# C&W       |       96.0%  |       1.247 |             -71.8%
# DeepFool  |       84.0%  |       1.892 |             -48.3%

Physical Patch Evaluation

Framework for testing adversarial patches under realistic physical-world conditions.

physical_patch_eval.py
python
#!/usr/bin/env python3
"""Physical-world adversarial patch evaluation framework.
Tests patch effectiveness across angles, distances, and lighting conditions.
Requires: pip install ultralytics opencv-python numpy"""
import cv2
import numpy as np
from ultralytics import YOLO

# Load YOLOv8 model (downloads weights on first run)
detector = YOLO("yolov8n.pt")

def evaluate_detector(frame, model, conf_threshold=0.25):  # YOLO detection threshold (lower = more sensitive detector)
    """Run YOLOv8 detector and return max person-class confidence."""
    results = model(frame, verbose=False, conf=conf_threshold)
    
    max_conf = 0.0
    for r in results:
        for box in r.boxes:
            cls_id = int(box.cls[0])
            conf   = float(box.conf[0])
            # Class 0 = person in COCO
            if cls_id == 0 and conf > max_conf:
                max_conf = conf
    
    return max_conf

def physical_patch_evaluation(video_dir, model):
    """Evaluate adversarial patch across physical conditions."""
    conditions = {
        "angles": [0, 15, 30, 45, 60, 75, 90],
        "distances_m": [2, 5, 10, 15, 20],
        "lighting": ["bright", "overcast", "indoor", "low_light", "backlit"],
    }
    
    results = []
    for angle in conditions["angles"]:
        for dist in conditions["distances_m"]:
            # Load frame for this condition
            # Expected file naming: angle{A}_dist{D}.jpg (e.g., angle0_dist5.jpg, angle30_dist10.jpg)
            frame_path = f"{video_dir}/angle{angle}_dist{dist}.jpg"
            frame = cv2.imread(frame_path)
            if frame is None:
                continue
            
            conf = evaluate_detector(frame, model)
            results.append({
                "angle": angle, "distance": dist,
                "confidence": conf,
                "evaded": conf < 0.3  # if re-detected confidence stays below 0.3, the patch successfully degraded detection
            })
    
    # Summary report
    print(f"{'Angle':>6} {'Distance':>10} {'Confidence':>12} {'Evaded':>8}")
    print("-" * 40)
    for r in results:
        status = "āœ“ YES" if r["evaded"] else "āœ— NO"
        print(f"{r['angle']:>6}° {r['distance']:>9}m {r['confidence']:>12.3f} {status:>8}")
    
    evasion_rate = sum(1 for r in results if r["evaded"]) / len(results) if results else 0
    print(f"\nOverall evasion rate: {evasion_rate:.1%}")
#!/usr/bin/env python3
"""Physical-world adversarial patch evaluation framework.
Tests patch effectiveness across angles, distances, and lighting conditions.
Requires: pip install ultralytics opencv-python numpy"""
import cv2
import numpy as np
from ultralytics import YOLO

# Load YOLOv8 model (downloads weights on first run)
detector = YOLO("yolov8n.pt")

def evaluate_detector(frame, model, conf_threshold=0.25):  # YOLO detection threshold (lower = more sensitive detector)
    """Run YOLOv8 detector and return max person-class confidence."""
    results = model(frame, verbose=False, conf=conf_threshold)
    
    max_conf = 0.0
    for r in results:
        for box in r.boxes:
            cls_id = int(box.cls[0])
            conf   = float(box.conf[0])
            # Class 0 = person in COCO
            if cls_id == 0 and conf > max_conf:
                max_conf = conf
    
    return max_conf

def physical_patch_evaluation(video_dir, model):
    """Evaluate adversarial patch across physical conditions."""
    conditions = {
        "angles": [0, 15, 30, 45, 60, 75, 90],
        "distances_m": [2, 5, 10, 15, 20],
        "lighting": ["bright", "overcast", "indoor", "low_light", "backlit"],
    }
    
    results = []
    for angle in conditions["angles"]:
        for dist in conditions["distances_m"]:
            # Load frame for this condition
            # Expected file naming: angle{A}_dist{D}.jpg (e.g., angle0_dist5.jpg, angle30_dist10.jpg)
            frame_path = f"{video_dir}/angle{angle}_dist{dist}.jpg"
            frame = cv2.imread(frame_path)
            if frame is None:
                continue
            
            conf = evaluate_detector(frame, model)
            results.append({
                "angle": angle, "distance": dist,
                "confidence": conf,
                "evaded": conf < 0.3  # if re-detected confidence stays below 0.3, the patch successfully degraded detection
            })
    
    # Summary report
    print(f"{'Angle':>6} {'Distance':>10} {'Confidence':>12} {'Evaded':>8}")
    print("-" * 40)
    for r in results:
        status = "āœ“ YES" if r["evaded"] else "āœ— NO"
        print(f"{r['angle']:>6}° {r['distance']:>9}m {r['confidence']:>12.3f} {status:>8}")
    
    evasion_rate = sum(1 for r in results if r["evaded"]) / len(results) if results else 0
    print(f"\nOverall evasion rate: {evasion_rate:.1%}")

Responsible Research

Run adversarial experiments only in controlled environments and within written authorization. Never test against systems you do not own or operate with explicit consent. Document all experiments and findings for audit trails.

Related: Face Recognition & Physical Defense

Apply adversarial ML concepts to specific biometric systems: see Facial Recognition for FR-specific attacks and Physical Countermeasures for real-world adversarial patch deployment and testing.

Key Takeaways for Defenders

  • No model is robust to all attacks: adversarial training improves but doesn't eliminate vulnerability
  • Physical attacks degrade: real-world conditions significantly reduce digital attack effectiveness
  • Multi-model ensembles resist transfer: use diverse architectures to reduce single-point failure
  • Human review remains essential: automate detection but keep humans in the loop for high-stakes decisions
  • Test continuously: new attack methods emerge regularly — robustness evaluation must be ongoing
šŸŽÆ

Adversarial ML Labs

Hands-on exercises to understand adversarial techniques and defenses.

šŸ”§
FGSM/PGD Attack Lab Custom Lab hard
Set up PyTorch with pre-trained ResNet-50Implement FGSM attack with varying epsilon valuesImplement PGD attack and compare success ratesMeasure perturbation visibility (L2, Lāˆž norms)Test transferability between different model architectures
šŸ”§
ART Defense Evaluation Custom Lab hard
Install IBM Adversarial Robustness ToolboxGenerate adversarial examples with FGSM, PGD, C&W, DeepFoolApply preprocessing defenses (spatial smoothing, feature squeezing)Measure defense recovery rates against each attackDocument which defenses are most effective for each attack type