Model Security

Advanced

Adversarial Machine Learning

Adversarial ML techniques reveal blind spots in vision and biometric models, helping defenders understand model fragility in real environments. This knowledge is essential for assessing whether surveillance systems can be pushed below reliable detection thresholds through lawful physical-world modifications.

Why This Matters for Counter-Surveillance

Understanding adversarial ML is not about building attacks — it's about knowing what's possible. If you're evaluating whether a surveillance system is robust, you need to understand the attack surface. Defenders who understand adversarial examples can better evaluate vendor claims, set realistic confidence thresholds, and design multi-sensor fallback architectures.

Adversarial Attack Taxonomy

Choosing the right adversarial technique depends on two key decisions: how much access you have to the target model, and whether the attack needs to survive physical-world conditions (printing, lighting, camera angles). The decision tree below maps these choices to specific attack families.

Adversarial Attack Decision Tree

flowchart TD A["Threat Model"] --> B{"Access Level?"} B -->|White-box| C["Gradient-Based"] B -->|Black-box| D["Transfer or Query"] C --> E["FGSM Single-step"] C --> F["PGD Iterative"] C --> G["C+W Optimization"] D --> H["Ensemble Transfer"] D --> I["Query-Based ZOO, HSJA"] E --> J{"Physical?"} F --> J G --> J H --> J I --> J J -->|Yes| K["EOT + Patch Printable"] J -->|No| L["Digital-Only Lp bounded"] style A fill:#facc15,stroke:#facc15,color:#000 style C fill:#4ade80,stroke:#4ade80,color:#000 style D fill:#22d3ee,stroke:#22d3ee,color:#000 style K fill:#ec4899,stroke:#ec4899,color:#000 style L fill:#a855f7,stroke:#a855f7,color:#000

White-box → Gradient-Based

Full access to model weights and architecture. Compute exact gradients to craft minimal perturbations. Most effective but requires model access — common in research and internal audits.

Black-box → Transfer / Query

No model access. Either craft adversarial examples on a surrogate model and hope they transfer, or probe the target API with repeated queries to estimate gradients. Realistic for attacking deployed systems.

Physical → EOT + Patch

Attack must survive printing, variable lighting, camera angles, and distance. Uses Expectation over Transformation (EOT) to optimize across conditions. Produces wearable patches, adversarial clothing, or printed patterns.

Digital-Only → Lp Bounded

Perturbation applied directly to pixel values with a mathematical bound (L∞ or L2) ensuring changes are imperceptible. Useful for testing model robustness but won't work against real-world cameras.

Attack Class	Method	Norm Bound	Strength	Physical Viability
FGSM	Single-step gradient sign	L∞	Moderate	Low (digital only)
PGD	Iterative gradient descent	L∞	Strong	Low-Medium
C&W	Optimization-based (Carlini-Wagner)	L2	Very Strong	Low (digital only)
DeepFool	Minimal perturbation search	L2	Strong	Low (digital only)
Physical patches	Printable adversarial patterns	Unconstrained	Variable	High (physical world)
EOT	Expectation over Transformation	Variable	Strong	High (robust to conditions)

Defensive Use Cases

🔍 Model Robustness Auditing

Test surveillance model brittleness before operational deployment. Identify failure modes, compute adversarial accuracy, and establish minimum confidence thresholds.

🔄 Multi-Sensor Fallback Design

Design fallback paths when single-model confidence drops. If face recognition is unreliable, the system should gracefully degrade to gait or device-based identification rather than failing silently.

⚙️ Threshold Calibration

Tune confidence thresholds and human-review gates for high-impact alerts. Adversarial testing reveals where automatic thresholds may produce dangerous false positives or false negatives.

📋 Governance Documentation

Document model failure modes for governance, external audit, and regulatory compliance. Adversarial evaluation reports demonstrate due diligence in model deployment decisions.

Physical-World Constraints

Digital adversarial examples do not automatically transfer to the physical world. Understanding these constraints is critical for realistic threat assessment.

⚠ Printing artifacts: Printer resolution, color gamut limitations, and paper reflectivity degrade subtle perturbations to noise.
⚠ Environmental variation: Lighting changes, weather, distance, and sensor compression all affect perturbation effectiveness. EOT-trained patches are more robust but still degraded.
⚠ Model transferability: Adversarial examples crafted for one model often fail against different architectures. Black-box transferability is limited without ensemble techniques.
⚠ Multi-camera coverage: Single-angle patches may be neutralized by overlapping cameras at different angles. Multi-view systems are significantly harder to defeat.
⚠ Adversarial defenses: Some deployed systems include adversarial detection preprocessing (spatial smoothing, JPEG compression, feature squeezing) that specifically targets adversarial perturbations.

FGSM Attack Implementation

The simplest adversarial attack — demonstrates the core concept of gradient-based perturbation.

fgsm_attack.py

python

#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""FGSM (Fast Gradient Sign Method) adversarial example generation.
Demonstrates how small perturbations can fool classifiers."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np

# Load pre-trained model
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≥0.13
model = models.resnet50(pretrained=True)
model.eval()

# Standard ImageNet preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    # ImageNet dataset mean and std — required for pre-trained model compatibility
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def fgsm_attack(image_tensor, epsilon, data_grad):
    """Generate adversarial example using FGSM."""
    sign_grad = data_grad.sign()
    perturbed = image_tensor + epsilon * sign_grad
    return torch.clamp(perturbed, 0, 1)

# Load and preprocess image
img = Image.open("test_face.jpg")
input_tensor = preprocess(img).unsqueeze(0)
input_tensor.requires_grad = True

# Forward pass
output = model(input_tensor)
original_class = output.argmax(dim=1).item()
original_conf = F.softmax(output, dim=1).max().item()

# Generate adversarial gradient
loss = F.cross_entropy(output, torch.tensor([original_class]))
model.zero_grad()
loss.backward()

# Test different epsilon values
# Epsilon controls perturbation magnitude: 0.001–0.01 = imperceptible; 0.02–0.05 = faint artifacts visible on zoom
for eps in [0.001, 0.005, 0.01, 0.02, 0.05]:
    perturbed = fgsm_attack(input_tensor, eps, input_tensor.grad.data)
    adv_output = model(perturbed)
    adv_class = adv_output.argmax(dim=1).item()
    adv_conf = F.softmax(adv_output, dim=1).max().item()
    
    # Measure perturbation visibility
    l2_norm = torch.norm(perturbed - input_tensor).item()
    linf_norm = torch.max(torch.abs(perturbed - input_tensor)).item()
    
    print(f"eps={eps:.3f} | class: {original_class}→{adv_class} | "
          f"conf: {original_conf:.3f}→{adv_conf:.3f} | "
          f"L2: {l2_norm:.3f} | L∞: {linf_norm:.3f}")

# Expected output:
# === FGSM Attack Results ===
# Epsilon | Original Class    | Adversarial Class | Confidence | Success
# ──────────────────────────────────────────────────────────────────────
# 0.001   | golden_retriever  | golden_retriever  |     89.2%  | NO
# 0.005   | golden_retriever  | golden_retriever  |     71.4%  | NO
# 0.01    | golden_retriever  | Labrador          |     54.8%  | YES ← misclassified
# 0.02    | golden_retriever  | tennis_ball       |     67.3%  | YES
# 0.05    | golden_retriever  | shower_curtain    |     82.1%  | YES

#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""FGSM (Fast Gradient Sign Method) adversarial example generation.
Demonstrates how small perturbations can fool classifiers."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np

# Load pre-trained model
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≥0.13
model = models.resnet50(pretrained=True)
model.eval()

# Standard ImageNet preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    # ImageNet dataset mean and std — required for pre-trained model compatibility
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def fgsm_attack(image_tensor, epsilon, data_grad):
    """Generate adversarial example using FGSM."""
    sign_grad = data_grad.sign()
    perturbed = image_tensor + epsilon * sign_grad
    return torch.clamp(perturbed, 0, 1)

# Load and preprocess image
img = Image.open("test_face.jpg")
input_tensor = preprocess(img).unsqueeze(0)
input_tensor.requires_grad = True

# Forward pass
output = model(input_tensor)
original_class = output.argmax(dim=1).item()
original_conf = F.softmax(output, dim=1).max().item()

# Generate adversarial gradient
loss = F.cross_entropy(output, torch.tensor([original_class]))
model.zero_grad()
loss.backward()

# Test different epsilon values
# Epsilon controls perturbation magnitude: 0.001–0.01 = imperceptible; 0.02–0.05 = faint artifacts visible on zoom
for eps in [0.001, 0.005, 0.01, 0.02, 0.05]:
    perturbed = fgsm_attack(input_tensor, eps, input_tensor.grad.data)
    adv_output = model(perturbed)
    adv_class = adv_output.argmax(dim=1).item()
    adv_conf = F.softmax(adv_output, dim=1).max().item()
    
    # Measure perturbation visibility
    l2_norm = torch.norm(perturbed - input_tensor).item()
    linf_norm = torch.max(torch.abs(perturbed - input_tensor)).item()
    
    print(f"eps={eps:.3f} | class: {original_class}→{adv_class} | "
          f"conf: {original_conf:.3f}→{adv_conf:.3f} | "
          f"L2: {l2_norm:.3f} | L∞: {linf_norm:.3f}")

# Expected output:
# === FGSM Attack Results ===
# Epsilon | Original Class    | Adversarial Class | Confidence | Success
# ──────────────────────────────────────────────────────────────────────
# 0.001   | golden_retriever  | golden_retriever  |     89.2%  | NO
# 0.005   | golden_retriever  | golden_retriever  |     71.4%  | NO
# 0.01    | golden_retriever  | Labrador          |     54.8%  | YES ← misclassified
# 0.02    | golden_retriever  | tennis_ball       |     67.3%  | YES
# 0.05    | golden_retriever  | shower_curtain    |     82.1%  | YES

PGD Attack (Iterative)

Stronger iterative attack used as the gold standard for adversarial robustness evaluation.

pgd_attack.py

python

#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""PGD (Projected Gradient Descent) — stronger iterative attack.
More robust than FGSM, commonly used for adversarial training evaluation."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image

# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≥0.13
model = models.resnet50(pretrained=True)
model.eval()

def pgd_attack(model, image, label,
               eps=0.03,    # Maximum total perturbation budget (L∞ norm) — ~8/255 pixel intensity
               alpha=0.005, # Step size per iteration — smaller = finer-grained but slower convergence
               iters=40):   # Number of projected gradient steps — more iterations = stronger attack but slower
    """Projected Gradient Descent adversarial attack."""
    perturbed = image.clone().detach().requires_grad_(True)
    
    for i in range(iters):
        output = model(perturbed)
        loss = F.cross_entropy(output, label)
        loss.backward()
        
        # Gradient step
        adv = perturbed + alpha * perturbed.grad.sign()
        
        # Project back to epsilon ball
        perturbation = torch.clamp(adv - image, min=-eps, max=eps)
        perturbed = torch.clamp(image + perturbation, min=0, max=1).detach().requires_grad_(True)
    
    return perturbed

# Usage
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
input_tensor = preprocess(Image.open("test.jpg")).unsqueeze(0)
original_output = model(input_tensor)
original_class = original_output.argmax(dim=1).item()
label = torch.tensor([original_class])
adversarial = pgd_attack(model, input_tensor, label, eps=0.03, iters=40)

# Check result
adv_output = model(adversarial)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = torch.nn.functional.softmax(adv_output, dim=1).max().item()
print(f"Original: {original_class} | Adversarial: {adv_class} ({adv_conf:.1%})")
print(f"L∞ perturbation: {torch.max(torch.abs(adversarial - input_tensor)).item():.4f}")
print(f"L2 perturbation: {torch.norm(adversarial - input_tensor).item():.3f}")

# Expected output:
# PGD Attack (eps=0.030, alpha=0.005, iters=40)
# Original prediction: golden_retriever (91.3%)
# Adversarial prediction: paper_towel (73.8%) ← misclassified
# L∞ perturbation: 0.0300 | L2 perturbation: 2.847

#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""PGD (Projected Gradient Descent) — stronger iterative attack.
More robust than FGSM, commonly used for adversarial training evaluation."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image

# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ≥0.13
model = models.resnet50(pretrained=True)
model.eval()

def pgd_attack(model, image, label,
               eps=0.03,    # Maximum total perturbation budget (L∞ norm) — ~8/255 pixel intensity
               alpha=0.005, # Step size per iteration — smaller = finer-grained but slower convergence
               iters=40):   # Number of projected gradient steps — more iterations = stronger attack but slower
    """Projected Gradient Descent adversarial attack."""
    perturbed = image.clone().detach().requires_grad_(True)
    
    for i in range(iters):
        output = model(perturbed)
        loss = F.cross_entropy(output, label)
        loss.backward()
        
        # Gradient step
        adv = perturbed + alpha * perturbed.grad.sign()
        
        # Project back to epsilon ball
        perturbation = torch.clamp(adv - image, min=-eps, max=eps)
        perturbed = torch.clamp(image + perturbation, min=0, max=1).detach().requires_grad_(True)
    
    return perturbed

# Usage
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
input_tensor = preprocess(Image.open("test.jpg")).unsqueeze(0)
original_output = model(input_tensor)
original_class = original_output.argmax(dim=1).item()
label = torch.tensor([original_class])
adversarial = pgd_attack(model, input_tensor, label, eps=0.03, iters=40)

# Check result
adv_output = model(adversarial)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = torch.nn.functional.softmax(adv_output, dim=1).max().item()
print(f"Original: {original_class} | Adversarial: {adv_class} ({adv_conf:.1%})")
print(f"L∞ perturbation: {torch.max(torch.abs(adversarial - input_tensor)).item():.4f}")
print(f"L2 perturbation: {torch.norm(adversarial - input_tensor).item():.3f}")

# Expected output:
# PGD Attack (eps=0.030, alpha=0.005, iters=40)
# Original prediction: golden_retriever (91.3%)
# Adversarial prediction: paper_towel (73.8%) ← misclassified
# L∞ perturbation: 0.0300 | L2 perturbation: 2.847

IBM ART Comprehensive Framework

The Adversarial Robustness Toolbox provides a unified interface for multiple attack methods and defenses.

art_evaluation.py

python

#!/usr/bin/env python3
# Prerequisites: pip install adversarial-robustness-toolbox torch torchvision numpy
"""IBM Adversarial Robustness Toolbox (ART) — comprehensive attack/defense framework.
Supports multiple attack methods, defenses, and detection techniques."""
from art.attacks.evasion import ProjectedGradientDescent, FastGradientMethod
from art.attacks.evasion import CarliniL2Method, DeepFool
from art.estimators.classification import PyTorchClassifier
from art.defences.preprocessor import SpatialSmoothing, FeatureSqueezing
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
import numpy as np

# Load pre-trained model and wrap for ART
model = models.resnet50(weights="IMAGENET1K_V2")
model.eval()

criterion = nn.CrossEntropyLoss()
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=None,
    input_shape=(3, 224, 224),
    nb_classes=1000,
    clip_values=(0.0, 1.0),  # Pixel value range after normalization (0.0–1.0 for float tensors)
)

# Load a small test batch (ImageNet-style or custom face images)
# ⚠ test_images/ must use ImageFolder format: test_images/class_name/image.jpg (e.g., test_images/dog/photo1.jpg)
# Replace 'test_images/' with your dataset directory
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
test_dataset = datasets.ImageFolder("test_images/", transform=preprocess)
loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
x_batch, y_batch = next(iter(loader))
x_test = x_batch.numpy()   # shape: (10, 3, 224, 224)
y_test = y_batch.numpy()

# Attack comparison
attacks = {
    "FGSM": FastGradientMethod(estimator=classifier, eps=0.03),
    "PGD-40": ProjectedGradientDescent(
        estimator=classifier, eps=0.03, eps_step=0.005, max_iter=40
    ),
    "C&W-L2": CarliniL2Method(
        classifier=classifier, confidence=0.5, max_iter=100  # C&W confidence parameter — higher = more confident adversarial, but larger perturbation needed
    ),
    "DeepFool": DeepFool(classifier=classifier, max_iter=50),
}

print(f"{'':<12} | {'Success':>8} | {'Avg L2':>10}")
print("-" * 40)
for name, attack in attacks.items():
    x_adv = attack.generate(x=x_test)
    
    # Evaluate
    preds_clean = np.argmax(classifier.predict(x_test), axis=1)
    preds_adv = np.argmax(classifier.predict(x_adv), axis=1)
    
    success_rate = np.mean(preds_clean != preds_adv)
    l2_dist = np.mean(np.sqrt(np.sum((x_adv - x_test)**2, axis=(1,2,3))))
    
    print(f"{name:<12} | {success_rate:>7.1%} | {l2_dist:>10.4f}")

# Defense evaluation
print("\n--- Defenses ---")
defenses = {
    "Spatial Smoothing 3x3": SpatialSmoothing(window_size=3),
    "Feature Squeezing 8-bit": FeatureSqueezing(bit_depth=8, clip_values=(0,1)),
}

# Use the last attack's adversarial examples for defense testing
for def_name, defense in defenses.items():
    x_def, _ = defense(x_adv)
    preds_def = np.argmax(classifier.predict(x_def), axis=1)
    recovery_rate = np.mean(preds_def == preds_clean)
    print(f"{def_name:<30} | Recovery: {recovery_rate:.1%}")

# Expected output:
# === Multi-Attack Comparison ===
# Attack    | Success Rate | Avg L2 Dist | Avg Confidence Drop
# ─────────────────────────────────────────────────────────────
# FGSM      |       72.0%  |       3.214 |             -43.2%
# PGD       |       88.0%  |       2.891 |             -56.7%
# C&W       |       96.0%  |       1.247 |             -71.8%
# DeepFool  |       84.0%  |       1.892 |             -48.3%

#!/usr/bin/env python3
# Prerequisites: pip install adversarial-robustness-toolbox torch torchvision numpy
"""IBM Adversarial Robustness Toolbox (ART) — comprehensive attack/defense framework.
Supports multiple attack methods, defenses, and detection techniques."""
from art.attacks.evasion import ProjectedGradientDescent, FastGradientMethod
from art.attacks.evasion import CarliniL2Method, DeepFool
from art.estimators.classification import PyTorchClassifier
from art.defences.preprocessor import SpatialSmoothing, FeatureSqueezing
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
import numpy as np

# Load pre-trained model and wrap for ART
model = models.resnet50(weights="IMAGENET1K_V2")
model.eval()

criterion = nn.CrossEntropyLoss()
classifier = PyTorchClassifier(
    model=model,
    loss=criterion,
    optimizer=None,
    input_shape=(3, 224, 224),
    nb_classes=1000,
    clip_values=(0.0, 1.0),  # Pixel value range after normalization (0.0–1.0 for float tensors)
)

# Load a small test batch (ImageNet-style or custom face images)
# ⚠ test_images/ must use ImageFolder format: test_images/class_name/image.jpg (e.g., test_images/dog/photo1.jpg)
# Replace 'test_images/' with your dataset directory
preprocess = transforms.Compose([
    transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
test_dataset = datasets.ImageFolder("test_images/", transform=preprocess)
loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
x_batch, y_batch = next(iter(loader))
x_test = x_batch.numpy()   # shape: (10, 3, 224, 224)
y_test = y_batch.numpy()

# Attack comparison
attacks = {
    "FGSM": FastGradientMethod(estimator=classifier, eps=0.03),
    "PGD-40": ProjectedGradientDescent(
        estimator=classifier, eps=0.03, eps_step=0.005, max_iter=40
    ),
    "C&W-L2": CarliniL2Method(
        classifier=classifier, confidence=0.5, max_iter=100  # C&W confidence parameter — higher = more confident adversarial, but larger perturbation needed
    ),
    "DeepFool": DeepFool(classifier=classifier, max_iter=50),
}

print(f"{'':<12} | {'Success':>8} | {'Avg L2':>10}")
print("-" * 40)
for name, attack in attacks.items():
    x_adv = attack.generate(x=x_test)
    
    # Evaluate
    preds_clean = np.argmax(classifier.predict(x_test), axis=1)
    preds_adv = np.argmax(classifier.predict(x_adv), axis=1)
    
    success_rate = np.mean(preds_clean != preds_adv)
    l2_dist = np.mean(np.sqrt(np.sum((x_adv - x_test)**2, axis=(1,2,3))))
    
    print(f"{name:<12} | {success_rate:>7.1%} | {l2_dist:>10.4f}")

# Defense evaluation
print("\n--- Defenses ---")
defenses = {
    "Spatial Smoothing 3x3": SpatialSmoothing(window_size=3),
    "Feature Squeezing 8-bit": FeatureSqueezing(bit_depth=8, clip_values=(0,1)),
}

# Use the last attack's adversarial examples for defense testing
for def_name, defense in defenses.items():
    x_def, _ = defense(x_adv)
    preds_def = np.argmax(classifier.predict(x_def), axis=1)
    recovery_rate = np.mean(preds_def == preds_clean)
    print(f"{def_name:<30} | Recovery: {recovery_rate:.1%}")

# Expected output:
# === Multi-Attack Comparison ===
# Attack    | Success Rate | Avg L2 Dist | Avg Confidence Drop
# ─────────────────────────────────────────────────────────────
# FGSM      |       72.0%  |       3.214 |             -43.2%
# PGD       |       88.0%  |       2.891 |             -56.7%
# C&W       |       96.0%  |       1.247 |             -71.8%
# DeepFool  |       84.0%  |       1.892 |             -48.3%

Physical Patch Evaluation

Framework for testing adversarial patches under realistic physical-world conditions.

physical_patch_eval.py

python

#!/usr/bin/env python3
"""Physical-world adversarial patch evaluation framework.
Tests patch effectiveness across angles, distances, and lighting conditions.
Requires: pip install ultralytics opencv-python numpy"""
import cv2
import numpy as np
from ultralytics import YOLO

# Load YOLOv8 model (downloads weights on first run)
detector = YOLO("yolov8n.pt")

def evaluate_detector(frame, model, conf_threshold=0.25):  # YOLO detection threshold (lower = more sensitive detector)
    """Run YOLOv8 detector and return max person-class confidence."""
    results = model(frame, verbose=False, conf=conf_threshold)
    
    max_conf = 0.0
    for r in results:
        for box in r.boxes:
            cls_id = int(box.cls[0])
            conf   = float(box.conf[0])
            # Class 0 = person in COCO
            if cls_id == 0 and conf > max_conf:
                max_conf = conf
    
    return max_conf

def physical_patch_evaluation(video_dir, model):
    """Evaluate adversarial patch across physical conditions."""
    conditions = {
        "angles": [0, 15, 30, 45, 60, 75, 90],
        "distances_m": [2, 5, 10, 15, 20],
        "lighting": ["bright", "overcast", "indoor", "low_light", "backlit"],
    }
    
    results = []
    for angle in conditions["angles"]:
        for dist in conditions["distances_m"]:
            # Load frame for this condition
            # Expected file naming: angle{A}_dist{D}.jpg (e.g., angle0_dist5.jpg, angle30_dist10.jpg)
            frame_path = f"{video_dir}/angle{angle}_dist{dist}.jpg"
            frame = cv2.imread(frame_path)
            if frame is None:
                continue
            
            conf = evaluate_detector(frame, model)
            results.append({
                "angle": angle, "distance": dist,
                "confidence": conf,
                "evaded": conf < 0.3  # if re-detected confidence stays below 0.3, the patch successfully degraded detection
            })
    
    # Summary report
    print(f"{'Angle':>6} {'Distance':>10} {'Confidence':>12} {'Evaded':>8}")
    print("-" * 40)
    for r in results:
        status = "✓ YES" if r["evaded"] else "✗ NO"
        print(f"{r['angle']:>6}° {r['distance']:>9}m {r['confidence']:>12.3f} {status:>8}")
    
    evasion_rate = sum(1 for r in results if r["evaded"]) / len(results) if results else 0
    print(f"\nOverall evasion rate: {evasion_rate:.1%}")

#!/usr/bin/env python3
"""Physical-world adversarial patch evaluation framework.
Tests patch effectiveness across angles, distances, and lighting conditions.
Requires: pip install ultralytics opencv-python numpy"""
import cv2
import numpy as np
from ultralytics import YOLO

# Load YOLOv8 model (downloads weights on first run)
detector = YOLO("yolov8n.pt")

def evaluate_detector(frame, model, conf_threshold=0.25):  # YOLO detection threshold (lower = more sensitive detector)
    """Run YOLOv8 detector and return max person-class confidence."""
    results = model(frame, verbose=False, conf=conf_threshold)
    
    max_conf = 0.0
    for r in results:
        for box in r.boxes:
            cls_id = int(box.cls[0])
            conf   = float(box.conf[0])
            # Class 0 = person in COCO
            if cls_id == 0 and conf > max_conf:
                max_conf = conf
    
    return max_conf

def physical_patch_evaluation(video_dir, model):
    """Evaluate adversarial patch across physical conditions."""
    conditions = {
        "angles": [0, 15, 30, 45, 60, 75, 90],
        "distances_m": [2, 5, 10, 15, 20],
        "lighting": ["bright", "overcast", "indoor", "low_light", "backlit"],
    }
    
    results = []
    for angle in conditions["angles"]:
        for dist in conditions["distances_m"]:
            # Load frame for this condition
            # Expected file naming: angle{A}_dist{D}.jpg (e.g., angle0_dist5.jpg, angle30_dist10.jpg)
            frame_path = f"{video_dir}/angle{angle}_dist{dist}.jpg"
            frame = cv2.imread(frame_path)
            if frame is None:
                continue
            
            conf = evaluate_detector(frame, model)
            results.append({
                "angle": angle, "distance": dist,
                "confidence": conf,
                "evaded": conf < 0.3  # if re-detected confidence stays below 0.3, the patch successfully degraded detection
            })
    
    # Summary report
    print(f"{'Angle':>6} {'Distance':>10} {'Confidence':>12} {'Evaded':>8}")
    print("-" * 40)
    for r in results:
        status = "✓ YES" if r["evaded"] else "✗ NO"
        print(f"{r['angle']:>6}° {r['distance']:>9}m {r['confidence']:>12.3f} {status:>8}")
    
    evasion_rate = sum(1 for r in results if r["evaded"]) / len(results) if results else 0
    print(f"\nOverall evasion rate: {evasion_rate:.1%}")

Responsible Research

Run adversarial experiments only in controlled environments and within written authorization. Never test against systems you do not own or operate with explicit consent. Document all experiments and findings for audit trails.

Related: Face Recognition & Physical Defense

Apply adversarial ML concepts to specific biometric systems: see Facial Recognition for FR-specific attacks and Physical Countermeasures for real-world adversarial patch deployment and testing.

Key Takeaways for Defenders

No model is robust to all attacks: adversarial training improves but doesn't eliminate vulnerability
Physical attacks degrade: real-world conditions significantly reduce digital attack effectiveness
Multi-model ensembles resist transfer: use diverse architectures to reduce single-point failure
Human review remains essential: automate detection but keep humans in the loop for high-stakes decisions
Test continuously: new attack methods emerge regularly — robustness evaluation must be ongoing

Adversarial ML Labs

Hands-on exercises to understand adversarial techniques and defenses.

FGSM/PGD Attack Lab Custom Lab hard

Set up PyTorch with pre-trained ResNet-50Implement FGSM attack with varying epsilon valuesImplement PGD attack and compare success ratesMeasure perturbation visibility (L2, L∞ norms)Test transferability between different model architectures

ART Defense Evaluation Custom Lab hard

Install IBM Adversarial Robustness ToolboxGenerate adversarial examples with FGSM, PGD, C&W, DeepFoolApply preprocessing defenses (spatial smoothing, feature squeezing)Measure defense recovery rates against each attackDocument which defenses are most effective for each attack type

Need help setting up? Check our Lab Setup Guide →

ALPR Evasion Physical Countermeasures

Adversarial Machine Learning

Adversarial Attack Taxonomy

White-box → Gradient-Based

Black-box → Transfer / Query

Physical → EOT + Patch

Digital-Only → Lp Bounded

Defensive Use Cases

🔍 Model Robustness Auditing

🔄 Multi-Sensor Fallback Design

⚙️ Threshold Calibration

📋 Governance Documentation

Physical-World Constraints

FGSM Attack Implementation

PGD Attack (Iterative)

IBM ART Comprehensive Framework

Physical Patch Evaluation

Key Takeaways for Defenders

Adversarial ML Labs

Related Topics

Facial Recognition

Physical Countermeasures

Offensive AI

Surveillance Infrastructure