Adversarial Machine Learning
Adversarial ML techniques reveal blind spots in vision and biometric models, helping defenders understand model fragility in real environments. This knowledge is essential for assessing whether surveillance systems can be pushed below reliable detection thresholds through lawful physical-world modifications.
Why This Matters for Counter-Surveillance
Adversarial Attack Taxonomy
Choosing the right adversarial technique depends on two key decisions: how much access you have to the target model, and whether the attack needs to survive physical-world conditions (printing, lighting, camera angles). The decision tree below maps these choices to specific attack families.
Adversarial Attack Decision Tree
White-box ā Gradient-Based
Full access to model weights and architecture. Compute exact gradients to craft minimal perturbations. Most effective but requires model access ā common in research and internal audits.
Black-box ā Transfer / Query
No model access. Either craft adversarial examples on a surrogate model and hope they transfer, or probe the target API with repeated queries to estimate gradients. Realistic for attacking deployed systems.
Physical ā EOT + Patch
Attack must survive printing, variable lighting, camera angles, and distance. Uses Expectation over Transformation (EOT) to optimize across conditions. Produces wearable patches, adversarial clothing, or printed patterns.
Digital-Only ā Lp Bounded
Perturbation applied directly to pixel values with a mathematical bound (Lā or L2) ensuring changes are imperceptible. Useful for testing model robustness but won't work against real-world cameras.
| Attack Class | Method | Norm Bound | Strength | Physical Viability |
|---|---|---|---|---|
| FGSM | Single-step gradient sign | Lā | Moderate | Low (digital only) |
| PGD | Iterative gradient descent | Lā | Strong | Low-Medium |
| C&W | Optimization-based (Carlini-Wagner) | L2 | Very Strong | Low (digital only) |
| DeepFool | Minimal perturbation search | L2 | Strong | Low (digital only) |
| Physical patches | Printable adversarial patterns | Unconstrained | Variable | High (physical world) |
| EOT | Expectation over Transformation | Variable | Strong | High (robust to conditions) |
Defensive Use Cases
š Model Robustness Auditing
Test surveillance model brittleness before operational deployment. Identify failure modes, compute adversarial accuracy, and establish minimum confidence thresholds.
š Multi-Sensor Fallback Design
Design fallback paths when single-model confidence drops. If face recognition is unreliable, the system should gracefully degrade to gait or device-based identification rather than failing silently.
āļø Threshold Calibration
Tune confidence thresholds and human-review gates for high-impact alerts. Adversarial testing reveals where automatic thresholds may produce dangerous false positives or false negatives.
š Governance Documentation
Document model failure modes for governance, external audit, and regulatory compliance. Adversarial evaluation reports demonstrate due diligence in model deployment decisions.
Physical-World Constraints
Digital adversarial examples do not automatically transfer to the physical world. Understanding these constraints is critical for realistic threat assessment.
- ā Printing artifacts: Printer resolution, color gamut limitations, and paper reflectivity degrade subtle perturbations to noise.
- ā Environmental variation: Lighting changes, weather, distance, and sensor compression all affect perturbation effectiveness. EOT-trained patches are more robust but still degraded.
- ā Model transferability: Adversarial examples crafted for one model often fail against different architectures. Black-box transferability is limited without ensemble techniques.
- ā Multi-camera coverage: Single-angle patches may be neutralized by overlapping cameras at different angles. Multi-view systems are significantly harder to defeat.
- ā Adversarial defenses: Some deployed systems include adversarial detection preprocessing (spatial smoothing, JPEG compression, feature squeezing) that specifically targets adversarial perturbations.
FGSM Attack Implementation
The simplest adversarial attack ā demonstrates the core concept of gradient-based perturbation.
#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""FGSM (Fast Gradient Sign Method) adversarial example generation.
Demonstrates how small perturbations can fool classifiers."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np
# Load pre-trained model
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ā„0.13
model = models.resnet50(pretrained=True)
model.eval()
# Standard ImageNet preprocessing
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
# ImageNet dataset mean and std ā required for pre-trained model compatibility
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def fgsm_attack(image_tensor, epsilon, data_grad):
"""Generate adversarial example using FGSM."""
sign_grad = data_grad.sign()
perturbed = image_tensor + epsilon * sign_grad
return torch.clamp(perturbed, 0, 1)
# Load and preprocess image
img = Image.open("test_face.jpg")
input_tensor = preprocess(img).unsqueeze(0)
input_tensor.requires_grad = True
# Forward pass
output = model(input_tensor)
original_class = output.argmax(dim=1).item()
original_conf = F.softmax(output, dim=1).max().item()
# Generate adversarial gradient
loss = F.cross_entropy(output, torch.tensor([original_class]))
model.zero_grad()
loss.backward()
# Test different epsilon values
# Epsilon controls perturbation magnitude: 0.001ā0.01 = imperceptible; 0.02ā0.05 = faint artifacts visible on zoom
for eps in [0.001, 0.005, 0.01, 0.02, 0.05]:
perturbed = fgsm_attack(input_tensor, eps, input_tensor.grad.data)
adv_output = model(perturbed)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = F.softmax(adv_output, dim=1).max().item()
# Measure perturbation visibility
l2_norm = torch.norm(perturbed - input_tensor).item()
linf_norm = torch.max(torch.abs(perturbed - input_tensor)).item()
print(f"eps={eps:.3f} | class: {original_class}ā{adv_class} | "
f"conf: {original_conf:.3f}ā{adv_conf:.3f} | "
f"L2: {l2_norm:.3f} | Lā: {linf_norm:.3f}")
# Expected output:
# === FGSM Attack Results ===
# Epsilon | Original Class | Adversarial Class | Confidence | Success
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
# 0.001 | golden_retriever | golden_retriever | 89.2% | NO
# 0.005 | golden_retriever | golden_retriever | 71.4% | NO
# 0.01 | golden_retriever | Labrador | 54.8% | YES ā misclassified
# 0.02 | golden_retriever | tennis_ball | 67.3% | YES
# 0.05 | golden_retriever | shower_curtain | 82.1% | YES#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""FGSM (Fast Gradient Sign Method) adversarial example generation.
Demonstrates how small perturbations can fool classifiers."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
import numpy as np
# Load pre-trained model
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ā„0.13
model = models.resnet50(pretrained=True)
model.eval()
# Standard ImageNet preprocessing
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
# ImageNet dataset mean and std ā required for pre-trained model compatibility
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def fgsm_attack(image_tensor, epsilon, data_grad):
"""Generate adversarial example using FGSM."""
sign_grad = data_grad.sign()
perturbed = image_tensor + epsilon * sign_grad
return torch.clamp(perturbed, 0, 1)
# Load and preprocess image
img = Image.open("test_face.jpg")
input_tensor = preprocess(img).unsqueeze(0)
input_tensor.requires_grad = True
# Forward pass
output = model(input_tensor)
original_class = output.argmax(dim=1).item()
original_conf = F.softmax(output, dim=1).max().item()
# Generate adversarial gradient
loss = F.cross_entropy(output, torch.tensor([original_class]))
model.zero_grad()
loss.backward()
# Test different epsilon values
# Epsilon controls perturbation magnitude: 0.001ā0.01 = imperceptible; 0.02ā0.05 = faint artifacts visible on zoom
for eps in [0.001, 0.005, 0.01, 0.02, 0.05]:
perturbed = fgsm_attack(input_tensor, eps, input_tensor.grad.data)
adv_output = model(perturbed)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = F.softmax(adv_output, dim=1).max().item()
# Measure perturbation visibility
l2_norm = torch.norm(perturbed - input_tensor).item()
linf_norm = torch.max(torch.abs(perturbed - input_tensor)).item()
print(f"eps={eps:.3f} | class: {original_class}ā{adv_class} | "
f"conf: {original_conf:.3f}ā{adv_conf:.3f} | "
f"L2: {l2_norm:.3f} | Lā: {linf_norm:.3f}")
# Expected output:
# === FGSM Attack Results ===
# Epsilon | Original Class | Adversarial Class | Confidence | Success
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
# 0.001 | golden_retriever | golden_retriever | 89.2% | NO
# 0.005 | golden_retriever | golden_retriever | 71.4% | NO
# 0.01 | golden_retriever | Labrador | 54.8% | YES ā misclassified
# 0.02 | golden_retriever | tennis_ball | 67.3% | YES
# 0.05 | golden_retriever | shower_curtain | 82.1% | YESPGD Attack (Iterative)
Stronger iterative attack used as the gold standard for adversarial robustness evaluation.
#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""PGD (Projected Gradient Descent) ā stronger iterative attack.
More robust than FGSM, commonly used for adversarial training evaluation."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ā„0.13
model = models.resnet50(pretrained=True)
model.eval()
def pgd_attack(model, image, label,
eps=0.03, # Maximum total perturbation budget (Lā norm) ā ~8/255 pixel intensity
alpha=0.005, # Step size per iteration ā smaller = finer-grained but slower convergence
iters=40): # Number of projected gradient steps ā more iterations = stronger attack but slower
"""Projected Gradient Descent adversarial attack."""
perturbed = image.clone().detach().requires_grad_(True)
for i in range(iters):
output = model(perturbed)
loss = F.cross_entropy(output, label)
loss.backward()
# Gradient step
adv = perturbed + alpha * perturbed.grad.sign()
# Project back to epsilon ball
perturbation = torch.clamp(adv - image, min=-eps, max=eps)
perturbed = torch.clamp(image + perturbation, min=0, max=1).detach().requires_grad_(True)
return perturbed
# Usage
preprocess = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
input_tensor = preprocess(Image.open("test.jpg")).unsqueeze(0)
original_output = model(input_tensor)
original_class = original_output.argmax(dim=1).item()
label = torch.tensor([original_class])
adversarial = pgd_attack(model, input_tensor, label, eps=0.03, iters=40)
# Check result
adv_output = model(adversarial)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = torch.nn.functional.softmax(adv_output, dim=1).max().item()
print(f"Original: {original_class} | Adversarial: {adv_class} ({adv_conf:.1%})")
print(f"Lā perturbation: {torch.max(torch.abs(adversarial - input_tensor)).item():.4f}")
print(f"L2 perturbation: {torch.norm(adversarial - input_tensor).item():.3f}")
# Expected output:
# PGD Attack (eps=0.030, alpha=0.005, iters=40)
# Original prediction: golden_retriever (91.3%)
# Adversarial prediction: paper_towel (73.8%) ā misclassified
# Lā perturbation: 0.0300 | L2 perturbation: 2.847#!/usr/bin/env python3
# Prerequisites: pip install torch torchvision Pillow numpy
"""PGD (Projected Gradient Descent) ā stronger iterative attack.
More robust than FGSM, commonly used for adversarial training evaluation."""
import torch
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image
# Note: pretrained=True is deprecated; use weights=ResNet50_Weights.IMAGENET1K_V1 in torchvision ā„0.13
model = models.resnet50(pretrained=True)
model.eval()
def pgd_attack(model, image, label,
eps=0.03, # Maximum total perturbation budget (Lā norm) ā ~8/255 pixel intensity
alpha=0.005, # Step size per iteration ā smaller = finer-grained but slower convergence
iters=40): # Number of projected gradient steps ā more iterations = stronger attack but slower
"""Projected Gradient Descent adversarial attack."""
perturbed = image.clone().detach().requires_grad_(True)
for i in range(iters):
output = model(perturbed)
loss = F.cross_entropy(output, label)
loss.backward()
# Gradient step
adv = perturbed + alpha * perturbed.grad.sign()
# Project back to epsilon ball
perturbation = torch.clamp(adv - image, min=-eps, max=eps)
perturbed = torch.clamp(image + perturbation, min=0, max=1).detach().requires_grad_(True)
return perturbed
# Usage
preprocess = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
input_tensor = preprocess(Image.open("test.jpg")).unsqueeze(0)
original_output = model(input_tensor)
original_class = original_output.argmax(dim=1).item()
label = torch.tensor([original_class])
adversarial = pgd_attack(model, input_tensor, label, eps=0.03, iters=40)
# Check result
adv_output = model(adversarial)
adv_class = adv_output.argmax(dim=1).item()
adv_conf = torch.nn.functional.softmax(adv_output, dim=1).max().item()
print(f"Original: {original_class} | Adversarial: {adv_class} ({adv_conf:.1%})")
print(f"Lā perturbation: {torch.max(torch.abs(adversarial - input_tensor)).item():.4f}")
print(f"L2 perturbation: {torch.norm(adversarial - input_tensor).item():.3f}")
# Expected output:
# PGD Attack (eps=0.030, alpha=0.005, iters=40)
# Original prediction: golden_retriever (91.3%)
# Adversarial prediction: paper_towel (73.8%) ā misclassified
# Lā perturbation: 0.0300 | L2 perturbation: 2.847IBM ART Comprehensive Framework
The Adversarial Robustness Toolbox provides a unified interface for multiple attack methods and defenses.
#!/usr/bin/env python3
# Prerequisites: pip install adversarial-robustness-toolbox torch torchvision numpy
"""IBM Adversarial Robustness Toolbox (ART) ā comprehensive attack/defense framework.
Supports multiple attack methods, defenses, and detection techniques."""
from art.attacks.evasion import ProjectedGradientDescent, FastGradientMethod
from art.attacks.evasion import CarliniL2Method, DeepFool
from art.estimators.classification import PyTorchClassifier
from art.defences.preprocessor import SpatialSmoothing, FeatureSqueezing
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
import numpy as np
# Load pre-trained model and wrap for ART
model = models.resnet50(weights="IMAGENET1K_V2")
model.eval()
criterion = nn.CrossEntropyLoss()
classifier = PyTorchClassifier(
model=model,
loss=criterion,
optimizer=None,
input_shape=(3, 224, 224),
nb_classes=1000,
clip_values=(0.0, 1.0), # Pixel value range after normalization (0.0ā1.0 for float tensors)
)
# Load a small test batch (ImageNet-style or custom face images)
# ā test_images/ must use ImageFolder format: test_images/class_name/image.jpg (e.g., test_images/dog/photo1.jpg)
# Replace 'test_images/' with your dataset directory
preprocess = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
test_dataset = datasets.ImageFolder("test_images/", transform=preprocess)
loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
x_batch, y_batch = next(iter(loader))
x_test = x_batch.numpy() # shape: (10, 3, 224, 224)
y_test = y_batch.numpy()
# Attack comparison
attacks = {
"FGSM": FastGradientMethod(estimator=classifier, eps=0.03),
"PGD-40": ProjectedGradientDescent(
estimator=classifier, eps=0.03, eps_step=0.005, max_iter=40
),
"C&W-L2": CarliniL2Method(
classifier=classifier, confidence=0.5, max_iter=100 # C&W confidence parameter ā higher = more confident adversarial, but larger perturbation needed
),
"DeepFool": DeepFool(classifier=classifier, max_iter=50),
}
print(f"{'':<12} | {'Success':>8} | {'Avg L2':>10}")
print("-" * 40)
for name, attack in attacks.items():
x_adv = attack.generate(x=x_test)
# Evaluate
preds_clean = np.argmax(classifier.predict(x_test), axis=1)
preds_adv = np.argmax(classifier.predict(x_adv), axis=1)
success_rate = np.mean(preds_clean != preds_adv)
l2_dist = np.mean(np.sqrt(np.sum((x_adv - x_test)**2, axis=(1,2,3))))
print(f"{name:<12} | {success_rate:>7.1%} | {l2_dist:>10.4f}")
# Defense evaluation
print("\n--- Defenses ---")
defenses = {
"Spatial Smoothing 3x3": SpatialSmoothing(window_size=3),
"Feature Squeezing 8-bit": FeatureSqueezing(bit_depth=8, clip_values=(0,1)),
}
# Use the last attack's adversarial examples for defense testing
for def_name, defense in defenses.items():
x_def, _ = defense(x_adv)
preds_def = np.argmax(classifier.predict(x_def), axis=1)
recovery_rate = np.mean(preds_def == preds_clean)
print(f"{def_name:<30} | Recovery: {recovery_rate:.1%}")
# Expected output:
# === Multi-Attack Comparison ===
# Attack | Success Rate | Avg L2 Dist | Avg Confidence Drop
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
# FGSM | 72.0% | 3.214 | -43.2%
# PGD | 88.0% | 2.891 | -56.7%
# C&W | 96.0% | 1.247 | -71.8%
# DeepFool | 84.0% | 1.892 | -48.3%#!/usr/bin/env python3
# Prerequisites: pip install adversarial-robustness-toolbox torch torchvision numpy
"""IBM Adversarial Robustness Toolbox (ART) ā comprehensive attack/defense framework.
Supports multiple attack methods, defenses, and detection techniques."""
from art.attacks.evasion import ProjectedGradientDescent, FastGradientMethod
from art.attacks.evasion import CarliniL2Method, DeepFool
from art.estimators.classification import PyTorchClassifier
from art.defences.preprocessor import SpatialSmoothing, FeatureSqueezing
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
import numpy as np
# Load pre-trained model and wrap for ART
model = models.resnet50(weights="IMAGENET1K_V2")
model.eval()
criterion = nn.CrossEntropyLoss()
classifier = PyTorchClassifier(
model=model,
loss=criterion,
optimizer=None,
input_shape=(3, 224, 224),
nb_classes=1000,
clip_values=(0.0, 1.0), # Pixel value range after normalization (0.0ā1.0 for float tensors)
)
# Load a small test batch (ImageNet-style or custom face images)
# ā test_images/ must use ImageFolder format: test_images/class_name/image.jpg (e.g., test_images/dog/photo1.jpg)
# Replace 'test_images/' with your dataset directory
preprocess = transforms.Compose([
transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(),
])
test_dataset = datasets.ImageFolder("test_images/", transform=preprocess)
loader = torch.utils.data.DataLoader(test_dataset, batch_size=10, shuffle=False)
x_batch, y_batch = next(iter(loader))
x_test = x_batch.numpy() # shape: (10, 3, 224, 224)
y_test = y_batch.numpy()
# Attack comparison
attacks = {
"FGSM": FastGradientMethod(estimator=classifier, eps=0.03),
"PGD-40": ProjectedGradientDescent(
estimator=classifier, eps=0.03, eps_step=0.005, max_iter=40
),
"C&W-L2": CarliniL2Method(
classifier=classifier, confidence=0.5, max_iter=100 # C&W confidence parameter ā higher = more confident adversarial, but larger perturbation needed
),
"DeepFool": DeepFool(classifier=classifier, max_iter=50),
}
print(f"{'':<12} | {'Success':>8} | {'Avg L2':>10}")
print("-" * 40)
for name, attack in attacks.items():
x_adv = attack.generate(x=x_test)
# Evaluate
preds_clean = np.argmax(classifier.predict(x_test), axis=1)
preds_adv = np.argmax(classifier.predict(x_adv), axis=1)
success_rate = np.mean(preds_clean != preds_adv)
l2_dist = np.mean(np.sqrt(np.sum((x_adv - x_test)**2, axis=(1,2,3))))
print(f"{name:<12} | {success_rate:>7.1%} | {l2_dist:>10.4f}")
# Defense evaluation
print("\n--- Defenses ---")
defenses = {
"Spatial Smoothing 3x3": SpatialSmoothing(window_size=3),
"Feature Squeezing 8-bit": FeatureSqueezing(bit_depth=8, clip_values=(0,1)),
}
# Use the last attack's adversarial examples for defense testing
for def_name, defense in defenses.items():
x_def, _ = defense(x_adv)
preds_def = np.argmax(classifier.predict(x_def), axis=1)
recovery_rate = np.mean(preds_def == preds_clean)
print(f"{def_name:<30} | Recovery: {recovery_rate:.1%}")
# Expected output:
# === Multi-Attack Comparison ===
# Attack | Success Rate | Avg L2 Dist | Avg Confidence Drop
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
# FGSM | 72.0% | 3.214 | -43.2%
# PGD | 88.0% | 2.891 | -56.7%
# C&W | 96.0% | 1.247 | -71.8%
# DeepFool | 84.0% | 1.892 | -48.3%Physical Patch Evaluation
Framework for testing adversarial patches under realistic physical-world conditions.
#!/usr/bin/env python3
"""Physical-world adversarial patch evaluation framework.
Tests patch effectiveness across angles, distances, and lighting conditions.
Requires: pip install ultralytics opencv-python numpy"""
import cv2
import numpy as np
from ultralytics import YOLO
# Load YOLOv8 model (downloads weights on first run)
detector = YOLO("yolov8n.pt")
def evaluate_detector(frame, model, conf_threshold=0.25): # YOLO detection threshold (lower = more sensitive detector)
"""Run YOLOv8 detector and return max person-class confidence."""
results = model(frame, verbose=False, conf=conf_threshold)
max_conf = 0.0
for r in results:
for box in r.boxes:
cls_id = int(box.cls[0])
conf = float(box.conf[0])
# Class 0 = person in COCO
if cls_id == 0 and conf > max_conf:
max_conf = conf
return max_conf
def physical_patch_evaluation(video_dir, model):
"""Evaluate adversarial patch across physical conditions."""
conditions = {
"angles": [0, 15, 30, 45, 60, 75, 90],
"distances_m": [2, 5, 10, 15, 20],
"lighting": ["bright", "overcast", "indoor", "low_light", "backlit"],
}
results = []
for angle in conditions["angles"]:
for dist in conditions["distances_m"]:
# Load frame for this condition
# Expected file naming: angle{A}_dist{D}.jpg (e.g., angle0_dist5.jpg, angle30_dist10.jpg)
frame_path = f"{video_dir}/angle{angle}_dist{dist}.jpg"
frame = cv2.imread(frame_path)
if frame is None:
continue
conf = evaluate_detector(frame, model)
results.append({
"angle": angle, "distance": dist,
"confidence": conf,
"evaded": conf < 0.3 # if re-detected confidence stays below 0.3, the patch successfully degraded detection
})
# Summary report
print(f"{'Angle':>6} {'Distance':>10} {'Confidence':>12} {'Evaded':>8}")
print("-" * 40)
for r in results:
status = "ā YES" if r["evaded"] else "ā NO"
print(f"{r['angle']:>6}° {r['distance']:>9}m {r['confidence']:>12.3f} {status:>8}")
evasion_rate = sum(1 for r in results if r["evaded"]) / len(results) if results else 0
print(f"\nOverall evasion rate: {evasion_rate:.1%}")#!/usr/bin/env python3
"""Physical-world adversarial patch evaluation framework.
Tests patch effectiveness across angles, distances, and lighting conditions.
Requires: pip install ultralytics opencv-python numpy"""
import cv2
import numpy as np
from ultralytics import YOLO
# Load YOLOv8 model (downloads weights on first run)
detector = YOLO("yolov8n.pt")
def evaluate_detector(frame, model, conf_threshold=0.25): # YOLO detection threshold (lower = more sensitive detector)
"""Run YOLOv8 detector and return max person-class confidence."""
results = model(frame, verbose=False, conf=conf_threshold)
max_conf = 0.0
for r in results:
for box in r.boxes:
cls_id = int(box.cls[0])
conf = float(box.conf[0])
# Class 0 = person in COCO
if cls_id == 0 and conf > max_conf:
max_conf = conf
return max_conf
def physical_patch_evaluation(video_dir, model):
"""Evaluate adversarial patch across physical conditions."""
conditions = {
"angles": [0, 15, 30, 45, 60, 75, 90],
"distances_m": [2, 5, 10, 15, 20],
"lighting": ["bright", "overcast", "indoor", "low_light", "backlit"],
}
results = []
for angle in conditions["angles"]:
for dist in conditions["distances_m"]:
# Load frame for this condition
# Expected file naming: angle{A}_dist{D}.jpg (e.g., angle0_dist5.jpg, angle30_dist10.jpg)
frame_path = f"{video_dir}/angle{angle}_dist{dist}.jpg"
frame = cv2.imread(frame_path)
if frame is None:
continue
conf = evaluate_detector(frame, model)
results.append({
"angle": angle, "distance": dist,
"confidence": conf,
"evaded": conf < 0.3 # if re-detected confidence stays below 0.3, the patch successfully degraded detection
})
# Summary report
print(f"{'Angle':>6} {'Distance':>10} {'Confidence':>12} {'Evaded':>8}")
print("-" * 40)
for r in results:
status = "ā YES" if r["evaded"] else "ā NO"
print(f"{r['angle']:>6}° {r['distance']:>9}m {r['confidence']:>12.3f} {status:>8}")
evasion_rate = sum(1 for r in results if r["evaded"]) / len(results) if results else 0
print(f"\nOverall evasion rate: {evasion_rate:.1%}")Responsible Research
Related: Face Recognition & Physical Defense
Key Takeaways for Defenders
- No model is robust to all attacks: adversarial training improves but doesn't eliminate vulnerability
- Physical attacks degrade: real-world conditions significantly reduce digital attack effectiveness
- Multi-model ensembles resist transfer: use diverse architectures to reduce single-point failure
- Human review remains essential: automate detection but keep humans in the loop for high-stakes decisions
- Test continuously: new attack methods emerge regularly ā robustness evaluation must be ongoing
Adversarial ML Labs
Hands-on exercises to understand adversarial techniques and defenses.