Social Engineering
Advanced
T1566 T1598

AI Social Engineering

AI has fundamentally transformed social engineering from a craft into a scalable weapon. Deepfake video, real-time voice cloning, and LLM-generated phishing can now bypass human intuition at scale. Understanding these techniques is essential for red teamers and defenders alike.

Legal & Ethical Boundaries

AI social engineering tools can cause serious real-world harm. Only use these techniques in authorised red team engagements with explicit written scope that includes social engineering. Deepfakes and voice cloning without consent may violate laws in your jurisdiction.

The AI Social Engineering Threat Landscape

AI Social Engineering Attack Surface (2026)

Generation

Text / Email

LLM-crafted phishing

Voice Clone

3s sample → full voice

Deepfake Video

Real-time face swap + lip sync

Delivery

Email Phone (Vishing) Video Call SMS / Chat

Objectives

Credential harvest Wire transfer MFA bypass Access

Real-World Incidents

  • 2024 — $25M deepfake heist: A Hong Kong finance worker was tricked into transferring funds after a video call with AI-generated deepfakes of senior executives.
  • 2024 — CEO voice clone: Criminals used AI voice cloning to impersonate a CEO, authorising a fraudulent €220K wire transfer via phone call.
  • 2025 — Election deepfakes: AI-generated robocalls mimicking political candidates used to suppress voter turnout in multiple countries.

Why Attackers Love AI

  • Scale: Generate thousands of unique, personalised phishing emails in minutes
  • Quality: Perfect grammar, cultural context, and writing style mimicry
  • Speed: Real-time voice cloning needs only a 3-second sample
  • Cost: Open-source models make deepfakes free to produce
  • Evasion: Each output is unique — defeats signature-based email filters

1. LLM-Powered Phishing

Traditional phishing relies on templates that security-aware users learn to spot. AI-generated phishing is contextually unique, grammatically perfect, and can be personalised using OSINT data scraped from LinkedIn, social media, and company websites.

Red Team Simulation Framework

python
# Phishing simulation framework for authorised red team engagements
# REQUIRES: Written authorisation with social engineering in scope

import openai
import json

def generate_phishing_pretext(target_info: dict, scenario: str) -> str:
    """Generate a context-appropriate phishing pretext.
    
    Args:
        target_info: OSINT data about the target (name, role, company, interests)
        scenario: Attack scenario (credential_harvest, malware_delivery, wire_fraud)
    """
    prompt = f"""You are simulating a phishing email for an authorised red team engagement.
    
Target profile:
- Name: {target_info['name']}
- Role: {target_info['role']}
- Company: {target_info['company']}
- Recent activity: {target_info.get('recent_activity', 'N/A')}

Scenario: {scenario}

Generate a realistic phishing email that would be contextually appropriate for this 
target. Include subject line, sender name, and email body. The email should leverage 
the target's role and recent activity for credibility.

Format as JSON: {{"subject": "", "from_name": "", "from_address": "", "body": ""}}"""

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    return json.loads(response.choices[0].message.content)

# Example usage in authorised engagement
target = {
    "name": "Jane Smith",
    "role": "VP of Engineering", 
    "company": "Acme Corp",
    "recent_activity": "Spoke at CloudConf 2026 about Kubernetes migration"
}

email = generate_phishing_pretext(target, "credential_harvest")
print(f"Subject: {email['subject']}")
print(f"From: {email['from_name']} <{email['from_address']}>")
print(f"\n{email['body']}")
# Phishing simulation framework for authorised red team engagements
# REQUIRES: Written authorisation with social engineering in scope

import openai
import json

def generate_phishing_pretext(target_info: dict, scenario: str) -> str:
    """Generate a context-appropriate phishing pretext.
    
    Args:
        target_info: OSINT data about the target (name, role, company, interests)
        scenario: Attack scenario (credential_harvest, malware_delivery, wire_fraud)
    """
    prompt = f"""You are simulating a phishing email for an authorised red team engagement.
    
Target profile:
- Name: {target_info['name']}
- Role: {target_info['role']}
- Company: {target_info['company']}
- Recent activity: {target_info.get('recent_activity', 'N/A')}

Scenario: {scenario}

Generate a realistic phishing email that would be contextually appropriate for this 
target. Include subject line, sender name, and email body. The email should leverage 
the target's role and recent activity for credibility.

Format as JSON: {{"subject": "", "from_name": "", "from_address": "", "body": ""}}"""

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    return json.loads(response.choices[0].message.content)

# Example usage in authorised engagement
target = {
    "name": "Jane Smith",
    "role": "VP of Engineering", 
    "company": "Acme Corp",
    "recent_activity": "Spoke at CloudConf 2026 about Kubernetes migration"
}

email = generate_phishing_pretext(target, "credential_harvest")
print(f"Subject: {email['subject']}")
print(f"From: {email['from_name']} <{email['from_address']}>")
print(f"\n{email['body']}")

GoPhish + LLM Integration

For full red team campaigns, integrate LLM-generated content with GoPhish to track open rates, click rates, and credential submissions. Generate unique email content per target to defeat email clustering defences.

2. Voice Cloning & Vishing

Modern voice cloning models need as little as 3 seconds of audio to produce a convincing clone. Combined with real-time speech-to-speech models, attackers can conduct live phone calls in someone else's voice.

Voice Cloning Tools (Research/Red Team)

Tool Type Sample Needed Real-time?
ElevenLabs Cloud API ~1 min audio Yes (streaming)
OpenVoice Open-source ~10 seconds Near real-time
RVC (Retrieval Voice) Open-source ~10 min (training) Yes
Fish Speech Open-source ~3 seconds Near real-time
F5-TTS Open-source ~5 seconds Near real-time
StyleTTS2 Open-source ~10 seconds No (batch)
Parler TTS Open-source (HF) ~10 seconds No (batch)

Red Team Vishing Workflow

bash
# Vishing attack simulation workflow (authorised engagement only)

# Step 1: Collect voice sample from public sources
# LinkedIn videos, YouTube talks, podcast appearances, earnings calls
yt-dlp -x --audio-format wav "https://youtube.com/watch?v=TARGET_TALK"

# Step 2: Clone voice with OpenVoice (local, no data leakage)
git clone https://github.com/myshell-ai/OpenVoice.git
cd OpenVoice
pip install -e .

python openvoice_cli.py \
  --reference_audio target_sample.wav \
  --text "Hi, this is [Name] from IT. We detected unusual activity on your account. 
          I need you to verify your identity by logging into our security portal." \
  --output vishing_sample.wav

# Step 3: Real-time voice conversion for live calls
# Use RVC or SoVITS for live voice conversion during a phone call
# Pipe microphone → voice model → VOIP output

# Step 4: Combine with AI-generated pretext
# Feed OSINT about the target into an LLM to generate a contextual script
# The pretext should reference real projects, people, or events
# Vishing attack simulation workflow (authorised engagement only)

# Step 1: Collect voice sample from public sources
# LinkedIn videos, YouTube talks, podcast appearances, earnings calls
yt-dlp -x --audio-format wav "https://youtube.com/watch?v=TARGET_TALK"

# Step 2: Clone voice with OpenVoice (local, no data leakage)
git clone https://github.com/myshell-ai/OpenVoice.git
cd OpenVoice
pip install -e .

python openvoice_cli.py \
  --reference_audio target_sample.wav \
  --text "Hi, this is [Name] from IT. We detected unusual activity on your account. 
          I need you to verify your identity by logging into our security portal." \
  --output vishing_sample.wav

# Step 3: Real-time voice conversion for live calls
# Use RVC or SoVITS for live voice conversion during a phone call
# Pipe microphone → voice model → VOIP output

# Step 4: Combine with AI-generated pretext
# Feed OSINT about the target into an LLM to generate a contextual script
# The pretext should reference real projects, people, or events

3. Deepfake Video

Real-time deepfake technology allows attackers to impersonate anyone on a video call. This has already been used in the wild for fraud, and the barrier to entry is dropping rapidly.

Deepfake Tools

  • DeepFaceLive: Real-time face swap for video calls (open-source)
  • SimSwap: High-fidelity face swapping with single image
  • Wav2Lip: Accurate lip sync for any face with any audio
  • FaceFusion: Next-gen face swapping and enhancement
  • Deep-Live-Cam: Real-time face swap for video calls (successor to roop)
  • LivePortrait: Animate portraits from a single source image

Attack Scenarios

  • Executive impersonation: Deepfake CEO on Zoom authorising wire transfers
  • IT help desk: Fake IT admin on Teams requesting credentials
  • Vendor impersonation: Fake supplier representative changing payment details
  • KYC bypass: Deepfake video verification to open fraudulent accounts

3.5 Video Synthesis & the Detection Arms Race

The 2024–2026 wave of frontier video generators (OpenAI Sora 2, Google Veo 3, Runway Gen-3 Alpha, HeyGen 4, Hedra Character-3, Synthesia 2.0) collapsed the cost of producing a 30-second photorealistic avatar from "film studio" to "a $20 subscription." Real-time avatar pipelines (D-ID Live, HeyGen Streaming Avatar, Hedra) now drive Zoom / Teams / WhatsApp video at <500 ms end-to-end latency. Combined with voice cloning above, an attacker can run a fully interactive deepfake call against a finance team or a help desk.

Real-world incidents (2024–2026)

  • Arup, Hong Kong (Jan 2024) — finance employee wired US$25.6 M after a deepfake video call with a fake "CFO" plus several fake colleagues.
  • Ferrari (July 2024) — attempted CEO voice/video impersonation against an executive; foiled by a challenge question only the real CEO could answer.
  • WPP (May 2024) — deepfake of CEO Mark Read on Teams used to launch a wire-fraud attempt against an exec.
  • 2025 — multiple US-state political deepfake robocalls and an OpenAI-flagged use of voice cloning in influence ops (publicly disclosed).

2025–2026 Generation Stack

  • OpenAI Sora 2: Long-form text-to-video, audio-aligned; watermarked with C2PA + SynthID-like metadata.
  • Google Veo 3: Native synced audio + video; 1080p / 8s clips; SynthID watermark embedded.
  • Runway Gen-3 Alpha / Gen-4: Image-to-video, motion brush, subject consistency.
  • HeyGen 4 / Streaming Avatar: Realtime avatars on live video calls.
  • Hedra Character-3: Audio-driven talking-head with strong lip-sync.
  • D-ID Live / Synthesia 2.0: Enterprise streaming avatars; used in real-world help-desk impersonation tests.
  • Open-source: Wan 2.1, Hunyuan Video, Mochi 1, CogVideoX, LTX-Video — self-hostable, no provenance metadata by default.

Detection Stack

  • Reality Defender: commercial deepfake detection across image / video / audio; DOD & bank deployments.
  • Sensity AI: deepfake + AI-generated content detection; KYC focus.
  • Intel FakeCatcher: remote photoplethysmography (rPPG) — looks for blood-flow micro-changes that GANs don't reproduce.
  • Microsoft Video Authenticator + Content Credentials (C2PA).
  • Google SynthID: imperceptible watermark detector for Veo / Imagen / Lyria outputs.
  • OpenAI DALL·E / Sora detector + C2PA Content Credentials.
  • TrueMedia.org / Hive Moderation: ensemble classifiers used by newsrooms.
  • Open-source: DeepfakeBench, AltFreezing, DFDC baseline models.

Detection is necessary but not sufficient

Independent benchmarks in 2025 (e.g., Mozilla / OWASP DeepfakeBench-2025) show even leading detectors drop to 55–70% accuracy on the newest generators within months of release. Treat detection as a signal, not a verdict. Combine with provenance metadata (C2PA), out-of-band verification, and process controls (dual-control wire transfers, callback rules).

4. Detection & Defence

Defending against AI social engineering requires both technical controls and human awareness training. Traditional email filters are insufficient against LLM-generated, contextually unique content.

Technical Defences

  • Voice verification protocols: Callback procedures with pre-shared code words
  • Deepfake detection models: Microsoft Video Authenticator, Intel FakeCatcher
  • AI email analysis: Analyse writing style deviation from known sender patterns
  • DMARC / SPF / DKIM: Still essential — blocks impersonation at the email protocol level
  • Out-of-band verification: Verify high-value requests via a separate communication channel

Human Defences

  • AI-aware training: Teach staff that voice and video can be faked
  • Challenge phrases: Pre-agreed words for verifying identity in calls
  • Dual authorisation: Wire transfers require 2 people to approve
  • Red team exercises: Regular simulated attacks with AI-generated content
  • Slow down urgency: Train staff to pause when pressured for immediate action

Red Team Reporting

When reporting AI social engineering findings, document: the AI model used, the OSINT data that enabled personalisation, the attack success rate, and specific recommendations for that organisation. Include audio/video samples (with consent) to demonstrate the realism of the attack to executives.

4.5 Biometric & MFA Bypass, Smishing, RCS Deepfakes

AI-generated audio and video have moved past chat fraud into account-recovery, KYC onboarding, and voice-MFA bypass. Banks, exchanges, and gig platforms that rely on "selfie video + ID" or "voice passphrase" are the soft underbelly.

Voice-MFA & call-centre bypass

  • Pindrop, Nuance Gatekeeper, ID R&D, Daon: commercial voice-biometrics now ship a separate "synthetic-speech detector" alongside speaker verification.
  • Real-world: Vice (2023) showed an ElevenLabs clone defeating Lloyds Bank voice ID; multiple US banks have since added liveness challenges.
  • Defence: require knowledge factor + posession factor on top of voice; treat voice as a low-assurance signal.

Selfie-liveness / KYC bypass

  • Onfido, Veriff, iProov, Sumsub, Persona: 2024–2026 disclosures show deepfake injection via virtual cameras (OBS + Deep-Live-Cam) bypassing passive liveness.
  • Active liveness (head turns, randomised challenges) blocks most pre-rendered deepfakes; iProov Flashmark uses a randomised face illumination as a one-time challenge.
  • Defence: insist on active liveness with randomised challenges, hardware attestation of the camera (e.g., iOS DeviceCheck / Android Play Integrity), document-NFC chip read.

Smishing & RCS deepfakes

  • RCS (Rich Communication Services) is now default on Android and iOS 18+; verified-sender branding is widely abused via lookalike business profiles.
  • AI-generated short voice notes / video clips embedded in RCS / WhatsApp / iMessage are the new pretext vehicle — "hi, it's your daughter, my phone broke…".
  • Telecoms response: STIR/SHAKEN A-attestation for voice; for messaging, Google Verified Calls + Apple Business Connect; spam-AI filters in carriers.
  • Defence: family/team challenge phrases; out-of-band callback to known number; user training that attaches voice/video samples to phishing exercises.

Phishing-resistant MFA is the answer

  • Replace voice / SMS / TOTP with FIDO2 / WebAuthn passkeys for both consumer and workforce.
  • For workforce: Windows Hello for Business / YubiKey 5 / iCloud Keychain passkeys + Conditional Access requiring "phishing-resistant MFA."
  • For payments: out-of-band approval in the bank app rather than voice authorisation.
  • Deepfakes do not bypass FIDO origin binding — the strongest single mitigation in this entire domain.

5. Building Your AI SE Toolkit

bash
# Recommended AI social engineering toolkit for red teamers
# All tools should be run in an isolated VM

# Text generation (phishing pretexts)
pip install openai           # GPT-4o API for phishing generation
# OR use local models:
ollama pull dolphin-mixtral   # Uncensored local model

# Voice cloning
git clone https://github.com/myshell-ai/OpenVoice.git
pip install -e OpenVoice/
# Alternative: Fish Speech (zero-shot, 3s sample)
git clone https://github.com/fishaudio/fish-speech.git
pip install -e fish-speech/
# Alternative: F5-TTS (zero-shot voice cloning)
pip install f5-tts

# Deepfake video 
git clone https://github.com/iperov/DeepFaceLive.git
# OR real-time face swap (successor to roop):
git clone https://github.com/hacksider/Deep-Live-Cam.git
pip install -r Deep-Live-Cam/requirements.txt

# OSINT for target profiling
pip install theHarvester
pip install social-analyzer

# Campaign management
# GoPhish for email campaigns: https://getgophish.com
# Track: open rate, click rate, credential harvest rate

# Audio sample collection
pip install yt-dlp            # Download public talks/interviews
# Recommended AI social engineering toolkit for red teamers
# All tools should be run in an isolated VM

# Text generation (phishing pretexts)
pip install openai           # GPT-4o API for phishing generation
# OR use local models:
ollama pull dolphin-mixtral   # Uncensored local model

# Voice cloning
git clone https://github.com/myshell-ai/OpenVoice.git
pip install -e OpenVoice/
# Alternative: Fish Speech (zero-shot, 3s sample)
git clone https://github.com/fishaudio/fish-speech.git
pip install -e fish-speech/
# Alternative: F5-TTS (zero-shot voice cloning)
pip install f5-tts

# Deepfake video 
git clone https://github.com/iperov/DeepFaceLive.git
# OR real-time face swap (successor to roop):
git clone https://github.com/hacksider/Deep-Live-Cam.git
pip install -r Deep-Live-Cam/requirements.txt

# OSINT for target profiling
pip install theHarvester
pip install social-analyzer

# Campaign management
# GoPhish for email campaigns: https://getgophish.com
# Track: open rate, click rate, credential harvest rate

# Audio sample collection
pip install yt-dlp            # Download public talks/interviews

Getting Started

Start by understanding the AI pentesting fundamentals, then practice prompt engineering before attempting social engineering simulations.

Social Engineering Practice Labs

GoPhish AI Phishing Campaign Custom Lab medium
LLM phishing generationCampaign trackingCredential harvesting
Open Lab
OpenVoice Voice Cloning Lab Custom Lab hard
Voice cloningVishing simulationAudio deepfake detection
Open Lab
Deep-Live-Cam Deepfake Exercise Custom Lab hard
Real-time face swapVideo call impersonationDeepfake detection
Open Lab