Reconnaissance

Active Reconnaissance

Active reconnaissance involves direct interaction with target systems to gather detailed information about web technologies, application structure, and potential vulnerabilities. This phase generates traffic detectable by security monitoring.

Detection Risk

Active reconnaissance will leave traces in target logs. WAFs, IDS/IPS, and SIEM systems may detect and alert on scanning activity. Ensure you have explicit authorization and coordinate with the client's security team if stealth is required.

Warning

Active recon generates traffic that may trigger security alerts. Always have written authorization before proceeding.

Tools & Resources

WhatWeb

Website fingerprinting for CMS

brew install whatweb GitHub →

httpx

Fast HTTP probing & tech detection

go install ...httpx@latest GitHub →

katana

Next-gen web crawling framework

go install ...katana@latest GitHub →

waybackurls

Fetch URLs from Wayback Machine

go install waybackurls@latest GitHub →

Technology Fingerprinting

Identify web servers, frameworks, CMS platforms, and client-side technologies to understand the technology stack and potential vulnerabilities.

WhatWeb

whatweb.sh

bash

# Basic fingerprinting
whatweb https://example.com

# Verbose output with full details
whatweb -v https://example.com

# Aggressive mode (more plugins, more requests)
whatweb -a 3 https://example.com

# Output to JSON
whatweb --log-json=output.json https://example.com

# Scan multiple targets
whatweb -i targets.txt -v

# Quiet mode (just technologies)
whatweb -q https://example.com
# Basic fingerprinting
whatweb https://example.com

# Verbose output with full details
whatweb -v https://example.com

# Aggressive mode (more plugins, more requests)
whatweb -a 3 https://example.com

# Output to JSON
whatweb --log-json=output.json https://example.com

# Scan multiple targets
whatweb -i targets.txt -v

# Quiet mode (just technologies)
whatweb -q https://example.com

httpx Technology Detection

httpx-tech.sh

bash

# Basic HTTP probing
echo "example.com" | httpx

# Technology detection
echo "example.com" | httpx -tech-detect

# Full details
cat subdomains.txt | httpx -title -status-code -tech-detect -content-length

# With response headers
cat subdomains.txt | httpx -include-response-header

# JSON output
cat subdomains.txt | httpx -tech-detect -json -o results.json

# Filter by status
cat subdomains.txt | httpx -mc 200,301,302 -tech-detect
# Basic HTTP probing
echo "example.com" | httpx

# Technology detection
echo "example.com" | httpx -tech-detect

# Full details
cat subdomains.txt | httpx -title -status-code -tech-detect -content-length

# With response headers
cat subdomains.txt | httpx -include-response-header

# JSON output
cat subdomains.txt | httpx -tech-detect -json -o results.json

# Filter by status
cat subdomains.txt | httpx -mc 200,301,302 -tech-detect

Manual Header Analysis

header-analysis.sh

bash

# Curl headers
curl -I https://example.com

# Full response with headers
curl -i https://example.com

# Follow redirects
curl -IL https://example.com

# Look for revealing headers:
# Server: Apache/2.4.41
# X-Powered-By: PHP/7.4.3
# X-AspNet-Version: 4.0.30319
# X-Generator: Drupal 8
# Via: 1.1 vegur (Heroku)

# Check security headers
curl -sI https://example.com | grep -iE "(strict-transport|x-frame|x-content-type|x-xss|content-security)"

# Nmap HTTP enumeration
nmap -sV -p 80,443 --script http-headers,http-server-header example.com
# Curl headers
curl -I https://example.com

# Full response with headers
curl -i https://example.com

# Follow redirects
curl -IL https://example.com

# Look for revealing headers:
# Server: Apache/2.4.41
# X-Powered-By: PHP/7.4.3
# X-AspNet-Version: 4.0.30319
# X-Generator: Drupal 8
# Via: 1.1 vegur (Heroku)

# Check security headers
curl -sI https://example.com | grep -iE "(strict-transport|x-frame|x-content-type|x-xss|content-security)"

# Nmap HTTP enumeration
nmap -sV -p 80,443 --script http-headers,http-server-header example.com

Web Crawling & Spidering

Crawl the target to discover endpoints, parameters, forms, and hidden functionality.

Katana

katana.sh

bash

# Basic crawling
katana -u https://example.com

# Depth control
katana -u https://example.com -d 5

# Output to file
katana -u https://example.com -o crawl_results.txt

# Include JavaScript parsing
katana -u https://example.com -jc

# Headless browser mode (renders JavaScript)
katana -u https://example.com -headless

# With custom headers
katana -u https://example.com -H "Cookie: session=abc123"

# Filter by extension
katana -u https://example.com -ef png,jpg,gif,svg,css,woff

# JSON output
katana -u https://example.com -json -o crawl.json

# Multiple targets
katana -list urls.txt -d 3 -o all_crawl.txt
# Basic crawling
katana -u https://example.com

# Depth control
katana -u https://example.com -d 5

# Output to file
katana -u https://example.com -o crawl_results.txt

# Include JavaScript parsing
katana -u https://example.com -jc

# Headless browser mode (renders JavaScript)
katana -u https://example.com -headless

# With custom headers
katana -u https://example.com -H "Cookie: session=abc123"

# Filter by extension
katana -u https://example.com -ef png,jpg,gif,svg,css,woff

# JSON output
katana -u https://example.com -json -o crawl.json

# Multiple targets
katana -list urls.txt -d 3 -o all_crawl.txt

gospider

gospider.sh

bash

# Basic spidering
gospider -s https://example.com

# With depth and concurrency
gospider -s https://example.com -d 3 -c 10

# Output to directory
gospider -s https://example.com -o output_dir

# Include subdomains
gospider -s https://example.com --subs

# Custom User-Agent
gospider -s https://example.com -u "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"

# With cookies
gospider -s https://example.com --cookie "session=abc123"

# Filter JS files
gospider -s https://example.com | grep "\.js" | sort -u
# Basic spidering
gospider -s https://example.com

# With depth and concurrency
gospider -s https://example.com -d 3 -c 10

# Output to directory
gospider -s https://example.com -o output_dir

# Include subdomains
gospider -s https://example.com --subs

# Custom User-Agent
gospider -s https://example.com -u "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"

# With cookies
gospider -s https://example.com --cookie "session=abc123"

# Filter JS files
gospider -s https://example.com | grep "\.js" | sort -u

hakrawler

hakrawler.sh

bash

# Basic crawl
echo "https://example.com" | hakrawler

# With depth
echo "https://example.com" | hakrawler -d 3

# Include subdomains
echo "https://example.com" | hakrawler -subs

# JavaScript parsing
echo "https://example.com" | hakrawler -js

# Custom headers
echo "https://example.com" | hakrawler -h "Authorization: Bearer token"

# Filter unique URLs
echo "https://example.com" | hakrawler | sort -u
# Basic crawl
echo "https://example.com" | hakrawler

# With depth
echo "https://example.com" | hakrawler -d 3

# Include subdomains
echo "https://example.com" | hakrawler -subs

# JavaScript parsing
echo "https://example.com" | hakrawler -js

# Custom headers
echo "https://example.com" | hakrawler -h "Authorization: Bearer token"

# Filter unique URLs
echo "https://example.com" | hakrawler | sort -u

Archive & Historical Data

Historical URL data reveals removed endpoints, old parameters, and hidden functionality that may still be accessible.

historical-urls.sh

bash

# Wayback Machine URLs
waybackurls example.com | sort -u > wayback_urls.txt

# GAU (GetAllUrls) - Multiple sources
gau example.com | sort -u > all_urls.txt

# GAU with subdomains
gau --subs example.com

# GAU providers: wayback, commoncrawl, otx, urlscan
gau --providers wayback,otx example.com

# Filter interesting endpoints
waybackurls example.com | grep -iE "\.(php|asp|aspx|jsp|json|xml|config|sql|log|bak|old)$"

# Find parameters
waybackurls example.com | grep "?" | cut -d "?" -f 2 | tr "&" "\n" | cut -d "=" -f 1 | sort -u

# Extract JS files for analysis
waybackurls example.com | grep "\.js$" | sort -u > js_files.txt

# Check if old URLs still work
cat wayback_urls.txt | httpx -silent -mc 200 -o still_alive.txt
# Wayback Machine URLs
waybackurls example.com | sort -u > wayback_urls.txt

# GAU (GetAllUrls) - Multiple sources
gau example.com | sort -u > all_urls.txt

# GAU with subdomains
gau --subs example.com

# GAU providers: wayback, commoncrawl, otx, urlscan
gau --providers wayback,otx example.com

# Filter interesting endpoints
waybackurls example.com | grep -iE "\.(php|asp|aspx|jsp|json|xml|config|sql|log|bak|old)$"

# Find parameters
waybackurls example.com | grep "?" | cut -d "?" -f 2 | tr "&" "\n" | cut -d "=" -f 1 | sort -u

# Extract JS files for analysis
waybackurls example.com | grep "\.js$" | sort -u > js_files.txt

# Check if old URLs still work
cat wayback_urls.txt | httpx -silent -mc 200 -o still_alive.txt

JavaScript Analysis

JavaScript files often contain API endpoints, secrets, and hardcoded credentials.

js-analysis.sh

bash

# Extract JS files from crawl
katana -u https://example.com -jc | grep "\.js" > js_files.txt

# Download all JS files
while read url; do
  curl -s "$url" -o "js/$(echo $url | md5sum | cut -d' ' -f1).js"
done < js_files.txt

# LinkFinder - Extract endpoints from JS
linkfinder -i https://example.com/app.js -o cli

# Run on all JS files
cat js_files.txt | while read url; do
  linkfinder -i "$url" -o cli
done | sort -u > endpoints.txt

# SecretFinder - Find secrets in JS
python3 SecretFinder.py -i https://example.com/app.js -o cli

# JS Miner
python3 jsminer.py -u https://example.com/app.js

# Grep for interesting patterns
curl -s https://example.com/app.js | grep -oE "(api|endpoint|url|path|secret|key|token|auth|password)['"][^'"]*['"]"

# JSBeautifier for minified code
js-beautify app.min.js > app.js
# Extract JS files from crawl
katana -u https://example.com -jc | grep "\.js" > js_files.txt

# Download all JS files
while read url; do
  curl -s "$url" -o "js/$(echo $url | md5sum | cut -d' ' -f1).js"
done < js_files.txt

# LinkFinder - Extract endpoints from JS
linkfinder -i https://example.com/app.js -o cli

# Run on all JS files
cat js_files.txt | while read url; do
  linkfinder -i "$url" -o cli
done | sort -u > endpoints.txt

# SecretFinder - Find secrets in JS
python3 SecretFinder.py -i https://example.com/app.js -o cli

# JS Miner
python3 jsminer.py -u https://example.com/app.js

# Grep for interesting patterns
curl -s https://example.com/app.js | grep -oE "(api|endpoint|url|path|secret|key|token|auth|password)['"][^'"]*['"]"

# JSBeautifier for minified code
js-beautify app.min.js > app.js

JS Secrets to Look For

API Keys & Tokens

• AWS access keys
• Google API keys
• Firebase credentials
• Stripe/payment keys
• OAuth tokens

Hidden Endpoints

• Admin API routes
• Debug endpoints
• Internal services
• Undocumented features
• GraphQL schemas

Parameter Discovery

Hidden parameters can unlock debug endpoints, admin features, and injection points. The Enumeration page covers parameter discovery tools (Arjun, x8) in full depth.

param-discovery.sh

bash

# Quick parameter discovery during recon
arjun -u https://example.com/endpoint   # Automated param discovery
paramspider -d example.com -o params.txt # Mine params from web archives

# Extract params from crawl results
cat crawl_results.txt | grep "?" | cut -d "?" -f 2 | tr "&" "\n" | cut -d "=" -f 1 | sort -u > found_params.txt

# See /web-pentest/04-enumeration/ for full parameter + vhost discovery
# Quick parameter discovery during recon
arjun -u https://example.com/endpoint   # Automated param discovery
paramspider -d example.com -o params.txt # Mine params from web archives

# Extract params from crawl results
cat crawl_results.txt | grep "?" | cut -d "?" -f 2 | tr "&" "\n" | cut -d "=" -f 1 | sort -u > found_params.txt

# See /web-pentest/04-enumeration/ for full parameter + vhost discovery

Virtual Host Discovery

Virtual hosts may expose staging, admin, or internal applications on the same IP. For comprehensive vhost fuzzing techniques, see the Enumeration — Virtual Hosts section.

vhost-discovery.sh

bash

# Quick vhost check during recon
ffuf -w vhosts.txt -u http://example.com -H "Host: FUZZ.example.com" -fc 404
curl -s http://10.10.10.10 -H "Host: dev.example.com"

# See /web-pentest/04-enumeration/ for gobuster vhost, wfuzz, and filter techniques
# Quick vhost check during recon
ffuf -w vhosts.txt -u http://example.com -H "Host: FUZZ.example.com" -fc 404
curl -s http://10.10.10.10 -H "Host: dev.example.com"

# See /web-pentest/04-enumeration/ for gobuster vhost, wfuzz, and filter techniques

Active Recon Checklist

🔍 Technology Stack

☐ Web server identified
☐ Framework/CMS detected
☐ Programming language determined
☐ Version numbers noted
☐ CDN/WAF identified

🕷️ Crawling

☐ Application crawled
☐ Forms identified
☐ File uploads found
☐ API endpoints discovered
☐ Admin panels located

📜 JavaScript Analysis

☐ JS files extracted
☐ Endpoints extracted
☐ Secrets searched
☐ API schemas found
☐ Source maps checked

📋 Documentation

☐ Wayback URLs collected
☐ Parameters documented
☐ Virtual hosts tested
☐ Attack surface mapped
☐ Priority targets identified

With comprehensive reconnaissance complete, proceed to Scanning to identify specific vulnerabilities in the discovered assets.

Active Reconnaissance

Detection Risk

Tools & Resources

WhatWeb

httpx

katana

waybackurls

Technology Fingerprinting

WhatWeb

httpx Technology Detection

Manual Header Analysis

Web Crawling & Spidering

Katana

gospider

hakrawler

Archive & Historical Data

JavaScript Analysis

JS Secrets to Look For

API Keys & Tokens

Hidden Endpoints

Parameter Discovery

Virtual Host Discovery

Active Recon Checklist

🔍 Technology Stack

🕷️ Crawling

📜 JavaScript Analysis

📋 Documentation

Practice Labs

TryHackMe - Active Recon

HTB Recon Guide

PortSwigger - Info Disclosure

OWASP Testing Guide

Related Topics

Subdomain Discovery

Subdomain Takeover

Reconnaissance Overview

Enumeration