Race Conditions Remediation
Risk Severity
๐ High Fix Effort
๐ง Medium Est. Time
โฑ๏ธ 3-6 hours Reference
A01:2021 CWE-362
Race conditions occur when multiple threads or processes access shared resources concurrently without proper synchronization, leading to inconsistent state, privilege escalation, or data corruption.
Critical Impact
Race conditions can bypass security checks, corrupt data, cause double-spending in payment systems,
and lead to privilege escalation. They're often difficult to detect and reproduce.
Understanding Race Conditions
Common Race Condition Types
- โข TOCTOU: Time-of-Check-Time-of-Use
- โข Database races: Concurrent transactions
- โข File system races: Symlink attacks
- โข State transitions: Order-dependent operations
- โข Double-spend: Balance check race
Attack Scenario
- Thread 1 checks if file exists
- Thread 2 creates malicious symlink
- Thread 1 opens file (now follows symlink)
- Attacker reads/writes sensitive file
TOCTOU Prevention
Best Practice
Eliminate the window between check and use by performing atomic operations.
Use "try and fail" instead of "check then act".
Vulnerable File Access (TOCTOU)
javascript
// โ VULNERABLE: Check-then-act pattern
if (fs.existsSync(filename)) {
// Race window here! File could be deleted or replaced
const content = fs.readFileSync(filename);
processFile(content);
}
// โ
SECURE: Try-and-catch pattern
try {
const content = fs.readFileSync(filename);
processFile(content);
} catch (err) {
// Handle missing file
}
// โ
SECURE: Atomic file operations (Node.js)
const fd = fs.openSync(filename, 'r', 0o666);
try {
const content = fs.readFileSync(fd);
processFile(content);
} finally {
fs.closeSync(fd);
}Python Safe File Operations
python
# โ VULNERABLE: Separate check and open
import os
if os.path.exists(filename):
with open(filename) as f: # Race window!
data = f.read()
# โ
SECURE: Try to open directly
try:
with open(filename, 'r') as f:
data = f.read()
except FileNotFoundError:
# Handle missing file
pass
# โ
SECURE: Use exclusive creation flag
import os
try:
# O_CREAT | O_EXCL ensures atomic creation
fd = os.open(filename, os.O_CREAT | os.O_EXCL | os.O_WRONLY, 0o600)
with os.fdopen(fd, 'w') as f:
f.write(sensitive_data)
except FileExistsError:
# File already exists - safe failure
passDatabase Race Conditions
Vulnerable Balance Check
sql
-- โ VULNERABLE: Separate read and update
-- Thread 1: SELECT balance = 100
-- Thread 2: SELECT balance = 100
-- Thread 1: UPDATE balance = 100 - 60 = 40
-- Thread 2: UPDATE balance = 100 - 50 = 50 โ Wrong! Should be -10
-- Query 1: Check balance
SELECT balance FROM accounts WHERE user_id = 123;
-- Race window here!
-- Query 2: Deduct money
UPDATE accounts
SET balance = balance - 50
WHERE user_id = 123;Secure: Atomic Operations
sql
-- โ
SECURE: Single atomic transaction with constraint
UPDATE accounts
SET balance = balance - 50
WHERE user_id = 123
AND balance >= 50; -- Check happens atomically
-- Check affected rows
IF @@ROWCOUNT = 0 THEN
RAISE 'Insufficient funds';
END IF;
-- โ
SECURE: Use database transactions with serializable isolation
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE user_id = 123 FOR UPDATE;
-- Row is now locked, no other transaction can modify it
IF balance >= 50 THEN
UPDATE accounts SET balance = balance - 50 WHERE user_id = 123;
ELSE
ROLLBACK;
RAISE 'Insufficient funds';
END IF;
COMMIT;Application Synchronization
Python Threading Locks
python
import threading
# โ VULNERABLE: No synchronization
balance = 1000
def withdraw(amount):
global balance
if balance >= amount:
# Race window!
balance -= amount
return True
return False
# โ
SECURE: Use lock for critical section
balance = 1000
balance_lock = threading.Lock()
def withdraw_safe(amount):
global balance
with balance_lock:
if balance >= amount:
balance -= amount
return True
return False
# โ
SECURE: Use thread-safe data structures
from queue import Queue
from threading import Semaphore
# Semaphore for resource limiting
max_connections = Semaphore(10)
def process_request():
with max_connections:
# Only 10 threads can be here at once
handle_connection()Java Synchronized Methods
java
// โ VULNERABLE: Non-atomic check-then-act
private int balance = 1000;
public boolean withdraw(int amount) {
if (balance >= amount) { // Race!
balance -= amount;
return true;
}
return false;
}
// โ
SECURE: Synchronized method
private int balance = 1000;
public synchronized boolean withdraw(int amount) {
if (balance >= amount) {
balance -= amount;
return true;
}
return false;
}
// โ
SECURE: Use concurrent collections
import java.util.concurrent.atomic.AtomicInteger;
private AtomicInteger balance = new AtomicInteger(1000);
public boolean withdraw(int amount) {
while (true) {
int current = balance.get();
if (current < amount) {
return false;
}
int next = current - amount;
if (balance.compareAndSet(current, next)) {
return true; // Success!
}
// CAS failed, retry
}
}
// โ
SECURE: ReentrantLock with conditions
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
private final Lock lock = new ReentrantLock();
public boolean withdraw(int amount) {
lock.lock();
try {
if (balance >= amount) {
balance -= amount;
return true;
}
return false;
} finally {
lock.unlock();
}
}Distributed System Races
Redis Distributed Locks
javascript
// โ
SECURE: Redis distributed lock with Redlock algorithm
const redis = require('redis');
const { promisify } = require('util');
async function acquireLock(client, lockKey, ttl = 5000) {
const lockValue = crypto.randomUUID();
// SET with NX (only if not exists) and PX (TTL in ms)
const result = await client.set(
lockKey,
lockValue,
'NX',
'PX',
ttl
);
if (result === 'OK') {
return lockValue; // Lock acquired
}
return null; // Lock held by another process
}
async function releaseLock(client, lockKey, lockValue) {
// Use Lua script for atomic check-and-delete
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await client.eval(script, 1, lockKey, lockValue);
}
// Usage
async function processWithLock() {
const lockValue = await acquireLock(redisClient, 'resource:123');
if (!lockValue) {
throw new Error('Could not acquire lock');
}
try {
// Critical section
await performTransaction();
} finally {
await releaseLock(redisClient, 'resource:123', lockValue);
}
}Database-Level Pessimistic Locking
sql
-- โ
SECURE: Row-level locking with SELECT FOR UPDATE
BEGIN;
-- Lock the row exclusively
SELECT * FROM inventory
WHERE product_id = 123
FOR UPDATE;
-- No other transaction can modify this row until we commit
UPDATE inventory
SET quantity = quantity - 1
WHERE product_id = 123;
COMMIT;
-- โ
SECURE: Named advisory locks (PostgreSQL)
SELECT pg_advisory_lock(123456);
-- Critical section - only one session can hold this lock
SELECT pg_advisory_unlock(123456);
-- โ
SECURE: Optimistic locking with version column
UPDATE products
SET
quantity = quantity - 1,
version = version + 1
WHERE
product_id = 123
AND version = @expected_version;
-- Check affected rows
IF @@ROWCOUNT = 0 THEN
RAISE 'Concurrent modification detected';
END IF;State Transition Safety
javascript
// โ VULNERABLE: Unprotected state transitions
class Order {
constructor() {
this.status = 'pending';
}
approve() {
this.status = 'approved'; // Race!
}
cancel() {
this.status = 'cancelled'; // Could overwrite approval!
}
}
// โ
SECURE: Atomic state transitions with validation
class OrderSafe {
constructor() {
this.status = 'pending';
this.version = 0;
}
async transitionTo(newStatus, expectedVersion) {
// Database enforces version check atomically
const result = await db.query(
`UPDATE orders
SET status = $1, version = version + 1
WHERE id = $2
AND status = $3
AND version = $4`,
[newStatus, this.id, this.status, expectedVersion]
);
if (result.rowCount === 0) {
throw new Error('Invalid state transition or concurrent modification');
}
this.status = newStatus;
this.version++;
}
async approve() {
if (this.status !== 'pending') {
throw new Error('Can only approve pending orders');
}
await this.transitionTo('approved', this.version);
}
}
// โ
SECURE: State machine with explicit transitions
const validTransitions = {
'pending': ['approved', 'cancelled'],
'approved': ['shipped'],
'shipped': ['delivered'],
'cancelled': [], // Terminal state
'delivered': [] // Terminal state
};
function canTransition(currentState, newState) {
return validTransitions[currentState]?.includes(newState) || false;
}๐งช Testing Verification
Testing Challenge
Race conditions are notoriously difficult to test because they're timing-dependent.
Use stress testing, thread analysis tools, and code review to identify vulnerabilities.
Concurrent Testing (Python)
python
import threading
import time
# Test for race conditions with concurrent threads
def test_concurrent_withdrawal():
balance = Account(1000)
threads = []
# Try to withdraw $600 twice simultaneously (should fail once)
def withdraw_thread():
result = balance.withdraw(600)
print(f"Withdrawal {'succeeded' if result else 'failed'}")
# Launch multiple threads
for _ in range(2):
t = threading.Thread(target=withdraw_thread)
threads.append(t)
t.start()
# Wait for completion
for t in threads:
t.join()
final_balance = balance.get_balance()
print(f"Final balance: {final_balance}")
# Should be either 400 (one succeeded) or 1000 (both failed)
# Should NEVER be -200 (both succeeded - race condition!)
assert final_balance in [400, 1000], "Race condition detected!"
# Stress test with many threads
def stress_test_race_condition():
account = Account(10000)
threads = []
def worker():
for _ in range(100):
account.withdraw(1)
time.sleep(0.0001) # Increase race window
# 10 threads, 100 withdrawals each = 1000 total
for _ in range(10):
t = threading.Thread(target=worker)
threads.append(t)
t.start()
for t in threads:
t.join()
expected_balance = 10000 - 1000
actual_balance = account.get_balance()
print(f"Expected: {expected_balance}, Actual: {actual_balance}")
assert actual_balance == expected_balance, "Race condition detected!"Testing Tools
Race Condition Detection Tools
- โข ThreadSanitizer (TSan): Detects data races in C/C++/Go
- โข Java Concurrency Stress Tests: jcstress framework
- โข Python threading analysis: pytest-timeout, pytest-xdist
- โข Static analysis: FindBugs, SpotBugs (Java), Pylint (Python)
- โข Stress testing: Apache JMeter, Gatling, Locust
โ ๏ธ Common Mistakes
โ Using sleep() as synchronization
Adding arbitrary delays doesn't fix races - it just makes them harder to reproduce. Use proper locks.
python
// โ WRONG: Sleep is not synchronization!
if (!fileExists(file)) {
time.sleep(0.1); // Attacker still wins this race
createFile(file);
}โ Double-checked locking without volatile
Classic broken singleton pattern - requires proper memory barriers.
java
// โ BROKEN: Double-checked locking without volatile
private static Singleton instance;
public static Singleton getInstance() {
if (instance == null) { // Check 1 (unsynchronized)
synchronized (Singleton.class) {
if (instance == null) { // Check 2 (synchronized)
instance = new Singleton(); // Can be partially constructed!
}
}
}
return instance;
}
// โ
CORRECT: Add volatile keyword
private static volatile Singleton instance;โ Locking at wrong granularity
Lock too much = performance issues. Lock too little = race conditions.
java
// โ TOO COARSE: Entire method locked
public synchronized void processLargeFile() {
readFile(); // Lock held during I/O - bad!
processData(); // Lock held during computation - bad!
writeResults(); // Lock held during more I/O - bad!
}
// โ
CORRECT: Lock only critical section
public void processLargeFile() {
byte[] data = readFile(); // No lock
byte[] results = processData(data); // No lock
synchronized (this) {
writeResults(results); // Lock only during shared state modification
}
}Framework Solutions
Django (Python)
python
from django.db import transaction
from django.db.models import F
# โ
Atomic update
Product.objects.filter(
id=product_id
).update(
quantity=F('quantity') - 1
)
# โ
Select for update
with transaction.atomic():
product = Product.objects.select_for_update().get(id=product_id)
if product.quantity > 0:
product.quantity -= 1
product.save()Rails (Ruby)
ruby
# โ
Optimistic locking
class Product < ApplicationRecord
self.locking_column = :lock_version
end
product.update!(quantity: product.quantity - 1)
# Raises StaleObjectError on concurrent modification
# โ
Pessimistic locking
Product.transaction do
product = Product.lock.find(product_id)
product.quantity -= 1
product.save!
end