Race Conditions Remediation

Risk Severity
๐ŸŸ  High
Fix Effort
๐Ÿ”ง Medium
Est. Time
โฑ๏ธ 3-6 hours
Reference
A01:2021 CWE-362

Race conditions occur when multiple threads or processes access shared resources concurrently without proper synchronization, leading to inconsistent state, privilege escalation, or data corruption.

Critical Impact

Race conditions can bypass security checks, corrupt data, cause double-spending in payment systems, and lead to privilege escalation. They're often difficult to detect and reproduce.

Understanding Race Conditions

Common Race Condition Types

  • โ€ข TOCTOU: Time-of-Check-Time-of-Use
  • โ€ข Database races: Concurrent transactions
  • โ€ข File system races: Symlink attacks
  • โ€ข State transitions: Order-dependent operations
  • โ€ข Double-spend: Balance check race

Attack Scenario

  1. Thread 1 checks if file exists
  2. Thread 2 creates malicious symlink
  3. Thread 1 opens file (now follows symlink)
  4. Attacker reads/writes sensitive file

TOCTOU Prevention

Best Practice

Eliminate the window between check and use by performing atomic operations. Use "try and fail" instead of "check then act".

Vulnerable File Access (TOCTOU)

javascript
// โŒ VULNERABLE: Check-then-act pattern
if (fs.existsSync(filename)) {
    // Race window here! File could be deleted or replaced
    const content = fs.readFileSync(filename);
    processFile(content);
}

// โœ… SECURE: Try-and-catch pattern
try {
    const content = fs.readFileSync(filename);
    processFile(content);
} catch (err) {
    // Handle missing file
}

// โœ… SECURE: Atomic file operations (Node.js)
const fd = fs.openSync(filename, 'r', 0o666);
try {
    const content = fs.readFileSync(fd);
    processFile(content);
} finally {
    fs.closeSync(fd);
}

Python Safe File Operations

python
# โŒ VULNERABLE: Separate check and open
import os
if os.path.exists(filename):
    with open(filename) as f:  # Race window!
        data = f.read()

# โœ… SECURE: Try to open directly
try:
    with open(filename, 'r') as f:
        data = f.read()
except FileNotFoundError:
    # Handle missing file
    pass

# โœ… SECURE: Use exclusive creation flag
import os
try:
    # O_CREAT | O_EXCL ensures atomic creation
    fd = os.open(filename, os.O_CREAT | os.O_EXCL | os.O_WRONLY, 0o600)
    with os.fdopen(fd, 'w') as f:
        f.write(sensitive_data)
except FileExistsError:
    # File already exists - safe failure
    pass

Database Race Conditions

Vulnerable Balance Check

sql
-- โŒ VULNERABLE: Separate read and update
-- Thread 1: SELECT balance = 100
-- Thread 2: SELECT balance = 100
-- Thread 1: UPDATE balance = 100 - 60 = 40
-- Thread 2: UPDATE balance = 100 - 50 = 50  โ† Wrong! Should be -10

-- Query 1: Check balance
SELECT balance FROM accounts WHERE user_id = 123;

-- Race window here!

-- Query 2: Deduct money
UPDATE accounts 
SET balance = balance - 50 
WHERE user_id = 123;

Secure: Atomic Operations

sql
-- โœ… SECURE: Single atomic transaction with constraint
UPDATE accounts 
SET balance = balance - 50 
WHERE user_id = 123 
  AND balance >= 50;  -- Check happens atomically

-- Check affected rows
IF @@ROWCOUNT = 0 THEN
    RAISE 'Insufficient funds';
END IF;

-- โœ… SECURE: Use database transactions with serializable isolation
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    
    SELECT balance FROM accounts WHERE user_id = 123 FOR UPDATE;
    
    -- Row is now locked, no other transaction can modify it
    
    IF balance >= 50 THEN
        UPDATE accounts SET balance = balance - 50 WHERE user_id = 123;
    ELSE
        ROLLBACK;
        RAISE 'Insufficient funds';
    END IF;
    
COMMIT;

Application Synchronization

Python Threading Locks

python
import threading

# โŒ VULNERABLE: No synchronization
balance = 1000

def withdraw(amount):
    global balance
    if balance >= amount:
        # Race window!
        balance -= amount
        return True
    return False

# โœ… SECURE: Use lock for critical section
balance = 1000
balance_lock = threading.Lock()

def withdraw_safe(amount):
    global balance
    with balance_lock:
        if balance >= amount:
            balance -= amount
            return True
        return False

# โœ… SECURE: Use thread-safe data structures
from queue import Queue
from threading import Semaphore

# Semaphore for resource limiting
max_connections = Semaphore(10)

def process_request():
    with max_connections:
        # Only 10 threads can be here at once
        handle_connection()

Java Synchronized Methods

java
// โŒ VULNERABLE: Non-atomic check-then-act
private int balance = 1000;

public boolean withdraw(int amount) {
    if (balance >= amount) {  // Race!
        balance -= amount;
        return true;
    }
    return false;
}

// โœ… SECURE: Synchronized method
private int balance = 1000;

public synchronized boolean withdraw(int amount) {
    if (balance >= amount) {
        balance -= amount;
        return true;
    }
    return false;
}

// โœ… SECURE: Use concurrent collections
import java.util.concurrent.atomic.AtomicInteger;

private AtomicInteger balance = new AtomicInteger(1000);

public boolean withdraw(int amount) {
    while (true) {
        int current = balance.get();
        if (current < amount) {
            return false;
        }
        int next = current - amount;
        if (balance.compareAndSet(current, next)) {
            return true;  // Success!
        }
        // CAS failed, retry
    }
}

// โœ… SECURE: ReentrantLock with conditions
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;

private final Lock lock = new ReentrantLock();

public boolean withdraw(int amount) {
    lock.lock();
    try {
        if (balance >= amount) {
            balance -= amount;
            return true;
        }
        return false;
    } finally {
        lock.unlock();
    }
}

Distributed System Races

Redis Distributed Locks

javascript
// โœ… SECURE: Redis distributed lock with Redlock algorithm
const redis = require('redis');
const { promisify } = require('util');

async function acquireLock(client, lockKey, ttl = 5000) {
    const lockValue = crypto.randomUUID();
    
    // SET with NX (only if not exists) and PX (TTL in ms)
    const result = await client.set(
        lockKey,
        lockValue,
        'NX',
        'PX',
        ttl
    );
    
    if (result === 'OK') {
        return lockValue;  // Lock acquired
    }
    return null;  // Lock held by another process
}

async function releaseLock(client, lockKey, lockValue) {
    // Use Lua script for atomic check-and-delete
    const script = `
        if redis.call("get", KEYS[1]) == ARGV[1] then
            return redis.call("del", KEYS[1])
        else
            return 0
        end
    `;
    
    await client.eval(script, 1, lockKey, lockValue);
}

// Usage
async function processWithLock() {
    const lockValue = await acquireLock(redisClient, 'resource:123');
    if (!lockValue) {
        throw new Error('Could not acquire lock');
    }
    
    try {
        // Critical section
        await performTransaction();
    } finally {
        await releaseLock(redisClient, 'resource:123', lockValue);
    }
}

Database-Level Pessimistic Locking

sql
-- โœ… SECURE: Row-level locking with SELECT FOR UPDATE
BEGIN;

-- Lock the row exclusively
SELECT * FROM inventory 
WHERE product_id = 123 
FOR UPDATE;

-- No other transaction can modify this row until we commit

UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = 123;

COMMIT;

-- โœ… SECURE: Named advisory locks (PostgreSQL)
SELECT pg_advisory_lock(123456);

-- Critical section - only one session can hold this lock

SELECT pg_advisory_unlock(123456);

-- โœ… SECURE: Optimistic locking with version column
UPDATE products 
SET 
    quantity = quantity - 1,
    version = version + 1
WHERE 
    product_id = 123 
    AND version = @expected_version;

-- Check affected rows
IF @@ROWCOUNT = 0 THEN
    RAISE 'Concurrent modification detected';
END IF;

State Transition Safety

javascript
// โŒ VULNERABLE: Unprotected state transitions
class Order {
    constructor() {
        this.status = 'pending';
    }
    
    approve() {
        this.status = 'approved';  // Race!
    }
    
    cancel() {
        this.status = 'cancelled';  // Could overwrite approval!
    }
}

// โœ… SECURE: Atomic state transitions with validation
class OrderSafe {
    constructor() {
        this.status = 'pending';
        this.version = 0;
    }
    
    async transitionTo(newStatus, expectedVersion) {
        // Database enforces version check atomically
        const result = await db.query(
            `UPDATE orders 
             SET status = $1, version = version + 1
             WHERE id = $2 
               AND status = $3
               AND version = $4`,
            [newStatus, this.id, this.status, expectedVersion]
        );
        
        if (result.rowCount === 0) {
            throw new Error('Invalid state transition or concurrent modification');
        }
        
        this.status = newStatus;
        this.version++;
    }
    
    async approve() {
        if (this.status !== 'pending') {
            throw new Error('Can only approve pending orders');
        }
        await this.transitionTo('approved', this.version);
    }
}

// โœ… SECURE: State machine with explicit transitions
const validTransitions = {
    'pending': ['approved', 'cancelled'],
    'approved': ['shipped'],
    'shipped': ['delivered'],
    'cancelled': [],  // Terminal state
    'delivered': []   // Terminal state
};

function canTransition(currentState, newState) {
    return validTransitions[currentState]?.includes(newState) || false;
}

๐Ÿงช Testing Verification

Testing Challenge

Race conditions are notoriously difficult to test because they're timing-dependent. Use stress testing, thread analysis tools, and code review to identify vulnerabilities.

Concurrent Testing (Python)

python
import threading
import time

# Test for race conditions with concurrent threads
def test_concurrent_withdrawal():
    balance = Account(1000)
    threads = []
    
    # Try to withdraw $600 twice simultaneously (should fail once)
    def withdraw_thread():
        result = balance.withdraw(600)
        print(f"Withdrawal {'succeeded' if result else 'failed'}")
    
    # Launch multiple threads
    for _ in range(2):
        t = threading.Thread(target=withdraw_thread)
        threads.append(t)
        t.start()
    
    # Wait for completion
    for t in threads:
        t.join()
    
    final_balance = balance.get_balance()
    print(f"Final balance: {final_balance}")
    
    # Should be either 400 (one succeeded) or 1000 (both failed)
    # Should NEVER be -200 (both succeeded - race condition!)
    assert final_balance in [400, 1000], "Race condition detected!"

# Stress test with many threads
def stress_test_race_condition():
    account = Account(10000)
    threads = []
    
    def worker():
        for _ in range(100):
            account.withdraw(1)
            time.sleep(0.0001)  # Increase race window
    
    # 10 threads, 100 withdrawals each = 1000 total
    for _ in range(10):
        t = threading.Thread(target=worker)
        threads.append(t)
        t.start()
    
    for t in threads:
        t.join()
    
    expected_balance = 10000 - 1000
    actual_balance = account.get_balance()
    
    print(f"Expected: {expected_balance}, Actual: {actual_balance}")
    assert actual_balance == expected_balance, "Race condition detected!"

Testing Tools

Race Condition Detection Tools

  • โ€ข ThreadSanitizer (TSan): Detects data races in C/C++/Go
  • โ€ข Java Concurrency Stress Tests: jcstress framework
  • โ€ข Python threading analysis: pytest-timeout, pytest-xdist
  • โ€ข Static analysis: FindBugs, SpotBugs (Java), Pylint (Python)
  • โ€ข Stress testing: Apache JMeter, Gatling, Locust

โš ๏ธ Common Mistakes

โŒ Using sleep() as synchronization

Adding arbitrary delays doesn't fix races - it just makes them harder to reproduce. Use proper locks.

python
// โŒ WRONG: Sleep is not synchronization!
if (!fileExists(file)) {
    time.sleep(0.1);  // Attacker still wins this race
    createFile(file);
}

โŒ Double-checked locking without volatile

Classic broken singleton pattern - requires proper memory barriers.

java
// โŒ BROKEN: Double-checked locking without volatile
private static Singleton instance;

public static Singleton getInstance() {
    if (instance == null) {  // Check 1 (unsynchronized)
        synchronized (Singleton.class) {
            if (instance == null) {  // Check 2 (synchronized)
                instance = new Singleton();  // Can be partially constructed!
            }
        }
    }
    return instance;
}

// โœ… CORRECT: Add volatile keyword
private static volatile Singleton instance;

โŒ Locking at wrong granularity

Lock too much = performance issues. Lock too little = race conditions.

java
// โŒ TOO COARSE: Entire method locked
public synchronized void processLargeFile() {
    readFile();      // Lock held during I/O - bad!
    processData();   // Lock held during computation - bad!
    writeResults();  // Lock held during more I/O - bad!
}

// โœ… CORRECT: Lock only critical section
public void processLargeFile() {
    byte[] data = readFile();      // No lock
    byte[] results = processData(data);  // No lock
    
    synchronized (this) {
        writeResults(results);     // Lock only during shared state modification
    }
}

Framework Solutions

Django (Python)

python
from django.db import transaction
from django.db.models import F

# โœ… Atomic update
Product.objects.filter(
    id=product_id
).update(
    quantity=F('quantity') - 1
)

# โœ… Select for update
with transaction.atomic():
    product = Product.objects.select_for_update().get(id=product_id)
    if product.quantity > 0:
        product.quantity -= 1
        product.save()

Rails (Ruby)

ruby
# โœ… Optimistic locking
class Product < ApplicationRecord
  self.locking_column = :lock_version
end

product.update!(quantity: product.quantity - 1)
# Raises StaleObjectError on concurrent modification

# โœ… Pessimistic locking
Product.transaction do
  product = Product.lock.find(product_id)
  product.quantity -= 1
  product.save!
end