Skip to content

Race condition: 'Failed to write executable' (EPERM) when multiple dx serve instances compete for shared build artifacts #5275

@ThomasSteinbach

Description

@ThomasSteinbach

Bug Description

When multiple dx serve instances start simultaneously on macOS, they fail with:

ERROR dx::serve: Build failed: Failed to write executable
1: Operation not permitted (os error 1)

This occurs when multiple processes compete for shared build artifacts in target/dx/ during initial build.

Environment

  • OS: macOS 14.2+ (likely affects all macOS versions)
  • Dioxus Version: 0.7.3
  • Rust: 1.93.0
  • Platform: Fullstack (web + server)

Steps to Reproduce

# Start multiple dx serve instances with staggered delays
# They all try to build to the same target/dx/ directory

for i in {1..10}; do
    PORT=$((8080 + i))
    dx serve --package myapp --port $PORT &
    sleep 3  # 2-6 second delays trigger the race condition
done

# Check logs - one or more instances will fail with EPERM

When This Occurs

  1. AI-assisted development: AI agents may start multiple server instances without checking if one is already running
  2. Rapid restarts: Developers quickly restarting servers during debugging (2-6s intervals)
  3. Automation/CI: Scripts that don't check for existing instances
  4. Accidental multiple terminals: Opening multiple terminals and running dx serve in each

Expected Behavior

Only one dx serve instance should run per project, or instances should coordinate to avoid race conditions.

Actual Behavior

Multiple instances compete for the same target/dx/<package>/debug/web/ directory. When they try to write executables simultaneously, file lock conflicts cause EPERM errors.

Root Cause Analysis

Location: packages/cli/src/build/request.rs:1417

BundleFormat::Server => {
    std::fs::create_dir_all(self.exe_dir())?;
    std::fs::copy(exe, self.main_exe())?;  // ← FAILS HERE
}

Race condition sequence:

  1. Multiple dx serve processes start within 2-6 seconds
  2. All attempt to build to the same target/dx/<package>/debug/web/ directory
  3. They race to write executables with hash suffixes (e.g., panelist-abc123)
  4. File lock conflicts → EPERM

Why it happens on macOS: When multiple processes try to write to the same file location, macOS file locking causes "Operation not permitted" errors.

Test Results

Scenario Result
Single server, file changes (hot-reload enabled) ✅ 15/15 rebuilds successful
Single server, file changes (hot-reload disabled) ✅ 10/10 rebuilds successful
10 parallel server starts (hot-reload enabled) ❌ 1/10 had EPERM
10 parallel server starts (hot-reload disabled) ❌ 1/10 had EPERM

Note: Hot-reload is unrelated to this issue. The race condition occurs during server startup, not during file-change rebuilds.

Proposed Fixes

Option 1: Add retry logic with exponential backoff (simplest)

fn copy_with_retry(src: &Path, dst: &Path) -> Result<()> {
    for attempt in 0..5 {
        match std::fs::copy(src, dst) {
            Ok(_) => return Ok(()),
            Err(e) if e.kind() == ErrorKind::PermissionDenied && attempt < 4 => {
                std::thread::sleep(Duration::from_millis(50 * 2_u64.pow(attempt)));
            }
            Err(e) => return Err(e.into()),
        }
    }
    bail!("Failed to copy executable after retries")
}

Option 2: Use atomic rename (best practice)

BundleFormat::Server => {
    let temp_exe = self.exe_dir().join(format!(".{}.tmp", self.platform_exe_name()));
    std::fs::create_dir_all(self.exe_dir())?;
    std::fs::copy(exe, &temp_exe)?;
    std::fs::rename(temp_exe, self.main_exe())?;  // Atomic on Unix
}

Option 3: Per-project lock file (prevents multiple instances)

// In dx serve startup:
let lock_path = project_dir.join("target/.dx_serve.lock");
let lock_file = std::fs::OpenOptions::new()
    .create(true)
    .write(true)
    .open(&lock_path)?;

use std::os::unix::fs::OpenOptionsExt;
use std::os::unix::io::AsRawFd;

// Try to acquire exclusive lock
if flock(lock_file.as_raw_fd(), LOCK_EX | LOCK_NB) != 0 {
    bail!("Another dx serve instance is already running for this project");
}

Recommendation: Option 3 prevents the issue entirely by ensuring only one dx serve runs per project.

Workaround

Use a PID lock file wrapper script to prevent multiple instances:

#!/bin/bash
# safe_dx_serve.sh - Prevents multiple server instances

LOCKFILE="/tmp/dx_serve_${PROJECT_NAME}.lock"

# Check if another instance is running
if [ -f "$LOCKFILE" ]; then
    EXISTING_PID=$(cat "$LOCKFILE")
    if ps -p "$EXISTING_PID" > /dev/null 2>&1; then
        echo "❌ Server already running (PID: $EXISTING_PID)"
        exit 1
    else
        rm -f "$LOCKFILE"  # Stale lock file
    fi
fi

# Start server and write PID to lock file
dx serve --interactive false --package myapp --verbose &
SERVER_PID=$!
echo "$SERVER_PID" > "$LOCKFILE"

trap "rm -f $LOCKFILE" EXIT
wait "$SERVER_PID"

Additional Context

  • This affects all macOS users when multiple instances start
  • First build always succeeds (no competition)
  • Linux/Windows may have similar issues but with different timing
  • Hot-reload during development works fine - this is a startup race condition

Related Code

  • packages/cli/src/build/request.rs:1416-1417 - Copy operation
  • packages/cli/src/build/builder.rs - Build process
  • packages/cli/src/serve/runner.rs - Server startup logic

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions