#child-process #parallel #memory-leaks #multi-threading #cursed #assume

fork-map

A crate for running operations in a child process spawned by fork()

4 releases

0.1.3 Mar 5, 2024
0.1.2 Feb 14, 2024
0.1.1 Feb 13, 2024
0.1.0 Feb 13, 2024

#402 in Concurrency

Download history 5/week @ 2024-09-20 1/week @ 2024-09-27

156 downloads per month

MIT license

11KB

fork-map

A Rust library for running operations in a child process spawned by fork(). Embrace fearful concurrency: fearful that your worker task will mess up your memory space.

Example

use fork_map::fork_map;

pub fn do_with_fork(value: u64) -> u64 {
    // Spawn a child process with a copy-on-write copy of memory
    unsafe {
        fork_map(|| {
            // Do some obnoxious operation with `value`
            // * Maybe it leaks memory
            // * Maybe it uses static resources unsafely and 
            //   prevents multi-threaded operation
            // * Maybe you couldn't figure out how to
            //   send your data to a thread
            Ok(value * 10)
        }).unwrap()
    }
    // Execution continues after the child process has exited
}

Motivation

Some operations work best if run in their own process. Whether they impose single-threaded restrictions, they consume untold resources when left running, or you just want to abuse copy-on-write memory to eliminate startup time, sometimes you really just want to fork and map. My main uses for this crate have been trying to embed libClang, which unsafely uses static memory because it assumes it is running single-threaded, and running operations that leak memory.

Implementation and Support

fork_map is written using libc::fork and as such, will only work properly on *nix based systems that support fork (sorry Windows users!). Since the child process inherits the parent's memory space (as copy-on-write), there are no constraints on the input value or the operation. The result value is serialized using serde_json and sent over a libc file handle via some incredibly C-inspired unsafe io code. The parent process reads the data from the file and waits for the child to exit before returning.

Use with rayon

It is generally expected that you will want to use this crate in conjunction with something like rayon since the call to fork_map blocks the thread of execution until the child process returns. In combination, you can have rayon coordinate a pool of worker threads that each spawn and control child processes with minimal boilerplate. A lot of my use cases end up looking something like this:

use fork_map::fork_map;
use rayon::prelude::*;

pub fn main() {
    let my_big_list = [ /* ... */ ];
    
    // Create a worker pool with rayon's into_par_iter
    let results = my_big_list.into_par_iter().map(|item| {
        // Have each worker spawn a child process for the
        // operations we don't want polluting the parent's memory
        unsafe {
            fork_map(|| {
                // Do your ugly operations here
                Ok(item * 1234)
            }).expect("fork_map succeeded")
        }
    }).collect::<Vec<_>>();

    // Use results here
}

If you have a lot of small tasks that you can run on a child process, you can use rayon's chunks() function and eliminate much of the overhead from calling fork() a lot (which can be significant):

use fork_map::fork_map;
use rayon::prelude::*;

pub fn main() {
    let my_big_list = [ /* ... */ ];
    
    // Use rayon's chunks() to give each forked process more
    // work to handle, if you have a lot of small tasks
    let results = my_big_list
        .into_par_iter()
        .chunks(512)
        .map(|items| {
            // Now each child process does 512 items at once
            unsafe {
                fork_map(|| {
                    let mut results = vec![];
                    // Maybe this operation is only mildly heinous
                    // and we can do 512 of them before the child
                    // process needs to be restarted.
                    for item in items {
                        results.push(item * 1234);
                    }
                    Ok(results)
                }).expect("fork_map succeeded")
            }
        })
        .collect::<Vec<_>>();
}

Safety

Due to the nature of fork(), this function is very unsound and likely violates most of Rust's guarantees about lifetimes, considering all of your memory gets duplicated into a second process, even though it calls exit(0) after your closure is executed. Any threads other than the one calling fork_map will not be present in the new process, so threaded lifetime guarantees are also violated. Don't even think about using async executors with this.

Dependencies

~0.8–1.8MB
~38K SLoC