#logging #cortex-m #arm #log

no-std cortex-m-funnel

A lock-free, wait-free, block-free logger for the ARM Cortex-M architecture

1 unstable release

0.1.0-alpha.1 Nov 27, 2019

#527 in #log

MIT/Apache

30KB
440 lines

cortex-m-funnel

A lock-free, wait-free, block-free logger for the ARM Cortex-M architecture

(lock-free as in logging doesn't block interrupt handlers; wait-free as in there's no spinning (e.g. CAS loop) to get a handle; and block-free as in the logger never waits for an I/O transfer (e.g. ITM, UART, etc.) to complete)

Status: ☢️ Experimental ☢️ (ALPHA PRE-RELEASE)

Documentation

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.


lib.rs:

A lock-free, wait-free, block-free logger for the ARM Cortex-M architecture

(lock-free as in logging doesn't block interrupt handlers; wait-free as in there's no spinning (e.g. CAS loop) to get a handle; and block-free as in the logger never waits for an I/O transfer (e.g. ITM, UART, etc.) to complete)

Status: ☢️ Experimental ☢️ (ALPHA PRE-RELEASE)

SUPER IMPORTANT Using this crate in a threaded environment will result in an unsound program! You have been warned! Also, multi-core support has not been thought out at all so this is likely wrong when used in multi-core context.

Working principle

There's one ring buffer per priority level. Logging from an interrupt / exception handler will simply write the message into one of these ring buffers. Thus logging is effectively 'I/O less' and as fast as a memcpy. Only the 'thread handler' (AKA main or idle in RTFM apps) can drain these ring buffers into an appropriate I/O sink (e.g. the ITM).

Nothing is without trade-offs in this life; this logger uses plenty of static memory (i.e. RAM) in exchange for fast and predictable logging performance. Also, compared to loggers that directly do I/O this logger will, in overall, spend more CPU cycles to log the same amount of data but most of the work will be done at the lowest priority making logging in interrupt handlers much faster.

Examples

Usual setup

Application crate:

// aligned = "0.3.2"
use aligned::Aligned;
use cortex_m::itm;

use funnel::{Drain, funnel, info, trace};

// `NVIC_PRIO_BITS` is the number of priority bits supported by the device
//
// The `NVIC_PRIO_BITS` value can be a literal integer (e.g. `3`) or a path to a constant
// (`stm32f103xx::NVIC_PRIO_BITS`)
//
// This macro call can only appear *once* in the dependency graph and *must* appear if any
// of the `funnel` macros or the `Logger::get()` API is used anywhere in the dependency graph
funnel!(NVIC_PRIO_BITS = 3, {
     // syntax: $logical_priority : $ring_buffer_size_in_bytes
     // to get better performance use sizes that are a power of 2
     1: 32,
     2: 64,

     // not listing a priority here disables logging at that priority level
     // entering the wrong NVIC_PRIO_BITS value will disable most loggers
});

#[entry]
fn main() -> ! {
    // ..
    let mut itm: ITM = /* .. */;

    let drains = Drain::get_all();

    let mut buf = Aligned([0; 32]); // 4-byte aligned buffer
    loop {
        for (i, drain) in drains.iter().enumerate() {
            'l: loop {
                let n = drain.read(&mut buf).len();

                // this drain is empty
                if n == 0 {
                    break 'l;
                }

                // we need this coercion or the slicing below won't do the right thing
                let buf: &Aligned<_, [_]> = &buf;

                // will send data in 32-bit chunks
                itm::write_aligned(&mut itm.stim[i], &buf[..n]);
            }
        }
    }
}

// logical_priority = 1 (nvic_priority = 224)
#[interrupt]
fn GPIOA() {
    info!("GPIOA");
    foo(0);
    // ..
}

// logical_priority = 2 (nvic_priority = 192)
#[interrupt]
fn GPIOB() {
    info!("GPIOB");
    foo(1);
    // ..
}

fn foo(x: i32) {
    // this macro can appear in libraries
    trace!("foo({})", x);
    // ..
}

Logger

The overhead of each macro call can be reduced using one of the uwrite! macros on a Logger. A Logger can only be obtained using the Logger::get() constructor.

use funnel::{Logger, log_enabled};

#[interrupt]
fn GPIOC() {
    if let Some(mut logger) = Logger::get() {
         if log_enabled!(Info) {
             uwriteln!(logger, "{}", 100).ok();
             uwriteln!(logger, "{:?}", some_value).ok();
         }
    }
}

Logging levels

funnel supports 5 logging level: Trace, Debug, Info, Warn and Error, sorted in increasing level of severity. Each of these logging level has an associated logging macro: trace!, debug!, info!, warn! and error!.

Logs of lesser severity can be statically disabled using of these Cargo features.

  • max_level_trace
  • max_level_debug
  • max_level_info
  • max_level_warn
  • max_level_error
  • max_level_off
  • release_max_level_trace
  • release_max_level_debug
  • release_max_level_info
  • release_max_level_warn
  • release_max_level_error
  • release_max_level_off

Enabling the max_level_info feature will disable the Debug and Trace logging levels; max_level_off will disable all logging levels. The release_* features apply when the application is compiled using the 'release' profile; the other features apply when the 'dev' profile is used. To check if a logging level is enabled or disabled in code use the log_enabled! macro.

Benchmarks

Ran on Cortex-M3 core clocked at 8 MHz and configured with 0 Flash wait cycles.

Code Cycles
info!("") 36
uwriteln!(logger, "") 15
drain("") 27
info!("{}", S) 331-369
uwriteln!(logger, "{}", S) 308-346
drain(S) 863-916
iprintln!(_, "{}", S) 1652
info!("{}", N) 348-383
uwriteln!(logger, "{}", N) 329-364
drain(N) 217-230

Where S is a 45-byte long string, N = usize::max_value(), the drain function is ptr::read_volatile-ing each byte and the ITM was clocked at 2 MHz.

Potential improvements / alternatives

Instead of draining the ring buffers at the lowest priority one could drain the buffers using the debugger using something like SEGGER's Real Time Transfer mechanism. The implementation would need to change to properly support this form of parallel draining.

Dependencies

~1.5MB
~38K SLoC