#c-str #c-strings #ffi

nightly thin_cstr

An experimental crate which provides a truly thin std::ffi::CStr

2 releases

Uses old Rust 2015

0.1.1 Dec 22, 2019
0.1.0 Nov 16, 2017

#739 in Rust patterns

MIT/Apache

50KB
538 lines

Important: Currently the extern type-based CStr will not work in containers involving size_of_val, such as Box<CStr>. Read https://github.com/kennytm/thin_cstr/issues/1 and https://github.com/rust-lang/rust/pull/64021 before trying to use a thin CStr.

Pre-RFC: Make *CStr a Thin Pointer

Summary

Make *CStr a thin pointer via extern type (RFC 1861). CStr::from_ptr() will become zero-cost, while CStr::to_bytes() will incur a length calculation.

Motivation

The CStr type was introduced in RFC 592 during Rust 1.0-alpha as a replacement of the slice type [c_char], where one of the motivations was

… in order to construct a slice (or a dynamically sized newtype wrapping a slice), its length has to be determined, which is unnecessary for the consuming FFI function that will only receive a thin pointer. …

However, Rust at that time only supported three kinds of dynamic-sized types: str, [T] and trait objects, where all of them become fat pointers when referenced. An attempt to introduce DST with thin pointer was made as RFC 709, but due to time constraint close to the release of 1.0, it was postponed and kept as a low-priority issue.

Thus the implementation of CStr chose to wrap a [c_char] and provides the following FIXME:

pub struct CStr {
    // FIXME: this should not be represented with a DST slice but rather with
    //        just a raw `c_char` along with some form of marker to make
    //        this an unsized type. Essentially `sizeof(&CStr)` should be the
    //        same as `sizeof(&c_char)` but `CStr` should be an unsized type.
    inner: [c_char]
}

Fast forward to 2017, extern type (RFC 1861) was introduced to represent opaque FFI types which are fairly popular in C as a way to hide implementation detail. These types have unspecified size in the public interface, and also are represented as thin pointers. The extern type RFC was accepted and implemented as an unstable feature in Rust 1.23.

With the introduction of extern type, suddenly we have a way to fix the FIXME by changing the inner slice into such extern type:

extern {
    type CStrInner;
}
#[repr(C)]
pub struct CStr {
    inner: CStrInner,
}

Thus this RFC is proposed to gauge interest if we really want to fix this issue, and sort out potential unsafety before merging into the standard library.

Guide-level explanation

The main implication of making *CStr thin is that the length is no longer stored alongside the pointer. Some signficant changes are:

  • CStr becomes #[repr(C)] and its pointer type should be compatible with char* in C.
  • CStr::from_ptr becomes free.
  • CStr::to_bytes and other getter methods now require length calculation.

Fortunately the documentation of std::ffi::CStr already included tons of warnings about future changes, so we could assume users not relying on these performance characteristics in code.

Reference-level explanation

An implementation of such change is available as the thin_cstr crate, and the source code is available at https://github.com/kennytm/thin_cstr.

The change only affects the unsized CStr type. The owned CString type will not be modified.

Drawbacks

Assuming the C string has length n,

Function Before After
from_ptr O(n) O(1)
from_bytes_with_nul O(n) O(n)
from_bytes_with_nul_unchecked O(1) O(1)
as_ptr O(1) O(1)
to_bytes O(1) O(n)
to_bytes_with_nul O(1) O(n)
to_str O(n) O(n)
to_string_lossy O(n) O(n)
into_c_string O(1) O(n)

Here, only CStr::from_ptr has become a zero-cost function, all other methods either still have the same cost or become even slower. One particular issue is CStr::into_c_string, which was stabilized in 1.20 but without the performance warning.

In rustc alone, most use of CStr will immediately convert it to a byte-slice or string, which gives no performance advantage or disadvantage. Even worse, if we create the &CStr via CStr::from_bytes_with_nul, the length calculation cost will be doubled.

let s = CStr::from_ptr(last_error).to_bytes();

Rationale and alternatives

The main rationale of this RFC is that *CStr being fat was considered a bug. An obvious alternative is "not do this", accepting a fat *CStr as a feature. In this case, we would modify the documentation and get rid of all mentions of potential performance changes.

We currently use extern type as this is the only way to get a thin DST. Extern types will not automatically implement auto traits (Send, Sync, UnwindSafe, RefUnwindSafe, etc), while a [c_char] slice will. Currently Freeze cannot be implemented at all since it is private in libcore (although it is expected and losing it will not affect language semantics). Furthermore, it means whenever a new auto-trait is introduced (probably by third-party), it will need to be manually implemented for CStr. If this semantics of extern type cannot be tolerated, we may need to consider reviving the custom DST RFC (RFC 1524) for more control.

Unresolved questions

How to make the thin CStr implement Freeze. (irrelevant)

No runtime deps