3 releases
0.1.2 | Jun 1, 2019 |
---|---|
0.1.1 | May 25, 2019 |
0.1.0 | May 25, 2019 |
#334 in FFI
32 downloads per month
Used in 5 crates
(via ndless)
47KB
1K
SLoC
Utilities related to FFI bindings, for embedded platforms that use Unix-like conventions. This is mostly copy & pasted from the Rust standard library.
Note that OsString and CString require the alloc
feature enabled
in your Cargo.toml.
This module provides utilities to handle data across non-Rust interfaces, like other programming languages and the underlying operating system. It is mainly of use for FFI (Foreign Function Interface) bindings and code that needs to exchange C-like strings with other languages.
Overview
Rust represents owned strings with the String
type, and
borrowed slices of strings with the [str
] primitive. Both are
always in UTF-8 encoding, and may contain nul bytes in the middle,
i.e., if you look at the bytes that make up the string, there may
be a \0
among them. Both String
and str
store their length
explicitly; there are no nul terminators at the end of strings
like in C.
C strings are different from Rust strings:
-
Encodings - Rust strings are UTF-8, but C strings may use other encodings. If you are using a string from C, you should check its encoding explicitly, rather than just assuming that it is UTF-8 like you can do in Rust.
-
Character size - C strings may use
char
orwchar_t
-sized characters; please note that C'schar
is different from Rust's. The C standard leaves the actual sizes of those types open to interpretation, but defines different APIs for strings made up of each character type. Rust strings are always UTF-8, so different Unicode characters will be encoded in a variable number of bytes each. The Rust typechar
represents a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'. -
Nul terminators and implicit string lengths - Often, C strings are nul-terminated, i.e., they have a
\0
character at the end. The length of a string buffer is not stored, but has to be calculated; to compute the length of a string, C code must manually call a function likestrlen()
forchar
-based strings, orwcslen()
forwchar_t
-based ones. Those functions return the number of characters in the string excluding the nul terminator, so the buffer length is reallylen+1
characters. Rust strings don't have a nul terminator; their length is always stored and does not need to be calculated. While in Rust accessing a string's length is a O(1) operation (because the length is stored); in C it is an O(length) operation because the length needs to be computed by scanning the string for the nul terminator. -
Internal nul characters - When C strings have a nul terminator character, this usually means that they cannot have nul characters in the middle — a nul character would essentially truncate the string. Rust strings can have nul characters in the middle, because nul does not have to mark the end of the string in Rust.
Representations of non-Rust strings
CString
and CStr
are useful when you need to transfer
UTF-8 strings to and from languages with a C ABI, like Python.
-
From Rust to C:
CString
represents an owned, C-friendly string: it is nul-terminated, and has no internal nul characters. Rust code can create aCString
out of a normal string (provided that the string doesn't have nul characters in the middle), and then use a variety of methods to obtain a raw*mut
[u8
] that can then be passed as an argument to functions which use the C conventions for strings. -
From C to Rust:
CStr
represents a borrowed C string; it is what you would use to wrap a raw*const
[u8
] that you got from a C function. ACStr
is guaranteed to be a nul-terminated array of bytes. Once you have aCStr
, you can convert it to a Rust [&str
][str
] if it's valid UTF-8, or lossily convert it by adding replacement characters.
OsString
and OsStr
are useful when you need to transfer
strings to and from the operating system itself, or when capturing
the output of external commands. Conversions between OsString
,
OsStr
and Rust strings work similarly to those for CString
and CStr
.
-
OsString
represents an owned string in whatever representation the operating system prefers. In the Rust standard library, various APIs that transfer strings to/from the operating system useOsString
instead of plain strings. -
OsStr
represents a borrowed reference to a string in a format that can be passed to the operating system. It can be converted into an UTF-8 Rust string slice in a similar way toOsString
.
Conversions
On Unix
On Unix, OsStr
implements the
OsStrExt
trait, which
augments it with two methods, from_bytes
and as_bytes
.
These do inexpensive conversions from and to UTF-8 byte slices.
Additionally, on Unix OsString
implements the
OsStringExt
trait,
which provides from_vec
and into_vec
methods that consume
their arguments, and take or produce vectors of [u8
].
Dependencies
~170–315KB