6 releases (3 breaking)
0.4.1 | Dec 19, 2024 |
---|---|
0.4.0 | Dec 16, 2024 |
0.3.0 | Jun 26, 2024 |
0.2.1 | Jun 13, 2024 |
0.1.0 | Feb 20, 2024 |
#272 in Parser implementations
119,966 downloads per month
Used in 18 crates
(3 directly)
120KB
2.5K
SLoC
Differences from toml
First off I just want to be up front and clear about the differences/limitations of this crate versus toml
- No
serde
support for deserialization, there is aserde
feature, but that only enables serialization of theValue
andSpanned
types. - No toml serialization. This crate is only intended to be a span preserving deserializer, there is no intention to provide serialization to toml, especially the advanced format preserving kind provided by
toml-edit
. - No datetime deserialization. It would be trivial to add support for this (behind an optional feature), I just have no use for it at the moment. PRs welcome.
Why does this crate exist?
The problem
This crate was specifically made to suit the needs of cargo-deny, namely, that it can always retrieve the span of any toml item that it wants to. While the toml crate can also produce span information via toml::Spanned there is one rather significant limitation, namely, that it must pass through serde. While in simple cases the Spanned
type works quite well, eg.
#[derive(serde::Deserialize)]
struct Simple {
/// This works just fine
simple_string: toml::Spanned<String>,
}
As soon as you have a more complicated scenario, the mechanism that toml
uses to get the span information breaks down.
#[derive(serde::Deserialize)]
#[serde(untagged)]
enum Ohno {
Integer(u32),
SpannedString(toml::Spanned<String>),
}
#[derive(serde::Deserialize)]
struct Root {
integer: Ohno,
string: Ohno
}
fn main() {
let toml = r#"
integer = 42
string = "we want this to be spanned"
"#;
let parsed: Root = toml::from_str(toml).expect("failed to deserialize toml");
}
thread 'main' panicked at src/main.rs:20:45:
failed to deserialize toml: Error { inner: Error { inner: TomlError { message: "data did not match any variant of untagged enum Ohno", original: Some("\ninteger = 42\nstring = \"we want this to be spanned\"\n"), keys: ["string"], span: Some(23..51) } } }
To understand why this fails we can look at what #[derive(serde::Deserialize)]
expand to for Ohno
in HIR.
#[allow(unused_extern_crates, clippy :: useless_attribute)]
extern crate serde as _serde;
#[automatically_derived]
impl <'de> _serde::Deserialize<'de> for Ohno {
fn deserialize<__D>(__deserializer: __D)
-> _serde::__private::Result<Self, __D::Error> where
__D: _serde::Deserializer<'de> {
let __content =
match #[lang = "branch"](<_serde::__private::de::Content as
_serde::Deserialize>::deserialize(__deserializer)) {
#[lang = "Break"] { 0: residual } =>
#[allow(unreachable_code)]
return #[lang = "from_residual"](residual),
#[lang = "Continue"] { 0: val } =>
#[allow(unreachable_code)]
val,
};
let __deserializer =
_serde::__private::de::ContentRefDeserializer<, ,
__D::Error>::new(&__content);
if let _serde::__private::Ok(__ok) =
_serde::__private::Result::map(<u32 as
_serde::Deserialize>::deserialize(__deserializer),
Ohno::Integer) { return _serde::__private::Ok(__ok); }
if let _serde::__private::Ok(__ok) =
_serde::__private::Result::map(<toml::Spanned<String> as
_serde::Deserialize>::deserialize(__deserializer),
Ohno::SpannedString) { return _serde::__private::Ok(__ok); }
_serde::__private::Err(_serde::de::Error::custom("data did not match any variant of untagged enum Ohno"))
}
}
What serde does in the untagged case is first deserialize into _serde::__private::de::Content
, an internal API container that is easiest to think of as something like serde_json::Value
. This is because serde speculatively parses each enum variant until one succeeds by passing a ContentRefDeserializer
that just borrows the deserialized Content
from earlier to satisfy the serde deserialize API consuming the Deserializer
. The problem comes because of how toml::Spanned
works, namely that it uses a hack to workaround the limitations of the serde API in order to "deserialize" the item as well as its span information, by the Spanned
object specifically requesting a set of keys from the toml::Deserializer
impl so that it can encode the span information as if it was a struct to satisfy serde. But serde doesn't know that when it deserializes the Content
object, it just knows that the Deserializer reports it has a string, int or what have you, and deserializes that, "losing" the span information. This problem also affects things like #[serde(flatten)]
for slightly different reasons, but they all basically come down to the serde API not truly supporting span information, nor any plans to.
How toml-span
is different
This crate works by just...not using serde
. The core of the crate is based off of basic-toml which itself a fork of toml v0.5
before it added a ton of features an complexity that...well, is not needed by cargo-deny or many other crates that only need deserialization.
Removing serde
support means that while deserialization must be manually written, which can be tedious in some cases, while doing the porting of cargo-deny I actually came to appreciate it more and more due to a couple of things.
- Maximal control.
toml-span
does an initial deserialization pass intotoml_span::value::Value
which keeps span information for both keys and values, and provides helpers (namelyTableHelper
), but other than satisfying thetoml_span::Deserialize
trait doesn't restrict you in how you want to deserialize your values, and you don't even have to use that if you don't want to. - While it's slower to manually write deserialization code rather than just putting on a few serde attributes, the truth is that that initial convenience carries a compile time cost in terms of
serde_derive
and all of its dependencies, as well as all of the code that is generated, for...ever. This is fine when you are prototyping, but becomes quite wasteful once you have (mostly/somewhat) stabilized your data format. - (optional) Span-based errors.
toml-span
provides thereporting
feature that can be enabled to havetoml_span::Error
be able to be converted into a Diagnostic which can provide nice error output if you use thecodespan-reporting
crate.
Usage
Simple
The most simple use case for toml-span
is just as slimmer version of toml
that also has a pointer API similar to serde_json allowing easy piecemeal deserialization of a toml document.
toml
version
fn is_crates_io_sparse(config: &toml::Value) -> Option<bool> {
config
.get("registries")
.and_then(|v| v.get("crates-io"))
.and_then(|v| v.get("protocol"))
.and_then(|v| v.as_str())
.and_then(|v| match v {
"sparse" => Some(true),
"git" => Some(false),
_ => None,
})
}
toml-span
version
fn is_crates_io_sparse(config: &toml_span::Value) -> Option<bool> {
match config.pointer("/registries/crates-io/protocol").and_then(|p| p.as_str())? {
"sparse" => Some(true),
"git" => Some(false),
_ => None
}
}
Common
Of course the most common case is deserializing toml into Rust containers.
toml
version
#[derive(Deserialize, Clone)]
#[cfg_attr(test, derive(Debug, PartialEq, Eq))]
#[serde(rename_all = "kebab-case", deny_unknown_fields)]
pub struct CrateBan {
pub name: Spanned<String>,
pub version: Option<VersionReq>,
/// One or more crates that will allow this crate to be used if it is a
/// direct dependency
pub wrappers: Option<Spanned<Vec<Spanned<String>>>>,
/// Setting this to true will only emit an error if multiple
// versions of the crate are found
pub deny_multiple_versions: Option<Spanned<bool>>,
}
toml-span
version
The following code is much more verbose (before proc macros run at least), but show cases something that moving cargo-deny to toml-span
allowed, namely, PackageSpec
.
Before toml-span
, all cases where a user specifies a crate spec, (ie, name + optional version requirement) was done via two separate fields, name
and version
. This was quite verbose, as in many cases not only is version
not specified, but also could be just a string if the user doesn't need/want to provide other fields. Normally one would use the string or struct idiom but this was impossible due to how I wanted to reorganize the data to have the package spec as either a string or struct, as well as optional data that is flattened to the same level as the package spec. But since toml-span
changes how deserialization is done, this change was quite trivial after the initial work of getting the crate stood up was done.
pub type CrateBan = PackageSpecOrExtended<CrateBanExtended>;
#[cfg_attr(test, derive(Debug, PartialEq, Eq))]
pub struct CrateBanExtended {
/// One or more crates that will allow this crate to be used if it is a
/// direct dependency
pub wrappers: Option<Spanned<Vec<Spanned<String>>>>,
/// Setting this to true will only emit an error if multiple versions of the
/// crate are found
pub deny_multiple_versions: Option<Spanned<bool>>,
/// The reason for banning the crate
pub reason: Option<Reason>,
/// The crate to use instead of the banned crate, could be just the crate name
/// or a URL
pub use_instead: Option<Spanned<String>>,
}
impl<'de> Deserialize<'de> for CrateBanExtended {
fn deserialize(value: &mut Value<'de>) -> Result<Self, DeserError> {
// The table helper provides convenience wrappers around a Value::Table, which
// is just a BTreeMap<Key, Value>
let mut th = TableHelper::new(value)?;
// Since we specify the keys manually there is no need for serde(rename/rename_all)
let wrappers = th.optional("wrappers");
let deny_multiple_versions = th.optional("deny-multiple-versions");
let reason = th.optional_s("reason");
let use_instead = th.optional("use-instead");
// Specifying None means that any keys that still exist in the table are
// unknown, producing an error the same as with serde(deny_unknown_fields)
th.finalize(None)?;
Ok(Self {
wrappers,
deny_multiple_versions,
reason: reason.map(Reason::from),
use_instead,
})
}
}
#[derive(Clone, PartialEq, Eq)]
pub struct PackageSpec {
pub name: Spanned<String>,
pub version_req: Option<VersionReq>,
}
impl<'de> Deserialize<'de> for PackageSpec {
fn deserialize(value: &mut Value<'de>) -> Result<Self, DeserError> {
use std::borrow::Cow;
struct Ctx<'de> {
inner: Cow<'de, str>,
split: Option<(usize, bool)>,
span: Span,
}
impl<'de> Ctx<'de> {
fn from_str(bs: Cow<'de, str>, span: Span) -> Self {
let split = bs
.find('@')
.map(|i| (i, true))
.or_else(|| bs.find(':').map(|i| (i, false)));
Self {
inner: bs,
split,
span,
}
}
}
let ctx = match value.take() {
ValueInner::String(s) => Ctx::from_str(s, value.span),
ValueInner::Table(tab) => {
let mut th = TableHelper::from((tab, value.span));
if let Some(mut val) = th.table.remove(&"crate".into()) {
let s = val.take_string(Some("a crate spec"))?;
th.finalize(Some(value))?;
Ctx::from_str(s, val.span)
} else {
// Encourage user to use the 'crate' spec instead
let name = th.required("name").map_err(|e| {
if matches!(e.kind, toml_span::ErrorKind::MissingField(_)) {
(toml_span::ErrorKind::MissingField("crate"), e.span).into()
} else {
e
}
})?;
let version = th.optional::<Spanned<Cow<'_, str>>>("version");
// We return all the keys we haven't deserialized back to the value,
// so that further deserializers can use them as this spec is
// always embedded in a larger structure
th.finalize(Some(value))?;
let version_req = if let Some(vr) = version {
Some(vr.value.parse().map_err(|e: semver::Error| {
toml_span::Error::from((
toml_span::ErrorKind::Custom(e.to_string()),
vr.span,
))
})?)
} else {
None
};
return Ok(Self { name, version_req });
}
}
other => return Err(expected("a string or table", other, value.span).into()),
};
let (name, version_req) = if let Some((i, make_exact)) = ctx.split {
let mut v: VersionReq = ctx.inner[i + 1..].parse().map_err(|e: semver::Error| {
toml_span::Error::from((
toml_span::ErrorKind::Custom(e.to_string()),
Span::new(ctx.span.start + i + 1, ctx.span.end),
))
})?;
if make_exact {
if let Some(comp) = v.comparators.get_mut(0) {
comp.op = semver::Op::Exact;
}
}
(
Spanned::with_span(
ctx.inner[..i].into(),
Span::new(ctx.span.start, ctx.span.start + i),
),
Some(v),
)
} else {
(Spanned::with_span(ctx.inner.into(), ctx.span), None)
};
Ok(Self { name, version_req })
}
}
pub struct PackageSpecOrExtended<T> {
pub spec: PackageSpec,
pub inner: Option<T>,
}
impl<T> PackageSpecOrExtended<T> {
pub fn try_convert<V, E>(self) -> Result<PackageSpecOrExtended<V>, E>
where
V: TryFrom<T, Error = E>,
{
let inner = if let Some(i) = self.inner {
Some(V::try_from(i)?)
} else {
None
};
Ok(PackageSpecOrExtended {
spec: self.spec,
inner,
})
}
pub fn convert<V>(self) -> PackageSpecOrExtended<V>
where
V: From<T>,
{
PackageSpecOrExtended {
spec: self.spec,
inner: self.inner.map(V::from),
}
}
}
impl<'de, T> toml_span::Deserialize<'de> for PackageSpecOrExtended<T>
where
T: toml_span::Deserialize<'de>,
{
fn deserialize(value: &mut Value<'de>) -> Result<Self, DeserError> {
let spec = PackageSpec::deserialize(value)?;
// If more keys exist in the table (or string) then try to deserialize
// the rest as the "extended" portion
let inner = if value.has_keys() {
Some(T::deserialize(value)?)
} else {
None
};
Ok(Self { spec, inner })
}
}
Contributing
We welcome community contributions to this project.
Please read our Contributor Guide for more information on how to get started. Please also read our Contributor Terms before you make any contributions.
Any contribution intentionally submitted for inclusion in an Embark Studios project, shall comply with the Rust standard licensing model (MIT OR Apache 2.0) and therefore be dual licensed as described below, without any additional terms or conditions:
License
This contribution is dual licensed under EITHER OF
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
For clarity, "your" refers to Embark or any other licensee/user of the contribution.
Dependencies
~0–6.5MB
~40K SLoC