#byte #byte-string #json #utf-8 #console #bencode #structure

bin+lib bencode2json

A Bencoded to JSON converter library and console app with no intermediary in-memory structure

1 unstable release

0.1.0 Nov 1, 2024

#615 in Parser implementations

LGPL-3.0

160KB
3.5K SLoC

Torrust Bencode2Json

Testing

A lib and console command to convert from bencoded data to JSON format.

Output is similar to: https://github.com/Chocobo1/bencode_online. When a bencoded string (byte string) contains only valid UTF-8 chars, those chars will print to the output. If the string contains non valid UTF-8 chars, them the string will be printed in hexadecimal. For example:

Bencoded string (with 2 bytes):

2:\xFF\xFE

JSON string:

<hex>fffe</hex>

More info: https://github.com/torrust/teps/discussions/15

Console

Run the binary with input and output file:

cargo run -- -i ./tests/fixtures/sample.bencode -o output.json

Run the binary with stdin and stdout (UTF-8):

echo "4:spam" | cargo run
"spam"

Run the binary with stdin and stdout (non UTF-8):

printf "d3:bar2:\xFF\xFEe" | cargo run
{"bar":"<hex>fffe</hex>"}
printf "d2:\xFF\xFE3:bare" | cargo run
{"<hex>fffe</hex>":"bar"}

NOTICE: We need two escape the two bytes FF and FE with \x inside the string.

More examples:

cat ./tests/fixtures/sample.bencode | cargo run
["spam"]

More examples with invalid Bencode:

printf "i42" | cargo run
Error: Unexpected end of input parsing integer; read context: input pos 3, latest input bytes dump: [105, 52, 50] (UTF-8 string: `i42`); write context: output pos 2, latest output bytes dump: [52, 50] (UTF-8 string: `42`)
printf "3:ab" | cargo run
Error: Unexpected end of input parsing string value; read context: input pos 4, latest input bytes dump: [51, 58, 97, 98] (UTF-8 string: `3:ab`); write context: output pos 0, latest output bytes dump: [] (UTF-8 string: ``)
echo "i00e" | cargo run
Error: Leading zeros in integers are not allowed, for example b'i00e'; read context: byte `48` (char: `0`), input pos 3, latest input bytes dump: [105, 48, 48] (UTF-8 string: `i00`); write context: byte `48` (char: `0`), output pos 2, latest output bytes dump: [48, 48] (UTF-8 string: `00`)

Generating pretty JSON with jq:

echo "d3:foold3:bari42eeee" | cargo run | jq
{
  "foo": [
    {
      "bar": 42
    }
  ]
}

You can install the binary with:

cargo install bencode2json

Or by using cargo-binstall:

cargo binstall bencode2json

Library

You can install the library with:

cargo add bencode2json

There two ways of using the library:

  • With high-level parser wrappers.
  • With the low-level parsers.

Example using the high-level parser wrappers:

use bencode2json::{try_bencode_to_json};

let result = try_bencode_to_json(b"d4:spam4:eggse").unwrap();

assert_eq!(result, r#"{"spam":"eggs"}"#);

Example using the low-level parser:

use bencode2json::parsers::{BencodeParser};

let mut output = String::new();

let mut parser = BencodeParser::new(&b"4:spam"[..]);

parser
  .write_str(&mut output)
  .expect("Bencode to JSON conversion failed");

println!("{output}"); // It prints the JSON string: "spam"

More examples.

Test

Run unit and integration tests:

cargo test

We have included a copy of another C implementation "be2json.c". You can execute it with the following:

gcc ./contrib/be2json.c -o be2json
chmod +x ./be2json
echo "4:spam" | ./be2json

You can generate the coverage report with:

cargo cov

Performance

In terms of memory usage this implementation consumes at least the size of the biggest bencoded string. The string parser keeps all the string bytes in memory until it parses the whole string, in order to convert it to UTF-8, when it's possible.

The library also wraps the input and output streams in a BufReader and BufWriter because it can be excessively inefficient to work directly with something that implements Read or Write.

TODO

  • More examples of using the library.
  • Counter for number of items in a list for debugging and errors.
  • Fuzz testing: Generate random valid bencoded values.
  • Install tracing crate. Add verbose mode that enables debugging.
  • Option to check if the final JSON it's valid at the end of the process.
  • Benchmarking for this implementation and the original C implementation.
  • Optimize string parser. We can stop trying to convert the string to UTF-8 when we find a non valid UTF-8 char.

Alternatives

Bencode online:

Bencode:

Credits

This implementation is basically a port to Rust from https://gist.github.com/camilleoudot/840929699392b3d25afbec25d850c94a with some changes like:

  • It does not use magic numbers (explicit enum for states).
  • It prints non UTF-8 string in hexadecimal.

The idea of using hexadecimal format <hex>ff</hex> for non UTF-8 string came from the bencode online repo by @Chocobo1.

We also want to thank @da2ce7 for his feedback and review that has improved this project significantly.

License

Copyright (c) 2024 The Torrust Developers.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Some files include explicit copyright notices and/or license notices.

Legacy Exception

For prosperity, versions of Torrust Bencode2Json that are older than five years are automatically granted the MIT-0 license in addition to the existing LGPL-3.0-only license.

Dependencies

~1.8–2.9MB
~55K SLoC