1 unstable release

0.1.0 Feb 10, 2025

#590 in Encoding

Download history 62/week @ 2025-02-04 52/week @ 2025-02-11

114 downloads per month

MIT/Apache

74KB
1.5K SLoC

jsonish

jsonish is a resilient JSON parsing library written in Rust that goes beyond the strict JSON specification. It’s designed to parse not only valid JSON but also "JSON‐like" input that may include common syntax errors, multiple JSON objects, or JSON embedded in markdown code blocks.


Features

  • Standard JSON Parsing:
    Uses serde_json under the hood to parse valid JSON strings quickly and reliably.

  • Error-Tolerant Parsing:
    When given imperfect JSON (e.g. missing commas, unquoted keys, unclosed arrays or objects), jsonish will attempt to fix and recover the input rather than immediately failing.

  • Markdown Code Block Extraction:
    Supports extracting and parsing JSON from markdown code blocks (e.g. fenced with triple backticks). This is especially useful when working with documents or logs that embed JSON in markdown.

  • Multi-Object Handling:
    Can detect and extract multiple JSON objects from a single input, returning them as a combined result.

  • Custom Value Representation:
    The parsed output is provided as a custom Value enum that includes variants for:

    • Primitives: Strings, Numbers, Booleans, and Null.
    • Complex Structures: Arrays and Objects.
    • Special Cases:
      • Markdown: Represents a code block with a tag and its parsed inner value.
      • FixedJson: Wraps JSON that was fixed during parsing, along with a list of applied fixes.
      • AnyOf: Holds multiple possible parsed values (useful when multiple parsing strategies succeed).
  • Serde Conversion:
    Easily convert jsonish’s custom Value to a standard serde_json::Value using the provided jsonish_to_serde function.

  • Configurable Parsing Options:
    Fine-tune the parsing behavior via the ParseOptions struct, allowing you to enable or disable specific parsing strategies (e.g. markdown parsing, fixing errors, or treating input as a plain string).


Installation

Add this library as a dependency in your Cargo.toml:

[dependencies]
json_partial = { git = "https://github.com/TwistingTwists/json_partial" }

Usage

Here’s a simple example that demonstrates how to use jsonish to parse a JSON string:

use json_partial::jsonish::{parse, Value, ParseOptions};
use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct Person {
    name: String,
    age: u8,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // A string given by LLM
    let input = r#"
    Here is your text 
    
    {
        "name": "Alice",
        "age": 30
    }
    "#;

    // Parse the JSON using default options.
    let value = parse(input, ParseOptions::default())?;
    println!("Parsed value: {:#?}", value);

    // Convert to serde_json::Value if needed.
    let serde_value = jsonish::jsonish_to_serde(&value);
    let person: Person = serde_json::from_value(serde_value).unwrap();
    println!("Serde JSON value: {}", serde_value);
    println!("Person: {:?}", person);

    Ok(())
}

Parsing Imperfect JSON

jsonish is built to recover from common mistakes. For example:

  • Missing Commas:

    let input = r#"{ "name": "Bob" "age": 25 }"#;
    let value = parse(input, ParseOptions::default())?;
    // The parser applies fixes and still returns a valid result.
    
  • Unclosed Arrays or Objects:

    let input = r#"[1, 2, 3"#;
    let value = parse(input, ParseOptions::default())?;
    // Returns an array containing the numbers 1, 2, and 3.
    
  • JSON in Markdown Code Blocks:
    When given markdown text with fenced code blocks containing JSON, jsonish will extract and parse the JSON:

    let input = r#"
    ```json
    { "key": "value" }
    

    Some additional text.

    
    let value = parse(input, ParseOptions::default())?;
    // The returned value will include a Markdown variant wrapping the parsed JSON.
    

API Overview

  • jsonish::parse
    Main entry point for parsing a JSON (or JSON‐like) string. It applies a series of strategies:

    1. Attempt standard JSON parsing.
    2. If that fails and markdown JSON is allowed, try to extract and parse markdown code blocks.
    3. If enabled, attempt to locate multiple JSON objects.
    4. Apply automatic fixes for common syntax errors.
    5. Fallback to treating the input as a raw string if all else fails.
  • jsonish::Value
    A custom enum that represents the parsed JSON data with variants for primitive types, objects, arrays, markdown code blocks, fixed JSON (with applied fixes), and a collection of multiple possible parsed values.

  • jsonish::ParseOptions
    A configurable struct that controls which parsing strategies are enabled. It allows you to adjust settings like whether to allow markdown JSON, auto-fixing, multi-object parsing, and more.

  • jsonish::to_serde::jsonish_to_serde
    Converts a jsonish::Value into a serde_json::Value, making it easy to work with other libraries that use serde.


Testing

jsonish comes with a comprehensive suite of tests that verify its ability to handle:

  • Valid JSON objects
  • JSON with missing commas or unclosed structures
  • Nested JSON structures
  • JSON embedded within markdown
  • Multiple JSON objects within a single input

You can run the tests with:

cargo test

Contributing

Contributions, bug reports, and feature requests are welcome! Feel free to open issues or submit pull requests on GitHub.


License

This project is licensed under the MIT License. See the LICENSE file for details.


Thank you Note

Lot of the code has been taken from baml repository -> here

Thanks to awesome folks at Baml!

Happy parsing!

Dependencies

~3–5MB
~92K SLoC