#markdown-html #detect #spaces #convert #convert-html #default #blocks

tform

A crate to format plain text into well-structured Markdown or HTML

2 releases

0.1.1 Jan 21, 2025
0.1.0 Jan 21, 2025

#552 in Text processing

Download history 169/week @ 2025-01-18 16/week @ 2025-01-25 5/week @ 2025-02-01

190 downloads per month

MIT license

21KB
266 lines

TFORM.IO

A Rust crate that cleans and converts large, poorly formatted text into well-structured Markdown or HTML.
Designed for streaming (line-by-line) processing of up to hundreds of megabytes of text, TFORM.IO removes extra spaces, merges paragraphs, detects headings/lists/code blocks, and lets you override defaults with a simple configuration file.


Table of Contents


Features

Streaming Support

Process very large text files or streams without loading them entirely into memory.

Automatic Markdown/HTML Conversion

  • Headings (lines starting with #, ##, etc.)
  • Bullet lists (lines starting with -, +, or *)
  • Code blocks (triple backticks)
  • Paragraph separation on blank lines

Configurable

  • Enable/disable headings, list detection, or space-trimming.
  • Define custom regex patterns (future expansion).
  • Load config from TOML or JSON files.

High Performance

Written in Rust to handle up to 512 MB of input efficiently.


Installation

Add TFORM.IO to your Cargo.toml:

[dependencies]
tform = "0.1.0"

Then run:

cargo build

Usage

Here’s a minimal example of using TFORM.IO:

use tform::{Config, Formatter};
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // Load config (TOML/JSON) or default
    let config = Config::from_file("tform_config.toml").unwrap_or_default();
    let formatter = Formatter::new(config);

    // Input text (could come from a file, user input, etc.)
    let input_text = r#"


# Heading 1

This   is some    example text.

- item one
- item two

fn main() { println!("Hello, world!"); }

This is some
more example text.
"#;

    // Convert to Markdown
    let markdown_output = formatter.format_to_markdown(input_text.as_bytes())?;
    println!("--- Markdown ---\n{}", markdown_output);

    // Convert to HTML
    let html_output = formatter.format_to_html(input_text.as_bytes())?;
    println!("--- HTML ---\n{}", html_output);

    Ok(())
}

Compile and run:

cargo run

Configuration

By default, TFORM.IO uses:

remove_extra_spaces = true
detect_headings = true
detect_lists = true
custom_patterns = []

You can override these by creating a tform_config.toml or JSON file. For example:

# tform_config.toml
remove_extra_spaces = true
detect_headings = false
detect_lists = true
custom_patterns = ["(?i)todo"]

Then load it:

let config = Config::from_file("tform_config.toml").unwrap_or_default();
let formatter = Formatter::new(config);

If detect_headings = false, # Some Text is treated as normal paragraph text instead of a heading.


Examples

TFORM.IO includes example programs under the examples/ folder. You can run them with:

Basic Formatting

cargo run --example basic_formatting

Custom Rules

cargo run --example custom_rules

Streaming

cargo run --example streaming

Each example demonstrates different aspects of TFORM.IO, like loading a config, processing large files line-by-line, or basic text transformations.


Testing

We have both unit tests (within modules) and integration tests (in tests/integration_tests.rs). Run them all:

cargo test

This validates:

  • Heading detection (HTML & Markdown)
  • List detection
  • Code block handling via triple backticks
  • Custom config usage (e.g., disabling headings)

Contributing

  1. Fork the repository and clone it locally.
  2. Create a branch for your feature or bug fix.
  3. Write tests that cover your changes.
  4. Submit a Pull Request on GitHub with a clear description of your work. We welcome all suggestions and improvements!

License

This project is licensed under the MIT License. You’re free to use, modify, and distribute this software under its terms.

Enjoy TFORM.IO! With TFORM.IO, you can painlessly convert jumbled text into neat Markdown or HTML—perfect for documentation, PDF generation, or any structured text workflow. If you have any questions or feedback, feel free to open an issue or submit a pull request.

Dependencies

~3–5MB
~91K SLoC