#unicode #string #moderation #confusables

decancer

A Rust module that removes common confusables from strings without the use of Regexes

3 stable releases

Uses new Rust 2021

1.4.1 Jul 16, 2022
1.3.3 Jul 11, 2022

#121 in WebAssembly

MIT license

30KB
666 lines

decancer

A portable module that removes common confusables from strings without the use of Regexes. Available for Rust, Node.js, Deno, and the Browser.

Pros:

  • Extremely fast, no use of regex whatsoever!
  • No dependencies.
  • Simple to use, just one single function.
  • Supports all the way to UTF-32 code-points. Like emojis, zalgos, etc.
  • While this project may not be perfect, it should cover the vast majority of confusables.

Con:

  • Remember that this project is not perfect, false-positives may happen.

installation

Rust

In your Cargo.toml:

decancer = "1.4.1"

Node.js

In your shell:

$ npm install decancer

In your code:

const decancer = require('decancer');

Deno

In your code:

import init from "https://deno.land/x/decancer@v1.4.1/mod.ts";

const decancer = await init();

Browser

In your code:

import init from "https://cdn.jsdelivr.net/gh/null8626/decancer@v1.4.1/decancer.min.js";

const decancer = await init();

examples

NOTE: cured output will ALWAYS be in lowercase.

JavaScript

const noCancer = decancer('vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣');

console.log(noCancer); // 'very funny text'

Rust

extern crate decancer;
use decancer::Decancer;

fn main() {
  let instance = Decancer::new();
  let output = instance.cure("vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣");

  assert_eq!(output, String::from("very funny text"));
}

If you want to check if the decancered string contains a certain keyword, i recommend using this instead since mistranslations can happen (e.g mistaking the number 0 with the letter O)

JavaScript

const noCancer = decancer(someString);

if (decancer.contains(noCancer, 'no-no-word')) console.log('LANGUAGE!!!');

Rust

extern crate decancer;
use decancer::Decancer;

fn main() {
  let instance = Decancer::new();
  let output = instance.cure("vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣");
  
  if instance.contains(&output, "funny") {
    println!("i found the funny");
  }
}

Web app example

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Decancerer!!! (tm)</title>
    <style>
      textarea {
        font-size: 30px;
      }
      
      #cure {
        font-size: 20px;
        padding: 5px 30px;
      }
    </style>
  </head>
  <body>
    <h3>Input cancerous text here:</h3>
    <textarea rows="10" cols="30"></textarea>
    <br />
    <button id="cure" onclick="cure()">cure!</button>
    <script type="module">
      import init from "https://cdn.jsdelivr.net/gh/null8626/decancer@v1.4.1/decancer.min.js";
      
      const decancer = await init();
      
      window.cure = function () {
        const textarea = document.querySelector("textarea");
        
        if (!textarea.value.length) {
          return alert("There's no text!!!");
        }
        
        textarea.value = decancer(textarea.value);
      }
    </script>
  </body>
</html>

contributions

All contributions are welcome. Feel free to fork the project at GitHub! <3

If you want to add, remove, modify, or view the list of supported confusables, you can clone the GitHub repository, and modify it directly with Node.js. Either through a script or directly from the REPL.

const reader = await import('./contrib/index.mjs');
const data = reader.default('./core/bin/confusables.bin');

// do something with data...

data.save('./core/bin/confusables.bin');

special thanks

These are the primary resources that made this project possible.

No runtime deps