#twitter #extract #character #tweet #text-parser #hashtag #url

egg-mode-text

Text parsing for Twitter: character counting, hashtag/mention extraction

4 releases (stable)

Uses old Rust 2015

1.15.1 Jan 31, 2023
1.15.0 Jul 2, 2022
1.14.7 Nov 28, 2017
0.1.0 Nov 27, 2017

#764 in Text processing


Used in sharexin

MPL-2.0 license

86KB
1K SLoC

egg-mode-text

like twitter-text, but in rust

This library is an attempt to port twitter-text to Rust. It was originally part of egg-mode, but it's totally distinct from egg-mode type-wise, so i pulled it out into its own library. (Also it's chock full of macros so it was also a bid to bring egg-mode's compile times down. >_>)

This library can be used to count characters for tweets, and extract URLs and "entities" from arbitrary text, such as @-mentions and hashtags.

For example, to see how many characters a given tweet takes:

use egg_mode_text::character_count;

let count = character_count("This is a test.", 23, 23);
assert_eq!(count, 15);

// URLs get replaced by a t.co URL of the given length
//
// This length is available from the Twitter API in `help/configuration`
let count = character_count("test.com", 23, 23);
assert_eq!(count, 23);

// Multiple URLs get shortened individually
let count =
    character_count("Test https://test.com test https://test.com test.com test", 23, 23);
assert_eq!(count, 86);

To extract substrings of various "entities" used by Twitter:

use egg_mode_text::{EntityKind, entities};

let text = "sample #text with a link to twitter.com";
let mut results = entities(text).into_iter();

let entity = results.next().unwrap();
assert_eq!(entity.kind, EntityKind::Url);
assert_eq!(entity.substr(text), "twitter.com");

let entity = results.next().unwrap();
assert_eq!(entity.kind, EntityKind::Hashtag);
assert_eq!(entity.substr(text), "#text");

assert_eq!(results.next(), None);

For more information, check out the documentation.

To use this crate in your own project, add the following to your Cargo.toml:

[dependencies]
egg-mode-text = "1.15.1"

...and add the following to your crate root:

extern crate egg_mode_text;

License

egg-mode-text is licensed under the Mozilla Public License, version 2.0. See the LICENSE file for details.

Dependencies

~2.7–4MB
~93K SLoC