Text parsing for Twitter: character counting, hashtag/mention extraction

2 releases (1 stable)

1.14.7 Nov 28, 2017
0.1.0 Nov 27, 2017

#14 in #twitter

Download history 46/week @ 2019-10-07 57/week @ 2019-10-14 67/week @ 2019-10-21 31/week @ 2019-10-28 51/week @ 2019-11-04 32/week @ 2019-11-11 39/week @ 2019-11-18 53/week @ 2019-11-25 32/week @ 2019-12-02 101/week @ 2019-12-09 32/week @ 2019-12-16 19/week @ 2019-12-23 44/week @ 2019-12-30 60/week @ 2020-01-06 52/week @ 2020-01-13

201 downloads per month
Used in 1 crate

MPL-2.0 license



like twitter-text, but in rust

This library is an attempt to port twitter-text to Rust. It was originally part of egg-mode, but it's totally distinct from egg-mode type-wise, so i pulled it out into its own library. (Also it's chock full of macros so it was also a bid to bring egg-mode's compile times down. >_>)

This library can be used to count characters for tweets, and extract URLs and "entities" from arbitrary text, such as @-mentions and hashtags.

For example, to see how many characters a given tweet takes:

use egg_mode_text::character_count;

let count = character_count("This is a test.", 23, 23);
assert_eq!(count, 15);

// URLs get replaced by a t.co URL of the given length
// This length is available from the Twitter API in `help/configuration`
let count = character_count("test.com", 23, 23);
assert_eq!(count, 23);

// Multiple URLs get shortened individually
let count =
    character_count("Test https://test.com test https://test.com test.com test", 23, 23);
assert_eq!(count, 86);

To extract substrings of various "entities" used by Twitter:

use egg_mode_text::{EntityKind, entities};

let text = "sample #text with a link to twitter.com";
let mut results = entities(text).into_iter();

let entity = results.next().unwrap();
assert_eq!(entity.kind, EntityKind::Url);
assert_eq!(entity.substr(text), "twitter.com");

let entity = results.next().unwrap();
assert_eq!(entity.kind, EntityKind::Hashtag);
assert_eq!(entity.substr(text), "#text");

assert_eq!(results.next(), None);

For more information, check out the documentation.

To use this crate in your own project, add the following to your Cargo.toml:

egg-mode-text = "1.14.7"

...and add the following to your crate root:

extern crate egg_mode_text;


egg-mode-text is licensed under the Mozilla Public License, version 2.0. See the LICENSE file for details.


~83K SLoC