#twitter #serde #json

twitter-archive

Serde structs, deserialize, and serialize definitions for Twitter archived data

1 unstable release

new 0.0.1 Apr 17, 2024

#90 in #twitter

AGPL-3.0

475KB
1.5K SLoC

Twitter Archive

Serde structs, deserialize, and serialize definitions for Twitter archived data

Byte size of Twitter Archive Open Issues Open Pull Requests Latest commits GitHub Actions Build Status License



Requirements

This repository requires Rust language/compiler to build from source

As of last update to this ReadMe file, the recommended method of installing Rust is via the installer script...

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Quick Start

This repository is a Rust library, define it as a dependency within a project Cargo.toml file...

cargo add twitter-archive

Cargo.toml (snip)

[dependencies]
twitter_archive = "0.0.1"

Check Rust -- Doc -- Specifying Dependencies for details about defining dependencies.

Then include within a source file via use statement...

use twitter_archive;

Usage

Twitter archive parsing example, print all tweets' creation date and full text;

use zip::read::ZipArchive;
use std::{fs, path};
use twitter_archive::structs::tweets;

fn main() {
    let input_file = "path/to/twitter.zip";

    let file_descriptor = fs::File::open(input_file).expect("Unable to read --input-file");
    let mut zip_archive = ZipArchive::new(file_descriptor).unwrap();
    let mut zip_file = zip_archive.by_name("data/tweets.js").unwrap();

    let mut buff = String::new();
    zip_file.read_to_string(&mut buff).unwrap();
    let json = buff.replacen("window.YTD.tweets.part0 = ", "", 1);

    let data: Vec<tweets::TweetObject> = serde_json::from_str(&json).expect("Unable to parse");

    for (index, object) in data.iter().enumerate() {
        /* Do stuff with each Tweet */
        println!("Index: {index}");
        println!("Created at: {}", object.tweet.created_at);
        println!("vvv Content\n{}\n^^^ Content", object.tweet.full_text);
    }
}

Check the examples/ directory for more examples!


Notes

This repository is not be feature complete or fully functional, Pull Requests that add features or fix bugs are certainly welcomed.


Tips for application authors

The data/manifest.js file, parse-able via src/structs/manifest.rs, defines pointers to files and strings that may be helpful for pre-parsing/stripping of other files within the archived directory/file structure.

All accessors/key-names defined by JSON/JavaScript Twitter archive data are available via snake_case via Rust data-structures, regardless of source's choice(s) to mix camelCase and snake_case formatting.


Running tests

Individual data-structures documentation test may be run via;

RUST_BACKTRACE=1 cargo test --doc 'structs::personalization::InferredAgeInfo'

Running examples

Examples may be run via cargo incantations similar to;

cargo run --example search-tweets -- --help

Note; the -- separator to pass arguments to the example instead of Cargo sub-command


Contributing

Options for contributing to twitter-archive and rust-utilities


Forking

⚠️ Creating fork(s), submitting contribution(s), publishing derivative work(s), etc. based on this repository will form an agreement to be bound by the use-cased based licensing sub-sections.

I.E. if you choose to contribute to or use this project, you acknowledge and accept these usage based licensing terms will apply to any such works too.

Start making a Fork of this repository to an account that you have write permissions for.

  • Add remote for fork URL. The URL syntax is git@github.com:<NAME>/<REPO>.git...
cd ~/git/hub/rust-utilities/twitter-archive

git remote add fork git@github.com:<NAME>/twitter-archive.git
  • Commit your changes and push to your fork, eg. to fix an issue...
cd ~/git/hub/rust-utilities/twitter-archive


git commit -F- <<'EOF'
:bug: Fixes #42 Issue


**Edits**


- `<SCRIPT-NAME>` script, fixes some bug reported in issue
EOF


git push fork main

Note, the -u option may be used to set fork as the default remote, eg. git push -u fork main however, this will also default the fork remote for pulling from too! Meaning that pulling updates from origin must be done explicitly, eg. git pull origin main

  • Then on GitHub submit a Pull Request through the Web-UI, the URL syntax is https://github.com/<NAME>/<REPO>/pull/new/<BRANCH>

Note; to decrease the chances of your Pull Request needing modifications before being accepted, please check the dot-github repository for detailed contributing guidelines.


Sponsor

Thanks for even considering it!

Via Liberapay you may sponsor__shields_io__liberapay on a repeating basis.

Regardless of if you're able to financially support projects such as twitter-archive that rust-utilities maintains, please consider sharing projects that are useful with others, because one of the goals of maintaining Open Source repositories is to provide value to the community.


Attribution


License

This project is licensed based on use-case


Commercial and/or proprietary use

If a project is either commercial or (||) proprietary, then please contact the author for pricing and licensing options to make use of code and/or features from this repository.


Non-commercial and FOSS use

If a project is both non-commercial and (&&) published with a licence compatible with AGPL-3.0, then it may utilize code from this repository under the following terms.

Serde structs, deserialize, and serialize definitions for Twitter archived data
Copyright (C) 2024 S0AndS0

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

... For further details review full length version of AGPL-3.0 License.

Dependencies

~1.7–2.6MB
~51K SLoC