5 releases
new 0.1.5 | May 9, 2025 |
---|---|
0.1.4 | May 9, 2025 |
0.1.3 | May 8, 2025 |
0.1.2 | May 8, 2025 |
0.1.1 | May 8, 2025 |
#180 in Web programming
164 downloads per month
305KB
3.5K
SLoC
yt-transcript-rs
yt-transcript-rs
is a Rust library for fetching and working with YouTube video transcripts. It allows you to retrieve transcripts in various languages, list available transcripts for a video, and fetch video details.
This project is heavily inspired by the Python module youtube-transcript-api originally developed by Jonas Depoix.
Table of Contents
- Features
- Installation
- Usage
- Requirements
- Advanced Usage
- Error Handling
- License
- Contributing
- Acknowledgments
Features
- Fetch transcripts from YouTube videos in various languages
- List all available transcripts for a video
- Retrieve translations of transcripts
- Get detailed information about YouTube videos
- Access video microformat data including available countries and embed information
- Retrieve streaming formats and quality options for videos
- Fetch all video information in a single request for optimal performance
- Support for proxy configuration and cookie authentication
Installation
Add yt-transcript-rs
to your Cargo.toml
:
cargo add yt-transcript-rs
Or manually add it to your Cargo.toml
:
[dependencies]
yt-transcript-rs = "0.1.0" # Replace with the latest version
Usage
Fetch a transcript
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
/// This example demonstrates how to fetch a transcript from a YouTube video.
///
/// It shows:
/// 1. Creating a YouTubeTranscriptApi instance
/// 2. Fetching a transcript for a video in a specific language
/// 3. Displaying the transcript content
#[tokio::main]
async fn main() -> Result<()> {
// Initialize the YouTubeTranscriptApi
// This creates a new instance without proxy or cookie authentication
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Ted Talk video ID
let video_id = "5MuIMqhT8DM";
// Language preference (English)
let languages = &["en"];
// Don't preserve formatting (remove line breaks, etc.)
let preserve_formatting = false;
// Fetch the transcript
println!("Fetching transcript for video ID: {}", video_id);
match api.fetch_transcript(video_id, languages, preserve_formatting).await {
Ok(transcript) => {
println!("Successfully fetched transcript!");
println!("Video ID: {}", transcript.video_id);
println!(
"Language: {} ({})",
transcript.language, transcript.language_code
);
println!("Is auto-generated: {}", transcript.is_generated);
println!("Number of snippets: {}", transcript.snippets.len());
println!("\nTranscript content:");
// Display the first 5 snippets
for (_i, snippet) in transcript.snippets.iter().take(5).enumerate() {
println!(
"[{:.1}-{:.1}s] {}",
snippet.start,
snippet.start + snippet.duration,
snippet.text
);
}
println!("... (truncated)");
}
Err(e) => {
println!("Failed to fetch transcript: {:?}", e);
}
}
Ok(())
}
List available transcripts
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
/// This example demonstrates how to list all available transcripts for a YouTube video.
///
/// It shows:
/// 1. Creating a YouTubeTranscriptApi instance
/// 2. Listing all available transcripts
/// 3. Displaying information about each transcript, including whether it's translatable
#[tokio::main]
async fn main() -> Result<()> {
// Initialize the YouTubeTranscriptApi
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Ted Talk video ID (known to have multiple language transcripts)
let video_id = "arj7oStGLkU";
// List available transcripts
println!("Listing available transcripts for video ID: {}", video_id);
match api.list_transcripts(video_id).await {
Ok(transcript_list) => {
println!("Successfully retrieved transcript list!");
println!("Video ID: {}", transcript_list.video_id);
// Count available transcripts
let mut count = 0;
let mut translatable_count = 0;
println!("\nAvailable transcripts:");
for transcript in &transcript_list {
count += 1;
let translatable = if transcript.is_translatable() {
translatable_count += 1;
"[translatable]"
} else {
""
};
println!(
"{}: {} ({}) {}",
count, transcript.language, transcript.language_code, translatable
);
// If this transcript is translatable, show available translation languages
if transcript.is_translatable() && count == 1 {
// Just show for the first one
println!(" Available translations:");
for (i, lang) in transcript.translation_languages.iter().take(5).enumerate() {
println!(" {}: {} ({})", i + 1, lang.language, lang.language_code);
}
if transcript.translation_languages.len() > 5 {
println!(
" ... and {} more",
transcript.translation_languages.len() - 5
);
}
}
}
println!("\nTotal transcripts: {}", count);
println!("Translatable transcripts: {}", translatable_count);
}
Err(e) => {
println!("Failed to list transcripts: {:?}", e);
}
}
Ok(())
}
Fetch video details
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
/// This example demonstrates how to fetch video details using the YouTube Transcript API.
///
/// It shows:
/// 1. Creating a YouTubeTranscriptApi instance
/// 2. Fetching video details for a given video ID
/// 3. Displaying the video information including title, author, view count, etc.
#[tokio::main]
async fn main() -> Result<()> {
println!("YouTube Video Details Example");
println!("------------------------------");
// Initialize the YouTubeTranscriptApi
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Ted Talk video ID
let video_id = "arj7oStGLkU";
println!("Fetching video details for: {}", video_id);
match api.fetch_video_details(video_id).await {
Ok(details) => {
println!("\nVideo Details:");
println!("-------------");
println!("Video ID: {}", details.video_id);
println!("Title: {}", details.title);
println!("Author: {}", details.author);
println!("Channel ID: {}", details.channel_id);
println!("View Count: {}", details.view_count);
println!("Length: {} seconds", details.length_seconds);
println!("Is Live Content: {}", details.is_live_content);
// Print keywords if available
if let Some(keywords) = details.keywords {
println!("\nKeywords:");
for (i, keyword) in keywords.iter().enumerate().take(10) {
println!(" {}: {}", i + 1, keyword);
}
if keywords.len() > 10 {
println!(" ... and {} more", keywords.len() - 10);
}
}
// Print thumbnail information
println!("\nThumbnails: {} available", details.thumbnails.len());
for (i, thumb) in details.thumbnails.iter().enumerate() {
println!(
" {}: {}x{} - {}",
i + 1,
thumb.width,
thumb.height,
thumb.url
);
}
// Print a truncated description
println!("\nDescription:");
let description = if details.short_description.len() > 300 {
format!("{}...", &details.short_description[..300])
} else {
details.short_description.clone()
};
println!("{}", description);
}
Err(e) => {
println!("Failed to fetch video details: {:?}", e);
}
}
Ok(())
}
### Fetch microformat data
```rust
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
// Initialize the YouTubeTranscriptApi
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Ted Talk video ID
let video_id = "arj7oStGLkU";
println!("Fetching microformat data for: {}", video_id);
match api.fetch_microformat(video_id).await {
Ok(microformat) => {
println!("\nMicroformat Data:");
println!("-----------------");
// Print video title and channel info
if let Some(title) = µformat.title {
println!("Title: {}", title);
}
if let Some(channel) = µformat.owner_channel_name {
println!("Channel: {}", channel);
}
// Print video stats
if let Some(views) = µformat.view_count {
println!("View Count: {}", views);
}
if let Some(likes) = µformat.like_count {
println!("Like Count: {}", likes);
}
// Print video status and category
if let Some(category) = µformat.category {
println!("Category: {}", category);
}
if let Some(is_unlisted) = microformat.is_unlisted {
println!("Is Unlisted: {}", is_unlisted);
}
if let Some(is_family_safe) = microformat.is_family_safe {
println!("Is Family Safe: {}", is_family_safe);
}
// Print countries where video is available
if let Some(countries) = µformat.available_countries {
println!("Available in {} countries", countries.len());
}
// Print embed information
if let Some(embed) = µformat.embed {
if let Some(iframe_url) = &embed.iframe_url {
println!("Embed URL: {}", iframe_url);
}
}
}
Err(e) => {
println!("Failed to fetch microformat data: {:?}", e);
}
}
Ok(())
}
Fetch streaming data
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
println!("YouTube Streaming Data Example");
println!("------------------------------");
// Initialize the YouTubeTranscriptApi
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Ted Talk video ID
let video_id = "arj7oStGLkU";
println!("Fetching streaming data for: {}", video_id);
match api.fetch_streaming_data(video_id).await {
Ok(streaming_data) => {
println!("\nStreaming Data:");
println!("--------------");
println!("Expires in: {} seconds", streaming_data.expires_in_seconds);
// Display basic format counts
println!("\nCombined Formats (video+audio): {}", streaming_data.formats.len());
println!("Adaptive Formats: {}", streaming_data.adaptive_formats.len());
// Example of accessing video format information
if let Some(format) = streaming_data.formats.first() {
println!("\nSample format information:");
println!(" ITAG: {}", format.itag);
if let (Some(w), Some(h)) = (format.width, format.height) {
println!(" Resolution: {}x{}", w, h);
}
println!(" Bitrate: {} bps", format.bitrate);
println!(" MIME type: {}", format.mime_type);
}
// Count video and audio format types
let video_count = streaming_data.adaptive_formats
.iter()
.filter(|f| f.mime_type.starts_with("video/"))
.count();
let audio_count = streaming_data.adaptive_formats
.iter()
.filter(|f| f.mime_type.starts_with("audio/"))
.count();
println!("\nAdaptive format breakdown:");
println!(" Video formats: {}", video_count);
println!(" Audio formats: {}", audio_count);
}
Err(e) => {
println!("Failed to fetch streaming data: {:?}", e);
}
}
Ok(())
}
Fetch all video information at once
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
#[tokio::main]
async fn main() -> Result<()> {
println!("YouTube Video Infos (All-in-One) Example");
println!("----------------------------------------");
// Initialize the YouTubeTranscriptApi
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Ted Talk video ID
let video_id = "arj7oStGLkU";
println!("Fetching all video information in a single request...");
match api.fetch_video_infos(video_id).await {
Ok(infos) => {
// Access video details
println!("\nVideo Details:");
println!("Title: {}", infos.video_details.title);
println!("Author: {}", infos.video_details.author);
println!("Length: {} seconds", infos.video_details.length_seconds);
// Access microformat data
if let Some(category) = &infos.microformat.category {
println!("Category: {}", category);
}
if let Some(countries) = &infos.microformat.available_countries {
println!("Available in {} countries", countries.len());
}
// Access streaming data
println!("\nStreaming Options:");
println!("Video formats: {}", infos.streaming_data.formats.len());
println!("Adaptive formats: {}", infos.streaming_data.adaptive_formats.len());
// Find highest resolution
let highest_res = infos.streaming_data.adaptive_formats
.iter()
.filter_map(|f| f.height)
.max()
.unwrap_or(0);
println!("Highest resolution: {}p", highest_res);
// Access transcript information
let transcript_count = infos.transcript_list.transcripts().count();
println!("\nAvailable transcripts: {}", transcript_count);
println!("\nAll data retrieved in a single network request!");
}
Err(e) => {
println!("Failed to fetch video information: {:?}", e);
}
}
Ok(())
}
Requirements
- Rust 1.56 or higher
tokio
for async execution
Advanced Usage
Using Proxies
You can configure the API to use a proxy server:
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use yt_transcript_rs::proxies::ProxyConfig;
#[tokio::main]
async fn main() -> Result<()> {
// Create a proxy configuration
let proxy = Box::new(ProxyConfig {
url: "http://your-proxy-server:8080".to_string(),
username: Some("username".to_string()),
password: Some("password".to_string()),
});
// Initialize the API with proxy
let api = YouTubeTranscriptApi::new(Some(proxy), None, None)?;
// Use the API as normal
let video_id = "5MuIMqhT8DM";
let languages = &["en"];
let transcript = api.fetch_transcript(video_id, languages, false).await?;
println!("Fetched transcript via proxy!");
Ok(())
}
Using Cookie Authentication
For videos that require authentication:
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use std::path::Path;
#[tokio::main]
async fn main() -> Result<()> {
// Provide path to cookies file exported from browser
let cookie_path = Path::new("path/to/cookies.txt");
// Initialize the API with cookies
let api = YouTubeTranscriptApi::new(Some(cookie_path.as_ref()), None, None)?;
// Fetch transcript for a video that requires authentication
let video_id = "private_video_id";
let languages = &["en"];
let transcript = api.fetch_transcript(video_id, languages, false).await?;
println!("Successfully authenticated and fetched transcript!");
Ok(())
}
Serializing and Deserializing Video Information
You can serialize video information for storage or transmission between systems. The library provides full support for serde
serialization and deserialization of the VideoInfos
struct and related types.
use anyhow::Result;
use reqwest::Client;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use yt_transcript_rs::models::VideoInfos;
#[tokio::main]
async fn main() -> Result<()> {
// Initialize the YouTubeTranscriptApi
let api = YouTubeTranscriptApi::new(None, None, None)?;
// Fetch video information
let video_id = "dQw4w9WgXcQ";
let infos = api.fetch_video_infos(video_id).await?;
// Serialize to JSON
let json = serde_json::to_string(&infos)?;
println!("Serialized data size: {} bytes", json.len());
// Save to file, send over network, etc.
std::fs::write("video_info.json", &json)?;
// Later, deserialize back from JSON
let json = std::fs::read_to_string("video_info.json")?;
let deserialized = serde_json::from_str::<VideoInfos>(&json)?;
// The deserialized object has all the same data
println!("Title: {}", deserialized.video_details.title);
// To fetch a transcript from the deserialized data, you need to provide a client
let client = Client::new();
if let Ok(transcript) = deserialized.transcript_list.find_transcript(&["en"]) {
let fetched = transcript.fetch(&client, false).await?;
println!("Transcript text: {}", fetched.text());
}
Ok(())
}
The serialization support makes it easy to:
- Cache video information to reduce YouTube API requests
- Send video data between microservices
- Store video information in databases
- Create backup systems for important videos
Note that when deserializing a Transcript
, you'll need to provide a Client
when calling fetch()
, as HTTP clients cannot be serialized.
Error Handling
The library provides specific error types for handling different failure scenarios:
use anyhow::Result;
use yt_transcript_rs::api::YouTubeTranscriptApi;
use yt_transcript_rs::errors::CouldNotRetrieveTranscriptReason;
#[tokio::main]
async fn main() -> Result<()> {
let api = YouTubeTranscriptApi::new(None, None, None)?;
let video_id = "5MuIMqhT8DM";
match api.fetch_transcript(video_id, &["en"], false).await {
Ok(transcript) => {
println!("Successfully fetched transcript with {} snippets", transcript.snippets.len());
Ok(())
},
Err(e) => {
match e.reason {
Some(CouldNotRetrieveTranscriptReason::NoTranscriptFound) => {
println!("No transcript found for this video");
},
Some(CouldNotRetrieveTranscriptReason::TranslationLanguageNotAvailable) => {
println!("The requested translation language is not available");
},
Some(CouldNotRetrieveTranscriptReason::VideoUnavailable) => {
println!("The video is unavailable or does not exist");
},
// Handle other specific errors
_ => println!("Other error: {:?}", e),
}
Err(e.into())
}
}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Development Setup
# Clone the repository
git clone https://github.com/akinsella/yt-transcript-rs.git
cd yt-transcript-rs
# Build the project
cargo build
# Run tests
cargo test
# Run Clippy with strict settings for code quality
cargo clippy --all-targets --features ci --all-features -- -D warnings
# Run Clippy in fix mode to automatically apply suggested fixes
cargo clippy --all-targets --features ci --all-features --fix -- -D warnings
# Format code according to Rust style guidelines
cargo fmt
# Format and overwrite files with the formatting changes
cargo fmt --all
### Setting up cargo-husky for Git Hooks
Follow these steps to ensure cargo-husky is properly installed and configured:
1. **Install cargo-husky as a dev dependency**:
```bash
cargo add --dev cargo-husky
-
Configure cargo-husky in Cargo.toml:
Add the following to your
Cargo.toml
file:[dev-dependencies] cargo-husky = { version = "1", features = ["precommit-hook", "run-cargo-fmt", "run-cargo-clippy", "run-cargo-check"] }
-
Verify the installation:
After adding cargo-husky, run
cargo build
once to ensure the git hooks are installed:cargo build
-
Verify the hooks were created:
Check if the pre-commit hook file was created:
ls -la .git/hooks/pre-commit
You should see a pre-commit file, and it should be executable.
-
Test the pre-commit hook:
Make a small change to any file, then try to commit it:
# Make a change echo "// Test comment" >> src/lib.rs # Add the change git add src/lib.rs # Try to commit git commit -m "Test commit"
If the hook is working correctly, it should run:
cargo fmt
to format the codecargo check
to verify compilationcargo clippy
to check for lints
-
Troubleshooting:
If the hooks aren't running:
- Make sure the hook file is executable:
chmod +x .git/hooks/pre-commit
- Try rebuilding the project:
cargo clean && cargo build
- Check the content of the pre-commit file to ensure it's correct
- Make sure the hook file is executable:
-
Customizing hook behavior:
You can customize the hook behavior by adding a
.cargo-husky/hooks/pre-commit
file with your custom script. cargo-husky will use this file instead of generating its own. -
Skipping hooks when needed:
In rare cases when you need to bypass the hooks, you can use:
git commit -m "Your message" --no-verify
However, this should be used sparingly and only in exceptional circumstances.
By following these steps, cargo-husky will enforce code quality standards on every commit, helping maintain a clean and consistent codebase.
Acknowledgments
- Jonas Depoix for the original youtube-transcript-api Python library
- All contributors who have helped improve this library
Dependencies
~10–23MB
~323K SLoC