3 stable releases
| 1.0.2 | Oct 14, 2025 |
|---|---|
| 1.0.1 | Oct 10, 2025 |
| 1.0.0 | Oct 8, 2025 |
#431 in Web programming
1MB
25K
SLoC
Webpage Quality Analyzer
High-performance webpage quality analyzer with 115 comprehensive metrics. Analyze web pages for SEO, content quality, technical standards, accessibility, and more - all in milliseconds.
๐ Features
- 115 Comprehensive Metrics (92 HTML-based + 23 network-based) across 7 major categories (Content, SEO, Technical, Accessibility, and more)
- 8 Built-in Profiles optimized for different page types (news, blog, product, portfolio, etc.)
- Multi-Platform Support: Native Rust, WebAssembly (browser/Node.js), C++ FFI
- High Performance: 100+ pages/second batch processing with parallel analysis
- Advanced Customization: Metric weights, thresholds, penalties, bonuses, and field selectors
- Profile-Aware Scoring: Phase 3-6 implementation with category-based weighted scoring
- Output Optimization: Field selection with up to 98.8% size reduction
- Production Ready: Battle-tested, 40+ test files, extensive documentation
๐ฆ Installation
Add this to your Cargo.toml:
[dependencies]
webpage_quality_analyzer = "1.0"
Or use cargo:
cargo add webpage_quality_analyzer
๐ฏ Quick Start
Level 1: Simple Usage
use webpage_quality_analyzer::{analyze, analyze_with_profile};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Analyze with default settings
let report = analyze("https://example.com", None).await?;
println!("Score: {}/100", report.score);
println!("Quality: {}", report.verdict);
println!("Word Count: {}", report.metrics.content_metrics.word_count);
// Analyze with specific profile
let news_report = analyze_with_profile(
"https://example.com",
None,
"news"
).await?;
Ok(())
}
Level 2: Builder Pattern
use webpage_quality_analyzer::Analyzer;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Build custom analyzer
let analyzer = Analyzer::builder()
.with_profile_name("blog")?
.with_metric_weight("word_count", 1.5)?
.disable_metric("grammar_score")?
.with_timeout_secs(30)?
.build()?;
let report = analyzer.run("https://example.com", None).await?;
println!("Custom analysis score: {}", report.score);
Ok(())
}
Level 3: Advanced Configuration
use webpage_quality_analyzer::{from_config_file, analyze_batch_high_performance};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load from YAML/JSON/TOML config file
let analyzer = from_config_file("config.yaml")?;
let report = analyzer.run("https://example.com", None).await?;
// High-performance batch processing
let urls = vec![
"https://site1.com",
"https://site2.com",
"https://site3.com",
];
let json_results = analyze_batch_high_performance(
&urls,
None, // HTML (None = fetch from URLs)
50, // Max concurrent requests
Some("news") // Profile name
).await?;
for report in reports {
println!("{}: {}/100", report.url, report.score);
}
Ok(())
}
Analyzing HTML Directly
let html = r#"
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sample Page</title>
<meta name="description" content="A sample page">
</head>
<body>
<h1>Welcome</h1>
<p>This is a test page with some content.</p>
</body>
</html>
"#;
let report = analyze("https://example.com", Some(html.to_string())).await?;
println!("HTML analysis score: {}", report.score);
๐ Metrics Categories
All 115 metrics (92 HTML-based + 23 network-based):
Major Categories (7 total)
- Content (11 metrics) - Word count, readability (Flesch-Kincaid), text quality, content density
- SEO (9 metrics) - Meta tags, Open Graph, structured data, canonical URLs
- Technical (6 metrics) - HTML size, scripts, styles, validation
- Semantic (4 metrics) - Heading hierarchy, heading length, heading distribution
- Accessibility (7 metrics) - WCAG compliance, ARIA labels, contrast, alt text
- Network (23 metrics) - Performance (LCP, FCP), Security (HTTPS, CSP), Analytics
- Miscellaneous (55 metrics) - Links (8), Media (8), Forms (6), Structure (5), UX (5), Mobile (4), Branding (4), Structured Data (4), Business (3), Authority (3), Error (3), Internationalization (2)
Metric Distribution
- 92 metrics (80%) - HTML-only, no network required (WASM-compatible)
- 23 metrics (20%) - Network-required (when fetching URLs, server-side only)
See: Complete metrics breakdown
๐จ Available Profiles
Choose the right profile for your page type (8 built-in profiles):
| Profile | Best For | Content Weight | Key Focus |
|---|---|---|---|
content_article |
Long-form articles | 80% | Word count, structure, comprehensiveness |
blog |
Blog posts | 75% | Content quality, engagement, readability |
news |
News articles | 40% | Content freshness, readability, SEO (30%) |
general |
Any webpage | 35% | Balanced scoring across all categories |
homepage |
Landing pages | 25% | Navigation, structure, balanced (25% each) |
product |
Product pages | 20% | Media (35%), SEO (25%), product details |
portfolio |
Creative showcases | 15% | Media (50%), visual content |
login_page |
Authentication | 10% | Technical (50%), accessibility (20%), security |
Profile Customization: Each profile includes:
- Category weights (Content, SEO, Technical, Semantic, Accessibility)
- Content expectations (word count, headings, images)
- Metric overrides (custom weights and thresholds)
- Penalties (severe, moderate, light)
- Bonuses (excellence, achievement, synergy)
โ๏ธ Feature Flags
Control optional features via Cargo features:
[dependencies]
webpage_quality_analyzer = { version = "1.0", features = ["async", "linkcheck", "nlp"] }
Available features:
async(default) - Async runtime with tokio + reqwestreadability(default) - Mozilla Readability content extractionlinkcheck- External link validationnlp- Language detection and Unicode segmentationgrammar- Grammar checking (via nlprule)wasm- WebAssembly bindings (mutually exclusive with async)ffi- C FFI for C++ integrationcli- Command-line tool binary
๐ Multi-Platform Support
WebAssembly (Browser/Node.js)
# Build for npm
wasm-pack build --target bundler --no-default-features --features wasm
# Use in JavaScript/TypeScript
npm install @webpage-quality-analyzer/core
import { WasmAnalyzer } from '@webpage-quality-analyzer/core';
const analyzer = new WasmAnalyzer();
const report = await analyzer.analyze('<html>...</html>');
console.log(`Score: ${report.score}/100`);
C++ Integration
#include "webpage_quality_analyzer.hpp"
CAnalyzer* analyzer = wqa_analyzer_new();
CReport* report = wqa_analyze(analyzer, "https://example.com", nullptr);
double score = wqa_report_get_score(report);
Command-Line Tool
# Download binary from releases
wqa analyze https://example.com
wqa batch urls.txt --parallel 10
wqa profiles # List available profiles
๐ง Customization
Custom Metric Weights
let analyzer = Analyzer::builder()
.with_profile_name("blog")?
.with_metric_weight("word_count", 1.5)? // Increase importance
.with_metric_weight("readability_score", 2.0)? // Double weight
.build()?;
Custom Thresholds
let analyzer = Analyzer::builder()
.with_profile_name("blog")?
.set_metric_threshold(
"word_count",
100.0, // min
800.0, // optimal_min
2000.0, // optimal_max
5000.0 // max
)?
.build()?;
Custom Penalties & Bonuses
use webpage_quality_analyzer::{GlobalPenalty, PenaltyTrigger, PenaltyType};
let analyzer = Analyzer::builder()
.with_profile_name("news")?
.add_penalty(GlobalPenalty {
trigger: PenaltyTrigger::MetricBelow {
metric: "word_count".to_string(),
threshold: 500.0,
},
penalty: PenaltyType::FixedPoints { points: 10.0 },
description: "Content too short".to_string(),
})?
.add_bonus_above("readability_fk", 80.0, 5.0, "Highly readable")?
.build()?;
Disable Metrics
let analyzer = Analyzer::builder()
.with_profile_name("general")?
.disable_metric("grammar_score")?
.disable_metric("language_detection")?
.build()?;
Output Customization (Phase 6)
// Full report (default)
let report = analyzer.run(url, html).await?;
// Compact JSON (20-30% size reduction)
let compact_json = analyzer.run_compact(url, html).await?;
// Minimal output (98.8% size reduction)
let minimal = analyzer.run_with_fields(
url,
html,
vec!["score", "verdict", "url"]
).await?;
// Advanced field selection
use webpage_quality_analyzer::FieldSelector;
let selector = FieldSelector::builder()
.include_sections(vec!["metrics"])
.exclude_section("processed_document")
.build();
let custom = analyzer.run_with_selector(url, html, &selector).await?;
๐ Performance
Analysis Speed:
- Single page (HTML-only): <100ms (typical), ~200ms (large docs)
- Single page (with network): ~300-500ms
- Batch processing: 180+ pages/second (HTML-only), 50+ pages/second (with network)
- Memory: Linear scaling, stable across repeated analyses
- Thread-safe: Fully concurrent with
Arc<Semaphore>control
Output Optimization:
- Full report: 30-50 KB (pretty), 12-18 KB (compact)
- Minimal output: 500 bytes (98.8% reduction)
- Custom fields: 300 bytes (3 fields)
// High-performance batch processing
use webpage_quality_analyzer::analyze_batch_high_performance;
let urls = vec![/* ... 100 URLs ... */];
let json_results = analyze_batch_high_performance(
&urls,
None, // Fetch HTML from URLs
50, // Max 50 concurrent requests
Some("news") // Profile
).await?;
๐ Documentation
๐งช Testing
cargo test # Run all tests
cargo test --features linkcheck # With network features
cargo bench # Run benchmarks
๐ License
Dual licensed under MIT OR Apache-2.0. You can choose either license.
๐ค Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
๐ฆ Related Packages
- NPM:
@webpage-quality-analyzer/core- JavaScript/TypeScript (WASM) - CLI: Download binaries for Linux/Windows/macOS
- C++: Pre-compiled libraries with headers
- Python: Coming soon (PyO3 bindings)
๐ Why Choose This Analyzer?
- Comprehensive: 115 metrics across 20 categories covering all aspects of webpage quality
- Fast: Rust-powered performance, 180+ pages/sec batch processing
- Flexible: 8 profiles + full customization of weights, thresholds, penalties, bonuses
- Multi-Platform: Works everywhere - Native Rust, WASM (browser/Node.js), C++ FFI
- Production-Ready: 40+ test files, 279-line test README, extensive documentation
- Modern: profile-aware scoring, output optimization, field selectors
- Optimized: DOM caching, streaming serialization, 98.8% output size reduction
๐ Example Report
{
"score": 7.5,
"verdict": "Very Poor",
"url": "https://example.com",
"metrics": {
"content_metrics": {
"word_count": 10,
"paragraph_count": 1,
"avg_sentence_length": 7.5,
"readability_flesch_kincaid": 68.2
},
"technical_metrics": {
"title_length": 14,
"has_meta_description": true,
"html_size_bytes": 12320
},
"seo_metrics": {
"has_og_tags": true,
"has_schema_org": true,
"canonical_url_present": true
}
},
"phase3_scoring": {
"category_scores": {
"Content": 2.3,
"SEO": 68.5,
"Technical": 45.0
}
}
}
Made with โค๏ธ in Rust | Version 1.0.0 | October 2025
Dependencies
~11โ31MB
~450K SLoC