#domain #page #information #front #host #com #language

bin+lib domain_info

Fetch information about a domain, including front page stats and technologies used

1 unstable release

0.1.1 Jan 11, 2020
0.1.0 Jan 9, 2020

#324 in Text editors

MIT/Apache

97KB
782 lines

domain_info

An early-stage crate and tool to fetch information about a domain, primarily by looking at the information on the front page of the domain.

From the front page, it gets the load time, language, word count, image/form/script counts and it uses the Wappalizer project rules to identify technilogies used on the front page.

It also does a reverse dns lookup on the host and attempts to determine the host company/platform (E.g GoDaddy, Bluehost, AWS)

Additionally it gets the mail server hosts and whois information about the domain.

domain_info google.com
{
  "domain": "google.com",
  "dns_info": {
    "ip": "172.217.6.238",
    "other_ips": []
  },
  "host_info": {
    "host": "lga25s55-in-f238.1e100.net",
    "host_tld": "1e100.net",
    "platform": "Google"
  },
  "front_page_info": {
    "status_code": "200 OK",
    "load_time": {
      "secs": 0,
      "nanos": 217272393
    },
    "word_count": 100,
    "content_length": 47090,
    "techs": [
      {
        "category": "Web Servers",
        "name": "Google Web Server"
      },
      {
        "category": "JavaScript Frameworks",
        "name": "ExtJS"
      },
      {
        "category": "JavaScript Libraries",
        "name": "List.js"
      }
    ],
    "page_content": "",
    "page_text": "",
    "language": "",
    "iframe_count": 0,
    "image_count": 1,
    "form_count": 1,
    "script_count": 8
  },
  "mx_info": {
    "servers": [
      "alt2.aspmx.l.google.com",
      "alt3.aspmx.l.google.com",
      "aspmx.l.google.com",
      "alt1.aspmx.l.google.com",
      "alt4.aspmx.l.google.com"
    ]
  }
}```

```sh
cat domains.txt | domain_info | jq

Status

An initial implementation is there and it performs adequatily for small lists of domains. More work and testing needs to be done when running against large lists of domains.

TODO

[] Better domain validation [] Executable will silently hang if there is no argument and no piped input. I need to find a mechanism in Rust to check if there is any input for stdin. [] Extract Wappalyzer into its own crate [] Extract the reverse dns host company/brand into its own crate

Notes

The Wappalyzer-based functionalty is somewhat limited as we are not running the rules from within a headless browser, but only against the initially-returned html from the main page.

Dependencies

~28MB
~521K SLoC