2 unstable releases

0.2.0 Aug 19, 2023
0.1.0 Aug 6, 2023

#2287 in Command line utilities

MIT license

31KB
259 lines

normalized-hasher

badge github badge crates.io badge license

Create cross-platform hashes of text files.

This is the binary crate. If you're looking for the library crate instead, go to normalized-hash.

Motivation

Hashes or checksums are a great means for validating the contents of files. You record the hash of a file, distribute the file and the hash code, and everyone can run the hasher again to verify that the file has not changed since you created the hash the first time. Each small change will also change the hash code. Even if it is a change you cannot even see.

In my job, we unfortunately had this situation a couple of times. The workflow is as follows: We create code and generate a hash from this code. Both are inserted into a specification document. Then we copy and paste the code to a customer's system and run the hasher again to verify that the code is still the same as in the specification. But from time to time, we got different hashes. After some search for the reason, we stumbled across this one coworker who did not save their files with UNIX line endings (a single LF) like the rest of us, but with Windows line endings (CR followed by LF). Just by looking at the files, they seemed identical, but after enabling control characters, we could clearly see the differences in the end of every line. By copying the code to the customer system, the line endings get automatically converted into UNIX style, hence the hash would be different from what we generate on our systems. This is an embarrassing situation, because this involves huge paper work to request a change in the already finalized specification document.

To come over this problem, I created this program. A file hasher that would convert file endings to UNIX style on the fly when generating the hash. So, no matter how the file was created, the hash would be the same.

Installation

normalized-hasher can be installed easily through Cargo via crates.io:

cargo install --locked normalized-hasher

Please note that the --locked flag is necessary here to have the exact same dependencies as when the application was tagged and tested. Without it, you might get more up-to-date versions of dependencies, but you have the risk of undefined and unexpected behavior if the dependencies changed some functionalities. The application might even fail to build if the public API of a dependency changed too much.

Alternatively, pre-built binaries can be downloaded from the GitHub releases page.

Usage

Usage: normalized-hasher [OPTIONS] <FILE_IN> [FILE_OUT]

Arguments:
  <FILE_IN>
          File to be hashed

  [FILE_OUT]
          Optional file path to write normalized input into

Options:
      --eol <EOL>
          End-of-line sequence, will be appended to each normalized line for hashing
          
          [default: "\n"]

      --ignore-whitespaces
          Ignore all whitespaces
          
          This will remove all whitespaces from the input file when generating the hash.

      --no-eof
          Skip last end-of-line on end-of-file
          
          With this flag, no trailing EOL will be appended at the end of the file.

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Flags

  • --eol

    With the --eol flag you can change the end-of-line sequence that will be appended to each normalized line to generate the hash. This can be useful if you explicitly want CRLF endings, for example.

    Please note that you need to escape control characters properly in your shell. For Bash, you can type:

    normalized-hasher --eol $'\r\n' input.txt output.txt
    
  • --ignore-whitespaces

    In some extreme cases, you might want to ignore all whitespaces in a file. With the --ignore-whitespaces flag, all whitespaces are removed prior to generate the hash.

  • --no-eof

    With the --no-eof flag you can avoid appending the EOL sequence at the end of the file. This is for use cases where such trailing EOL is not desireable, like in Windows files. In contrast to UNIX files which usually end with a final LF, Windows files do not usually end with an additional CRLF.

Examples

Simple example with default options, without writing an output file:

normalized-hasher input.txt

More complex example, with writing output:

normalized-hasher --eol $'\r\n' --no-eof input.txt output.txt

Dependencies

~1.5–2.2MB
~42K SLoC