1 unstable release
Uses old Rust 2015
0.3.1 | Feb 5, 2020 |
---|
#2209 in Cryptography
120KB
2.5K
SLoC
[[enprot]]
Engyon: enprot
Enprot is a confidentiality processor for text and source code files.
Introduction and Tutorial
Engyon Protected Text (EPT) is a human-editable annotation method that allows a text format document to contain finely grained cryptographic confidentiality and integrity features.
Enprot requires a recent version of botan.
Use cargo build --release
to build the release binary target/release/enprot
or cargo run --
to build and run the debug version. Some dependencies may
require the latest compiler.
There is a simple help built-in with -h
flag:
[source,sh]
enprot$ cargo run -- -h
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/enprot -h`
enprot
USAGE:
enprot [FLAGS] [OPTIONS] <FILE>...
FLAGS:
-v, --verbose Produce more verbose output
-q, --quiet Suppress unnecessary output
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-l, --left-separator <SEP> Specify left separator in parsing
-r, --right-separator <SEP> Specify right separator in parsing
-s, --store <WORD>... Store (unencrypted) WORD segments to CAS
-f, --fetch <WORD>... Fetch (unencrypted) WORD segments to CAS
-k, --key <WORD=PASSWORD>... Specify a secret PASSWORD for WORD
-e, --encrypt <WORD>... Encrypt WORD segments
-E, --encrypt-store <WORD>... Encrypt and store WORD segments
-d, --decrypt <WORD>... Decrypt WORD segments
-c, --casdir <DIRECTORY> Directory for CAS files (default "cas" if exists, else ".")
-p, --prefix <PREFIX> Use PREFIX for output filenames
-o, --output <FILE>... Specify output file for previous input
ARGS:
<FILE>... The input file(s)
enprot$
All of the commands also have have long variants; see src/main.rs
.
A simple example is contained in sample/test.ept
:
hello, this is a test file
// <( BEGIN GEHEIM )>
Secret line 1
Secret line 2
// <( BEGIN Agent_007 )>
James Bond
// <( END Agent_007 )>
// <( END GEHEIM )>
// <( BEGIN Agent_007 )>
Super secret line 3
// <( END Agent_007 )>
As can be seen, the most elementary EPT markup is in comments of the
"host
" language (which is C or AsciiDoc in this case) and consists
of BEGIN..END segments. Each such segment has a keyword (or WORD)
associated with it. Keywords can be used inside other keywords to form
a tree-like structure.
When invoke enprot
on a syntactically correct file such as this one,
nothing happens:
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept
enprot$
In fact the markup has been chosen specifically in a way that most input
files would be syntactically correct without modification. The markup
is contained between left and right separators that can be specified
with -l
and -r
flags. Here are some comment-hiding suggestions
for different languages:
|===
| Format | LEFT_SEP
| RIGHT_SEP
| Raw text format | "<("
, | ")>"
| AsciiDoc and C++ code | "// <("
| ")>"
| MarkDown, XML, HTML | "<-- <("
| ")> -->"
| (La)TeX and similar | "\n% <("
| ")>"
|===
Note that the left separator must start the line (after whitespace). The
right separator must currently also be on the same line. Adding verbosity
with -v
reveals what the system is doing:
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -v
LEFT_SEP='// <(' RIGHT_SEP=')>' casdir = 'cas'
Reading sample/test.ept
Transforming sample/test.ept
Writing sample/test.ept
enprot$
We see the default setting for left and right separators. Furthermore we see that the file is read in (parsed), transformed (which is a no-op in this case), and then written back (synthesized), overwriting the original file. This transformation behavior is controlled by a multitude of flags (as seen before).
Content Addressed Storage
In above example we see casdir = 'cas'
. This means that the default
directory for CAS (Content Addressed Storage) is cas
under current
directory. If you do not have a cas
directory set up, these work files are
written to the current directory. This is not a problem, but creates
clutter. We suggest that you do a mkdir cas
.
The operation of CAS can be demonstrated simply by using the -s
(store) flag,
which hides away (sanitizes) a part of the text file.
enprot$ ./target/debug/enprot sample/test.ept -s GEHEIM
enprot$ cat sample/test.ept
hello, this is a test file
// <( STORED GEHEIM cea67c3ef34ff899793b557e9178c1b97bbcfe9722df2f6d35d2d0c91d2c1fe4 )>
// <( BEGIN Agent_007 )>
Super secret line 3
// <( END Agent_007 )>
enprot$ ls cas
cea67c3ef34ff899793b557e9178c1b97bbcfe9722df2f6d35d2d0c91d2c1fe4
enprot$
We can see that the entire section between BEGIN GEHEIM and END GEHEIM has disappeared and has been replaced with a single STORED GEHEIM. The contents are actually stored in a file with a long hexadecimal filename. The properties of cryptographic functions guarantee that no two different contents can have the same hash. This removes much of the problem of version control as the file can be referred directly by it's content.
One could now send this file for editing, and new text could be added around
the GEHEIM sections. If the original CAS files are around, the same hidden
part can be recovered simply with a -f
(fetch) instruction:
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -f GEHEIM
Now the contents of test.ept
are exactly as they were before, and the GEHEIM
section is again contained in a BEGIN.. END enclosure.
With all parameters, multiple keywords can be joined with a comma:
[source,sh]
enprot sample/test.ept -s Agent_007,GEHEIM
enprot$ cat sample/test.ept
hello, this is a test file
// <( STORED GEHEIM cea67c3ef34ff899793b557e9178c1b97bbcfe9722df2f6d35d2d0c91d2c1fe4 )>
// <( STORED Agent_007 575d69f5b0034279bc3ef164e94287e6366e9df76729895a302a66a8817cf306 )>
enprot$
We see the the first GEHEIM is again stored under the same filename. In fact it was not even overwritten because the system checked that a file with that name already existed in the CAS directory, so there is no need.
Such determinism is a important property of the CAS. Even if you lose the CAS files related to some sanitized version of the document, you may regenerate the exactly same ones if you have the original unsanitized document.
Now the original document can be restored with
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -f Agent_007 -f notexistent,GEHEIM
You see that -f
parameter can be given multiple times. In fact it is possible
to even mix -s
and -f
statements on the same command if you want to
sanitize some keywords while unsanitizing others. However specifying both
-s
and -f
for the same keyword isn't very helpful; the keyword will
be unsanitized and resanitized on alternative runs.
Encryption and Decryption
We may encrypt sections in a way that keeps the ciphertext entirely in the
document itself. Assuming that sample/test.ept
is at it's original state:
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -e Agent_007
Password for Agent_007:
Repeat password for Agent_007:
enprot$ cat sample/test.ept
hello, this is a test file
// <( BEGIN GEHEIM )>
Secret line 1
Secret line 2
// <( ENCRYPTED Agent_007 )>
// <( DATA lEsVpN3ES6rj0sbxrDm30EgMpYCc+yKM2i2Z )>
// <( END Agent_007 )>
// <( END GEHEIM )>
// <( ENCRYPTED Agent_007 )>
// <( DATA C0nBhV6V5yVExLOgvpK8xzUluc08lsr7wwBhx4ENMDrJU3pA )>
// <( END Agent_007 )>
enprot$
In the above example I entered "bond" in both password prompts. Keys can
also be passed from command line with the -k
flag:
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -e GEHEIM -k GEHEIM=james
enprot$ cat sample/test.ept
hello, this is a test file
// <( ENCRYPTED GEHEIM )>
// <( DATA 4reYea85vTqNzzf7eon3x/LHs6iLy3GPgSZvsX7l0MhqdVnuIe5y3poxqvQxFqYT )>
// <( DATA B1np55+m8WlPDtzMt+SMPEyfPIKAeqo+tAWS7ftfJmAqSswibIqRJh0jXO6nBDvK )>
// <( DATA 4EclPifsb89G2i5vu8dfFkmQT8uj2o71UAohLPeY8vX2qksDJGm99pzZwm5hoXUm )>
// <( DATA VVYf )>
// <( END GEHEIM )>
// <( ENCRYPTED Agent_007 )>
// <( DATA C0nBhV6V5yVExLOgvpK8xzUluc08lsr7wwBhx4ENMDrJU3pA )>
// <( END Agent_007 )>
enprot$
Decryption can be performed exactly the same way using the -d
command:
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -d Agent_007,GEHEIM -k GEHEIM=james -k Agent_007=bond
enprot$ cat sample/test.ept
hello, this is a test file
// <( BEGIN GEHEIM )>
Secret line 1
Secret line 2
// <( ENCRYPTED Agent_007 )>
// <( DATA lEsVpN3ES6rj0sbxrDm30EgMpYCc+yKM2i2Z )>
// <( END Agent_007 )>
// <( END GEHEIM )>
// <( BEGIN Agent_007 )>
Super secret line 3
// <( END Agent_007 )>
enprot%
We see that only one layer of encryption was removed from GEHEIM. You may use the exactly same command for second iteration to reveal the original file.
Working on Source Code
The system allows one work on text-format documents, but also on program source code. For example the source code of Enprot has an encrypted portion in its help message:
[source,sh]
enprot$ ./target/debug/enprot -d AUTHOR -k AUTHOR=markku src/lib.rs
enprot$ cargo run -- -h
Compiling enprot v0.1.0 (file:///home/mjos/Desktop/lab/enprot)
Finished dev [unoptimized + debuginfo] target(s) in 2.17s
Running `target/debug/enprot -h`
Written 2018 by Markku-Juhani O. Saarinen <mjos@iki.fi>
[...]
enprot$
Notice how that authorship information appeared at the end of help text when cargo recompiled the source code (since it was "touched"). This is because some source lines originally read:
// <( ENCRYPTED AUTHOR )>
// <( DATA X417HVMRRAs6Z1xGo5yY4TxUQ2tpAHEKQ1sg9+kfku5uUikK3y2tODtsUiGqfRGW )>
// <( DATA xUCGYFu02BCdqPM7uuX5UNvbfrLvKkj6gLYwg/cr42PJmr4o5xnw1qo= )>
// <( END AUTHOR )>
Which was decrypted to
// <( BEGIN AUTHOR )>
println!("Written 2018 by Markku-Juhani O. Saarinen <mjos@iki.fi>");
// <( END AUTHOR )>
Without modifying anything else in the source code.
Encrypted Stashing
If we combine encryption -e WORD
and CAS storage -s WORD
, the ciphertext
is stored into CAS in encryption form. One may use -E
flag to specify
both predicates at once.
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -E GEHEIM
Password for GEHEIM:
Repeat password for GEHEIM:
enprot$ cat sample/test.ept
hello, this is a test file
// <( ENCRYPTED GEHEIM 12d24bf3dbebfe5feb7684efdb1d98391c4b0afd809a8bc87f3f8e6f75e59651 )>
// <( BEGIN Agent_007 )>
Super secret line 3
// <( END Agent_007 )>
enprot$
Here I left out the -k
definition so Enprot asked me to enter a password.
The -d
flag will work the same way when the ciphertext is in CAS or in
local DATA clauses.
[source,sh]
enprot$ ./target/debug/enprot sample/test.ept -d GEHEIM
Password for GEHEIM:
enprot$
Multi-File Processing
Since files are transformed in place, you can use wildcards to process a large number of files at once. You will be asked for passwords only once.
To process a file and output to a different filename, use -o
:
[source,sh]
enprot$ ./target/debug/enprot input.ept -o output.ept
To direct output to an another directory, or to add a prefix flag -p PREFIX
.
The PREFIX is literally added before each output file. Note that if input
filenames have a relative path, that remains unchanged.
enprot$ ./target/debug/enprot -p output/ file.*
Will read files file.1
, file.2
, etc and write them into directory output
(if it exists). However
[source,sh]
enprot$ ./target/debug/enprot -p output file.*
Will produce files outputfile.1
, outputfile.2
, etc.
Cryptography: Symmetric Authenticated Encryption
Due to its minimal message expansion and non-sequential nature of data
being encrypted, a nonce-reuse/misuse resistant Authenticated Encryption
with Associated Data (AEAD) mechanism is used. We have chosen to use
AES-256 in SIV (Synthetic Initialization Vector) mode [RFC5297]. A SIV
ciphertext is always 16 bytes larger than plaintext and the 16-byte
authentication tag also serves as the "synthetic IV
".
All hash function computations for CAS utilize SHA-3 [FIPS202] variants. It is also used to derive keying material from passwords.
Dependencies
~6.5MB
~121K SLoC