5 releases (3 breaking)

0.4.0	Jun 8, 2020
0.3.0	Jun 5, 2020
0.2.1	Mar 21, 2020
0.2.0	Feb 24, 2020
0.1.0	Feb 22, 2020

#6 in #constructs

Apache-2.0

86KB
1K SLoC

Crypto Construct Language

Summary

CCLang is a new language, inspired by Bitcoin script, for serializing all cryptographic constructs as a series of data and command tokens that describe the use/application of the construct. For instance, a secret key would be serialized as an encrypted key with commands for unwrapping the key. Anybody wanting to access the secret key can execute the script and if they provide the proper inputs, the result will be the unwrapped key. This applies for all cryptographic constructs from simple constructs such as key storage up to the most complex such as multi-factor key rotation schemes.

Motivation

By storing cryptographic constructs in "functional" form, we abstract away the underlying cryptographic privitive/operation providers as well as provide a standardized language for describing arbitrarily complex cryptographic constructs in serialized form (e.g. DID document, blockchain transaction records, encrypted packets, etc).

This came from the DID document standardization effort when we ran into the need for storing complex cryptographic constructs for different controller binding and key rotation operations. Those operations typically have one or more ways that a controller can prove control over an identity and/or force a rotation away from a compromised key. This RFC seeks to standardize the language we use to describe those constructs so that regarless of the underlying crypto library, the construct can be recreated and the operation executed.

This also came from the did:git DID method spec effort where we ran into the need for describing cryptographic constructs for enforcing governance rules for a repo. An example rule would be: changes to the MAINTAINERS file requires a commit that is signed by at least 2 of the maintainers of the project. The rules need to be serialized in a form that is machine parsable and machine executable with context specific references to files and commits to read data used in the construct creation. This RFC again seeks to standardize the language we use to describe those rules.

To further expand on the topic of enforcing an M-of-N signatures of existing maintainers for any commits that change the MAINTAINERS file in a repo, the follow example shows how it could be accomplished. The MAINTAINERS file can contain a number of different CCLang scripts for the governance of that file itself. Or the rules can be stored in another GOVERNANCE file. It doesn't matter.

Let's assume the CCLang M-of-N check script is stored in the MAINTAINERS file. It assumes that the list of maintainer public keys will first be pushed onto the stack before it gets executed and it just adds up the number of maintainer public keys and tests to see if that sum is greater-than-or-equal to the M threshold.

Now, if the CCLang parallel multi-sig on the commit is written so that it leaves a copy of the validating public key on the stack for every valid digital signature, then to enforce M-of-N, the commit hook script just needs to append the M-of-N CCLang script from the MAINTAINERS file to the end of the CCLang multi-sig in the commit and then execute.

The first half will be the CCLang multi-sig from the commit and it will leave the stack with public keys for all of the valid signatures. Then the second half will be the M-of-N check script that checks each public key against the list of public keys from the MAINTAINERS file and adds up the number of matches and compares that for >= against the M threshold. If, at the end, the stack has TRUE left then the commit passes the M-of-N multi-sig of maintainers check. If the stack is left with FALSE it does not pass the check and the commit should be rejected.

If a code repo requires that all commits be signed by an identity already stored in the repo itself and all of the commit hooks enforce the cryptographic checks using CCLang, and if all of the cryptographic check CCLang scripts are also stored in files in the repo, then the entire process is self-certifying and guaranteed to be correct.

Guide-level Explanation

The crypto constructs language (CCLang) is used to serialize all cryptographic constructs and uses operations that map directly to the cryptographic algorithms and operations that Ursa offers. Ursa abstracts away multiple implementations of different algorithms (e.g. SHA256, RSA encryption, etc). At compile time the developer chooses which implementation they want included in the final Ursa binary. CCLang is an extension of the Ursa API by encoding Ursa API calls as a series of data and tokens that direct the use of the Ursa API.

CCLang is inspired by Bitcoin script and is a reverse-polish notation (RPL) scripting language design to execute on a virtual stack machine. CCLang scripts contain data tokens, argument tokens and operation tokens to serialize the steps needed to execute a specific cryptographic operation (e.g. verify a digital signature, unwrap a secret key, validate a key rotation operation, etc). Anywhere a developer would normally encode data used for a cryptographic operation they would instead store a CCLang script.

It is important to state that CCLang's scope is limited to cryptographic operations and is not intended to support other operations such as contacting oracles, network lookups, or anything else that wouldn't be handled by an existing cryptographic library. It is intended to normalize how we all share complex cryptographic constructs through serialized form (e.g. network packets, DID documents, etc).

Some Examples

I think the best way to really understand how CCLang is different is to look at some comparisons between existing serialization schemes and how the same construct would be serialized in CCLang. Don't worry about understanding all of the tokens in these examples because they will be described in full detail in the reference section below.

Key Material

The most basic of cryptographic constructs is storing a public key. In the proposed DID document specification a public key is serialized in JSON format with the encoding type in the key:

{ "type": "Ed25519", "hexkey": "0a7d1d784358af1f8073ba07eb5ae2fc7272a860ec4547de8bc13d04259cd59a" }

Secure Scuttlebutt does something different. They use a sigil character to identify a public key---@---and then add a suffix to define the algorithm the key is intended to be used with:

@Cn0deENYrx+Ac7oH61ri/HJyqGDsRUfei8E9BCWc1Zo=.ed25519

In CCLang, a public key alone doesn't need any decoration. It is just stored as the key data. The serialization changes however if the key must be encoded for the given format. When storing in a text format, public keys are typically encoded using hexidecimal characters or a binary-to-text translation scheme like Base64 or Base58. So in a text format, a Base64 encoded public key in CCLang form would look like this:

Cn0deENYrx+Ac7oH61ri/HJyqGDsRUfei8E9BCWc1Zo= Base64 DECODE

The first token is the Base64 encoded public key data followed by the identifier for the encoding scheme (e.g. "Base64") and lastly an opcode executing the decode operation. Since CCLang is processed using a stack machine this is an RPN representation of the operation to decode the key into bytes.

To execute this, the tokens are processed left-to-right. First the encoded key data is pushed onto the stack. Then the identifier specifying the text-to-binary scheme is pushed onto the stack. The DECODE opcode pops the encoding scheme identifier and the encoded bytes off of the stack, executes the correct decoding function and then pushes the resulting bytes onto the stack. Any software that understands CCLang would execute this script to get the public key bytes in memory ready for use in other operations.

Digital Signatures

Digital signatures are when we start getting into slightly complex cryptographic constructs. The most common way to create a digital signature is to first hash the data being signed and then using a nonce with public key cryptography to encrypt the hash of the data. The resulting digital signature is usually the combination of the original data, the encrypted hash, the public key of the signer and the nonce used when generating the signature. To verify a digital signature, the verifier uses the nonce and public key to decrypt the hash and then compares that with the hash they calculate for the signed data.

In Verifiable Credentials, they store something called a "Linked Data Signature" that contains the digital signature of the credential. They are encoded in JSON-LD and look like this:

{
  "@context": "https://www.w3.org/2018/credentials/examples/v1",
  "title": "Hello World!",
  "proof": {
    "type": "Ed25519Signature2018",
    "proofPurpose": "assertionMethod",
    "created": "2019-08-23T20:21:34Z",
    "verificationMethod": "did:example:123456#key1",
    "domain": "example.org",
    "jws": "eyJ0eXAiOiJK...gFWFOEjXk"
  }
}

Notice how the type is a complex identifier that implies a hash function and encryption function and relies on external documentation to specify exactly how an Ed25519Signature2018 is constructed. Implementors will have to read this external documentation before they will know how to process each field to verify this signature. Linked data signatures are only lightly self-describing. The steps required to verify the signature are left up to the implementor and assumed to be widely known. The linked data signature specification fails to describe exactly the steps to take to verify the signature and presents a problem for implementers new to cryptography.

In Secure Scuttlebutt, for some reason they forego the sigil character that they use for everything else and instead signal that something is a signature by adding a ".sig" string as part of the suffix. The hash algorithm used is specified in the Secure Scuttlebutt protocol guide and is not encoded in the signature itself. This limits the ability for Secure Scuttlebutt to adopt new signature schemes without massive disruption caused by breaking changes and incompatible implementations. The binary-to-text encoding scheme is also specified in the protocol specification as non-URL-safe Base64. Again, this limits the opportunity for future revisions to the protocol. The only part that is self-describing is the encryption algorithm identifier in the suffix. A Secure Scuttlebutt signature looks like the following:

QYOR/zU9dxE1aKBaxc3C0DJ4gRyZtlMfPLt+CGJcY73sv5abKKKxr1SqhOvnm8TY784VHE8kZHCD8RdzFl1tBA==.sig.ed25519

With CCLang, digital signatures are fully self-describing and do not rely on an external specification to store the procedure for verifying the digital signature. CCLang digital signatures are scripts that, when executed, verify the digital signature. The primary challenge with CCLang digital signatures is the reference to the external data that was hashed when calculating the signature. CCLang uses an "open-read-close" sequence along with application-specific identifiers to specify what data storage unit to open (e.g. file, stream, object), and which bytes to read from the data storage unit for the hash calculation.

If we assume that the following is a digital signature for a text file named foo.txt and the entire file was signed and that the binary-to-text encoding scheme used is hexidecimal and the hash algorithm used was SHA256 and the public key encryption algorithm used was Ed25519, then the CCLang digital signature would look like the following:

418391ff353d77113568a05ac5cdc2d03278811c99b6531f3cbb7e08625c63bdecbf969b28a2b1af54aa84ebe79bc4d8efce151c4f24647083f11773165d6d04
Hex DECODE 1425ffb6c0cba6e6c23ca29f22bc3881cf924241dc683d7bb3b188ea2ff38966 Hex
DECODE foo.txt OPEN 0 $ READ CLOSE Ed25519 VERIFY

To understand and verify this digital signature it needs to be executed like so:

The digital signature hexidecimal is pushed followed by the Hex encoding identifier and the opcode to decode the text to binary. The result is the digital signature binary data on the top of the stack.
Next the public key of the signer is decoded from hex and also left on the stack.
Then the name of the file---foo.txt---is pushed and the OPEN opcode pops the name, opens the file, and pushes the stream handle onto the stack.
Next the starting index of the read---0---followed by the number of bytes to read---$---are pushed onto the stack. The $ symbol means "end of stream" which will cause the READ opcode to read all of the bytes from the file and push them onto the stack followed by the stream handle.
Then the file stream is closed which closes the file and pops the stream handle from the stack leaving the bytes from the file on top.
The last step is to push the signature algorithm identifier 'Ed25519' onto the stack and then execute the 'VERIFY' opcode to verify the signature from the signature data, the public key, and the data that was signed. The 'VERIFY' opcode results in 'TRUE' or 'FALSE' being pushed onto the stack whether the signature is valid or not.

By executing the CCLang digital signature script, we validate the digital signature by putting the signature, public key and signed data onto the stack and then running the signature verification function. It is important to point out that the parameter for the OPEN opcode---foo.txt in this case---is entirely application specific. The above CCLang script is a digital signature over a file in a filesystem and therefore uses a relative path to reference the file. It is also assumed that the signature is stored in a separate file in the same directory as what is called a "detached signature". If this signature was stored as part of the foo.txt file, then the READ opcode would not use $ but would instead specify the number of bytes to read up to, but not including the digital signature itself.

If this was a digital signature over a transaction in a blockchain data store, the parameter for the OPEN operation would be the identifier for the transaction---either the hash address of the transaction or whatever addressing scheme the blockchain uses.

One other note is required. In systems where the signed data is encoded in JSON and the signature itself is also part of the JSON, the CCLang script can use multiple reads to cause the bytes on before and after the digital signature to be read for hashing. This is far superior to the way Secure Scuttlebutt and the Verifiable Credential specifications expect digital signatures to be verified and does not require the full parsing and manipulation of the JSON document.

For insance, in Secure Scuttlebutt, digital signatures are stored as: "signature": "<base64>.sig.ed25519". The specification says to parse the JSON, remove the signature field, then canonicalize the JSON before hashing. With CCLang, if the signature data starts at offest 152 and is 100 bytes long, then the CCLang version of the signature would have two READ operations followed by a CONCAT like so:

0 152 READ 251 $ READ CONCAT

This would leave the stack with the JSON bytes before and after the signature line on the stack ready to be parsed, canonicalized, and then hashed. If the JSON is assumed to already be canonicalized, then the hash can be directly calculated.

Multi-Sig Signatures

Where it becomes obvious that CCLang is superior to both Verifiable Credentials and Secure Scuttlebutt is in the encoding of multi-sig signatures. Multi-sig signatures are digital signatures constructed as a conglomeration of signatures from two or more identities. There are two kinds of multi-sig signatures: parallel and serial.

Parallel signatures are when multiple identities sign the same data. This is a simple agreement protocol. Think of it like multiple parties signing the same contract. They are all agreeing to the data they are signing. Serial signatures are when one identity signs some data and then a second identity signs both the data concatenated with the first signature. The next signature would sign the concatenation of the data and all previous signatures. This is an endorsement protocol whereby each subsequent signature endorses the data and the previous signatures. This is useful in supervisory roles such as project maintainers in open source projects. Developers submit signed commits and then the maintainers sign the commit and the developer's signature when merging it into the main branch of the code, thus endorsing the contribution.

Currently, the Verifiable Credentials spec doesn't directly specify how either type of multi-sig is to be serialized. It is left up to the implementor as long as each individual signature follows the spec, it is a compliant signature.

In Secure Scuttlebutt protocol specification, multi-sig signatures are not addressed at all. There is some discussion on supporting this but no consensus has emerged mostly because all obvious solutions are ugly hacks and so far the need is low.

With CCLang, both kids of multi-sig signatures are trivial. By using an IF-ELSE-FI opcode trio it is easy to append digital signatures to form a multi-sig signature. For instance, let's start with a basic digital signature that looks like the following:

<sig hex> Hex DECODE <pub key hex> Hex DECODE foo.txt OPEN 0 $ READ CLOSE Ed25519 VERIFY

Let's say another signer wants to add a parallel signature to the one above. They would calculate their own signature and append it using IF-ELSE-FI to create a CCLang script that validates both signatures over the data:

<first sig hex> Hex DECODE <first pub key hex> Hex DECODE OPEN 0 $ READ CLOSE ED25519 VERifY IF <second sig hex> Hex DECODE <second pub key hex> Hex DECODE foo.txt OPEN 0 $ READ CLOSE Ed25519 VERIFY ELSE FALSE FI

This is a parallel multi-sig. The first part is a normal CCLang digital signature validation. After the VERIFY opcode, the stack will either have a TRUE on top if the signature is valid or a FALSE on top if it is not. The IF opcode pops the top and if it is TRUE, the script between IF and ELSE is executed, otherwise the script between ELSE and FI is executed. Between the IF and ELSE is a second digital signature validation script that will leave TRUE on top if it is valid. So if both signatures are valid, the script will end with TRUE on top of the stack. If the first signature is invalid, then the code between ELSE and FI will get executed. That script just pushes FALSE onto the stack to indicate that the signature checks failed. Any number of parallel signatures can be appended to the digital signature independently. This allows for asynchronous multi-sig operations where one person signs some data and then forwards the data and their signature to the next person who then appends their signature and so on until all signatures are created and appended.

For serial multi-sig, the subsequent signature must include not only the data but the previous signatures. To accomplish this, the CCLang script of the endorsing signature must be set up with reads of the data and reads of the previous signature data before the validaton opcode executes.

Any number of digital signatures can be generated and appended to the existing signature using the IF-ELSE-FI pattern. It is also possible to mix both parallel and serial digital signatures in any arbitrary arrangement. Let's say a commit comes in that is signed off by two authors that collaborated. That commit has two parallel signatures, one from each author. Then the maintainer of the project merges it with their serial signature that covers both the commit and the two signatures from the authors.

Reference-level Explanation

The Stack

The heart of the CCLang system is the stack machine used to execute CCLang scripts. In general, the CCLang stack machine is an abstract stack machine that accepts data of any type. Long strings get stored and references to the strings are pushed onto the stack. Opcodes may or may not pop arguments from the stack and may or may not push results onto the stack. If for some reason an opcode cannot be executed due to incorrect types of input parameters or the values of the input parameters are invalid, the execution of the CCLang script will halt immediately and an appropriate error code will be returned to the caller of the CCLang interpreter.

Documenting Opcodes

The rest of this reference uses the standard notation for documenting commands in the Forth programming language. Forth is also a RPN stack based programming language. It uses a simple form of documenting commands that looks like the following:

/ a1 -- argument one is data of any type.
/ a2 -- second argument of any type.
= ( a1 a2 -- TRUE|FALSE )

Comment lines being with a single / character and come before the command and stack documentation to give explanations of the parametes int he stack diagram. The stack diagram follow the command---in this case =---and consists of a right parenthesis ( followed by the parameters poped off of the stack with -- separator followed by the return parameters pushed onto the stack. In the case of the = opcode, it pops two parameters of any type---a1 and a2--- and compares them for equality and pushes either TRUE if they are equal or FALSE if they are not.

Another example would be the DECODE opcode:

/ e -- encoded data.
/ t -- encoding type identifier.
/ b -- decoded binary data.
DECODE ( e t -- b )

The DECODE opcode pops the encoded data and the encoding type identifier from the stack, executes the correct decode operation to turn the text to binary and then pushes the binary onto the stack. This is only used in text serialization formats such as JSON, YAML, or XML.

Opcodes

Logical Comparison

/ a1 -- data of any type.
/ a2 -- data of any type.
= ( a1 a2 -- TRUE|FALSE )

The = operator pops two arguments of any type and compares them for equality. It pushes TRUE if they are equal, FALSE if not.

/ a1 -- numerical data.
/ a2 -- numerical data.
< ( a1 a2 -- TRUE|FALSE )

The < operator pops two numerical arguments off of the stack and compares them to see if the first argument popped is less than the second argument popped. It pushes TRUE if that is true, FALSE if not.

/ a1 -- numerical data.
/ a2 -- numerical data.
> ( a1 a2 -- TRUE|FALSE )

The > operator pops two numerical arguments off of the stack and compares them to see if the first argument popped is greater than the second argument popped. It pushes TRUE if that is true, FALSE if not.

There are also the <=, >=, and != opcodes that are combinations of the above logical comparisons. They test for less-than-or-equal, greater-than-or- equal, and not-equal respectively.

Binary-to-text Operations

/ e -- text-encoded data.
/ t -- encoding type identifier.
/ b -- decoded binary data.
DECODE ( e t -- b )

The DECODE opcode is used to decode binary data that has been encoded in some binary-to-text encoding system such as Base64, Base58, and/or Hexidecimal. The list of supported encoding types is listed below.

/ b -- binary data.
/ t -- encoding type identifier.
/ e -- binary data encoded as text.
ENCODE ( b t -- e )

The ENCODE opcode is used to encode binary data into a text format using the specified binary-to-text encoding scheme. The result is the text encoded binary data.

Encryption

/ e -- binary encrypted data.
/ k -- binary key data.
/ o.. -- optional binary parameters required by the given algorithm (e.g. nonce).
/ i -- encryption algorithm identifier.
/ b -- decrypted binary data.
DECRYPT ( e k o.. i -- b )

The DECRYPT opcode is used to decrypt the encrypted data using the key and any other required data for decrypting using the algorithm specified by the algorithm identifier. The result is the decrypted binary data.

/ b -- binary data.
/ k -- binary key data.
/ o.. -- optional binary parameters.
/ i -- encryption algorithm identifier.
/ e -- encrypted binary data.

The ENCRYPT opcode is used to take binary and encrypt it using the specified encryption algorithm and key and algorithm-specific parameters. The result is the encrypted data.

Signing

/ s -- binary signature data.
/ k -- binary public key.
/ d -- binary data that was signed.
/ o.. -- optional binary signature parameters.
/ i -- signature algoirthm identifier.
VERIFY ( s k o.. i -- TRUE|FALSE )

The 'VERIFY' opcode executes a digital signature verification function associated with the signature algorithm specified. It pops the signature, public key, data that was signed and any optional paramters and the identifier off and pushes 'TRUE' if the signature is valid and 'FALSE' if it is not.

/ d -- binary data to be signed.
/ k -- binary secret key to sign with.
/ o.. -- binary optional parameters for the signature scheem.
/ i -- signature algorithm identifier.
SIGN ( d k o.. i -- s )

The 'SIGN' opcode creates a detached digital signature over the provided data using the secret key. All parameters are popped and the resulting signature is pushed onto the stack.

Hashing

/ b -- binary data.
/ i -- hashing algorithm identifier.
/ h -- hash of the data.
HASH ( b i -- h )

The HASH opcode takes the binary data and hashes it using the specified hashing algorithm. The result is the hash of the data.

Data I/O

/ i -- data storage object identifier (e.g. file name, transaction number, etc)
/ m -- mode to open the file under.
/ h -- handle to the opened data storage object.
OPEN ( i m -- h )

The OPEN opcode is application-specific in that the data storage object identifier is specific to the application. In some cases it may be the file name of a file to open or it may be the identifier of a transaction in a blockchain application or it could any reference to a data object that makes sense in the application. The mode is an identifier representing read, write, and append. The result is a handle to the opened object that can be used by other data I/O commands.

/ h -- handle to the opened data storage object.
/ s -- zero-indexed starting offset to begin the read.
/ n -- number of bytes to read from the object.
/ b -- binary read from the data object.
READ ( h s n -- b h )

The READ opcode takes the handle to the opened data storage object and reads the number of bytes specified starting at the offset given. The result is the binary data read from the data object and then the handle to the open data object on top.

/ h -- handle to the opened data storage object.
/ b -- binary data to write.
WRITE ( h b -- h )

The WRITE opcode takes the handle and the data to write and writes it to the open data object. The result is the handle to the data object.

/ h -- handle to the opened data storage object.
/ n -- number of bytes and direction to seek in.
SEEK ( h n -- h )

The SEEK opcode seeks the specified number of bytes. If the number of bytes is negative, it seeks backwards towards the 0th index byte. If the number is positive, it seeks forward towards the last byte in the object. The result is the handle to the opened data object.

/ h -- handle to the opened data storage object.
CLOSE ( h -- )

The CLOSE opcode closes the opened data storage object. It pops the handle from the stack and does not push anything to the stack.

Data Manipulation

/ b -- binary data.
CONCAT ( b b -- b )

The CONCAT opcode pops two binary data arguments from the stack and concatenates the top argument to the end of the argument below it on the stack and pushes the resulting binary data back onto the stack.

/ b -- binary data.
/ o -- integer offset.
/ c -- integer count.
SLICE ( b o c -- b )

The SLICE opcode pops the count, offset, and binary data from the stack and creates a new binary data argument starting at the offset and taking the count number of bytes from the original binary data argument. The result is pushed onto the stack. The offset is zero-indexed and both the offset and count are in bytes.

data: abcdef0123456789
         ^ 3 offset ^ 12 count

result: def012345678

/ b -- binary data.
| ( b b -- b )

The | opcode is the bitwise or between the top two binary data arguments. The result is pushed onto the stack.

/ b -- binary data.
& ( b b -- b )

The & opcode is the bitwise and between the top two binary data arguments. The result is pushed onto the stack.

/ b -- binary data.
^ ( b b -- b )

The ^ opcode is the bitwise xor between the top two binary data arguments. The result is pushed onto the stack.

/ b -- binary data.
~ ( b -- b )

The ~ opcode is the bitwise inverse of the top binary data argument. The result is pushed onto the stack.

Stack Control

/ a -- argument of any type.
DUP ( a -- a a )

The DUP opcode pops the top item, duplicates it and pushes the original and its duplicate on the top of the stack.

/ a -- argument of any type.
POP ( a -- )

The POP opcode pops the top item from the stack and forgets about it. This is used for throwing away the top of the stack.

Flow Control

/ b -- boolean argument.
IF ( b -- ) ELSE FI

The IF-ELSE-FI opcode trio and the related IF-FI opcode duo are used to do conditional branching in CCLang scripts. The IF opcode pops the top of the stack and evaluates it as a boolean argument.

In the case of IF-ELSE-FI if the argument is true then the script between IF and ELSE is excuted followed by jumpting to the script after the FI. If it is a false then the script between the ELSE and the FI is executed.

In the case of IF-FI if the argument is true then script between the IF and FI is executed. If it is false then flow jumps to the script after the FI opcode.

Encoding Formats

The first version of CCLang supports the following encoding types:

Hexidecimal -- binary represented as lower case hexidecimal characters.
Base64 -- 62nd and 63rd characters are + and / respectively. No line length maximum; all data in one line with mandatory padding using =.
Base64Url -- 62nd and 63rd characters are - and _ respectively. No line length maximum; all data in one line with no padding.
Base58 -- See definition here.

Encryption Algorithms

The first version of CCLang supports the following encryption algorithms:

XSalsa20Poly1305

Signing Algorithms

The first version of CCLang supports the following signing algorithms:

Ed255519

Hashing Algorithms

The first version of CCLang supports the following hashing algorithms:

SHA256
SHA512

Serialization Formats

CCLang is an abstract language definition and does not prescribe how the data and/or opcodes are serialized in any given format. Each encoding format is left to specify how each that is done. Below is a sample encoding specification for JSON.

JSON-CCLang

The first job of mapping abstract CCLang to JSON is to decide the string constants to use for the supported encoding types, encryption algorithms, hashing algorithms, and opcodes. Below are lists of constants in CCLang and their string constant equivalents in JSON.

Encoding Constants

Hexidecimal - Hex
Base64 - Base64
Base64Url - Base64Url
Base58 - Base58

Encryption Algorithms

XSalsa20Poly1305 - XSalsa20Poly1305

Signing Algorithms

Ed25519 - Ed25519

Hashing Algorithms

SHA256 - SHA256
SHA512 - SHA512

Opcodes

Equal - =
NotEqual - !=
LessThan - <
LessThanEqual - <=
GreterThan - >
GreaterThanEqual - >=
DECODE - DECODE
ENCODE - ENCODE
DECRYPT - DECRYPT
ENCRYPT - ENCRYPT
SIGN - SIGN
VERIFY - VERIFY
HASH - HASH
OPEN - OPEN
READ - READ
WRITE - WRITE
SEEK - SEEK
CLOSE - CLOSE
CONCAT - CONCAT
SLICE - SLICE
DUP - DUP
POP - POP
IF - IF
ELSE - ELSE
FI - FI

Encoding

CCLang scripts are stored as items in a JSON list. So a simple hex encoded public string would be encoded as:

{
  "key": [ "0a7d1d784358af1f8073ba07eb5ae2fc7272a860ec4547de8bc13d04259cd59a", "Hex", "DECODE" ]
}

Drawbacks

The only drawback is that the resulting cryptographic constructs are not compact. They can be strings that are very long and it may be difficult for a person not experienced in RPN to follow what is being happening. The compactness of other cryptographic construct representations is a result of fixing details of the constructs in specifications that are not easy to update and are not transported with the data itself. This makes all other formats non-self-describing. It makes it hard for new systems to know exactly what they must do to make use of any specific cryptographic construct.

Rationale and alternatives

The rationale for CCLang is to standardize a self-describing format for serialized cryptographic constructs. As we develop new applications that use ever more complicated constructs, it is become more and more important to adopt a self-describing serialized form. Gone are the days where all we needed to store was the public key and then write the spec to say, "always use algorithm X to decrypt the data". With the increasing use of multi-sig signatures of all types and zero-knowledge proofs (ZKPs), our systems can be greatly enhanced by using a self- describing format. It also makes it easy for implementors to map CCLang to whatever is the underlying cryptographic library they are using.

Prior art

Systems like Git and Secure Scuttlebutt as well as DID powered systems are struggling to adapt to the new multi-sig and ZKP world in which they operate. So far no good proposals have been made to adapt Git to multi-sig and Secure Scuttlebutt has just rejected multi-sig altogether for now. There have been some attempts at adapting Linked Data Signatures used in DIDs to support multi-sigs. What was proposed is a little similary to CCLang but is clunky and doesn't leverage the elegant solution of a stack machine and language.

As stated in the introduction, CCLang is inspired by Bitcoin script and takes ideas from the Forth programming language. Bitcoin script would have been a good alternative but it has purpose-built opcodes that only make sense in the context of Bitcoin transactions. CCLang aims to be a more general design for achieving the same thing as Bitcoin script. In fact, CCLang could be used to implement all of the features in Bitcoin script but in a little more verbose way. Bitcoin script opcodes like OP_CHECKLOCKTIMEVERIFY could be implemented using a sequence of CCLang opcodes that read the nLockTime from the Bitcoin transaction and uses IF-ELSE-FI to do the same things that OP_CHECKLOCKTIMEVERIFY does.

There is some other prior art documented in Chrisopher Allen's post on smarter signatures [0]. In that blog post he discusses functional programming inspired solutions like Forth-based scripts such a Bitcoin script. He also references Peter Todd's Dex script that uses Lisp-like s-expression syntax [1].

In the end, the benefits of a self-describing system like Bitcoin script makes this better than the "specification assisted" systems used by Git and Secure Scuttlebutt and DID docs. Bitcoin's script is too application-specific to be a good general solution so CCLang has been created to fill that gap.

Unresolved questions

Should we focus on making sure CCLang is non-Turing-complete? These are essentially small "smart contracts" and therefore they are safety critical. Non-Turing-completeness would allow for the creation of static analysis tools and deterministic evaluation looking for undesirable corner case possibilities.
Should CCLang include a macro-definition system where common sequences of CCLang opcodes can be aliased with a macro name that is used in the serialized versions? This would require a macro setup script that would be implied to run before any CCLang scripts are executed to ensure that all macro definitions are processed and initialized before the scripts that use those macros are executed. A good example of where this would be useful is decoding Base58Check Bitcoin addresses. The encoding and decoding of a Base58Check address requires concatenation, byte masking, and multiple SHA256 hashes for the checksum piece. Having a macro called Base58CheckDecode would make CCLang scripts more readable.
Some crypto libraries like NaCl hide a lot of the "sub-operations" used in their more complicated constructs like the sealed box. This is done on purpose to make the library misuse resistant and hides a lot of inner details. CCLang would be a challenge to map to NaCl without defining new opcodes that mapped directly to the secret box open/close calls in NaCl.

References

Dependencies

~24MB
~180K SLoC