3 unstable releases
0.2.1 | May 15, 2022 |
---|---|
0.2.0 | May 14, 2022 |
0.1.2 |
|
0.0.0 | May 12, 2022 |
#297 in Machine learning
30KB
332 lines
const_cge
: Neural Network Compiler
What do?
const_cge
performs a symbolic evaluation of your neural network at compile time, producing efficient rust code with identical behavior.
With the information about data dependencies inside the network available, LLVM is able to perform more advanced optimizations, like instruction ellision, pipeline-aware reordering, SIMD vectorization, register + stack size minimization, and more.
The generated rust code:
- will never allocate, panic, or rely on
std
(unless usingstd
feature!) - has perfect determinism
- has input and output dimensions which are statically declared
- has internal data dependencies that are statically analyzable
- utilizes an exactly minimal recurrent state array, or none at all (only pay for what you use)
- statically captures properties of your neural network in the type system
- incurs zero overhead cost at runtime
Check out eant2
to see how to train a neural network compatible with const_cge
.
const_cge = "0.2"
Floating Point in #![no_std]
-land
You can pick a floating point implementation through features: libm
(default), std
, or micromath
, like:
const_cge = "0.2" # use libm
const_cge = { version = "0.2", default-features = false, features = ["std"] } # `no_std` incompatible
const_cge = { version = "0.2", default-features = false, features = ["micromath"] } # use micromath
Simple Example
Network
The network
macro generates all of the fields and functions required to evaluate our network.
/// Use sensor data to control the limbs of a robot (using f32 only).
#[network("nets/walk.cge", numeric_type = f32)]
struct Walk;
let mut walk = Walk::default();
// I/O is statically sized to match our network
walk.evaluate(&input, &mut output);
Compile Time Guarantees
Nonrecurrent
It is sometimes a problem if a network can squirel away information about its past states (recurrency).
You can use nonrecurrent
, which will halt compilation if the imported network contains any recurrency:
/// Predict which lighting color would best
/// complement the current sunlight color
#[nonrecurrent("nets/color.cge")]
struct Color;
// evaluate is now a static function.
// it has no state, and this is captured in the type system.
Color::evaluate(&input, &mut output);
Recurrent
Some tasks are best solved using recurrent architectures, and the inclusion of a non-recurrent network would be a mistake.
You can use recurrent
, which will halt compilation if the imported network contains no recurrency:
/// Detect if our device has just been dropped
/// and is currently falling through the air
#[recurrent("nets/drop.cge")]
struct Dropped;
let mut d = Dropped::default();
d.evaluate(&input, &mut output);
Recurrent State?
Recurrent state stores the previous value of a neuron for use in the next evaluation (sent backwards in the network).
The state inside a recurrent network is represented as either [f64; N]
(or [f32; N]
), and is updated on every evaluation. As mentioned before, it is made only as large as it needs to be.
If you like, you can read this state, modify it, restore it later, etc.
/// Attempt to clarify audio stream
#[recurrent("nets/denoise.cge")]
struct Denoise;
// I want a specific recurrent state,
// not the `::default()` initially-zero recurrent state.
let mut d = Denoise::with_recurrent_state(&saved_state);
// Some evaluations later, read internal state
let state = d.recurrent_state();
// Or modify internal state
do_something_to_state(d.recurrent_state_mut());
// Or set custom state after construction
d.set_recurrent_state(&saved_state);
numeric_type
- You often don't need the precision of
f64
, andf64
is in general larger and slower thanf32
. Usingf64
will behave identically to your CGE file, and so it is the default behavior. - You can perform (lossy) parameter 'downcasting' on your network, causing all parameters and operations to use your requested type:
#[network("net.cge", numeric_type = f32)]
struct SmallerFaster;
- Only
f64
andf32
are supported for now. Maybe I will add support forf16
/ integer / fixed-precision in the future.
Netcrates!
What is a netcrate?
-
const_cge
netcrates are pre-trained neural networks as crates! -
const_cge
functions as a common format, allowing the community to share neural networks for common tasks.
Let's see how you'd use one!
use netcrate_ocr::ocr;
#[network(ocr)]
struct HandwritingOCR;
Publishing a netcrate
In your Cargo.toml
file,
- make sure to disable
default-features
forconst_cge
, - and add an
std
feature:
[dependencies]
const_cge = { version = "0.2", default-features = false } # <== important!
[features]
std = [] # <== important!
In your stc/lib.rs
file,
- make sure to conditionally enable
no_std
#![cfg_attr(not(feature = "std"), no_std)] // <== important!
const_cge::netcrate!(ocr_english = "nets/ocr/en.cge");
const_cge::netcrate!(ocr_japanese = "nets/ocr/jp.cge");
Done!
Extensions
If you'd like to provide a nicer interface that wraps your network, please write a macro which provides the implementation, like so:
#[macro_export]
macro_rules! ocr_ext {
($name: ident, $numeric_type: ty) => {
impl $name {
/// Returns the unicode char
pub fn predict_char(&mut self, image: &ImageBuffer) -> char {
// access everything a `const_cge` struct normally has:
let output_dim = $name::OUTPUT_SIZE;
self.recurrent_state_mut()[0] *= -1.0;
// even access the particluar activation function implementation the end
// user has chosen:
const_cge::activations::$numeric_type::relu(x);
}
}
// or produce a new struct, whatever you think is best.
struct SmolOCR {
network: $name,
extra_memory_bank: [$numeric_type; 6 * $name::OUTPUT_SIZE]
}
impl SmolOCR {
//...
}
}
And an end user can simply:
use netcrate_ocr::*;
#[network(ocr_japanese, numeric_type = f32)]
struct JapaneseOCR;
ocr_ext!(JapaneseOCR, f32);
This approach is a necessary evil because we must allow users to choose their own numerical backend for `no_std` environments, and the options may evolve over time. Writing an extension macro is the least-terrible approach I could think of to fit this particular use-case.
Design Goals & Drawbacks
- You can accomplish quite a lot with "small" networks, especially for control tasks.
const_cge
is not intended for use in "deep learning" tasks (language modeling, etc). - Tradeoffs that enable embedded use cases (robotics, 5¢ microcontrollers)
- Lots of individual
const_cge
networks in the same binary may end up being larger or slower than a runtime evaluation approach. This will depend on the target machine and the networks you're evaluating. If you really care, measure. This library should cover the common use case perfectly.
MIT License
Copyright © 2022 Will Brickner
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
Dependencies
~2MB
~50K SLoC