#git #features #gio #optimize

git-features

A crate to integrate various capabilities using compile-time feature flags

6 releases (breaking)

new 0.6.0 Sep 14, 2020
0.5.0 Sep 12, 2020
0.4.0 Aug 18, 2020
0.3.0 Aug 12, 2020
0.1.0 Jul 12, 2020

#161 in Development tools

Download history 26/week @ 2020-07-11 27/week @ 2020-07-18 12/week @ 2020-07-25 2/week @ 2020-08-01 178/week @ 2020-08-08 69/week @ 2020-08-15 19/week @ 2020-08-22 20/week @ 2020-08-29 25/week @ 2020-09-05 232/week @ 2020-09-12

158 downloads per month
Used in 6 crates (5 directly)

MIT/Apache

19KB
534 lines

Rust Crates.io

gio is a command-line interface (CLI) to access git repositories. It's written to optimize the user-experience, and perform as good or better than the native implementation, and make git tooling more hackable.

The CLI uses various crates, please see 'Development Status' for details.

asciicast

Development Status

  • gitoxide (CLI)
    • please note that all functionality comes from the gitoxide-core library, which mirrors these capabilities and itself relies on all git-* crates.
    • limit amount of threads used in operations that support it.
    • repository
      • init
    • plumbing
  • git-object
    • represent borrowed commits, trees and tags
    • decode (zero-copy)
      • commit
      • tree
      • tag
    • encode
      • commit
      • tree
      • tag
    • API documentation with examples
  • git-odb
    • loose objects
      • traverse
      • read
        • into memory
        • streaming
      • streaming write
    • packs
      • traverse pack index
      • decode
        • full objects
        • deltified objects
        • Multi-Pack index file (MIDX)
        • 'bitmap' file
      • encode
        • create new packs
      • verify pack with statistics
      • pack streaming (i.e. indexing + resolution)
        • use pack streaming for verification for performance and correctness
    • API documentation with examples
    • promisor
      • It's vague, but these seems to be like index files allowing to fetch objects from a server on demand.
  • git-repository
    • initialize
      • Proper configuration depending on platform (e.g. ignorecase, filemode, …)
    • read and write all data types
    • rev-parsing and ref history
    • remotes with push and pull
    • configuration
    • merging
    • API documentation with examples
    • Commit Graph - split and unsplit
  • git-config
    • read and write git configuration files
    • API documentation with examples
  • git-ref
    • Handle symbolic references and packed references
    • discover them in typical folder structures
    • name validation
    • API documentation with examples
  • git-index
    • read and write a git-index file
    • add and remove entries
    • API documentation with examples
  • git-diff
    • diffing of git-object::Tree structures
    • diffing, merging, working with hunks of data
    • find differences between various states, i.e. index, working tree, commit-tree
    • API documentation with examples
  • git-transport
    • via ssh
      • push
      • pull
    • via https
      • push
      • pull
    • API documentation with examples
  • git-features
    • parallel feature toggle
      • When on…
        • in_parallel
        • join
      • When off all functions execute serially
  • Stress Testing
    • Verify huge packs
    • Explode huge packs to disk and validate loose objects
    • Generate huge back from a lot of loose objects
  • Ideas for Demos
    • A simple git-hours clone
    • Open up SQL for git using sqlite virtual tables. Check out gitqlite as well. What would an MVP look like? Maybe even something that could ship with gitoxide.

Installation

Binary Release

curl -LSfs https://raw.githubusercontent.com/byron/git-oxide/master/ci/install.sh | \
    sh -s -- --git byron/git-oxide --crate gio-max-termion

See the releases section for manual installation and various alternative builds that are slimmer or smaller, depending on your needs, for Linux, MacOS and Windows.

Cargo

cargo is the Rust package manager which can easily be obtained through rustup. With it, you can build your own binary effortlessly and for your particular CPU for additional performance gains.

# The default installation, 'max'
cargo install gitoxide

# On linux, it's a little faster to compile the termion version, which also results in slightly smaller binaries
cargo install gitoxide --no-default-features --features max-termion

# For smaller binaries and even faster build times that are traded for a less fancy CLI implementation, use `lean`
# or `lean-termion` respectively.
cargo install gitoxide --no-default-features --features lean

Usage

Once installed, there are two binaries:

  • gio
    • high level commands, porcelain, for every-day use, optimized for a pleasant user experience
  • giop
    • low level commands, plumbing, for use in more specialized cases

Project Goals

  • a pure-rust implementation of git
    • including transport, object database, references and cli
    • a simple command-line interface is provided for the most common git operations, optimized for user experience. A simple-git if you so will.
    • be the go-to implementation for anyone who wants to solve problems around git, and become the alternative to GitPython in the process.
    • become the foundation for a free distributed alternative to github.
  • learn from the best to write the best possible idiomatic Rust
    • libgit2 is a fantastic resource to see what abstractions work, we will use them
    • use Rust's type system to make misuse impossible
  • be the best performing implementation
    • use Rust's type system to optimize for work not done without being hard to use
    • make use of parallelism from the get go
  • assure on-disk consistency
    • assure reads never interfere with concurrent writes
    • assure multiple concurrent writes don't cause trouble
  • take shortcuts, but not in quality
    • binaries may use anyhow::Error exhaustively, knowing these errors are solely user-facing.
    • libraries use light-weight custom errors implemented using quick-error.
    • internationalization is nothing we are concerned with right now.
    • IO errors due to insufficient amount of open file handles don't always lead to operation failure
  • async as opt-in
    • Making certain capabilities available through async APIs allows for abortable operations, which may be interesting for interactive user interfaces. Thus it is something worth considering, but only behind a feature flag and once the need transpires.
    • Ideally many operations powered by implementors of std::io::{Read, Write} and std::iter::Iterator, which makes unblocking them trivial using the fantastic blocking crate. Only when these are used internally, providing a separate async version of these operations can be beneficial to make them abortable.

Non-Goals

  • replicate git command functionality perfectly
    • git is git, and there is no reason to not use it. Our path is the one of simplicity to make getting started with git easy.
  • be incompatible to git
    • the on-disk format must remain compatible, and we will never contend with it.
  • use async IO everywhere
    • for the most part, git operations are heavily relying on memory mapped IO as well as CPU to decompress data, which doesn't lend itself well to async IO out of the box.

Roadmap to Future

Roadmap to 1.0

Provide a CLI to for the most basic user journey:

  • initialize a repository
  • create a commit
  • add a remote
  • push
    • create (thin) pack

Roadmap to 1.1

  • clone a repository via
    • http(s)
    • ssh

Cargo features guide

Cargo uses feature toggles to control which dependencies are pulled in, allowing users to specialize crates to fit their usage. Ideally, these should be additive. This guide documents which features are available for each of the crates provided here and how they function.

gitoxide

The top-level command-line interface.

  • fast
    • Makes the crate execute as fast as possible by supporting parallel computation of otherwise long-running functions as well as fast, hardware accelerated hashing.
    • If disabled, the binary will be visibly smaller.
  • (mutually exclusive)
    • pretty-cli
      • Use clap + structopt to build the prettiest, best documented and most user-friendly CLI at the expense of file size.
      • provides a terminal user interface for detailed and exhaustive progress.
      • provides a line renderer for log-like progress
    • lean-cli
      • Use argh to produce a usable binary with decent documentation that is smallest in size, usually 300kb less than pretty-cli.
      • If pretty-cli is enabled as well, lean-cli will take precedence, and you pay for building unnecessary dependencies.
      • provides a line renderer for log-like progress
  • prodash-line-renderer-crossterm or prodash-line-renderer-termion (mutually exclusive)
    • The --verbose flag will be powered by an interactive progress mechanism that doubles as log as well as interactive progress that appears after a short duration.

There are convenience features, which combine common choices of the above into one name

  • max = pretty-cli + fast + prodash-tui-renderer-crossterm
    • default, for unix and windows
  • max-termion = pretty-cli + fast + prodash-tui-renderer-termion
    • for unix only, faster compile times, a little smaller
  • lean = lean-cli + fast + prodash-line-renderer-crossterm
    • for unix and windows, significantly smaller than max, but without --progress terminal user interface.
  • lean-termion = lean-cli + fast + prodash-line-renderer-termion
    • for unix only, faster compile times, a little smaller
  • light = lean-cli + fast
    • crossplatform by nature as this comes with simplified log based progress
  • small = lean-cli
    • As small as it can possibly be, no threading, no fast sha1, log based progress only

git-features

A crate to help controlling which capabilities are available from the top-level crate that uses gitoxide-core or any other gitoxide crate that uses git-features. All feature toggles are additive.

  • parallel
    • Use scoped threads and channels to parallelize common workloads on multiple objects. If enabled, it is used everywhere where it makes sense.
    • As caches are likely to be used and instantiated per thread, more memory will be used on top of the costs for threads.
  • fast-sha1
    • a multi-crate implementation that can use hardware acceleration, thus bearing the potential for up to 2Gb/s throughput on CPUs that support it, like AMD Ryzen or Intel Core i3.
  • progress-log
    • Implement the Progress trait using the log crate. Throttle progress output to one every 0.5 seconds unless messages are sent manually.
  • progress-prodash
    • Implement the Progress trait for the tree data structures provided by prodash, which enables using a terminal user interface for progress.

Serialization Support

What follows is feature toggles to control serialization of all public facing simple data types.

  • serde1
    • Data structures implement serde::Serialize and serde::Deserialize

The feature above is provided by the crates:

  • git-object
  • git-odb
  • gitoxide-core

Development Guide

Practices

  • test-first development
    • protect against regression and make implementing features easy
    • user docker to test more elaborate user interactions
    • keep it practical, knowing the Rust compiler already has your back for the mundane things, like unhappy code paths.
    • use git itself as reference implementation, and use their test-cases and fixtures where appropriate
    • use libgit2 test fixtures and cases where appropriate
  • safety first
    • handle all errors, never unwrap.
    • provide an error chain and make it easy to understand what went wrong.
  • strive for an MVP and version 1.0 fast...
    • ...even if that includes only the most common usecases.
  • Prefer to increment major version rapidly...
    • ...instead of keeping major version zero for longer than needed.

Guidelines

  • prepare for SHA256 support by using owned::Id and borrowed::Id
    • eventually there will be the need to support both Sha1 and Sha256. We anticipate it by using the Id type instead of slices or arrays of 20 bytes. This way, eventually we can support multiple hash digest sizes.
    • Right now it's unclear how Sha256 is going to work in git, so we only support Sha1 for now. It might be an avenue to proactively implement it ahead of time once there is a specification to follow.
    • It looks like Git prepares to support it by using compile time, we can support it at runtime though with minimal cost. If needed, we can later remove support using a cargo feature toggle.

Sha256

A bunch of notes collected to keep track of what's needed to eventually support it

  • read hash-function-transition.txt
  • support gpgsig-sha256 field - we won't break, but also don't do anything with it (e.g. extra_headers)
  • support index V3
  • Pack file PSRC field

Plumbing vs Porcelain

Both terms are coming from the git implementation itself, even though it won't necessarily point out which commands are plumbing and which are porcelain. The term plumbing refers to lower-level, more rarely used commands that complement porcelain by being invoked by it or for certain use cases. The term porcelain refers to those with a decent user experience, they are primarily intended for use by humans.

In any case, both types of programs must self-document their capabilities using through the --help flag.

From there, we can derive a few rules to try adhere to:

Plumbing

  • does not show any progress or logging output by default
  • if supported and logging is enabled, it will show timestamps in UTC

Porcelain

  • Provides output to stderr by default to provide progress information. There is no need to allow disabling it, but it shouldn't show up unless the operation takes some time.
  • If timestamps are shown, they are in localtime.
  • Non-progress information goes to stdout.

Maintenance Guide

Utilities to aid in keeping the project fresh and in sync can be found in the Maintenance section of the makefile. Run make to get an overview.

Creating a release

Run etc/release.sh to release all crates in leaf-first order using cargo release.

Which git-version to chase?

Generally, we take the git version installed on ubuntu-latest as the one we stay compatible with (while maintaining backwards compatibility). Certain tests only run on CI, designed to validate certain assumptions still hold against possibly changed git program versions.

This also means that CI may fail despite everything being alright locally, and the fix depends on the problem at hand.

How to update fixtures

Fixtures are created by using a line like this which produces a line we ignore via tail +1 followed by the un-prettified object payload trailed by a newline.

echo c56a8e7aa92c86c41a923bc760d2dc39e8a31cf7  | git cat-file --batch | tail +2 > fixture

Thus one has to post-process the file by reducing its size by one using truncate -s -1 fixture, removing the newline byte.

Shortcomings

  • lean and light and small builds don't support non-UTF-8 paths
  • Packfiles use memory maps
    • Even though they are comfortable to use and fast, they squelch IO errors.
    • potential remedy: We could generalize the Pack to make it possible to work on in-memory buffers directly. That way, one would initialize a Pack by reading the whole file into memory, thus not squelching IO errors at the expense of latency as well as memory efficiency.
  • Packfiles cannot load files bigger than 2^31 or 2^32 on 32 bit systems
    • As these systems cannot address more memory than that.
    • potential remedy: implement a sliding window to map and unmap portions of the file as needed.
  • CRC32 implementation doesn't use SIMD
    • Probably at no cost one could upgrade to the crc32fast crate, but it looks unmaintained and has more code.

Credits

  • itertools (MIT Licensed)
    • We use the izip! macro in code
  • deflate2 (MIT Licensed)
    • We use various abstractions to implement decompression and compression directly on top of the rather low-level miniz_oxide crate

Fun facts

  • Originally I was really fascinated by this problem and believe that with gitoxide it will be possible to provide the fastest solution for it.
  • I have been absolutely blown away by git from the first time I experienced git more than 13 years ago, and tried to implement it in various shapes and forms multiple times. Now with Rust I finally feel to have found the right tool for the job!

Dependencies

~255–570KB
~11K SLoC