#language #different #build #engine #how #identify #research

app scie

Scie is a research about how to build simple code identify engine for different languages

1 unstable release

0.1.0 Oct 15, 2020

#1589 in Text processing

MIT license

1.5MB
23K SLoC

TypeScript 22K SLoC // 0.1% comments JavaScript 438 SLoC // 0.2% comments Groovy 106 SLoC // 0.3% comments C++ 75 SLoC // 0.1% comments Python 69 SLoC // 0.2% comments Perl 55 SLoC // 0.1% comments PHP 43 SLoC // 0.1% comments Clojure 38 SLoC // 0.2% comments PowerShell 33 SLoC // 0.1% comments Objective-C 33 SLoC // 0.2% comments Objective-C++ 33 SLoC // 0.2% comments CoffeeScript 32 SLoC Rust 32 SLoC C 27 SLoC // 0.1% comments Ruby 26 SLoC // 0.3% comments JSX 24 SLoC // 0.2% comments Java 23 SLoC // 0.3% comments Go 21 SLoC Shell 19 SLoC // 0.2% comments Visual Basic 16 SLoC // 0.3% comments C# 16 SLoC Swift 13 SLoC F# 11 SLoC // 0.2% comments R 11 SLoC // 0.5% comments Batch 11 SLoC // 0.3% comments Lua 10 SLoC SQL 6 SLoC TSX 5 SLoC

Scie

Scie is a research about how to build simple code identify engine for different languages.

goal: build a better code figure engine for code refactoring.

  • scie-bingen. generate languages bindata.
  • scie-detector. detector for different frameworks & languages.
  • scie-grammar. A library that helps tokenize text using Text Mate grammars.
  • scie-infra. common infrasturcture support, like fs
  • scie-onig. Rust FFI for Oniguruma.
  • scie-model. common model of VSCode models & Miao Model.
  • scie-scanner. Wrapper Rust Oniguruma FFI api.
  • scie-cli. cli part ofr Scie.

Guideline

major issues:

  • performance
    • rule in Grammar
    • UTF 8 to UTF 16 in UtfString
    • normal issue
  • unstable
    • Random test failure on OnigScanner.
    • GC issues on OnigScanner.
      • GC issue seems resolved with Jemalloc.
      • Signal 6 (SIGABRT) = SIGABRT is commonly used by libc and other libraries to abort the program in case of critical errors. For example, glibc sends an SIGABRT in case of a detected double-free or other heap corruptions.
      • maybe UTF8 encoding issue

Todo:

  • replace with fancy-regex for pure Rust impl
  • process todo
    • back references
    • multiple languages
  • rewrite VSCode-textmate with Rust
    • language for testing
    • support others language
  • benchmark
    • fast than VSCode version
  • multiple languages one project support
  • analyser
    • line counts
    • keywords map

DevSetup

  1. git clone
git clone https://github.com/phodal/scie/
  1. run
cargo run scie

install just

curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to DEST

run tests

just tests

Documents

refs

License

scie-grammar based on vscode-textmate with MIT LICENSE see in scie-grammar/src/scanner/LICENSE

onigvs based on rust-onig

Phodal's Idea

@ 2020 A Phodal Huang's Idea. This code is distributed under the MPL license. See LICENSE in this directory.

No runtime deps