#regex #string-matching #resources #interpretation

nightly no-std repr

The regular-expression-as-linear-logic interpretation and its implementation

20 releases (6 breaking)

0.7.0 Sep 19, 2024
0.5.3 Sep 14, 2024
0.0.10 Aug 27, 2023
0.0.9 Mar 18, 2023

#379 in Text processing

Download history 1/week @ 2024-07-23 827/week @ 2024-09-10 300/week @ 2024-09-17 90/week @ 2024-09-24 77/week @ 2024-10-01 6/week @ 2024-10-08

904 downloads per month

MIT/Apache

285KB
4.5K SLoC

About

By regarding matching as an assignment of occurrences of strings to each part of an expression, regular expressions are resource (limited occurrences of strings) consumption (exclusive assignment/matching of them).

Kind

  • Decidable (or only complete?)
  • Orderd/non-commutative
  • Linear/non-deterministic (βŠ‡ regular/deterministic)
  • Unitless/promonoidal

TODO

  • Is there such as selective/projective mul to concatenate only one side of And?
  • Representation of the four units, or do we need them? (Decidability is affecting)
    • 1 as Seq::empty(), 𝟏 ≑ Ξ½X.X
    • 𝟎 ≑ ΞΌX.X
    • ⊀ as Inf(Interval::full())
    • βŠ₯ as a & b when a β‰  b (?A)
  • One as Seq(['a']) vs Interval('a', 'a')
  • Relatipnship between nominality, input-ness/string, and function arguments/abstraction, the Integral trait and the De Bruijn index
  • TextStart and TextEnd as left / right units
  • Interpretation of match, is it a judgement a : A or a test that one is divisible by another (quotient is equal to zero) a / b, one (string) is contained by another (regex) a β†’ b? Proof search
  • Two types of negations for each positive connectives: rev and div for Seq, not and sub for Interval
  • a.le(b) β†’ a.and(b) = a, a.add(b) = b
  • Equational reasoning, bisimulation
  • Induction
  • True(Seq<I>, Box<Repr<I>>), assertions, dependent types, backreference
  • action a.P, a βŠ™ P, guard, scalar product
  • Normalisation vs equality rules
  • characteristic function/morphism
  • weighted limit, weighted colimit, end, coend
  • parameterise some laws as features
  • Spherical conic (tennis ball)

Xor

  • lookahead/behind, multiple futures, communication and discard, ignore combinator
  • Split, subspace

Correspondence

formal language theory /
automata theory /
set theory
linear logic repr type theory /
category theory /
coalgebra
len process calculus probability theory /
learning theory
quantum theory
a ∈ L (match) a : A (judgement)
$βˆ…$ $0$ (additive falsity) Zero 𝟎 (empty type) nil, STOP
$⊀$ (additive truth) True
a a One(Seq(a)) len(a) (sequential composition, prefix)
$Ρ$ (empty) ∈ {Ρ} $1$ (multiplicative truth) Seq([]) * : 𝟏 (unit type) 0 SKIP
. Interval(MIN, MAX) 1
ab / $a Β· b$ (concatenation) $a βŠ— b$ (multiplicative conjunction/times) Mul(a, b) $a βŠ— b$ (tensor product) len(a) + len(b) P ||| Q (interleaving)
a|b (alternation),
$a βˆͺ b$ (union)
$a βŠ• b$ (additive disjuction/plus) Or(a, b) $a + b$ (coproduct) max(len(a), len(b)) (deterministic choice)
a* (kleen star),
..|aa|a|Ξ΅
$ΞΌX.𝟏 + (L(a) βŠ— X)$
TODO $!a$ (exponential conjunction/of course) Inf(a) $Ξ½X.𝟏 & a & (X βŠ— X)$ (replication)
a*? (non greedy)
TODO $?a$ (exponential disjunction/why not) Sup(a) $Β΅X.βŠ₯ βŠ• a βŠ• (X β…‹ X)$
a? a + 1 Or(Zero, a)
a{n,m} (repetition) a Or(Mul(a, Mul(a, ..)), Or(..))
[a-z] (class) Interval(a, z)
[^a-z] (negation) TODO this is complementary op Neg(a)/-a
a† (reverse) right law vs left law a.rev() len(a)
$a / b$ (right quotient) $a ⊸ b$ Div(a, b) len(a) - len(b) (hiding)
a \ b (left quotient) Div(a, b) (hiding)
a β…‹ b (multiplicative disjunction/par) Add(a, b) $a βŠ• b$ (direct sum) (nondeterministic choice)
$a ∩ b$ (intersection) a & b (additive conjunction/with) And(a, b) $a Γ— b$ (product) (interface parallel)
a(?=b) (positive lookahead)
a(?!b) (negative lookahead)
(?<=a)b (positive lookbehind)
(?<!a)b (negative lookbehind)
$a βŠ† b, a ≀ b$ (containmemt) $a ≀ b (≃ a = b β…‹ a < b)$ a.le(b)
aβŠ₯ (dual) a.dual()
a = b (equality) a = b (identity type)

About symbols

Symbols are grouped and assigned primarily by additive/multiplicative distinciton. They are corresponding to whether computation exits or rather continues; though concatenation βŠ—/Mul/* has conjunctive meaning, computation doesn't complete/exit at not satisfying one criterion for there are still different ways of partition to try (backtracking). Though RegexSet β…‹/Add/+ has disjunctive meaning, computation doesn't complete/exit at satisfying one criterion to return which regex has match. On the other hand, alternation βŠ•/Or/| and intersection &/And/& early-break, hence order-dependent. When I add Map variant to Repr to execute arbitrary functions, this order-dependency suddenly becomes to matter for those arbitrary functions can have effects, for example, modification/replacement of input string. (Effects are additive.)

additive multiplicative exponential
positive/extensional $βŠ•$ $0$ Or $βŠ—$ $1$ Mul !
negative/intensional & ⊀ And β…‹ βŠ₯ Add ?

Properties/Coherence

  • All connectives are associative
  • Additive connectives are commutative and idempotent

TODO:

  • Seq::empty(), Ξ΅ - can be empty because negative
  • Interval::full() - can't be empty because positive
name regular expressions linear logic type theory repr
or-associativity $a | (b | c) = (a | b) | c$ $a βŠ• (b βŠ• c) = (a βŠ• b) βŠ• c$ Or(a, Or(b, c)) = Or(Or(a, b), c)
or-idempotence $a | a = a$ $a βŠ• a = a$ Or(a, a) = a
or-unit $a βŠ• 0 = 0 βŠ• a = a$ Or(a, Zero) = Or(Zero, a) = a
mul-unit $a Β· Ξ΅ = Ξ΅ Β· a = a$ $a βŠ— 1 = 1 βŠ— a = a$ Mul(a, One('')) = Mul(One(''), a) = a
right-distributivity $a Β· (b | c) = (a Β· b) | (a Β· c)$ $a βŠ— (b βŠ• c) = (a βŠ— b) βŠ• (a βŠ— c)$ Mul(a, Or(b, c)) = Or(Mul(a, b), Mul(a, c))
left-distributivity $(a | b) Β· c = (a Β· c) | (b Β· c)$ $(a βŠ• b) βŠ— c = (a βŠ— c) βŠ• (b βŠ— c)$ Mul(Or(a, b), c) = Or(Mul(a, c), Mul(b, c))
$Ξ΅^† = Ξ΅$
(a & b)† = (b†) & (a†) Rev(Mul(a, b)) = Mul(Rev(b), Rev(a))
Mul(One(a), One(b)) = One(ab)
right-distributivity a β…‹ (b & c) = (a β…‹ b) & (a β…‹ c) Add(a, And(b, c)) = And(Add(a, b), Add(a, c))
left-distributivity $(a & b) β…‹ c = (a β…‹ c) & (b β…‹ c)$ Add(And(a, b), c) = And(Add(a, c), Add(b, c))
and-idempotence $a & a = a$ And(a, a) = a
exponential $(a & b)* = a* Β· b*$ $!(A & B) ≣ !A βŠ— !B$
exponential $(a | b)? = a? β…‹ b?$ $?(A βŠ• B) ≣ ?A β…‹ ?B$

Relationship among additive, multiplicative and exponential

  • exp(a + b) = exp(a) * exp(b)

Linearity (which)

  • f(a + b) = f(a) + f(b)
  • (a β†’ b) + (b β†’ a) (non-constructive)
  • functions take only one argument
  • presheaves of modules

πœ•, derivation

math repr
$πœ•(a βŠ— b) ≃ πœ•(a) βŠ— b + a βŠ— πœ•(b)$ der(Mul(a, b)) = Or(Mul(der(a), b), Mul(a, d(b)))
  • d(Zero) = Zero
  • d(Or(a, b)) = Or(d(a), d(b))
  • d(Mul(a, b)) = Or(Mul(d(a), b), Mul(a, d(b)) *
  • d(Inf(a)) = Mul(d(a), Inf(a))
  • a : D(a)

True

  • And(True, a) ->

Stream processor

  • Ξ½X.ΞΌY. (A β†’ Y) + B Γ— X + 1

Flags (TODO)

  • i, CaseInsensitive

  • m, MultiLine

  • s, DotMatchesNewLine

  • U, SwapGreed

  • u, Unicode

  • x, IgnoreWhitespace

  • Recouse non-greedy pattern to _

Algorithms (TODO)

  • Bit-pararell
  • aho-corasick
  • Boyer–Moore
  • memchr

Semantics (TODO)

  • Coherent space

References

  • Completeness for Categories of Generalized Automata, Guido Boccali, Andrea Laretto, Fosco Loregian, Stefano Luneia, 2023 [arXiv]
  • Beyond Initial Algebras and Final Coalgebras, Ezra Schoen, Jade Master, Clemens Kupke, 2023 [arXiv]
  • The Functional Machine Calculus, Willem Heijltjes, 2023 [arXiv]
  • The Functional Machine Calculus II: Semantics, Chris Barrett, Willem Heijltjes, Guy McCusker, 2023 [arXiv]
  • Coend Calculus, Fosco Loregian, 2020 [arXiv]
  • Partial Derivatives for Context-Free Languages: From ΞΌ-Regular Expressions to Pushdown Automata, Peter Thiemann, 2017 [arXiv]
  • Proof Equivalence in MLL Is PSPACE-Complete, Willem Heijltjes, Robin Houston, 2016 [arXiv]
  • Linear Logic Without Units, Robin Houston, 2013 [arXiv]
  • Imperative Programs as Proofs via Game Semantics, Martin Churchill, Jim Laird, Guy McCusker, 2013 [arXiv]
  • Regular Expression Containment as a Proof Search Problem, Vladimir Komendantsky, 2011 [inria]

Dependencies

~2.4–4MB
~73K SLoC