#regex #expression #interpretation #non-deterministic #pattern #resources #math

nightly no-std repr

The regular-expression-as-linear-logic interpretation and its implementation

11 releases

0.0.10 Aug 27, 2023
0.0.9 Mar 18, 2023
0.0.3 Feb 28, 2023

#761 in Algorithms

MIT/Apache

285KB
4.5K SLoC

Kind

  • Decidable (or only complete?)
  • Orderd/non-commutative
  • Linear/non-deterministic (βŠ‡ regular/deterministic)
  • Unitless/promonoidal

TODO

  • Is there such as selective/projective mul to concatenate only one side of And?
  • Representation of the four units, or do we need them? (Decidability is affecting)
    • 1 as Seq::empty(), 𝟏 ≑ Ξ½X.X
    • 𝟎 ≑ ΞΌX.X
    • ⊀ as Inf(Interval::full())
    • βŠ₯ as a & b when a β‰  b (?A)
  • One as Seq(['a']) vs Interval('a', 'a')
  • Relatipnship between nominality, input-ness/string, and function arguments/abstraction, the Integral trait and the De Bruijn index
  • TextStart and TextEnd as left / right units
  • Interpretation of match, is it a judgement a : A or a test that one is divisible by another (quotient is equal to zero) a / b, one (string) is contained by another (regex) a β†’ b? Proof search
  • Two types of negations for each positive connectives: rev and div for Seq, not and sub for Interval
  • a.le(b) β†’ a.and(b) = a, a.add(b) = b
  • Equational reasoning, bisimulation
  • Induction
  • True(Seq<I>, Box<Repr<I>>), assertions, dependent types, backreference
  • action a.P, a βŠ™ P, guard, scalar product
  • Normalisation vs equality rules
  • characteristic function/morphism
  • weighted limit, weighted colimit, end, coend
  • parameterise some laws as features
  • Spherical conic (tennis ball)

Map

  • The functional equivalent to grouping and capturing.

Xor

  • lookahead/behind, multiple futures, communication and discard, ignore combinator
  • Split, subspace

Comparisons

regular expressions /
set theory
linear logic repr type theory /
category theory
len process calculus probability theory /
learning theory
quantum theory
a ∈ L (match) a : A (judgement)
$βˆ…$ $0$ nil, STOP
$⊀$ True
a a One(Seq(a)) len(a) (sequential composition, prefix)
$Ρ$ (empty) ∈ {Ρ} $1$ Seq([]) * : 1 0 SKIP
. Interval(MIN, MAX) 1
ab / $a Β· b$ (concatenation) $a βŠ— b$ (multiplicative conjunction/times) Mul(a, b) $a βŠ— b$ (tensor product) len(a) + len(b) P ||| Q (interleaving)
a|b (alternation),
$a βˆͺ b$ (union)
$a βŠ• b$ (additive disjuction/plus) Or(a, b) $a + b$ (coproduct) max(len(a), len(b)) (deterministic choice)
a* (kleen star),
..|aa|a|Ξ΅
$!a$ (exponential conjunction/of course),
Ξ½X.1 & a & (X βŠ— X)
Inf(a) Ξ½, fixed point/trace, comonad, final coalgebra (replication)
a*? (non greedy),
Ξ΅|a|aa|..
$?a$ (exponential disjunction/why not),
$Β΅X.βŠ₯ βŠ• a βŠ• (X β…‹ X)$
Sup(a) ΞΌ, monad, initial algebra
a? a + 1 Or(Zero, a)
a{n,m} (repetition) a Or(Mul(a, Mul(a, ..)), Or(..))
[a-z] (class) Interval(a, z)
[^a-z] (negation) TODO this is complementary op Neg(a)/-a
a† (reverse) right law vs left law a.rev() len(a)
$a / b$ (right quotient) $a ⊸ b$ Div(a, b) len(a) - len(b) (hiding)
a \ b (left quotient) Div(a, b) (hiding)
RegexSet a β…‹ b (multiplicative disjunction/par) Add(a, b) $a βŠ• b$ (direct sum) (nondeterministic choice)
$a ∩ b$ (intersection) a & b (additive conjunction/with) And(a, b) $a Γ— b$ (product) (interface parallel)
a(?=b) (positive lookahead) And(a, b)
a(?!b) (negative lookahead) And(a, Not(b))
(?<=a)b (positive lookbehind) And(a, b)
(?<!a)b (negative lookbehind) And(a, b)
$a βŠ† b, a ≀ b$ (containmemt) $a ≀ b (≃ a = b β…‹ a < b)$ a.le(b)
aβŠ₯ (dual) a.dual()
a = b (equality) a = b (identity type)

About symbols

Symbols are grouped and assigned primarily by additive/multiplicative distinciton. They are corresponding to whether computation exits or rather continues; though concatenation βŠ—/Mul/* has conjunctive meaning, computation doesn't complete/exit at not satisfying one criterion for there are still different ways of partition to try (backtracking). Though RegexSet β…‹/Add/+ has disjunctive meaning, computation doesn't complete/exit at satisfying one criterion to return which regex has match. On the other hand, alternation βŠ•/Or/| and intersection &/And/& early-break, hence order-dependent. When I add Map variant to Repr to execute arbitrary functions, this order-dependency suddenly becomes to matter for those arbitrary functions can have effects, for example, modification/replacement of input string. (Effects are additive.)

additive multiplicative exponential
positive/extensional $βŠ•$ $0$ Or $βŠ—$ $1$ Mul !
negative/intensional & ⊀ And β…‹ βŠ₯ Add ?

Laws/Coherence

  • All connectives are associative
  • Additive connectives are commutative and idempotent

TODO:

  • Seq::empty(), Ξ΅ - can be empty because negative
  • Interval::full() - can't be empty because positive
regular expressions linear logic/quantale repr title
$a | (b | c) = (a | b) | c$ $a βŠ• (b βŠ• c) = (a βŠ• b) βŠ• c$ Or(a, Or(b, c)) = Or(Or(a, b), c) Or-associativity
a | a = a $a βŠ• a = a$ Or(a, a) = a Or-idempotence
$a βŠ• 0 = 0 βŠ• a = a$ Or(a, Zero) = Or(Zero, a) = a Zero, Or-unit
$a Β· Ξ΅ = Ξ΅ Β· a = a$ $a βŠ— 1 = 1 βŠ— a = a$ Mul(a, One('')) = Mul(One(''), a) = a One(''), Mul-unit
$a Β· (b | c)$ $a βŠ— (b βŠ• c) = (a βŠ— b) βŠ• (a βŠ— c)$ Mul(a, Or(b, c)) = Or(Mul(a, b), Mul(a, c)) right-distributivity
$(a βŠ• b) βŠ— c = (a βŠ— c) βŠ• (b βŠ— c)$ Mul(Or(a, b), c) = Or(Mul(a, c), Mul(b, c)) left-distributivity
$Ξ΅^† = Ξ΅$
(a & b)† = (b†) & (a†) Rev(Mul(a, b)) = Mul(Rev(b), Rev(a))
Mul(One(a), One(b)) = One(ab)
a β…‹ (b & c) = (a β…‹ b) & (a β…‹ c) Add(a, And(b, c)) = And(Add(a, b), Add(a, c)) right-distributivity
(a & b) β…‹ c = (a β…‹ c) & (b β…‹ c) Add(And(a, b), c) = And(Add(a, c), Add(b, c)) left-distributivity
a & a = a And(a, a) = a And-idempotence

Relationship among additive, multiplicative and exponential

  • exp(a + b) = exp(a) * exp(b)

Linearity (which)

  • f(a + b) = f(a) + f(b)
  • (a β†’ b) + (b β†’ a) (non-constructive)
  • functions take only one argument
  • presheaves of modules

πœ•, derivation

math repr
$πœ•(a βŠ— b) ≃ πœ•(a) βŠ— b + a βŠ— πœ•(b)$ der(Mul(a, b)) = Or(Mul(der(a), b), Mul(a, d(b)))
  • d(Zero) = Zero
  • d(Or(a, b)) = Or(d(a), d(b))
  • d(Mul(a, b)) = Or(Mul(d(a), b), Mul(a, d(b)) *
  • d(Inf(a)) = Mul(d(a), Inf(a))
  • a : D(a)

True

  • And(True, a) ->

Stream processor

  • Ξ½X.ΞΌY. (A β†’ Y) + B Γ— X + 1

Flags (TODO)

  • i, CaseInsensitive

  • m, MultiLine

  • s, DotMatchesNewLine

  • U, SwapGreed

  • u, Unicode

  • x, IgnoreWhitespace

  • Recouse non-greedy pattern to _

Interpretations

  • By regarding matching as an assignment of occurrences of strings to each part of an expression, regular expressions are resource (limited occurrences of strings) consumption (exclusive assignment/matching of them).

Let's not say 'communication'

  • local β†”οΈŽ global
  • context solving/fitting/providing with each other

Drawing Hands

Algorithms (TODO)

  • Bit-pararell
  • aho-corasick
  • Boyer–Moore
  • memchr

Semantics (TODO)

  • Coherent space

References

  • Completeness for Categories of Generalized Automata, Guido Boccali, Andrea Laretto, Fosco Loregian, Stefano Luneia, 2023 [arxiv]
  • Beyond Initial Algebras and Final Coalgebras, Ezra Schoen, Jade Master, Clemens Kupke, 2023 [arxiv]
  • The Functional Machine Calculus, Willem Heijltjes, 2023 [arxiv]
  • The Functional Machine Calculus II: Semantics, Chris Barrett, Willem Heijltjes, Guy McCusker, 2023 [arxiv]
  • Differential 2-Rigs, Fosco Loregian, Todd Trimble, 2022 [arxiv]
  • On the Pre- and Promonoidal Structure of Spacetime, James Hefford, Aleks Kissinger, 2022 [arxiv]
  • Diagrammatic Differentiation for Quantum Machine Learning, Alexis Toumi, Richie Yeung, Giovanni de Felice, 2021 [arxiv]
  • Coend Calculus, Fosco Loregian, 2020 [arxiv]
  • Proof Equivalence in MLL Is PSPACE-Complete, Willem Heijltjes, Robin Houston, 2016 [arxiv]
  • Linear Logic Without Units, Robin Houston, 2013 [arxiv]
  • Imperative Programs as Proofs via Game Semantics, Martin Churchill, Jim Laird, Guy McCusker, 2013 [arxiv]
  • Regular Expression Containment as a Proof Search Problem, Vladimir Komendantsky, 2011 [inria]

Dependencies

~2–2.6MB
~71K SLoC