1 unstable release
0.0.0 | Jun 2, 2021 |
---|
#23 in #dialect
10KB
Monty
A Strongly Typed Python Dialect
Index
- Index
- Brief
- Building the compiler
- Using the compiler
- Just some thoughts on how we can keep the dynamic feeling.
- Related projects
Brief
Monty (/ˈmɒntɪ/)
is an attempt to provide a completely organic alternative
dialect of Python equipped with a stronger, safer, and smarter type system.
At a high level monty can be closely compared with what TypeScript does for JavaScript. The core contrast between Monty and TypeScript however is that TS is a strict syntactical superset of JS, Monty is a strict syntactical subset of Python; meaning that TS adds backwards incompatible syntax to JS where Monty disallows existing Python syntax and semantics in a backwards compatible manner.
Monty is intended to be compiled to native executable binaries or WASM via the use of cranelift (and maybe llvm if support for that ever lands.)
Building the compiler
You will need the a recent nightly version of rust (1.53.0-nightly
or so) in order to build.
after that it's as simple as running cargo build
Using the compiler
It is strongly advised to use clang
and enable the mold
linker.
I suggest you download and install both since they (especially mold) will
improve the final steps of compile time performance.
Make sure to check the compilers help command with --help
as it will be more
up to date than this example.
The compiler aims to be easy to use, invoking the following command will produce
a mostly statically linked binary named ./file
(or ./file.exe
if using windows)
./montyc ./path/to/file.py
you may also specify the path to the local C compiler via --cc="path/to/cc"
and a linker via --ld="path/to/ld"
Just some thoughts on how we can keep the dynamic feeling.
Apart from the regular semantics of interpreter Python, Monty will disallow parts of the language selectively (depending on how hard the feature is to translate to compiled code.)
"automatic unions"
It's useful to be able to represent multiple types of values within one object. cough cough polymorphism cough cough
Variables, in Monty, may only have one type per scope. you may not re-assign a value to a variable with a different type.
def badly_typed():
this = 1
this = "foo"
You may however have a union of types, which is represented like a tagged union
in C or an enum in Rust. typing.Union[T, ...]
is the Pythonic way to annotate
a union explicitly but in Monty you may use the newer literal syntax T | U
from
PEP604:
def correctly_typed():
this: int | str = 1
this = "foo"
But wait! say you have the following code:
def foo() -> int:
return 1
def bar() -> str:
return "foo"
def baz(control: bool):
x = foo() if control else bar()
What's the type of x
in the function baz
now?
Some might expect this to be a type error after all foo
and bar
return
incompatible types and they try get associated with x
but this isn't the case
unless you explicitly annotate x
to be one of int
or str
.
What happens instead is that the compiler will "synthesize" (create) a union
type for you so the type of x
will be:
int | str
orUnion[int, str]
.
"Type narrowing"
Type narrowing is not a new concept and its been around for a while in typecheckers.
The idea is, roughly, that you can take a union type and dissasemble it into one of its variants through a type guard like:
x: int | str | list[str]
if isinstance(x, int):
# x is now considered an integer in this branch of the if statement
elif isinstance(x, str):
# x is now considered a string here.
else:
# exhaustive-ness checks will allow `x` to be treated as a list of strings here.
"Deviated instance types"
Inspired from this section of the RPython documentation Deviated Instance Types work very similarly. For example take the following class:
class Thing:
attr1: int
attr2: list[str]
The memory layout of class Thing
we'll call "Layout 1" will contain an integer and a list and by default
that's all that was said about the class so that's all monty can do with it for now any other attribute access
either getting or setting will invoke a type error to be reported to the user.
Here's where the idea of "deviating" an "instance"s "type" comes in:
THING = Thing()
THING.attr3 = "blah blah"
This value would constant at runtime but lazily initialized at compile time but that's besides the point.
THING
is an instance of Thing
meaning the layout of the constant is specified in the class definition.
but we then try and set an attribute on the instance and at first glance it looks like an error but it actually
lets us do very clever things with the way we model values and types with monty.
The memory layout of THING
is now "Layout 1, 0" (read as the first diverged layout from 1) and in rough
C pseudo code will be structured something like:
struct Thing_Layout_1 {
integer_type attr1;
list_str_type attr2;
}
struct Thing_Layout_1_0 {
Thing_Layout_1 head;
string_type attr3;
}
If THING
were not to be a constant and instead a static module-level variable then you
are also free to set and modify the value of THING.attr3
ala
thing = Thing()
def whatever(blah: str, n: int):
thing.attr3 = blah * n
Compile time (or "comptime") execution
The biggest difference between regular Python and Monty is how the module-level is evaluated.
Python is lazy and everything gets run when its accessed, a modules scope is still a big block of executable code after all and can be treated as a function that operates on an implicit module object.
Monty treats a module's global scope as a big pool of constant declarations. but this doesn't translate well for obvious reasons with already existing code and semantics. To bridge this gap montyc has within itself a small AST-based interpreter that is used to execute the code within a modules global scope.
Assuming most global-scope level logic is there to act as a sort of "initializing glue routine" then the user can do whatever they like as long as:
-
The execution finishes within a known amount of "ticks" (so that we don't accidentally run off into an infinite loop that never finishes.)
-
The state of the module's global scope is semantically correct (the typechecker will verify the module after comptime execution has finished for a module.)
Of course in a completely dynamic environment we don't have to restrict the user
like we would when compiling the code regularly, so in that case most things that
would be rejected normally are perfectly fine such as: exec
, eval
,
globals
, locals
, dynamic class creation, and functions with untyped arguments.