-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discuss] Resyntaxing argument conventions and References #3623
Comments
I'm ok with any of the proposed alternatives as long as they are self-consistent, so i don't have much to say with that. |
Hi, thanks for sharing the process :) . Here are my thoughts as a layman that loves this intuitive and pretty language: I'll start with the bike-shedding part as you called it 😆
I'd also like to take this opportunity to ask for the Python Now as for another of my crazy ideas as someone who has no formal training in CS and reading the dragonbook is all the knowledge I have about compilers:
var s : String
# Not alive here.
s = "hello"
# alive here
use(s)
# destroyed here.
unrelated_stuff()
# ...
s = "goodbye"
# alive here
use(s)
# destroyed here A word that is often used in monitoring is What I'd like is for the # nothing to do with the former `Reference` now named `Pointer`, but rather `ref`
struct Ref[
mutable: Bool,
stable: Bool,
scope: Scope,
liveness: AnyLiveness = Liveness[mutable, stable, scope],
]:
"""A `ref`.
Parameters:
mutable: Well, mutability.
stable: Whether the variable is stable in its assignment.
scope: Where and when the ref gets destroyed. This would signal that this does not get
consumed inside the current scope. Aditionally being able to set parameters like this
would allow for manual liveness guarantees for unsafe types (will it ?).
liveness: What the liveness for the given scope is.
"""
... As such we wouldn't need to write a whole book to explain:
PS: This might not make much sense on the details, but my main point is that we should try to simplify the amount of concepts and keywords as much as possible. |
Here's my take on things. We need to consider the syntax for closure-capturingThe keywords that we're discussing will be used for multiple purposes. Chris has mentioned two purposes:
IMO there's another major language feature we've forgotten:
Mojo closures are still very much WIP, but at some point we'll need a syntax for explicitly declaring the set of nonlocal variables that a closure captures. For example:
A syntax along these lines is necessary, because we need to enforce aliasing restrictions between arguments and captures. The set of captures affects the interface of a function (the arguments that you can pass to it), and we need a syntax that documents this.
In short: whatever set of keywords we end up choosing should be suitable for declaring captures as well. On the term "reference"Mojo's first-class type for memory-safe referencing has been recently renamed to Unfortunately, Python already has "references", and they have absolutely nothing to do with the C++ concept, nor the Mojo concept. In Python, a "reference" is a first-class value that points to an object, i.e. an instance of a class. This definition has been deeply ingrained in the Python community—in every tutorial, every book, and in every Python's programmer's lexicon—for 30+ years now. Accordingly, attempting to reclaim or extend the meaning of the term "reference" shouldn't be taken lightly. I anticipate that it will lead to a lot of confusion, especially for less experienced programmers. Also, it's not possible to explain Mojo's proposed
Alternatives to the term "reference"Given that "reference" isn't an ideal term, it's worth considering whether there are any viable alternatives. As mentioned earlier, a "reference" is basically a new identifier for an existing variable. A reference is certainly not an address or a pointer, because in Mojo, a register-passable variable will usually be passed to a function by value, in a register. In short: a Mojo "reference" doesn't correspond to any particular memory representation, so we should avoid choosing a term that implies that. So, we need a term that means "a new identifier for an existing variable". It turns out there is already a well-established term for that: a new name for an old thing is an alias. This term is very well-established. It's especially common in languages that enforce aliasing restrictions, such as Mojo and Rust. In fact, this is probably the biggest reason to describe arguments as "aliases" of variables—it fits so naturally into the topic of aliasing, which is going to be a big deal in the Mojo community. Before we can contemplate referring to arguments as "aliases", there's a small issue that needs addressing: Mojo already has a keyword named Note: I'm not necessarily saying that we should use the keyword My keyword suggestionsAs discussed earlier, we need keywords suitable for all of the following:
For the first two use-cases, the main thing we want to communicate is whether the function intends to mutate an argument or capture, or merely read it. Keywords appearing in function signatures would ideally be very short, because horizontal space is already at a high premium in function signatures. For mutable access to a variable, Here's what these keywords would look like:
And obviously, Here's how we would describe this signature:
This naming scheme fits well with
Ideally A keyword for "returning aliases", and for patternsWe haven't yet discussed the equivalent of Mojo's For For patterns, we also want a mutability-independent keyword, because the mutability of the variable that we're binding can always be determined from context, so there's no reason to declare the mutability explicitly. Here are my top 3 suggestions for mutability-independent keywords:
The terms The term Here are the keywords in action.
|
I appreciate you sharing this with us. I personally like Furthermore I realized my own mis-association with the "transfer operator", where due to the name and being a C++ dev by day, I imagined I was essentially using I highly look forward to the improvements to One point from me, however, and maybe I've missed something, but given the focus here on distinguishing between immutable and mutable references I'm confused why we previously did away with the
|
What it all comes down to is that it's crucial to know whether a function can mutate an argument—or memory accessible via an argument—because this affects the ability for the caller to pass in arguments that alias each other. Aliasing guarantees are essential for memory safety, data race freedom, optimization, and generally just reasoning about the correctness of a function. In contrast, the I'm sure if somebody can provide strong evidence that But let's not have that discussion here, because that would be derailing this thread. 🙂 |
fn main():
m = 0
f(m, m) # exclusivity violation
g(m, m) # pass
fn f[lt: ImmutableLifetime](inout a: Int, ref[lt] b: Int):
pass
fn g(inout a: Int, b: Int):
pass |
As an average programmer, I believe we should make pointer related operations close to C/C++ and keep anything in common with Python, pythonic. So, |
Seems that many Mojo adopters will be Python users with C++ background (like myself). Python's use of "reference" is somewhat different, but the keyword isn't in the language, words like "value" and "reference" are often casual concepts in Python. IMO, it's okay to reuse concepts from widely used C++, when it isn't part of what people think about when they think "Pythonic". |
@mind6 Just to clarify: Mojo's concept of "reference" is not the same as C++. It subsumes C++'s pass-by-value and pass-by-reference. If a C++ programmer attempts to use their mental model of how C++ references work when learning Mojo (e.g. to reason about efficiency), they are going to end up confused. |
Do you mean passing ownership is like passing by value? |
No, I mean passing as Basically: in Mojo, "passing by reference" or "by |
In In This way we can get 4 keywords of the same length:
I don't see the point in using verbs.
Whether |
I roughly agree with this, but I think Basically, a function only needs to declare what it does to the variable. It doesn't need to declare anything about the variable's inherent properties.
This concern was discussed in this thread. The callee always consumes (deinitializes) some variable. All the |
Temporary means that a new variable is created rather than something being consumed at the function boundary. Something like an implicit |
Have we considered replacing some of these keywords with operators (special characters)? |
Thanks @lattner for the detailed proposal and asking community input! I'll try to avoid as much bikeshedding as possible. I would like for us to improve the section called Patterns + Local reference bindings. Coming from Python, I would like to surface a few surprising points. Following this, we can either change the proposal or just improve the examples and the description. 1 - Using
|
I like nmsmith's point on naming. Basically we name the keywords after what we can do with the variable, rather than declaring what it is. It seems that we are converging towards a realization that it is about access control of the variables instead of how it is passed around (the actual technique). In that sense, it makes sense to use access control terms such as read and write/modify. Even Anything that has I think for most of the programmers, pivoting towards access control terms would be much more easier to understand than C++/Rust derived terminology. |
Now, we are basically LLVM! I like this idea. |
Here's a simple example using the proposed syntax: from memory import UnsafePointer
struct View[
T: CollectionElement,
origin: ImmutableOrigin,
]:
var data: UnsafePointer[T]
var len: Int
# The origin is the `list` passed in on init
fn __init__(out self, ref [origin] list: List[T, *_]):
self.data = list.data
self.len = len(list)
# Return a single item that originates from the `list` passed in on init
fn __getitem__(self, idx: Int) -> ref [origin] T:
return self.data[idx]
fn main():
strings = List[String]("a", "b")
view = View(strings)
# `a` and `b` originate from `strings`
a = view[0]
b = view[1]
print(a, b)
# No more variables that originate from `strings`, so it will now be destroyed I'm a fan of I agree with others |
I have a slightly orthoginal point, which is that I think that fn do_op_with_list(immref foo: Mutex[List[Int]]):
var guard: MutexGuard[Int, MutableOrigin] = foo.lock()
guard[][0] += 1
fn do_op_with_list(mutref foo: Mutex[List[Int]]):
var inner: List[Int] = foo.get_inner_mut()
inner[0] += 1 Currently, these are treated as the same typeand cause type errors (if you treat
The actual final names of the various reference types don't matter, just that you can disambiguate in this way.
While this is workable, having a top-level branch like this seems like a scenario ripe to be solved via function overloads. Keep in mind that there may also be multiple arguments where each will need to branch on mutability, which quickly will make this type of function a giant mess if you need to consider many objects, such as in an ECS, where this feature could be used to enable a single-threaded mode to avoid locking. |
I did not give it much of a thought, but: Can we just take plain Origin at struct-level and decide whether we need mutable or not at method level? from memory import UnsafePointer
struct View[
T: CollectionElement,
origin: Origin, # <- Not an immutable/mutable
]:
var data: UnsafePointer[T]
var len: Int
# The origin is the `list` passed in on init
fn __init__(out self, immref [origin] list: List[T, *_]): # Immutability defined at this level
self.data = list.data
self.len = len(list)
# Return a single item that originates from the `list` passed in on init
fn __getitem__(self, idx: Int) -> immref [origin] T:
return self.data[idx]
fn main():
strings = List[String]("a", "b")
view = View(strings)
# `a` and `b` originate from `strings`
a = view[0]
b = view[1]
print(a, b) Or do we really need to track immutability/mutability at struct level? With an "access control" concept it would look like: fn __init__(write self, read [origin] list: List[T, *_]): |
I would love if Mojo's default syntax kept Python's reference semantics. In the loop example, could this be implemented by returning an iterator of mutable references by default? Perhaps to be inline with Mojo fn's having immref as default, loops can return an immutable iterator. This way at least mutability errors will show up during compilation. |
I kinda like the idea of using |
+1. Reference by default makes much more sense to me, than copy by default.
+1. This space between looks strange
+1. This pattern to discard moved value is strange Regarding to
var baz String = "baz"
var qux: String = "qux"
fn foo(own String bar): ...
foo(mov baz) # moves baz into foo, without copying
var quux = mov qux # moves qux, without copying |
The "transfer operator" doesn't move any data, nor is it an operator. It just signifies that the variable will be passed as |
What you're suggesting is basically that we model mutability as an effect that the function performs (or a capability that it requests), rather than something baked into an I've been thinking about this kind of thing for months and I believe it can be done, but it has major consequences for how memory-safety pointers are modelled in Mojo. For one thing, it means that a struct can no longer tell whether it's storing an "immutable pointer" or a "mutable pointer", so the compiler can't enforce that mutable pointers are globally unaliased. People coming from Rust might think that this is a bad thing, but it's not really a big deal. To reason about memory safety, all you need is local aliasing guarantees: you only need to know that while a particular function is executing, certain arguments/pointers don't alias other arguments/pointers. This can be achieved by slightly tweaking how "borrow checking" works. Rather than borrow-checking the construction of pointers, you can borrow-check function calls. I'm still researching/experimenting with designs for memory-safe pointers, so I can't tell you if this the right way to go. I think it might be, but I need to demonstrate that. Anyway, to avoid derailing this thread, that's all I'll say for now. In summary: maybe we don't need both |
@gabrieldemarmiesse Your suggestion to reconsider whether
Here you've equated Mojo references with Python references, which is one of the major pitfalls that I brought up earlier. These two features behave very differently, and I don't think they can be unified. If patterns bind "Mojo references" by default, I don't think we end up with a behaviour equivalent to Python. In particular, assignment will behave differently. (Unless... you're proposing changing the meaning of assignment in Mojo?) That said, I agree with others that if we have patterns bind immutable "Mojo references" by default (at least within a I'll reiterate that I strongly believe we should ditch the term "reference" for Mojo references, because irrespective of whether you're equating Mojo references with Python references or C++ references, you're going to end up with the wrong mental model. |
@nmsmith I'm by no means a programming languages expert, and I cannot claim I deeply understand how Mojo's references differ from Python's references. So it's no surprises if I made a few mistakes in my analysis. I believe the conclusion still holds true. The proposed behaviour of Mojo for iterating and unpacking, as well as the current assignment operation, is surprising for python users since the code written can be exactly the same and have very different outcomes. As such, we need a way to avoir those behaviour footguns for python users. We can either
I do not have enough expertise to say which path is the best or even which one can be achieved (except modifying the docs, this seems easy enough). |
I agree: the inconsistency with Python (wherein every type is reference-semantic, and everything is pass-by-value) is definitely a concern, and we should strive to minimize it. We just have to make sure we preserve Mojo's ownership and referencing model, because that's what gives Mojo its performance and safety guarantees. |
This is how I'd describe it: You can use an struct Foo:
var value: String
fn __init__(out self, in value: String):
self.value = value^
var string = String("Mojo")
var foo = Foo(string)
The value will be copied var foo = Foo(string)
string += "🔥" # `string` reused here, so `Foo(string)` now passes `in` a copy
print(foo.value) # Mojo
print(string) # Mojo🔥 The default argument convention |
If we're going with this description, I think we should just say "takes ownership", because the two phrases are synonymous. And if we're talking about "taking ownership", then the keyword This is why I don't understand people's opinions that |
I'm trying to convert myself to an The following signature:
Verbalizes as:
The following signature:
Verbalizes as:
or maybe:
These verbalizations aren't working very well for me. I've spent a lot of time teaching Python in classrooms. A good verbalization goes a lot of the way towards building up a good mental model. |
The keyword isn't "transfer in" though and doesn't imply that, it's a value being passed "in" by transfer or copy. Like you say
Yes this is what it implies and is an incomplete description, it can also copy the value in:
I like |
At the end of the day, you're reading the word "take" as "take away" (lose), whereas I'm reading the word take as in "take possession" (receive). The second reading is accurate: a function with an If I'm the only one that appreciates this second reading, then I'd encourage consideration of synonymous keywords, such as
Hell, we could even use the
This function sends We could even use the
And so on. Personally, I think
The nice thing about all of these signatures is that they're verbalizable. I still haven't seen anyone suggest how to verbalize a signature that contains Perhaps I'm the only one that thinks verbalizability is important. In that case, carry on. That's the only thing I'm fighting for here. |
I argue for simplicity and intuitiveness so much because I care about the language being easy to teach and having a low barrier to entry. These are complicated topics for newbies and the simpler the better IMO.
This one makes me think of network I/O, receive is a word that makes me slow down and stumble every time I have to spell it out as a non-native speaker. If we go in this direction I'd argue in favor of using
This one (IMO) correctly implies, like @nmsmith pointed out, the true connotation to be read from Another option that has some nice contrast: fn __init__(give self, take value: String): ... this function gives back a And to concatenate to my previous point about renaming struct MyType:
fn __init__(give self, take value: String): ...
fn main():
a = "123"
t = MyType(transfer a) IMO this has an easily readable and explainable meaning |
I also like the
|
Too much bikeshedding here, I think Chris needs to pick one or the conversation will not end. There can never be a perfect keyword, and since Mojo is still in very active development everyone understands that things can still change. Whatever is picked can still change if it doesn't work. I had some reservations with "in" - it is already used for other things in Python. But Python and Python programmers are not strangers to overloaded semantics. "in" makes sense in the context of "out", so both could be explained as a shortened version of "take in" and "give out"; Or we'll find an explanation that works. And if we can't find an explanation that works we can always repaint. However, decisions need to be based on real-world experience because there is no end to speculative problems. It's probably helpful if we can make a mechanical tool for updating argument conventions, as far as tooling goes this should be relatively straightforward for anyone with access to the parsed source code. |
👍 |
This adds support for spelling "named functions results" using the same `out` syntax used by initializers (in addition to `-> T as name`). Functions may have at most one named result or return type specified with the usual `->` syntax. `out` arguments may occur anywhere in the argument list, but are typically last (except for `__init__` methods, where they are typically first). ```mojo # This function has type "fn() -> String" fn example(out result: String): result = "foo" ``` The parser still accepts the old syntax as a synonym for this, but that will eventually be deprecated and removed. This was discussed extensively here: #3623 MODULAR_ORIG_COMMIT_REV_ID: 23b3a120227a42d9550ba76d8cafb63c3a03edcf
To clarify,
While this is true to some extent, we are also trying to make the "best possible thing" here, not just "something fast". I'd much rather take time to soak a bit, see how things shake out with the adoption of the existing changes, and then decide what to do. We don't need to decide what to do about owned in the next few days or few weeks, let's give it the time it needs to gel a bit. I think we can make a final decision on this in Jan after the holidays. I really appreciate all the discussion and tradeoff analysis, and new ideas suggested! FWIW, something that I find helpful is to go and do the global search and replace in the stdlib and "see what it looks like". -Chris |
To be frank, while reading LinearTypes proposal, the following example sounded better with Compared to: @explicit_destroy("Use consume()")
struct MyLinear:
fn __init__(out self): pass
fn consume(in self):
destroy self The following nicely rolls. @explicit_destroy("Use consume()")
struct MyLinear:
fn __init__(give self): pass
fn consume(take self):
destroy self But it is probably just one data point that sounds good... Give and take also has no baggage from other languages and can be defined as the way we want for Mojo. |
I'm assuming some form of first-class references exist, since my understanding was that there are plans for them. Right now, the type would look something like |
There seems to still be some inconsistency in the implementation. After updating fn __copyinit__(out self, other: Matrix) -> None:
self.rows = other.rows
self.cols = other.cols
self.total_items = other.total_items
self.debugging = other.debugging
self.data = DataType.alloc(self.total_items)
memcpy(self.data.address, other.data.address, self.total_items) I was able to get the code to build by changing this to... fn __copyinit__(out self, other: Matrix):
self.rows = other.rows
self.cols = other.cols
self.total_items = other.total_items
self.debugging = other.debugging
self.data = DataType.alloc(self.total_items)
memcpy(self.data.address, other.data.address, self.total_items) However, the following code was able to be built without the compiler complaining... fn __isub__ (out self: Matrix, other: Matrix) -> None:
self = self - other For the sake of consistency, I changed this to... fn __isub__ (out self: Matrix, other: Matrix):
self = self - other and the code still built, but it does seem that the compiler is behaving differently in this situation without any good reason, that I can see, at least... |
There are some non idealities with |
|
In my opinion,
I prefer I think that verbalizable keywords make it easier to talk about function signatures. My favorites by example:
Verbalized: „The function |
I also have settled my opinion on to prefer I think it would also result in slightly less bugs as it is easy to verbalize in head while reading code. For example, when I And borrow (immutable and mutable), take and give have a cohesive meaning together also in the real world. |
Another thing worth thinking about is what you'd call There's a standard name for the first type: Below are some of my early thoughts on how these types would work. I haven't presented a complete design. My goal is just to explain how these types might be useful. Why promises matterThe main use case for a fn __setitem__(mut self, owned key: K, owned value: V) It could have the following signature: fn __setitem__(mut self, owned key: K) -> Promise[V, __origin_of(self)]: This provides the caller with an obligation to initialize a value ( The point of having Note 1: The syntax for adding an element to a dictionary would be unchanged. You'd still write Note 2: Promises would integrate with Mojo's borrow checker, to ensure that you can't read from the dictionary while the promise is unfulfilled. This is necessary for memory safety. Why
|
If So the linear types could be named in a similar way:
or simpler
|
I've thought of a simpler design for Recall my earlier definition of struct List[T: CollectionElement]:
fn pop(mut self) raises -> MustTake[T, __origin_of(self)]: Maybe we can instead allow Mojo's struct List[T: CollectionElement]:
fn pop(mut self) raises -> owned[self] T: The type Basically, we're allowing Exercise for the reader: Test how well your favourite names for the A similar trick works for promises/ In my last post, I presented the following signature for struct Dict[K: KeyElement, V: CollectionElement]:
fn __setitem__(mut self, owned key: K) -> Promise[V, __origin_of(self)]: Here's the same signature using the second-class struct Dict[K: KeyElement, V: CollectionElement]:
fn __setitem__(mut self, owned key: K) -> init[self] V: Again, How's that for a table flip? (╯°□°)╯︵ ┻━┻ These new conventions seem easy to implement, easy to teach, and they are very useful when working with large structs. If Mojo adopts these conventions, many of the keywords suggested earlier in this thread—including my own suggestions—would no longer be a good idea. That's not to say we will adopt these conventions. But at the moment I can't see any reason why we wouldn't. If anyone has thoughts on these "reverse argument" conventions, I'd love to hear them. |
I think this would work well with my favorite keywords if we stick to the notion that they are always interpreted from the viewpoint of the callee (like For
|
This adopts the recent changes that allow the use of the `out` argument convention. This argument convention more correctly models the nature of `__init__` which initializes self but never reads from it. This was discussed on this public thread: modularml/mojo#3623 MODULAR_ORIG_COMMIT_REV_ID: d05be1ccb28b254aeccd5807858546aeb1777991
As discussed in [this public thread](#3623), the use of `inout self` in initializers is incorrect. Initializers don't actually read the argument, they only write it. Per discussion on that thread, we're moving to 'out' as the keyword used in function declarations, which allows us to spell the function type correctly as well. This keeps support for the old syntax for migration support, but we should eventually remove it. MODULAR_ORIG_COMMIT_REV_ID: 115768e0c6df069f3cf2e4b3d806db1cc2e6d8a2
This adopts the recent changes that allow the use of the `out` argument convention. This argument convention more correctly models the nature of `__init__` which initializes self but never reads from it. This was discussed on this public thread: #3623 MODULAR_ORIG_COMMIT_REV_ID: d05be1ccb28b254aeccd5807858546aeb1777991
Per extensive discussion over on this public thread: #3623 We're moving to rename the `inout` argument convention to be called simply `mut`, and renames `borrowed` to `read` which can still be generally elided. This reduces the need to understand references for the basic conventions that many people work with, while providing a more strictly-correct and consistent model. These words are now "soft" keywords instead of "hard" keywords as well. This still maintains support for the `inout` and `borrowed` keywords, though they will eventually be removed. MODULAR_ORIG_COMMIT_REV_ID: e2b41cfb4cb8bb0b2e67ade93d32d7ef8989428e
Per extensive discussion over on this public thread: modularml/mojo#3623 We're moving to rename the `inout` argument convention to be called simply `mut`, and renames `borrowed` to `read` which can still be generally elided. This reduces the need to understand references for the basic conventions that many people work with, while providing a more strictly-correct and consistent model. These words are now "soft" keywords instead of "hard" keywords as well. This still maintains support for the `inout` and `borrowed` keywords, though they will eventually be removed. MODULAR_ORIG_COMMIT_REV_ID: e2b41cfb4cb8bb0b2e67ade93d32d7ef8989428e
This adds support for spelling "named functions results" using the same `out` syntax used by initializers (in addition to `-> T as name`). Functions may have at most one named result or return type specified with the usual `->` syntax. `out` arguments may occur anywhere in the argument list, but are typically last (except for `__init__` methods, where they are typically first). ```mojo # This function has type "fn() -> String" fn example(out result: String): result = "foo" ``` The parser still accepts the old syntax as a synonym for this, but that will eventually be deprecated and removed. This was discussed extensively here: #3623 MODULAR_ORIG_COMMIT_REV_ID: 23b3a120227a42d9550ba76d8cafb63c3a03edcf
The design of the Mojo references subsystem is starting to come together. To finalize the major points, it helps to come back and re-evaluate several early decisions in Mojo to make the design more self consistent. This is a proposal to gain consensus and alignment on naming topics, etc without diving deep into the overall design (which is settling out).
Instead of embedding this inline, I added this to a github gist here:
https://gist.github.com/lattner/da647146ea573902782525f3446829ff
I'd love feedback and discussion!
The text was updated successfully, but these errors were encountered: