Refactor compiler skeleton #22

simon-staal · 2024-03-04T22:53:23Z

Changes:

Replaced old-style #ifndef header guards with #pragma once
Wrapped nodes in AST namespace - it's in general good practice to do this in projects, and I think it'll help encourage students to do this without being too confusing for them.
Replaced Node* stored by nodes to NodePtr = std::unique_ptr<const Node>. This change incorporates a couple elements:
1. Immutability of AST: The AST is constructed by the parser, and all the information a node needs is available once it's constructed. making everything const reduces the chances of programmer error
2. Switch from manual memory management to RAII: Using new and delete manually is very C-like (/ pre C++11). Although it's all that students are taught in Y1, I think understanding what a std::unique_ptr is is relatively straightforward, and encourages students to use a more modern interface.
3. Aliasing node storage type: by defining NodePtr as an alias for the stored type, it not only saves on characters, but makes it much easier for students to switch this type out with something else (like raw pointers) if they find the use of std::unique_ptr too confusing.
TypeSpecifier is no longer a Node - this makes sense imo because just a type specifier is never enough information for you to produce assembly, meaning that EmitRISC(...) will always have an empty / throwing implementation. Also switched to using an enum class instead of a std::string for hopefully obvious reasons.
Removed branches_ from Node - see here for reasoning
Fixed memory leaks - see Memory leaks in compiler skeleton #21. I'll probably make a separate PR into main to fix this since it's minimally invasive and it's kinda unfair to assess students on not leaking memory if we give them a leaky base.
- Added valgrind to the docker env to make it easier for students to use it
- Add docs explaining how to check for memory leaks
- Explicitly mention memory management as part of the 10% of marks given for "nice code"
[nit] (only style change) Changed types from T & to T& - I think it's much nicer to have the type information all together rather than separated by whitespace (sorry, I couldn't resist)

Other considered changes:

Adding more intermediate abstract classes - in the future, students may need to get information from special types of nodes, for example, getting the name of an identifier. Adding a virtual method to their base node class would be bad practice, as would dynamic casting the member, it would be better for the relevant class to store the derived node type directly / a more specialised abstract class (i.e. Expression). Unfortunately, with the basic skeleton we provide, there's no really meaningful intermediate abstract classes we can really provide, for example, we could make Return store a pointer to an Expression instead of a general node, but currently an Expression and a Node are exactly the same (it would also make it more difficult for them to switch out the pointer implementation they're using). In general, I'm hoping that TypeSpecifier is enough to show them that it's fine for different Nodes to store things which aren't Nodes / it's fine for them to add more types to their parser.
Reducing copying using move semantics - when constructing things like Identifier, it would be best practice to use move semantics to avoid copying the string unnecessarily. However, move semantics / r-values is definitely something I'd classify as a more advanced topic, and since we're not really concerned about performance, I don't think there's enough upside to include it to justify the extra burden on the students.

…created AST namespace

Fiwo735

Thanks for the awesome changes! I have some comments below:

I also like T&/T* more than T &/T *.
#pragma once seems like a less-error prone concept, but StackOverflow is still divided - do you have some insights from industry? The companies I worked for still used classic header guards.
Namespace names should be lowercase according to Google Style Guide that we are following.
NodePtr usage is inconsistent - there are still many places that use Node*.
I like the use of unique_ptr - however, it might be a little tricky, so it should definitely be explained well in the docs.
New implementation of type_specifier is a bit advanced, but I do like the changes. Hopefully we can assume that people implementing more types than int would be proficient enough to understand the used constructs.
I think type_specifier should still be split into .hpp and .cpp as currently all the implementation is in .hpp.
emplace_back needs a mention in the docs along with move semantics in general as it's certainly an advanced topic.

It'd be good to split the less intrusive changes that you suggest into a separate PR, so that it can get merged into main ASAP, while the rest is going to wait on a separate branch.

simon-staal · 2024-03-05T20:55:18Z

To address your points:

#pragma once seems like a less-error prone concept, but StackOverflow is still divided - do you have some insights from industry? The companies I worked for still used classic header guards.

From my experience in industry, #pragma once is basically universally seen as a much better solution than the old school header guards. Although some of the comments in the thread you link claim that #pragma once has "unfixable bugs", I think that in practice you're much more likely to have issues with the old header guards, by accidentally re-using names, either by error or due to libraries you include. It's well-supported by basically all the big compilers you'd use.

Namespace names should be lowercase according to Google Style Guide that we are following.

sadge. will update

NodePtr usage is inconsistent - there are still many places that use Node*.

Not 100% sure I follow here. I'm pretty sure I use NodePtr everywhere in the AST files. If you're talking about the AST constructors / parser, unfortunately, flex / bison are pretty tied to a pre C++11-style API, and don't work well with unique_ptr as it tries to copy it around (which is illegal) so using raw pointers there, and then allowing your ast to own this memory by storing them internally as unique_ptrs is the best we can do imo.

New implementation of type_specifier is a bit advanced, but I do like the changes. Hopefully we can assume that people implementing more types than int would be proficient enough to understand the used constructs.

Given that they have to compile enums and switch statements, I'm sure we can trust them to figure out what an enum class is 😉. It's just a (safer) enum wrapped in a namespace after all.

I think type_specifier should still be split into .hpp and .cpp as currently all the implementation is in .hpp.

Given that type_specifier itself is just an enum class, and that the helper functions provided are relatively simple, I think it's fine (maybe even better) to leave it all in .hpp (the functions are constexpr, which solves any possible ODR issues). That said, if you have a particularly strong opinion, I can split it into a .cpp.

emplace_back needs a mention in the docs along with move semantics in general as it's certainly an advanced topic.

Can also touch on it, currently using it to ensure that NodeList builds without issues for both smart and raw pointers.

In terms of less intrusive changes to include, what are you thinking? I could add (in order of least to most intrusive):

Fix memory leaks
Formatting changes
NodePtr (but set to const Node*or evenNode*` if you think that's too much
New type_specifier (a bit more invasive but quite valuable imo)
namespace ast? (not sure about this one)

Fiwo735 · 2024-03-06T02:37:19Z

Not 100% sure I follow here. I'm pretty sure I use NodePtr everywhere in the AST files. If you're talking about the AST constructors / parser, unfortunately, flex / bison are pretty tied to a pre C++11-style API, and don't work well with unique_ptr as it tries to copy it around (which is illegal) so using raw pointers there, and then allowing your ast to own this memory by storing them internally as unique_ptrs is the best we can do imo.

Oh, so that means that all of these have to stay as Node*?

If so, then I think this is going to be too confusing for students:

Is there any way around it? Like some conditional NodePtr hackery for flex/bison or extracting raw pointers from unique_ptr etc.?

Btw, I've noticed some places still have the previous type style (same goes for files in compiler_tests/ and debugging/, but I don't think it's that important to be consistent there):

That said, if you have a particularly strong opinion, I can split it into a .cpp.

I think it'd be more consistent with existing files to split constexpr std::string_view ToString(TypeSpecifier type)

In terms of less intrusive changes to include, what are you thinking?

I'd say fixing memory leaks is definitely the most important to be merged into main. The other changes are less critical and they affect .cpp and .hpp, which I believe we should avoid modifying at this point to avoid confusion (while obviously pushing them to main_2024 and merging after the submissions as I do agree they're very valuable additions!).

simon-staal · 2024-03-07T23:11:14Z

Update:

NodePtr-ed everything (at the cost of adding some std::moves, definitely going to have to explain move semantics well now 😅. Although the new interface is definitely nicer / more correct, since it's very clear where ownership of resources is being transferred - I avoided this in my initial implementation to avoid the std::moves, but they may as well learn things properly.
Made ast lowercase
Not moving type_specifier functions into a .cpp file, as this is not possible for constexpr functions. I could make it non-constexpr but that's definitely worse, although I could be very easily convinced that the addition of move semantics is enough of a difficulty spike for students, and constexpr would confuse them too much.

simon-staal added 9 commits February 26, 2024 20:14

Switched to pragma

93dd1d5

Refactored use of raw pointers to std::unique_ptr, added const-ness, …

6463126

…created AST namespace

added comment and valgrind dependency

b779b0b

Removing asan from makefile as it doesn't work with valgrind

d8d7d80

Merge branch 'main' into refactor-skeleton

6bc7d1c

Solving memory leaks

f5a5f2f

Fixed memory leaks *properly*

53f775b

Added asan flags back to makefile

af4ba88

Renamed parser types

c2b5042

Fiwo735 self-requested a review March 5, 2024 15:31

Fiwo735 reviewed Mar 5, 2024

View reviewed changes

simon-staal added 2 commits March 7, 2024 22:45

Addressed some PR comments

6f51c7e

NodePtr *all* the things

44cea4b

simon-staal changed the base branch from main to main_2024 March 7, 2024 23:04

Updated NodePtr comment

94b30f4

Updated full parser example

29fa089

simon-staal requested a review from Fiwo735 March 7, 2024 23:38

simon-staal mentioned this pull request Mar 10, 2024

Fixed memory leaks in parser #23

Merged

simon-staal requested a review from Jpnock March 10, 2024 17:38

Fiwo735 approved these changes Mar 11, 2024

View reviewed changes

simon-staal merged commit 6c2b153 into LangProc:main_2024 Mar 11, 2024
1 check failed

simon-staal deleted the refactor-skeleton branch March 11, 2024 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor compiler skeleton #22

Refactor compiler skeleton #22

simon-staal commented Mar 4, 2024 •

edited

Loading

Fiwo735 left a comment •

edited

Loading

simon-staal commented Mar 5, 2024 •

edited

Loading

Fiwo735 commented Mar 6, 2024 •

edited

Loading

simon-staal commented Mar 7, 2024

Refactor compiler skeleton #22

Refactor compiler skeleton #22

Conversation

simon-staal commented Mar 4, 2024 • edited Loading

Fiwo735 left a comment • edited Loading

Choose a reason for hiding this comment

simon-staal commented Mar 5, 2024 • edited Loading

Fiwo735 commented Mar 6, 2024 • edited Loading

simon-staal commented Mar 7, 2024

simon-staal commented Mar 4, 2024 •

edited

Loading

Fiwo735 left a comment •

edited

Loading

simon-staal commented Mar 5, 2024 •

edited

Loading

Fiwo735 commented Mar 6, 2024 •

edited

Loading