Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor compiler skeleton #22

Merged
merged 13 commits into from
Mar 11, 2024
Merged

Refactor compiler skeleton #22

merged 13 commits into from
Mar 11, 2024

Conversation

simon-staal
Copy link
Collaborator

@simon-staal simon-staal commented Mar 4, 2024

Changes:

  • Replaced old-style #ifndef header guards with #pragma once
  • Wrapped nodes in AST namespace - it's in general good practice to do this in projects, and I think it'll help encourage students to do this without being too confusing for them.
  • Replaced Node* stored by nodes to NodePtr = std::unique_ptr<const Node>. This change incorporates a couple elements:
    1. Immutability of AST: The AST is constructed by the parser, and all the information a node needs is available once it's constructed. making everything const reduces the chances of programmer error
    2. Switch from manual memory management to RAII: Using new and delete manually is very C-like (/ pre C++11). Although it's all that students are taught in Y1, I think understanding what a std::unique_ptr is is relatively straightforward, and encourages students to use a more modern interface.
    3. Aliasing node storage type: by defining NodePtr as an alias for the stored type, it not only saves on characters, but makes it much easier for students to switch this type out with something else (like raw pointers) if they find the use of std::unique_ptr too confusing.
  • TypeSpecifier is no longer a Node - this makes sense imo because just a type specifier is never enough information for you to produce assembly, meaning that EmitRISC(...) will always have an empty / throwing implementation. Also switched to using an enum class instead of a std::string for hopefully obvious reasons.
  • Removed branches_ from Node - see here for reasoning
  • Fixed memory leaks - see Memory leaks in compiler skeleton #21. I'll probably make a separate PR into main to fix this since it's minimally invasive and it's kinda unfair to assess students on not leaking memory if we give them a leaky base.
    • Added valgrind to the docker env to make it easier for students to use it
    • Add docs explaining how to check for memory leaks
    • Explicitly mention memory management as part of the 10% of marks given for "nice code"
  • [nit] (only style change) Changed types from T & to T& - I think it's much nicer to have the type information all together rather than separated by whitespace (sorry, I couldn't resist)

Other considered changes:

  • Adding more intermediate abstract classes - in the future, students may need to get information from special types of nodes, for example, getting the name of an identifier. Adding a virtual method to their base node class would be bad practice, as would dynamic casting the member, it would be better for the relevant class to store the derived node type directly / a more specialised abstract class (i.e. Expression). Unfortunately, with the basic skeleton we provide, there's no really meaningful intermediate abstract classes we can really provide, for example, we could make Return store a pointer to an Expression instead of a general node, but currently an Expression and a Node are exactly the same (it would also make it more difficult for them to switch out the pointer implementation they're using). In general, I'm hoping that TypeSpecifier is enough to show them that it's fine for different Nodes to store things which aren't Nodes / it's fine for them to add more types to their parser.
  • Reducing copying using move semantics - when constructing things like Identifier, it would be best practice to use move semantics to avoid copying the string unnecessarily. However, move semantics / r-values is definitely something I'd classify as a more advanced topic, and since we're not really concerned about performance, I don't think there's enough upside to include it to justify the extra burden on the students.

@Fiwo735 Fiwo735 self-requested a review March 5, 2024 15:31
Copy link
Collaborator

@Fiwo735 Fiwo735 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome changes! I have some comments below:

  • I also like T&/T* more than T &/T *.
  • #pragma once seems like a less-error prone concept, but StackOverflow is still divided - do you have some insights from industry? The companies I worked for still used classic header guards.
  • Namespace names should be lowercase according to Google Style Guide that we are following.
  • NodePtr usage is inconsistent - there are still many places that use Node*.
  • I like the use of unique_ptr - however, it might be a little tricky, so it should definitely be explained well in the docs.
  • New implementation of type_specifier is a bit advanced, but I do like the changes. Hopefully we can assume that people implementing more types than int would be proficient enough to understand the used constructs.
  • I think type_specifier should still be split into .hpp and .cpp as currently all the implementation is in .hpp.
  • emplace_back needs a mention in the docs along with move semantics in general as it's certainly an advanced topic.

It'd be good to split the less intrusive changes that you suggest into a separate PR, so that it can get merged into main ASAP, while the rest is going to wait on a separate branch.

@simon-staal
Copy link
Collaborator Author

simon-staal commented Mar 5, 2024

To address your points:

#pragma once seems like a less-error prone concept, but StackOverflow is still divided - do you have some insights from industry? The companies I worked for still used classic header guards.

From my experience in industry, #pragma once is basically universally seen as a much better solution than the old school header guards. Although some of the comments in the thread you link claim that #pragma once has "unfixable bugs", I think that in practice you're much more likely to have issues with the old header guards, by accidentally re-using names, either by error or due to libraries you include. It's well-supported by basically all the big compilers you'd use.

Namespace names should be lowercase according to Google Style Guide that we are following.

sadge. will update

NodePtr usage is inconsistent - there are still many places that use Node*.

Not 100% sure I follow here. I'm pretty sure I use NodePtr everywhere in the AST files. If you're talking about the AST constructors / parser, unfortunately, flex / bison are pretty tied to a pre C++11-style API, and don't work well with unique_ptr as it tries to copy it around (which is illegal) so using raw pointers there, and then allowing your ast to own this memory by storing them internally as unique_ptrs is the best we can do imo.

New implementation of type_specifier is a bit advanced, but I do like the changes. Hopefully we can assume that people implementing more types than int would be proficient enough to understand the used constructs.

Given that they have to compile enums and switch statements, I'm sure we can trust them to figure out what an enum class is 😉. It's just a (safer) enum wrapped in a namespace after all.

I think type_specifier should still be split into .hpp and .cpp as currently all the implementation is in .hpp.

Given that type_specifier itself is just an enum class, and that the helper functions provided are relatively simple, I think it's fine (maybe even better) to leave it all in .hpp (the functions are constexpr, which solves any possible ODR issues). That said, if you have a particularly strong opinion, I can split it into a .cpp.

emplace_back needs a mention in the docs along with move semantics in general as it's certainly an advanced topic.

Can also touch on it, currently using it to ensure that NodeList builds without issues for both smart and raw pointers.

In terms of less intrusive changes to include, what are you thinking? I could add (in order of least to most intrusive):

  • Fix memory leaks
  • Formatting changes
  • NodePtr (but set to const Node*or evenNode*` if you think that's too much
  • New type_specifier (a bit more invasive but quite valuable imo)
  • namespace ast? (not sure about this one)

@Fiwo735
Copy link
Collaborator

Fiwo735 commented Mar 6, 2024

Not 100% sure I follow here. I'm pretty sure I use NodePtr everywhere in the AST files. If you're talking about the AST constructors / parser, unfortunately, flex / bison are pretty tied to a pre C++11-style API, and don't work well with unique_ptr as it tries to copy it around (which is illegal) so using raw pointers there, and then allowing your ast to own this memory by storing them internally as unique_ptrs is the best we can do imo.

Oh, so that means that all of these have to stay as Node*?
image

If so, then I think this is going to be too confusing for students:
image

Is there any way around it? Like some conditional NodePtr hackery for flex/bison or extracting raw pointers from unique_ptr etc.?

Btw, I've noticed some places still have the previous type style (same goes for files in compiler_tests/ and debugging/, but I don't think it's that important to be consistent there):
image

That said, if you have a particularly strong opinion, I can split it into a .cpp.

I think it'd be more consistent with existing files to split constexpr std::string_view ToString(TypeSpecifier type)

In terms of less intrusive changes to include, what are you thinking?

I'd say fixing memory leaks is definitely the most important to be merged into main. The other changes are less critical and they affect .cpp and .hpp, which I believe we should avoid modifying at this point to avoid confusion (while obviously pushing them to main_2024 and merging after the submissions as I do agree they're very valuable additions!).

@simon-staal simon-staal changed the base branch from main to main_2024 March 7, 2024 23:04
@simon-staal
Copy link
Collaborator Author

Update:

  • NodePtr-ed everything (at the cost of adding some std::moves, definitely going to have to explain move semantics well now 😅. Although the new interface is definitely nicer / more correct, since it's very clear where ownership of resources is being transferred - I avoided this in my initial implementation to avoid the std::moves, but they may as well learn things properly.
  • Made ast lowercase
  • Not moving type_specifier functions into a .cpp file, as this is not possible for constexpr functions. I could make it non-constexpr but that's definitely worse, although I could be very easily convinced that the addition of move semantics is enough of a difficulty spike for students, and constexpr would confuse them too much.

@simon-staal simon-staal requested a review from Fiwo735 March 7, 2024 23:38
@simon-staal simon-staal requested a review from Jpnock March 10, 2024 17:38
@simon-staal simon-staal merged commit 6c2b153 into LangProc:main_2024 Mar 11, 2024
1 check failed
@simon-staal simon-staal deleted the refactor-skeleton branch March 11, 2024 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants