-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO Abstraction: Introducing PathId & FileSystem (Phase 1) #3219
Conversation
NOTE: This is only the first phase of the large goal of IO abstraction. Big picture goal is to abstract IO operations out of Surelog and into a class implementation that can be overriden by client application. This facilitates implementation of other more extensive features (like support for UNC paths, portale cache, multi-source root directories, cache archives, etc). Adding those directly to Surelog is not ideal since in many cases this features pull in other thrid-party dependecnies that might not have the same licensing terms as Surelog (and so leave it to the client/application to decide). Introducing two new classes - PathId & FileSystem PathId - New class parallel to SymbolId that represent path-like objects for IO operations. Each instance can represent a directory or a file but the internal representation doesn't necessarily have to be a _path_. For instance, in case of compressed archive of sources, it can be a string representation of the archive + the offset. The only enforcement is that the representation needs to be *printable* i.e. should be convertible to a string (since thee representation is stored in SymbolTable). For the advanced users. this limitation can very easily be overcome and is left as an exercise. FileSystem - This reroutes access to native file system or in special advanced cases can restrict access to the file system. All operations of PathId are meant to be routed thru' this class. This commit contains only the bare-bones of the implementation (to keep the rest of the change review-able). Follow up commit will add the real meat to this implementation. Most of this change is about switching from SymbolId to use of PathId where applicable. Some specific ones that may be of interest to the reviewer. * Restrict use of std::filesystem to minimal scope (functions where possible). Do NOT pass std::filesystem::path across function boundaries. Use PathId instead. * Do NOT cache unpinned strings, especially the ones that represent paths. Add them to the SymbolTable and resolve it on restore. * PreprocessFile::m_fileId used to represent both a file path (when processing a file) or a macro name (when processing a macro). This was a problem since now the path needs to be represented using PathId but the macro name is represented using SymbolId. So, added two independent variables PreprocessFile::m_macroId & PreprocessFile::m_fileId.
@alaindargelas This PR pairs with chipsalliance/UHDM#793 |
@alaindargelas @hzeller Since this is only the first phase of the bigger change, the attached files should help provide larger picture of the end goal. FileSystem.h.txt A follow up phase 2 will introduce these files and all relevant changes to go with it across the project. |
There are absolutely no unit-tests that verify the new path-ID behavior, nor are there any extension to existing unit tests to verify corner cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first round of review comments.
Unfortunately, this one is already merged, but before any other pull request can be added in this series, we first need to work out all the details here.
In general, this is too much and not reviewable, you really need to break independent changes to what they are
- DesignComponent: change return value of name() from
const std::string&
tostd::string
can be a separate PR. In particular it would be then easy to discuss why this is notstd::string_view
and work towards what it is - PathId should start out as a no-op: have a typedef SymboldID to PathId, then change everything that needs to be a path from SymbolId to Pathid. Huge text change, but easy to review, no actual change.
- All new classes need a description of what they are. Also naming:
FileSystem
sounds like a poor name for what it does, or it is not explained enough. - All new classes need unit tests.
- Why does something that is called SometingId have a reference to the thing that supplies these IDs ? This sounds like a convenience hack.
Here are a first round of comments that we should address before the next phase of the filesystem series.
Whenever you work on changes, please make sure to break them up in simple steps
- big changes only in a way that they are No-ops first, e.g. with the
typedef
of SymboldId -> NodeId. - introducing new classes one at a time with proper unit tests
- Wire new classes up as needed in separate changes.
return *this; | ||
} | ||
|
||
const SymbolTable *getSymbolTable() const { return m_symbolTable; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear why a path-ID needs to know about the SymbolTable. It sounds like a leaky abstraction or 'convenience' member.
Ideally, this should be removed. Anything called 'ID' should not not carry more context around. Or it needs to be clearly explained in the PathId class description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intentional. It makes the implementation of FileSystem API lot easier and elegant. Please check the full implementation of FileSystem in files attached here
I did bring up this design decision in #3054 and we had some back and forth before it went silent. For SymbolId this might not be as necessary since the use of it is restricted to Surelog code. But for PathId, it is meant to be accessed and used in application code. Putting the overhead of managing the owner SymbolTable on client application seemed unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaking abstraction often makes implementation easier, but it doesn't mean it is the right thing.
I'll have a look at the CC file, but it currently smells like a convenience hack.
virtual const std::filesystem::path &getCwd(); | ||
virtual PathId getCwd(SymbolTable *const symbolTable); | ||
|
||
virtual PathId copy(PathId id, SymbolTable *const toSymbolTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a class called FileSystem
, a function called copy()
copies files around, right ?
If not, this needs a different name, and also, it needs a description.
Also, don't make the parameter const
, it makes absolutely no difference (as you const
the value, not the point location), and thus is just confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a class called FileSystem, a function called copy() copies files around, right ?
Good point. I didn't think along those lines. Do you have a suggestion of what this function should be renamed to. The purpose is to copy a PathId from one SymbolTable into another (giving the FileSystem a chance for any bookkeeping it might need to do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which contexts is this used ? Maybe duplicatePath()
could be a good name ?
What are the scenarios where this will be used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Location
takes in a PathId object but the owner of this object could be whoever. When an error is recorded with the ErrorContainer, it copies the contained PathId into its own SymbolTable.
In some ways, the name isn't so bad. FileSystem is responsible for everything PathId and so copy
with PathId as parameter can be a potential overloaded function.
PathId copy(PathId srcId, SymbolTable *destinationSourceTable); // Copies srcId into destinationSymbolTable
bool copy(PathId srcId, PathId dstId); // Copies files from location srcId to dstId
PathId is a data container class not intended to have any logic. There are few operator functions that might be worth some test cases that I can add. As for the FileSystem itself, yes, I do have uni tests covering most file operations and example to illustrate In-memory file system, portable cache and multiple source directories. These will be part of the follow up Phase-3 PR. |
To be honest, this isn't one night effort. This was a group effort from over three months. We did a number of design iterations for deciding how we wanted to implement some of the requested features in our tool. The change was organic but the effort paid off. I am cherry-picking from a much larger change in an effort to get this back into public repository. Frankly, I thought that's what this change was - a big text change and doesn't change anything in terms of logic. No test case broke and no test log really change in any significant meaningful manner. All this change was doing is introducing PathId class in the codebase and fixing all the fallouts from it.
That approach wouldn't have caught all the places that needed attention. I did do a few PR before this one for the changes that I could easily pull out from the larger effort. But breaking this into any smaller would have meant redoing (and verifying) a lot of work that was already completed.
And I wouldn't have known that such a change would be needed unless I had the PathId change in place.
Sure. Will do so.
PathId meant to be a data class and so didn't see much value in adding tests. There are few operator functions (comparison operators) that might demand such and so will add those in a follow up PR.
I have already replied back to your comment about this on the review. We can discuss this further there. IMO, PathId containing the owner SymbolTable makes sense to keep the API simpler. May be PathId shouldn't be called PathId.
This is good rule book to follow when you have a concrete design in front of you to implement. Much of our effort is organic and so I try my best to contribute changes back. It's not always that easy. All that said, I am happy to iterate on specific PR if you feel that needs more work and needs to be broken up. Unfortunately. given time constraints, it might not always be possible but will make best effort. |
…rnings * Fixed build warnings and also updated the workflow to match local settings so the warnings should now be reported on CI as well. * Add unit tests for PathId * Some documentation & decorative changes
Follow up PR #3223 |
…rnings * Fixed build warnings and also updated the workflow to match local settings so the warnings should now be reported on CI as well. * Add unit tests for PathId * Some documentation & decorative changes
…rnings * Fixed build warnings and also updated the workflow to match local settings so the warnings should now be reported on CI as well. * Add unit tests for PathId * Some documentation & decorative changes
But that is how code reviews and pull requests work. Sure, you might not know beforehand what changes, that is why you work on a full change first until you're happy with it. The work then to make it reviewable is to break it up into the parts. Find all the changes independent from the rest, but that you also need for everything coming together, that can have their own review. Such as the Yes, it requires a little bit more work to prepare a pull request, but that way it is not just a code-dump, but a well thought-out sequence of design decisions that the reviewers can follow. This code review contains 116 changed files! With a mix of functionality addition, text changes, return value changes... Any functionality change should be as minimal as possible (a few files and all the corresponding unit tests). Followed by a separate PR or at least commit that wires it up.
That is the point: you first develop things and find out what all is needed. That is for sure.
Please do. I am not saying that it is simple. And often it requires to re-do the change manually into multiple commits. |
Taking the PathId conversation to discussion board #3228 |
IO Abstraction: Introducing PathId & FileSystem (Phase 1)
NOTE: This is only the first phase of the large goal of IO abstraction.
Big picture goal is to abstract IO operations out of Surelog and into a class implementation that can be overriden by client application. This facilitates implementation of other more extensive features (like support for UNC paths, portale cache, multi-source root directories, cache archives, etc). Adding those directly to Surelog is not ideal since in many cases this features pull in other thrid-party dependecnies that might not have the same licensing terms as Surelog (and so leave it to the client/application to decide).
Introducing two new classes - PathId & FileSystem
PathId - New class parallel to SymbolId that represent path-like objects for IO operations. Each instance can represent a directory or a file but the internal representation doesn't necessarily have to be a path. For instance, in case of compressed archive of sources, it can be a string representation of the archive + the offset. The only enforcement is that the representation needs to be printable i.e. should be convertible to a string (since thee representation is stored in SymbolTable). For the advanced users. this limitation can very easily be overcome and is left as an exercise.
FileSystem - This reroutes access to native file system or in special advanced cases can restrict access to the file system. All operations of PathId are meant to be routed thru' this class. This commit contains only the bare-bones of the implementation (to keep the rest of the change review-able). Follow up commit will add the real meat to this implementation.
Most of this change is about switching from SymbolId to use of PathId where applicable. Some specific ones that may be of interest to the reviewer.