Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add functionality to convert windows paths (all shapes and sizes) to loc values #1942

Merged
merged 47 commits into from
May 22, 2024

Conversation

jurgenvinju
Copy link
Member

@jurgenvinju jurgenvinju commented May 10, 2024

Rethinking this based on feedback from @DavyLandman ; the current PR splits the responsibility of interpreting Windows file paths into a syntactical and a semantic part. Also it supports all shapes and sizes, including UNC paths, and does not normalize automatically.

  • syntax definition in lang::paths::Windows reverse engineered from docs and open-source path parsers in Java and C#; this is the syntactic part.
  • simple syntactic convertor function in lang::paths::Windows deals with the 5 kinds of paths, mapping them to file:///, cwd:///, cwdrive:/// and unc:/// paths
    • warns about illegal paths via a parse error (e.g. trailing "." or whitespace)
    • interprets different prefix notations C:, C:\, \, \\ mapping them to different schemes
    • skips all additional slashes or backslashes, as it should
    • does no decoding of the path literals, but everything is encoded for URI syntax automatically. E.g. file:///C:/Program%20Files is the result of converting C:\Program Files\
  • For the semantic interpretation (actual file IO), we've had to add two new resolvers to the team:
    • |cwdrive:///| refers to the root of the file system of the current working directory. On Windows that would be D:/ or C:/ for example, and on Unix it is always /. Not useful for unix, but essential to be able to encode the \filename notations of Windows paths.
    • |unc://<machine>/<share>/<path>| is mapped from \\<machine>\<share>\<path> and back, such that UNC paths actually work on Windows systems. This only works on Windows systems. We could have used the empty authority position of |file:///|, however that provides a false sense of portability that we do not have. |unc:///| literals work as identifiers on any system, but only for file IO on Windows systems.
      • Note that |unc:///| paths have the annoying property that machine names are valid only for the local network, so they are just as machine-specific as the file:/// scheme is.
      • The actual implementation derives directly from the file:/// scheme because new File and friends from the Java SDK implement the UNC scheme natively.

Added some tests to this effect, but have to run them on Windows.

…ng on each different OS filesystem type, including documentation
@jurgenvinju jurgenvinju requested a review from DavyLandman May 10, 2024 11:53
Copy link

codecov bot commented May 10, 2024

Codecov Report

Attention: Patch coverage is 38.88889% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 49%. Comparing base (178557c) to head (55fc9de).
Report is 20 commits behind head on main.

Files Patch % Lines
src/org/rascalmpl/uri/file/UNCResolver.java 28% 10 Missing ⚠️
src/org/rascalmpl/uri/file/FileURIResolver.java 0% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##              main   #1942    +/-   ##
========================================
  Coverage       49%     49%            
- Complexity    6206    6248    +42     
========================================
  Files          662     664     +2     
  Lines        59446   59464    +18     
  Branches      8618    8621     +3     
========================================
+ Hits         29189   29333   +144     
+ Misses       28060   27937   -123     
+ Partials      2197    2194     -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is not ready. Especially for Windows the cases are not thought out.

We need at least some test that show how the library is supposed to work.

src/org/rascalmpl/library/Location.rsc Outdated Show resolved Hide resolved
src/org/rascalmpl/library/Location.rsc Outdated Show resolved Hide resolved
src/org/rascalmpl/library/Prelude.java Outdated Show resolved Hide resolved
src/org/rascalmpl/library/Prelude.java Outdated Show resolved Hide resolved
src/org/rascalmpl/library/Prelude.java Outdated Show resolved Hide resolved
src/org/rascalmpl/library/Prelude.java Outdated Show resolved Hide resolved
@jurgenvinju

This comment was marked as outdated.

@jurgenvinju

This comment was marked as outdated.

@DavyLandman

This comment was marked as outdated.

@jurgenvinju

This comment was marked as outdated.

@jurgenvinju
Copy link
Member Author

Found the exact code that we are looking for here: https://github.com/frohoff/jdk8u-jdk/blob/master/src/windows/classes/sun/nio/fs/WindowsPathParser.java

That's GPL licensed. So these three algorithms (parser and normalize and finally toURI) should be mapped exactly to Rascal to get a very reasonable mapping. That should also guarantee that access of these files via the file:/// scheme works as expected. If that can not be achieved then different protocols should be introduced, like unc:/// that can interpret the encoding back to where it came from. I'm hoping for now the file scheme will do, however.

@jurgenvinju jurgenvinju marked this pull request as draft May 12, 2024 12:18
@jurgenvinju jurgenvinju changed the title added new function for converting string paths to loc values, depending on each different OS filesystem type, including documentation add functionality to convert windows paths (all shapes and sizes) to loc values May 13, 2024
@jurgenvinju
Copy link
Member Author

Rewrote from scratch; worked all the feedback from @DavyLandman into it, I believe. Also I'm focusing only on Windows paths from now on. With the current design we can add support for other Path syntaxes incrementally and in different modules.

@jurgenvinju jurgenvinju requested a review from DavyLandman May 13, 2024 15:01
@jurgenvinju
Copy link
Member Author

I need a little help on the UNC DOS test however. That I can not debug without a running system.

@DavyLandman
Copy link
Member

DavyLandman commented May 15, 2024

At the very least the grammar has to be updated, as : is a legal char in a DOSDevicePath. (the other UNC test I fixed, and works now)

But I'm open to having a pair programming session.

@jurgenvinju
Copy link
Member Author

Strange; in my documention people always write $ insteaf of : because : is not allowed. I'll dig into this again.

@DavyLandman
Copy link
Member

DavyLandman commented May 15, 2024

Strange; in my documention people always write $ insteaf of : because : is not allowed. I'll dig into this again.

The DOS Device path section on msdn is quite "good" in the coverage I think.

To recap:

\\localhost\c$

is a UNC Network path, pointing towards to a network share named C$. This is a default networkshare that windows makes (and maps onto the root c drive. but it could also have been \\localhost\share-for-jurgen. The $ ones are non-user defined shares.

\\(.|?)\...

Are UNC DOS Device paths, offer access to all the local devices. so \\.\c:\ is the c drive, but it can also be accessed as \\.\BootPartition. Now since the network UNC is also a network device, you can acces that as well via \\.\UNC\<original network path>.

…sallowing . and .. as the first path of a DOS Device UNC path. Also removed normalization test since the parser is not going to normalize anything. Too many ifs and buts that need to be dealt with carefully for all kinds of filesystems.
@jurgenvinju
Copy link
Member Author

Right. Thanks. I think I've fixed this for now. Would be great if Microsoft posts RFC's for this kind of stuff. Now I combine reading source code and reading documentation :-) With your help it is becoming a lot better.

I removed the normalization tests. All filesystem notation normalization is going to be deferred to another library function, because this is a complex matter. So for now c:\ and C:\ will produce different loc values pointing to the same drive.

@jurgenvinju
Copy link
Member Author

Unfortunately \\.\BootPartition is not available on the test machine of github actions. otherwise that would be a nice test as well. The Admin share is there however, so we can make tests for that.

@jurgenvinju jurgenvinju marked this pull request as ready for review May 22, 2024 15:26
@DavyLandman
Copy link
Member

LGTM, thanks for all the effort of getting it right on Windows.

@jurgenvinju jurgenvinju merged commit 0b0b547 into main May 22, 2024
6 of 7 checks passed
@jurgenvinju jurgenvinju deleted the loc-from-filesystem-converter branch May 22, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants