Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Unicode support by utilizing PosixString and friends #88

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hasufell
Copy link
Member

@hasufell hasufell commented Dec 11, 2023

Fixes #78

TODO:

@hasufell hasufell requested a review from Bodigrim December 11, 2023 15:42
@hasufell hasufell marked this pull request as draft December 11, 2023 15:46
Comment on lines 409 to 465
fromTarPathToWindowsPath :: MonadThrow m => TarPath -> m WindowsPath
fromTarPathToWindowsPath tarPath = do
let posix = fromTarPathToPosixPath tarPath
toWindowsPath posix

-- | We assume UTF-8 on posix and UTF-16 on windows.
toWindowsPath :: MonadThrow m => PosixPath -> m WindowsPath
toWindowsPath posix = do
str <- PFP.decodeUtf posix
win <- WFP.encodeUtf str
pure $ WS.map (\c -> if WFP.isPathSeparator c then WFP.pathSeparator else c) win

-- | We assume UTF-8 on posix and UTF-16 on windows.
toPosixPath :: MonadThrow m => WindowsPath -> m PosixPath
toPosixPath win = do
str <- WFP.decodeUtf win
posix <- PFP.encodeUtf str
pure $ PS.map (\c -> if PFP.isPathSeparator c then PFP.pathSeparator else c) posix

-- | We assume UTF-8 on posix and UTF-16 on windows.
toPosixPath' :: MonadThrow m => OsPath -> m PosixPath
#if defined(mingw32_HOST_OS)
toPosixPath' (OsString ws) = toPosixPath ws
#else
toPosixPath' (OsString ps) = pure ps
#endif

-- | We assume UTF-8 on posix and UTF-16 on windows.
fromPosixPath :: MonadThrow m => PosixPath -> m OsPath
#if defined(mingw32_HOST_OS)
fromPosixPath ps = OsPath <$> toWindowsPath ps
#else
fromPosixPath ps = pure $ OsString ps
#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the main conversion functions. As we can see... we leave posix filepaths untouched, but assume UTF-8 encoding when converting from posix filepaths (e.g. those coming from the actual tar archive) to windows, where we assume UTF-16.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar spec obviously demands PosixPath where we don't assume an encoding. So all filepaths within the tar archive are posix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense.

@hasufell
Copy link
Member Author

@Bodigrim
Copy link
Contributor

Thanks, that's great!

My current intention is to release the current master branch as tar-0.6 and postpone OsPath transition to tar-0.7. That's because:

  • I want to keep FilePath-based high-level API for a while, so that clients of tar-0.5 can upgrade quickly and painlessly. Otherwise many of them will be stuck with an unreliable version.
  • I do not want to push build plans to the very bleeding edge of the ecosystem (Win32 / unix / directory). Otherwise tar-0.6 would not be part of Stackage LTS for a very long time.

There is also an option for a middle ground: break this PR into two. One to change low-level interfaces to use PosixString (targeting tar-0.6), and another one to upgrade high-level API (postponed for tar-0.7). The first one would:

  • Change TarPath to TarPath PosixPath PosixPath.
  • Change things like encodeLongNames to work on GenEntry PosixPath PosixPath.
  • But things like pack / unpack still take FilePath.
  • Support for Unicode filenames should be feasible.
  • Depend filepath-1.4.100 is fine, but no further.

@hasufell what do you think? Are you interested in splitting the PR into two phases? That's obviously a massive amount of additional work, which we can avoid by delaying entire OsPath transition until tar-0.7.

@hasufell
Copy link
Member Author

I'm fine with delaying

@hasufell hasufell force-pushed the unicode-posixstring branch from 35ca6b0 to 62794b9 Compare December 12, 2023 08:01
@hasufell hasufell mentioned this pull request Dec 14, 2023
@Bodigrim Bodigrim force-pushed the master branch 2 times, most recently from b66f5cc to d94a988 Compare December 19, 2023 01:00
@hasufell hasufell force-pushed the unicode-posixstring branch from 62794b9 to f366d67 Compare January 6, 2024 13:05
@hasufell
Copy link
Member Author

hasufell commented Jan 6, 2024

@Bodigrim I rebased. It's possible I made a bit of a mess or there are redundant functions.

@hasufell hasufell force-pushed the unicode-posixstring branch 4 times, most recently from f3675c2 to 28aa81c Compare January 7, 2024 11:23
@hasufell
Copy link
Member Author

hasufell commented Jan 7, 2024

I tried hard to avoid MonadThrow now, so there's some unsafePerformIO and HasCallStack.

@hasufell hasufell marked this pull request as ready for review January 7, 2024 11:26
@hasufell
Copy link
Member Author

hasufell commented Jan 7, 2024

This is still blocked from doing a proper hackage release due to Win32: haskell/win32#226 (comment)

@hasufell hasufell force-pushed the unicode-posixstring branch from 28aa81c to 1f4feb6 Compare January 7, 2024 11:33
@Bodigrim
Copy link
Contributor

@hasufell Windows failures seem genuine.

@hasufell
Copy link
Member Author

Yeah, on my todo list.

@Bodigrim Bodigrim force-pushed the master branch 26 times, most recently from e1e02d8 to bd8cae7 Compare September 22, 2024 16:06
@Bodigrim Bodigrim force-pushed the master branch 2 times, most recently from 1299d00 to f6ae02c Compare November 27, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert to PosixPath
2 participants