-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Unicode support by utilizing PosixString and friends #88
base: master
Are you sure you want to change the base?
Conversation
Codec/Archive/Tar/Types.hs
Outdated
fromTarPathToWindowsPath :: MonadThrow m => TarPath -> m WindowsPath | ||
fromTarPathToWindowsPath tarPath = do | ||
let posix = fromTarPathToPosixPath tarPath | ||
toWindowsPath posix | ||
|
||
-- | We assume UTF-8 on posix and UTF-16 on windows. | ||
toWindowsPath :: MonadThrow m => PosixPath -> m WindowsPath | ||
toWindowsPath posix = do | ||
str <- PFP.decodeUtf posix | ||
win <- WFP.encodeUtf str | ||
pure $ WS.map (\c -> if WFP.isPathSeparator c then WFP.pathSeparator else c) win | ||
|
||
-- | We assume UTF-8 on posix and UTF-16 on windows. | ||
toPosixPath :: MonadThrow m => WindowsPath -> m PosixPath | ||
toPosixPath win = do | ||
str <- WFP.decodeUtf win | ||
posix <- PFP.encodeUtf str | ||
pure $ PS.map (\c -> if PFP.isPathSeparator c then PFP.pathSeparator else c) posix | ||
|
||
-- | We assume UTF-8 on posix and UTF-16 on windows. | ||
toPosixPath' :: MonadThrow m => OsPath -> m PosixPath | ||
#if defined(mingw32_HOST_OS) | ||
toPosixPath' (OsString ws) = toPosixPath ws | ||
#else | ||
toPosixPath' (OsString ps) = pure ps | ||
#endif | ||
|
||
-- | We assume UTF-8 on posix and UTF-16 on windows. | ||
fromPosixPath :: MonadThrow m => PosixPath -> m OsPath | ||
#if defined(mingw32_HOST_OS) | ||
fromPosixPath ps = OsPath <$> toWindowsPath ps | ||
#else | ||
fromPosixPath ps = pure $ OsString ps | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the main conversion functions. As we can see... we leave posix filepaths untouched, but assume UTF-8 encoding when converting from posix filepaths (e.g. those coming from the actual tar archive) to windows, where we assume UTF-16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tar spec obviously demands PosixPath
where we don't assume an encoding. So all filepaths within the tar archive are posix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense.
It seems this is safe, because hackage-server uses |
4ed6bfc
to
35ca6b0
Compare
Thanks, that's great! My current intention is to release the current
There is also an option for a middle ground: break this PR into two. One to change low-level interfaces to use
@hasufell what do you think? Are you interested in splitting the PR into two phases? That's obviously a massive amount of additional work, which we can avoid by delaying entire |
I'm fine with delaying |
35ca6b0
to
62794b9
Compare
b66f5cc
to
d94a988
Compare
62794b9
to
f366d67
Compare
@Bodigrim I rebased. It's possible I made a bit of a mess or there are redundant functions. |
f3675c2
to
28aa81c
Compare
I tried hard to avoid |
This is still blocked from doing a proper hackage release due to Win32: haskell/win32#226 (comment) |
28aa81c
to
1f4feb6
Compare
@hasufell Windows failures seem genuine. |
Yeah, on my todo list. |
e1e02d8
to
bd8cae7
Compare
1299d00
to
f6ae02c
Compare
Fixes #78
TODO: