-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify “letters” in spec #682
Comments
cc: @f-f on this, as I don't personally have an opinion. As far as I know the intention is to accept lower-cased Unicode characters in UTF-8, minus some special characters, but perhaps it was meant to be ASCII all along. |
There are some explicit test cases here: registry-dev/lib/test/Registry/PackageName.purs Lines 66 to 84 in 4cd57a1
|
I hope there isn’t like a filesystem limitation. It would be awesome if the support was there so you could have |
Regardless, I think there is room to spell out the English in the spec a bit more clearly. …And add some clarifying test cases like |
Actually I think there would be something important to bring up: Unicode confusion. This is Latin If I were to suggest a solution I would look at the (now defunct sad) ocaml-m17n.
Which in this case would mean each kebab-separated ‘word’ can only be of one script type--which is now adding quite a bit of complexity with the trade-off being safety. |
So, the proposed goal here would be to
I'd also guess that another goal is to make it possible to type such a package name for all users. For example, a user with a Latin keyboard may not know how to type a CJK character whereas the reverse is often true There also seems to be a spectrum for how to deal with this issue:
|
There are several barrels of worms that would be cracked open by allowing the whole of unicode in these names - no one can type them directly, a series of attack vectors, etc. The original intent was to only allow ASCII, and I strongly feel we should stick to that since I don't believe the upside of more varied package names is worth the hassle here. |
@f-f @JordanMartinez I’m happy y’all got back since it does seem this is a can of worms indeed & should be clearer in the spec docs. Personally, I like the idea of losing American-centrism (the ‘A’ in ASCII)--even if it still limited to bicameral scripts. Deburring That said, I can understand favoring the simplicity (at least for now). In the case of OPAM, apparently can only upload packages to the registry named in ASCII, however, this shouldn’t be a limitation on OCaml package names that don’t plan to be added to the main public registry (tho in practice Which goes out even further on a tangent: is PureScript Registry project even capable of allowing private registries? Are there mirrors for the public registry for resilience like in the Perl community? (I think about this a lot after seeing various GitHub & SourceHut outages). |
I'm not a native English speaker - Italian living in Finland, so I get to use all the various
Spago uses the same package-name parsing code as the Registry, so you wouldn't be able to build your project with it. Other things might work I guess?
Private registries and mirrors are not the same thing - Spago allows for private package sets because everyone needs custom packages at some point, but we purposefully did not choose to allow for private registries to avoid incentivising fragmentation. E.g. I'm not a fan of the OCaml situation, where every big company has their own compiler and package ecosystem. Mirrors of the official registry instead have been accounted for, and are on the roadmap. I do not recall if the spec contains any mention of this yet, but the original RFC calls this "storage backends" |
Sounds like we at least have some new test cases + descriptions to add regardless 😅 |
Are uppercase letters allowed? Is lower
é
a letter? Is lowerα
? Isก
? Isㅎ
? Is🍉
?https://pursuit.purescript.org/packages/purescript-parsing/10.2.0/docs/Parsing.String.Basic#v:lower
I’m assuming using
lower
allows anything from a bicameral script (just like how PureScript identifiers work). Asking because I got really frustrated by OCaml+OPAM’s limitations to ASCII only for packages/identifiers/modules, & this spec isn’t clear on what ’letter’ means.The text was updated successfully, but these errors were encountered: