Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature-request: wide-character support (UTF-8) #13

Open
Tux opened this issue Mar 17, 2015 · 7 comments
Open

feature-request: wide-character support (UTF-8) #13

Tux opened this issue Mar 17, 2015 · 7 comments

Comments

@Tux
Copy link
Contributor

Tux commented Mar 17, 2015

Currently, elvis does show multi-byte characters as multiple bytes

» shows as »

it feels like betrayal to use gvim instead of elvis to be able to edit these files

@mbert
Copy link
Owner

mbert commented Mar 17, 2015

Yes, that's probably elvis' biggest shortcoming now.
But that's nothing that I can fix. I tried it. But I was not successful.
That's probably something Steve could do better than anyone else, but it looks like he currently does not have the time to do so.

@simplehex
Copy link

Elvis will get a second life if somebody could devote a while on it. Lack of UTF support slowly kills our beloved, as people turn to vim consequently...

@mbert
Copy link
Owner

mbert commented Feb 24, 2016

Agreed. Just somebody will have to do it.

@ib
Copy link
Contributor

ib commented Jul 23, 2020

I have recently spent some time with the source code under this perspective and think that the necessary changes will affect larger parts of Elvis.
Although now and then characters are already stored in int (which would be sufficient for a UTF-8 encoding), major parts (especially the basic ones) are quite char based.

So what are the possibilities to edit files with UTF-8 encoded contents with Elvis?

You are using a UTF-8 terminal:

You will certainly use a UTF-8 font as well. Start Elvis with set nonascii=all, and there should be no problem to read, input and display any non-ASCII character.

However, because UTF-8 coded characters can be two, three or four bytes long, there is a kind of trailing whitespace with these, because Elvis displays less bytes than received. A screen refresh (^R in input mode, ^L else) corrects the display, but the line will remain longer than the amount of displayed characters. This is annoying but should not have too negative effects.

All reading, input and writing will be in UTF-8 mode.

You are using an ISO terminal:

You were screwed, so far, but there is an experimental patch that allows UTF-8 to ISO conversion - back and forth! (For the moment, ISO 8859-1 only, but if there is interest, I am willing to change this to support other or even all ISO-8859 encodings.)

The nonascii option can be set to convert and will change UTF-8 encoded input to ISO encoding. This will affect all data reading, but only the way the characters are displayed. They remain actually stored with their true UTF-8 value. (Again, there is a kind of trailing whitespace, because Elvis displays less bytes than received; see above.)

Any ISO character input will be converted into UTF-8 encoding (but still displayed in ISO encoding), so that UTF-8 encoded files can be edited without violating their encoding and the users see their typed characters.

Writing will be - unmodified - in UTF-8 mode.

What else could be done?

Recode UTF-8 files to your terminal's ISO encoding, pass the recoded file to Elvis and re-recode back to UTF-8 after saving. Could be automatize, but ugh!

Summary

Unless someone with a lot of time rewrites Elvis, there will be no native, full and true UTF-8 support. But as long as you work with only one ISO character set, it is not completely impossible to edit UTF-8 encoded files with Elvis - in both UTF-8 and ISO terminals.

@ib
Copy link
Contributor

ib commented Jul 24, 2020

Is anyone still using Elvis in an ISO 8859 terminal and is interested in the experimental patch mentioned above? (If so, which ISO 8859 encoding?)

@mbert
Copy link
Owner

mbert commented Jul 24, 2020

Is anyone still using Elvis in an ISO 8859 terminal and is interested in the experimental patch mentioned above? (If so, which ISO 8859 encoding?)

I am certainly not. Thanks also for your investigation on UTF8. I tried to implement UTF8 support several years ago but found that due to my ignorance of termcap programming I found no chance I would ever complete this.

Seems like people really needing to edit unicode files will need to use vim instead.

@Tux
Copy link
Contributor Author

Tux commented Jul 24, 2020

I (almost) never use elvis in a plain terminal environment, but (almost) always as elvis -Gx11 -fork (is my alias for vi)
I just accept that my multibyte characters use multiple positions and are not recognizable. It is what it is.
Elvis still accepts my Compose UTF-8 characters which then show as junk, but I just know it is valid junk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants