Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of local encoding in gesftpserver #5

Open
floppym opened this issue Jul 8, 2017 · 4 comments
Open

Handling of local encoding in gesftpserver #5

floppym opened this issue Jul 8, 2017 · 4 comments

Comments

@floppym
Copy link

floppym commented Jul 8, 2017

I tried out the latest gesftpserver code on my distro of choice (Gentoo Linux).

As a test, I downloaded some files using WinSCP with SFTP v6 enabled. Transferring files with ASCII filenames works fine, but transferring filenames with characters outside the ASCII range fails and the connection gets dropped.

I debugged the gesftpserver process, and I was hitting a fatal error in sftp_send_path. This ends up calling sftp_iconv to translate the path from the "local encoding" to UTF-8. I store all my filenames in UTF-8 on disk, so this doesn't make much sense.

Looking into it, I see that sshd (OpenSSH) has LANG=en_US.UTF-8 set in the environment, which it inherits from systemd. However, when sshd forks to start a new login session, it wipes the environment, including LANG. In other words, sftp_iconv fails fails due to LANG and LC_CTYPE being unset in the environment.

I was able to get the transfer to succeed by setting LANG=en_US.UTF-8 via the pam_env module, which gets invoked after the new session is created by sshd.

It seems like there must be a better way to make this work. I know that Linux doesn't really keep track of the encoding used for filenames. However, maybe the gesftpserver program could check to see if the string is already a valid UTF-8 sequence before throwing an encoding error?

@ewxrjk
Copy link
Owner

ewxrjk commented Jul 14, 2017

This sounds like a bug in sshd or its configuration to me. LC_CTYPE is how Unix programs expect to determine the encoding of all text including filenames, so in general we expect callers to set it appropriately.

@floppym
Copy link
Author

floppym commented Jul 15, 2017

Unix programs are free to use whatever encoding they want in filenames, regardless of LC_CTYPE. As well, different users may set LC_CTYPE or LANG in their shell startup scripts, and sshd has no way of knowing that when it starts gesftpserver.

I guess it would be nice to have some better error handling here if the filenames cannot be decoded. Right now, the gesftpserver daemon just aborts.

@ewxrjk
Copy link
Owner

ewxrjk commented Jul 15, 2017

What can I say? Applications have to make some decision about filename encoding. "Assume UTF-8" is indeed one possible policy, but I'm not ready to desupport users of other encodings, and the most widespread other approach I've ever found anything using is to honor LC_CTYPE. So that's the policy adopted here, and I don't see any need to change it.

I'll look into improving the error behavior, however, you're right that terminating the server is unfriendly.

@floppym
Copy link
Author

floppym commented Jul 16, 2017

Sure, it makes sense to use LC_CTYPE if it is set. It just sucks to use ASCII if it is unset. Maybe we could default to UTF-8 in that case? For example:

setlocale(LC_CTYPE, "");
local_encoding = nl_langinfo(CODESET);
if (!strcmp(local_encoding, "ANSI_X3.4-1968"))
  /* Use UTF-8 instead of ASCII */
  local_encoding = "UTF-8";

Also, I started a bug report at Gentoo to see if we can get the system default locale setting to be set by pam_env, which should mostly resolve this problem.

https://bugs.gentoo.org/show_bug.cgi?id=625234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants