Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbWriteTable into utf-8 MySQL DB from windows #183

Open
jfdesomzee opened this issue Mar 1, 2017 · 9 comments
Open

dbWriteTable into utf-8 MySQL DB from windows #183

jfdesomzee opened this issue Mar 1, 2017 · 9 comments

Comments

@jfdesomzee
Copy link

jfdesomzee commented Mar 1, 2017

Hello,

I'm trying to export a table from R into a MySQL database with encoding utf-8 using dbWriteTable (I'm working with R-Studio and RmySQL 0.10.9).
But I get uncorrect results. Word having accent or other special character are truncated as the first accent is encounter. E.G.: Céline --> C
It does the same thing as I would get importing in workbench when the file is in latin1 and I specify it as UTF-8.

Here is my code:

require(RMySQL)
con = dbConnect(MySQL(), user=USER, password=PASSWORD_DEV, dbname=DBNAME, host=HOST)

Some text file with default encoding

accent_ANSI<-read.table("D:/5-Testing/Accent/Accent_ANSI.txt",header=TRUE,fileEncoding="latin1")

Test
Céline
Julie
Jérôme

Some text file encoded in UTF-8

accent_UTF8<-read.table("D:/5-Testing/Accent/Accent_UTF-8.txt",header=TRUE,fileEncoding="UTF-8")

DBI::dbGetQuery(con, "show variables like 'character_set_%'")

         Variable_name                                     Value

1 character_set_client latin1
2 character_set_connection latin1
3 character_set_database utf8
4 character_set_filesystem binary
5 character_set_results latin1
6 character_set_server latin1
7 character_set_system utf8
8 character_sets_dir /rdsdbbin/mysql-5.6.27.R1/share/charsets/

dbWriteTable(con, value = accent_ANSI,name = "default_ANSI",row.names=FALSE,overwrite=TRUE )
dbWriteTable(con, value = accent_UTF8,name = "default_UTF8",row.names=FALSE,overwrite=TRUE )

In both case I end up with

Test
C
Julie
J

So I guess R is working in latin1 under windows and when I'm sending the table using dbWriteTable I should specify somewhere that I'm sending latin1.

What I've tried so far:

  • Setting R so that it is working in UTF-8, but the system won't let me. From what I've read, changing this might not be so easy and so recommended.

Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Sys.setlocale("LC_ALL", 'en_US.UTF-8')
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
OS reports request to set locale to "en_US.UTF-8" cannot be honored

  • Change the connection properties

rs <- dbSendQuery(con, 'set character set utf8')
DBI::dbGetQuery(con, "show variables like 'character_set_%'")
Variable_name Value
1 character_set_client utf8
2 character_set_connection utf8
3 character_set_database utf8
4 character_set_filesystem binary
5 character_set_results utf8
6 character_set_server latin1
7 character_set_system utf8
8 character_sets_dir /rdsdbbin/mysql-5.6.27.R1/share/charsets/

It does not change the result but tend to crash
SQL Error [1412] [HY000]: Table definition has changed, please retry transaction
java.sql.SQLException: Table definition has changed, please retry transaction

It sounds similar to #57, #93

Thanks in advance,

Jef

@jfdesomzee
Copy link
Author

Converting encoring before the export does not seem to work either

accent_UTF8_enc<-mutate(accent_UTF8,newchar=enc2utf8(Test))
dbWriteTable(con, value = accent_UTF8_enc,name = "enc_UTF8")

image

@al13nus
Copy link

al13nus commented Mar 23, 2017

Got the same issue. Trying to write UTF-8 data to MySQL with "utf8_unicode_ci" collation and after "ö, ä,ü" its cuts off the words.

results <- dbWriteTable(con, name = "Mysqltable", value = finalTable, overwrite = FALSE, row.names = FALSE, append = T, fileEncoding="UTF-8")

some month ago I had the same problem, but its magically disappeared. Maybe through an fixing update which was forgotten to keep in new versions?

@al13nus
Copy link

al13nus commented Mar 23, 2017

LOL, found how to fix. I just updated the "DBI" package and restartet R. Then it works. Maybe RMySQL is using an older version of DBI or I dont know, but it can work.

Try this, load DBI before RMySQL:

list.of.packages <- c("DBI","RMySQL")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

library(DBI, quietly = T)
library(RMySQL, quietly = T)

@jfdesomzee
Copy link
Author

Does not change anything for me...
It worked briefly installing the dev version of RMySQL but now it is not working anymore.

@al13nus
Copy link

al13nus commented Mar 25, 2017

It worked on Windows 10 for me and still works. After trying on Ubuntu, hat still the same issue.

@JacobZHANG2015
Copy link

I got the same issue. Have you found the solutions to this problem? I would very much appreciate it if you could share it with me

@al13nus
Copy link

al13nus commented Aug 31, 2017

Not sure anymore, try creating data tables with "utf8_general_ci" encoding. Since then I never had problems with writing UTF-8 to MySQL.

@JacobZHANG2015
Copy link

I have searched on the internet and found one way to bypass the encoding issue. We can use write.csv() to convert the desired data.frame to UTF-8 encoding and then use dbWritetable() to import the csv file into MySQL directly. For example,

dbWriteTable(conn, name='test', value = "tmp.csv", fileEncoding = "UTF-8", overwrite = T, sep = ",")

However, I get another issue and have no idea how to solve it.

Error in read.table(value, sep = sep, header = header, skip = skip, nrows = nrows, :
more columns than column names

Could you please give me some advice to solve this issue?

@al13nus
Copy link

al13nus commented Aug 31, 2017

first of all: start using fread(). much faster and there you fread("link.csv", encode = "UTF-8").

for writing to database I'm doing it like this:

dbWriteTable(con, name = "Database", value = data, overwrite = FALSE, row.names = FALSE, append = TRUE).

like I sad months ago, update all packages and maybe delete the old ones. Had problems with Rcpp because some packages used the old version instead of the new one. hmm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants