You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In one our project multiple applications are sharing a single large Oracle 19c DB - this includes both Ruby and non-Ruby apps.
Recently, we were faced with a curious bug where emoji character persisted in DB from non-Ruby application would be unprocessable (with "source sequence is illegal/malformed utf-8" error) when read back in Ruby (on Rails) application.
After some debugging it was found that an emoji (🙈) that was saved from non-Ruby application is stored in DB as raw bytes EDA0BDEDB988. The same emoji when saved from Ruby application was written to DB instead as F09F9988.
Per helpful table it was then understood that Ruby application writes data in "UTF-8" encoding, but non-Ruby apps are using a "CESU-8" encoding.
We took a look (SELECT * FROM nls_database_parameters) afterwards at a DB configuration of "CHARACTERSET" values:
Which lead to a curious discovery that our Ruby application is likely using an incorrect connection encoding at OCI8 level. Per our understanding and tests, the OCI8 gem assigns connection's encoding in "encoding-init.rb" bootup code. If we modified/used an initializer of our own that forced OCI8.encoding = Encoding::CESU_8, we could adequately read the original "CESU-8" encoded form of characters and new persistance from Ruby side would also follow the "CESU-8" conventions.
Thanks. I didn't notice that the CESU-8 encoding was supported in ruby.
As for your issue, I suggest that you should use AL32UTF8 in ruby.
# Set NLS_LANG before any "require 'oci8'".# When once oci8 is required, changing NLS_LANG has no effect.ENV['NLS_LANG']='american_america.AL32UTF8'require'oci8'
There is no difference as a result when 🙈 is inserted in the following two cases.
OCI8.encoding = Encoding::CESU_8 and UTF8 NLS language
OCI8.encoding = Encoding::UTF_8 and AL32UTF8 NLS language
In the case 1, 🙈 is converted to CESU-8 in ruby-oci8 and sent to the DB.
In the case 2, 🙈 is sent to the DB and converted to CESU-8 in the DB.
However when 🙈 is fetched, there is difference.
In the case 1, 🙈 is fetched as CESU-8. You need to convert it to UTF-8 or so.
On the other hand, in the case 2, 🙈 is fetched as UTF-8. It is more usable.
Well, I tested above in Oracle Database Free 23c.
I think that it is same in Oracle 19c but I haven't confirmed it.
In one our project multiple applications are sharing a single large Oracle 19c DB - this includes both Ruby and non-Ruby apps.
Recently, we were faced with a curious bug where emoji character persisted in DB from non-Ruby application would be unprocessable (with
"source sequence is illegal/malformed utf-8"
error) when read back in Ruby (on Rails) application.After some debugging it was found that an emoji (🙈) that was saved from non-Ruby application is stored in DB as raw bytes
EDA0BDEDB988
. The same emoji when saved from Ruby application was written to DB instead asF09F9988
.Per helpful table it was then understood that Ruby application writes data in "UTF-8" encoding, but non-Ruby apps are using a "CESU-8" encoding.
We took a look (
SELECT * FROM nls_database_parameters
) afterwards at a DB configuration of "CHARACTERSET" values:Obviously, in Oracle's infinite knowledge the "UTF8" here doesn't mean "UTF-8", but rather a weird "CESU-8" encoding form.
Which lead to a curious discovery that our Ruby application is likely using an incorrect connection encoding at OCI8 level. Per our understanding and tests, the
OCI8
gem assigns connection's encoding in "encoding-init.rb" bootup code. If we modified/used an initializer of our own that forcedOCI8.encoding = Encoding::CESU_8
, we could adequately read the original "CESU-8" encoded form of characters and new persistance from Ruby side would also follow the "CESU-8" conventions.In summary, I would like to know, if this line
ruby-oci8/lib/oci8/encoding.yml
Line 388 in 9594823
shouldn't rather read as
Ruby 2.7 added a "CESU-8" encoding, albeit with a slightest caveat concerning handling of
NULL
characters.Information on Ruby, "CESU-8" and Oracle DB is very sparse, so any discussion or suggestion would be helpful!
The text was updated successfully, but these errors were encountered: