Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

under Windows OS, datamachine mysqlimport - warnings and errors #88

Open
daxenberger opened this issue Jul 31, 2015 · 2 comments
Open
Labels
Milestone

Comments

@daxenberger
Copy link
Member

Originally reported on Google Code with ID 94

Hi, 
has anyone tried to create a database on Windows OS? 

While everything seems to be working on linux (64bit), I encountered problem at the
final mysqlimport step when creating an exactly same copy of wikipedia on win7 64bit.


The problem is that when using the "mysqlimport" statement, I see that there are lots
of "warnings" (which didnt occur on linux). In order to log these warnings I used the
following equivalent command to load one ".txt" file at a time:

"mysql -uroot -p [dbname] --default-character-set=utf8 --execute="LOAD DATA INFILE
'[path]/page.txt' REPLACE INTO TABLE page FIELDS TERMINATED BY '\t'; SHOW WARNINGS"
> $output.log"

I managed to capture an error when importing "page.txt":
-----------------------------------
"ERROR 1406 (22001) at line 1: Data too long for column 'isDisambiguation' at row 1"
-----------------------------------

And some wwarnings when importing "page_redirects.txt":
-----------------------------------
| Warning | 1366 | Incorrect string value: '\xF0\x92\x86\xB3\x0D' for column 'redirects'
at row 1550585        |
| Warning | 1366 | Incorrect string value: '\xF0\x92\x82\xBC\xF0\x92...' for column
'redirects' at row 1784951 |
| Warning | 1366 | Incorrect string value: '\xF0\x9D\x84\xAA\x0D' for column 'redirects'
at row 2088024        |
| Warning | 1366 | Incorrect string value: '\xF0\x9D\x84\xAB\x0D' for column 'redirects'
at row 2088025        |
-------------------------------------------


This seems to be an OS specific issue. Would be nice if some experts can identify the
cause. Otherwise I have to try exporting the working-copy-on-linux and importing it
to windows...
Thanks!

Reported by ziqizhang.email on 2012-05-15 15:47:03

@daxenberger
Copy link
Member Author

Hmm, might be an encoding problem.
Since the encoding has been correctly defined in the mysqlimport command, there could
still a problem somewhere else.
If utf8 is not the standard encoding on you system, you might have to run the DataMachine
with the -Dfile.encoding=utf8 parameter. (also see: http://code.google.com/p/jwpl/wiki/DataMachine)

Also, you should check if you have created the database using the command
CREATE DATABASE [DB_NAME] DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

If this does not help, please extract the lines from the data file which cause these
warnings and post them here.

Reported by oliver.ferschke on 2012-05-16 10:16:40

  • Status changed: Accepted

@daxenberger
Copy link
Member Author

thanks, but I have set character encoding specifically already, and can confirm that
I used the "create" statement as you said.

I extracted the line that caused the error (Error 1406... above) and attached as a
screenshot. It is extremely long, since each line in the "page.txt" stores a single
wikipedia article. The screenshot is about the tail of the first line, and I have highlighted
the boundary with second line with red color. 

Im not sure how useful this is, since the last field "isDismabiguation" is a "bit"
datatype, and it doesnt seem to show properly, as you can see. 

Reported by ziqizhang.email on 2012-05-16 10:55:48


- _Attachment: Capture.PNG
![Capture.PNG](https://storage.googleapis.com/google-code-attachments/jwpl/issue-94/comment-2/Capture.PNG)_

@reckart reckart added this to the Bug backlog milestone Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants