Simple concurrent importer for MySQL using load_data_infile2.
Add to your application's Gemfile:
gem 'mysql_import'
And bundle.
For exampole, if you want to import to users
table from /path/to/users.csv
:
db_config = {
host: 'localhost'
database: 'mysql_import_test'
username: 'root'
}
importer = MysqlImport.new(db_config)
importer.add('/path/to/users.csv')
importer.import
# => Import to `users` tables
Multiple import:
importer = MysqlImport.new(db_config)
importer.add('/path/to/users.csv')
importer.add('/path/to/groups.csv')
importer.add('/path/to/departments.csv')
importer.import
# => Import to three tables from three csv files
MysqlImport has the concurrency because it uses the parallel gem.
With import options:
importer = MysqlImport.new(db_config)
importer.add('/path/to/users1.csv', table: 'users')
importer.add('/path/to/users2.csv', table: 'users')
importer.add('/path/to/users3.csv', table: 'users')
importer.import
# => Import to `users` table from three csv files
Key that can be passed to the second argument of the option of MysqlImport#initialize
is the four types.
concurrency: The number of threads to use. Ruby' GIL is released when the IO waiting occurs in mysql, you might be effective by concurrent processing. (default: 2)
importer = MysqlImport.new(db_config, concurrency: 4)
log: This is an option for the logger. (default: nil)
# File path
importer = MysqlImport.new(db_config, log: '/path/to/import.log')
# nil(This is the same as `/dev/null`)
importer = MysqlImport.new(db_config, log: nil)
# STDOUT / STDERR
importer = MysqlImport.new(db_config, log: $stdout)
# Custom logger
importer = MysqlImport.new(db_config, log: CustomLogger.new)
debug: This is a flag to the level of the logger to debug. (default: false)
importer = MysqlImport.new(db_config, log: $stdout, debug: true)
sql_opts: This is the option of import to be passed directly to the second argument of LoadDataInfile2#initialize
.
See more details for import options.
https://github.com/nalabjp/load_data_infile2#sql-options
The second argument of MysqlImport#add
will be passed directly to the second argument of LoadDataInfile2#import
.
See more details for import options.
https://github.com/nalabjp/load_data_infile2#sql-options
If you want to import only a specific file, you can specify the file.
The specification of the file will be used regular expression
importer = MysqlImport.new(db_config)
importer.add('/path/to/users.csv')
importer.add('/path/to/groups.csv')
importer.add('/path/to/departments.csv')
importer.import('users')
# => Only import to `users` table
importer.import(['users', 'groups'])
# => Import to `users` and `groups` table
If empry:
importer.import([])
# => Do not import anything
You are able to set the hook immediately before and after import.
The hook will accept either String or Proc or Array.
String is evaluated directly as SQL.
importer = MysqlImport.new(db_config)
importer.add(
'/path/to/users.csv',
{
before: 'TRUNCATE TABLE users;'
}
)
importer.import
# => Truncate query is always executed before import.
If you want to make the subsequent processing based on the execution result of SQL, you should use Proc.
Arguments that are passed to Proc is an instance of LoadDataInfile2::Client
, which is a subclass of Mysql2::Client
.
importer = MysqlImport.new(db_config)
importer.add(
'/path/to/users.csv',
{
before: ->(cli) {
res = cli.query('SELECT COUNT(*) AS c FROM users;')
cli.query('TRUNCATE TABLE users;') if res.first['c'] > 0
}
}
)
importer.import
# => If there is one or more records in `users` table, truncate query is executed.
Array of elements you need to use String or Proc.
importer = MysqlImport.new(db_config)
importer.add(
'/path/to/users.csv',
{
before: [
"SET sql_mode = 'STRICT_TRANS_TABLES';",
->(cli) {
res = cli.query('SELECT COUNT(*) AS c FROM users;')
cli.query('TRUNCATE TABLE users;') if res.first['c'] > 0
}
],
after: [
'SET @i = 0;',
'UPDATE users SET order = (@i := @i + 1) ORDER BY name, email ASC;',
]
}
)
importer.import
If you want to skip all subsequent processing, you will need to raise MysqlImport::Break
in Proc.
importer = MysqlImport.new(db_config)
importer.add(
'/path/to/users.csv',
{
before: ->(cli) {
res = cli.query('SELECT COUNT(*) AS c FROM users;')
raise MysqlImport::Break if res.first['c'] > 0
},
after: [
'SET @i = 0;',
'UPDATE users SET order = (@i := @i + 1) ORDER BY name, email ASC;',
]
}
)
importer.import
# => If there is one or more records in `users` table, import and after hook will be skipped.
Bug reports and pull requests are welcome on GitHub at https://github.com/nalabjp/mysql_import.
The gem is available as open source under the terms of the MIT License.