-
Notifications
You must be signed in to change notification settings - Fork 20
RandomQueryGenerator
The Table of Contents is your starting point to the complete documentation on the RQG
Installation, prerequisites and quick start
Running the RQG from MTR and PB2
To start the server and run the Random Query Generator against it:
$ perl runall.pl \ --basedir=/path/to/mysql-basedir \ --grammar=conf/example.yy
The runall.pl script accepts the following options:
- --basedir=,--basedir1=,--basedir2= - specify the MySQL base dir to to start the server from. If two base directories are specified, two servers will be started and all queries will be issued against both. The servers are started using perl mysql-test-run.pl --start-and-exit
- --vardir=,--vardir1=,--vardir2= - specify a var dir to use. If two are specified, two servers will be started and queries will be issued against both.
- --mysqld=--option,--mysqld1=--option,--mysqld2=--option - specify options to be passed to the server or servers. If two servers were started from the same basedir, each one should have its own list of options in order to have different behavoir. For example, to specify different storage engines for the two servers, use --mysqld1=--default-storage-engine=Falcon --mysqld2=--default-storage-engine=Maria
- --grammar= - the grammar file to use when generating queries. This is described in a greater detail below.
- --redefine= - grammar file which redefines the grammar specified in --grammar
- --gendata= - the configuration file for the Data Generator. This is described in RandomDataGenerator. If this option is not specified, a default legacy tables and data (described below) will be created.
- --validators=Validator1,Validator2,ValidatorN - specifies which Validators to be used to verify the query results, see below.
- --reporters=Reporter1,Reporter2,ReporterN - specifies which Reporters to be used to monitor the test, see below.
- --rpl_mode=[row|statement|mixed] - if specified, the script will start a master and a slave. After the test, it will dump and compare them both and report on the outcome.
- --engine=[MyISAM|Maria|Falcon|InnoDB|etc] - the MySQL engine to use. To run the same test against two different engines, please use the mysqld option as described above;
- --threads=N - number of threads to run. Using more than one thread makes the test non-deterministic. If 100% determinism is important (such as when running the same test against two servers) use either only SELECT queries or set --threads to 1. The default is 10 threads;
- --queries=N - the number of queries to use per thread. The default is 1000, resulting in a test that completes quickly;
- --duration=N - the maximum duration of the test in seconds. The default is 3600 seconds, however if the maximum number of --queries is reached first, the test will terminate;
- --mask=N used to mask out parts of the grammar. See RandomQueryGeneratorCombinations;
- --mask-level=N The mask is applied down to a given level from the top rule in the grammar. See RandomQueryGeneratorCombinations;
- --rows=N specifies the number of rows to generate when populating the table;
- --varchar-length=N specifies the length of the varchar values that will be generated;
- --debug - enable extra debugging information - aggregate stats on the number of rows returned, the errors encountered, and the optimizations from EXPLAIN;
- --start-dirty - start the servers in pre-populated vardirs without creating new tables or data;
- --views - creates a view for each base table created, for use with the _table grammar element
- --valgrind - runs the test under Valgrind, see RandomQueryGeneratorValgrind
- --notnull - creates all tables with NOT NULL unless there is a ZZ file that specifies otherwise, which allows for extra optimizer paths to be exercised. No NULL values are inserted into the tables if this option is specified.
To run the random query generator against an existing server:
$ perl gentest.pl \ --gendata \ --grammar=conf/example.yy \ --dsn=dbi:mysql:host=127.0.0.1:port=9306:user=root:database=test
The gentest.pl script accepts the following options:
- --dsn= describing the location of the server to connect to. It is in the standard perl format for DBI DSNs. To run the same queries against two severs, use --dsn1= and --dsn2=.
- --engine=, --gendata=, --grammar=, --redefine, --validators=, --reporters=, --threads=, --queries=, --duration, --mask, --mask-level, --rows, --varchar-length, --debug=, --views, described above, are also supported.
This script generates SQL statements based on a grammar and dumps them to stdout where they can be saved to a file or forwarded to a mysql client for execution. All diagnostic messages by the RQG are prefixed with a # sign and therefore appear as comments in the query stream and will be ignored.
perl gensql.pl \ --grammar=path/to/grammar \ --queries=NNNN \ --seed=ZZZ
Note however that since no MySQL server is present, the SQL grammar may not contain elements that require an actual database to resolve, such as _table or _field.
Main article: RandomDataGenerator
The Random Data Generator script is used to create a set of tables based on the requirements specified in a configuration file. Queries can then be issued against those tables. The operation and configuration of the Data Generator are described in RandomDataGenerator.
In the absence of a specific configuration, the script creates a default, legacy layout of tables with different sizes that have the following properties that are useful when creating a random query workload:
- Nine tables are created, named A, B, C, D, E, AA, BB, CC and DD, containing 0, 1, 20, 100, 1000, 0, 1, 20 and 100 rows respectively. In addition a DUMMY table is created as a portable version of "DUAL".
- The tables are defined as:
pk INTEGER AUTO_INCREMENT, col_int_nokey INTEGER [NULL|NOT NULL], col_int_key INTEGER [NULL|NOT NULL], col_date_key DATE [NULL|NOT NULL], col_date_nokey DATE [NULL|NOT NULL], col_time_key TIME [NULL|NOT NULL], col_time_nokey TIME [NULL|NOT NULL], col_datetime_key DATETIME [NULL|NOT NULL], col_datetime_nokey DATETIME [NULL|NOT NULL], col_varchar_key VARCHAR($varchar_length) [NULL|NOT NULL], col_varchar_nokey VARCHAR($varchar_length) [NULL|NOT NULL], PRIMARY KEY (pk), KEY (col_int_key), KEY (col_date_key), KEY (col_time_key), KEY (col_datetime_key), KEY (col_varchar_key, col_int_key)
Where NOT NULL is specified only if the option --notnull is given on the command line. Otherwise the column is defined with the special MySQL-style comment /*! NULL */.
Tables with names consisting of more than one letter have AUTO_INCREMENT set to start with some offset. This is to avoid always having 1-to-1 matches when JOINing tables on the pk column, thus improving test coverage.
If the option --engine=<engine> is specified on the command line, the given engine will be used to create all tables (ENGINE=<engine>).
If the option --views is specified, one view per table will be created (with no specified ALGORITHM) with names corresponding to each table name prepended with "view_". For example, for the tables A and B, views "view_A" and "view_B" will be created.
If you want to define your own tables and their contents, please create a data generator configuration file as described in RandomDataGenerator and use the --gendata= option to pass it to runall.pl or gentest.pl.
The RQG has a facility for running the same test multiple times, with each run using slightly different command-line options and SQL grammar. For more information, see RandomQueryGeneratorCombinations.
Main article RandomQueryGeneratorValidators
The Validators are individual modules that perform extra checking on the queries being executed. They are enabled via the --validators=Validator1,Validator2,ValidatorN option. By default, ResultsetComparator is enabled when running the same queries against two servers and ResultsetComparator is enabled when replication is being tested.
Main Article: RandomQueryGeneratorReporters
The Reporters are extra modules that perform checks on the server while the test is running and after the test has terminated. Unlike Validators, they are not related to the result of a particular query and are run in a separate monitoring process. By default, the Deadlock, ErrorLog and Backtrace are enabled.
Main article: RandomQueryGeneratorGrammar
The Random Query Generator takes a description of the queries to run from a grammar file. It looks very similar to a standard YACC grammar, except that there are no actions to execute for every rule. Sample scripts are available in the mysql-test/gentest/conf directory. They are described in RandomQueryGeneratorTests.
The script starts from a top-level element named query and will then walk the grammar randomly in order to produce random queries.
For example, you can have two types of queries, select and update. To run them in a proportion of 3 to 1, you can begin your grammar as follows:
query: select | select | select | update;
As in YACC, rule names start on a new line and end with a colon (:). The possibilities for each rule are separated by a pipe (|) and the list ends with a semi-colon (;).
Now, we need to define what “select” and “update” mean:
update: UPDATE _table SET _field = _integer WHERE condition;
Please leave whitespace before and after each individual element, including SQL punctuation. Uppercase and quoted strings are copied verbatim into the generated query. Thus, we only need to define what the lowercase rules mean.
For convenience, identifiers naming MySQL data types such as “_integer” and "_integer_unsigned" will be substituted with a random value that fits into the data type with the same name. "_date", "_datetime" and "_time" will also be replaced with a random date or time.
_table and _field pick a random table and a random field, respectively, from the current database. Note that for this to work all tables in the database must have identical structure.
The final rule we need to define in order to make this sample grammar complete is condition. Non-alphanumeric items are copied verbatim into the output, so we do not need to define them. We can simply state:
condition: _field > integer | _field < integer | _field = integer ;
The entire grammar looks like this:
query: select | update; select: SELECT _field FROM _table WHERE condition ; update: UPDATE _table SET _field = integer WHERE condition ; condition: _field > integer | _field < integer | _field = integer ;
Such a grammar will generate queries like:
UPDATE C SET date_nokey = '-1544526345' WHERE varchar_nokey < '-1296644256'; SELECT time_nokey FROM AA WHERE datetime_key > '963484549';
To execute it, generating 10 queries per thread in 5 parallel threads, run:
./gentest.pl --gendata -–grammar=sample.yy –-threads=5 –-queries=10 --engine=Falcon
This will call the Data Generator to create and populate the tables and will then generate and run the queries against those tables.
For more information, see RandomQueryGeneratorGrammar
Main Article: RandomQueryGeneratorComparison
$ perl runall.pl \ --basedir1=/path/to/mysql-ver-1 \ --basedir2=/path/to/mysql-ver-2 \ --grammar=conf/example.yy
This will start two servers, one from each basedir, and will run all queries against both. Any mismatches in the length or the contents of the result set will be reported. If you are using DML statements, setting --threads to more than 1 will probably produce many false positives, since the test becomes non-deterministic.
$ perl runall.pl \ --basedir=/path/two/mysql \ --vardir1=/path/to/vardir1 \ --vardir2=/path/to/vardir2 \ --mysqld1=--default_storage_engine=falcon \ --mysqld2=--default_storage_engine=innodb \ --grammar=conf/example.yy
This will start two copies of the server in basedir with different startup parameters, and then proceed to run the queries against both invocations.
For the rest, see the Table of Contents.