First compile the dbgen binary:
make
Next use the helper script to generate the data. The script will pass any arguments to each per table invocation of dbgen (the -s argument indicates the scale factor to generate for):
./generate-data.sh -s 1
Following this several .tbl files will now exist.
To create the database ssb and columnstore tables:
mcsmysql -vvv < queries/ddl-columnstore.sql
To load the tables using cpimport (note the sudo is needed if columnstore is installed as root and you are running as a non root user):
sudo ./cpimport_data.sh
The queries directory contains the reference query and a helper script can run all in sequence:
./run-all-queries.sh
Note: In our research paper we use the SSB instead of SSBM Version of 2/28/10: Cardinality of supplier fixed to follow benchmark spec: now 2000SF (previously was 10000SF, in error): line 226, driver.c Type of time value changed from long to time_t (now 64 bits on Windows): line 688, build.c Building in Visual Studio 2008: Use Win32 console project, not using precompiled headers, in Properties>C/C++>CommandLine, additional options: /D "SSBM" /D "DBNAME" /D "DB2" (for DB2) Building using makefile_win: set for DB2 build: nmake -f makefile_win (Change DATABASE symbol for other database)
SSBM dbgen readme:
SSBM is based on TPC-H dbgen source. The coding style and architecture follows the TPCH dbgen. The original TPCH dbgen code stays untouched and all new code related to SSBM dbgen follow the "#ifdef SSBM" statements.
For original detailed TPC-H documentation, please refer TPCH_README document under the same directory. Here we just list few things that are specific to SSBM.
- How is SSBM DBGEN built?
Same idea as TPCH dbgen setup, which requires user to create an appropriate makefile, using makefile.suite as a basis. Make sure to use "SSBM" for the workload variable.
Type "make" to compile and to generate the SSBM dbgen executable. Please refer to Porting.Notes for more details and for suggested compile time options.
Note: If you want to generate the data files to a diffent directory, you should copy the dbgen executable as well as the dists.dss file to that directory.
- How to generate SSBM data files? To generate the dimension tables:
(customer.tbl) dbgen -s 1 -T c
(part.tbl) dbgen -s 1 -T p
(supplier.tbl) dbgen -s 1 -T s
(date.tbl) dbgen -s 1 -T d
(fact table lineorder.tbl) dbgen -s 1 -T l
(for all SSBM tables) dbgen -s 1 -T a
To generate the refresh (insert/delete) data set: (create delete.[1-4] and lineorder.tbl.u[1-4] with refreshing fact 0.05%) dbgen -s 1 -r 5 -U 4
where "-r 5" specifies refreshin fact n/10000 "-U 4" specifies 4 segments for deletes and inserts
At this moment there is no QGEN for SSBM. So the command line options related to those features won't apply.
- What are the changes upon TPC-H dbgen
changes made upon original TPC-H dbgen
- removed snowflake tables such as nation and region (done)
- removed the partsupply table (done)
- removed the order table (done)
- renamed the fact table as Lineorder and added/removed many fields ( done)
- added the date dimension table (done)
- adding and removing fields in dimension tables (done)
- have data cross reference for supplycost, revenue in lineorder (done)
- apply the refreshing only to lineorder table (done)
The command line option keeps the same as TPC-H dbgen (The -T options are changed to reflect different set of tables)
===================== End of README ========================================