v0.4.0
This is the first Datagen release with Spark.
Execution environments
- Both Spark 2 and 3 are supported.
- The generator can be run in a Docker container (for tests and small data sets), on a Spark cluster, and in cloud-based Spark implementations.
- We provide scripts for AWS EMR. We used these to generate data sets up to scale factor 30,000.
Data and parameter generation
- The generator produces a temporal graph where entities can be both inserted (
creationDate
) and deleted (deletionDate
). It support three serialization modes:- Raw mode: generates the entire temporal graph with the
creationDate
anddeletionDate
properties included for each dynamic entity. (Not intended for a benchmark but to be used for experiments where custom data sets are required.) - BI mode: generates an initial data set and daily batches of deletions and insertions. To be used with the LDBC SNB Business Intelligence workload.
- Interactive mode (incomplete): does not take deletions into account. Generates an initial data set. Does not yet generate update streams. See ldbc/ldbc_snb_interactive_v1_impls#173 for the plans to use the new Datagen for SNB Interactive.
- Raw mode: generates the entire temporal graph with the
- Supports producing factor tables.
- This release does not yet have a parameter generator. It will be added in later releases.