Skip to content

v0.4.0

Compare
Choose a tag to compare
@szarnyasg szarnyasg released this 04 Dec 22:56
· 311 commits to main since this release

This is the first Datagen release with Spark.

Execution environments

  • Both Spark 2 and 3 are supported.
  • The generator can be run in a Docker container (for tests and small data sets), on a Spark cluster, and in cloud-based Spark implementations.
  • We provide scripts for AWS EMR. We used these to generate data sets up to scale factor 30,000.

Data and parameter generation

  • The generator produces a temporal graph where entities can be both inserted (creationDate) and deleted (deletionDate). It support three serialization modes:
    • Raw mode: generates the entire temporal graph with the creationDate and deletionDate properties included for each dynamic entity. (Not intended for a benchmark but to be used for experiments where custom data sets are required.)
    • BI mode: generates an initial data set and daily batches of deletions and insertions. To be used with the LDBC SNB Business Intelligence workload.
    • Interactive mode (incomplete): does not take deletions into account. Generates an initial data set. Does not yet generate update streams. See ldbc/ldbc_snb_interactive_v1_impls#173 for the plans to use the new Datagen for SNB Interactive.
  • Supports producing factor tables.
  • This release does not yet have a parameter generator. It will be added in later releases.