Skip to content

Convert Fake data to CDM

Anton Ivanov edited this page Apr 27, 2021 · 3 revisions

Fake data

This feature allows one to create a fake dataset based on a scan report. The generated fake data can be used as source dataset for CDM conversion.

Fake data generation modes::

  • If no values have been scanned (i.e. the column in the scan report doesn’t contain values), Perseus will generate random strings or numbers for that column.
  • If there are values scanned, Perseus will generate the data by choosing from the scan values. Values are sampled either based on the frequencies of the values, or sampled uniformly (if this option selected).
  • If the column only contains unique values (each value has a frequency of 1, e.g. for primary keys), the generated column will be kept unique.

Fake data generation options:

Max rows per table sets the number of rows of each output table. By default, it is set to 10,000. By checking the Uniform Sampling box will generate the fake data uniformly. The frequency of each of the values will be treated as being 1, but the value sampling will still be random. This increases the chance that each of the values in the scan report is at least once represented in the output data.