Skip to content

Commit

Permalink
Typos and spell check.
Browse files Browse the repository at this point in the history
  • Loading branch information
marcosqlbi committed Jul 22, 2024
1 parent 4ea07b2 commit 5c63f64
Show file tree
Hide file tree
Showing 8 changed files with 52 additions and 57 deletions.
4 changes: 2 additions & 2 deletions _mydocs/contoso-data-generator/config-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The Excel configuration file contains both fixed data and parameters to control
The file contains several sheets, further described here. Each sheet contains multiple columns. The software reads some of the columns recognizing them by name. Columns with names that do not follow the standard requirements of the software are ignored. Columns have been conveniently colored in yellow if they are used by the software. Any non-yellow color is considered a comment and it is useful only for human purposes.

### Categories
From here you can configure sales of categories using two curves: W and PPC. "W" define the relative weight of each category in the set of all categories for different periods in the entire timeframe. "PPC" define the variation in the price of items of each category during the whole period (Price percent). Normally the last column is 100%.
From here you can configure sales of categories using two curves: W and PPC. "W" defines the relative weight of each category in the set of all categories for different periods in the entire timeframe. "PPC" defines the variation in the price of items of each category during the whole period (Price percent). Normally the last column is 100%.

### Subcategories
From here you can configure sales of subcategories using a weight curve with columns marked with W. The values are used to define the weight of a subcategory inside its category. Therefore, the numbers are summed by category and then used to weight subcategories inside the category.
Expand All @@ -32,6 +32,6 @@ This page is intended to define geographical areas, each with a set of weights t
For each geographic area, you define the W columns to provide the activity spline.

### Stores
On this page, you enumerate the stores. For each store, you provide its geographical area and the open and close date. A store is active only between the two dates.
On this page, you enumerate the stores. For each store, you provide its geographical area and the opening and closing date. A store is active only between the two dates.
You do not provide weight activity for the stores, as the behavior is dictated by the customer clusters. A special store marked -1 as StoreID defines the online store.
Each order is assigned to either the online store or to a local store depending on the country of the customer.
51 changes: 27 additions & 24 deletions _mydocs/contoso-data-generator/config-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,52 +7,55 @@ order: /06
---

This file contains the main configuration of the data generator.
- **OrdersCount**: (int) total number of orders to be generated.
- **OrdersCount**: (int) Total number of orders to be generated.

- **StartDT**: (datetime) date of the first order.
- **StartDT**: (datetime) Date of the first order.

- **YearsCount**: (int) total number of years generated. Orders are distributed over the years.
- **YearsCount**: (int) Total number of years generated. Orders are distributed over the years.

- **CutDateBefore**, **CutDateAfter**: (datetime optional parameters) the 2 parameters allow to create data starting from a day different from January 1st and ending on a date different from December 31st. Data before CutDateBefore and after CutDateAfter is removed
- **CutDateBefore**, **CutDateAfter**: (datetime optional parameters) The 2 parameters allow creating data starting from a day different from January 1st and ending on a date different from December 31st. Data before CutDateBefore and after CutDateAfter is removed

- **CustomerPercentage** : percentage of customers to be used. Range: 0.001 - 1.000
- **CustomerPercentage** : Percentage of customers to be used. Range: 0.001 - 1.000

- **OutputFormat** : format of the data to be generated. Values: CSV, PARQUET, DELTATABLE
- **OutputFormat** : Format of the data to be generated. Values: CSV, PARQUET, DELTATABLE

- **SalesOrders** : type if data to be generated. Values: SALES ORDERS BOTH. SALES = creates the "sales" table. ORDERS = creates the "orders" and the "orders details" table. BOTH = creates all the previous tables.
- **SalesOrders** : Type of data to be generated. Values: SALES / ORDERS / BOTH.
- SALES = creates the "sales" table.
- ORDERS = creates the "orders" and the "orders details" table.
- BOTH = creates all the previous tables ("sales", "orders", and "orders details").

- **CustomerFakeGenerator**: (int) number of full random customers. Only used during tests for speeding up the process.
- **CustomerFakeGenerator**: (int) Number of full random customers. Only used during tests to speed up the process.

- **DaysWeight** (section)

- **DaysWeightConstant**: (bool) if set to true, the configuration about days is ignored.
- **DaysWeightConstant**: (bool) If set to true, the configuration about days is ignored.

- **DaysWeightPoints**, **DaysWeighValues**: (double[]) points for interpolating the curve of distribution of orders over time. It covers the entire YearsCount period.
- **DaysWeightPoints**, **DaysWeighValues**: (double[]) Points for interpolating the curve of distribution of orders over time. It covers the entire YearsCount period.

- **DaysWeightAddSpikes**: (bool) if set to false, annual spikes are ignored.
- **DaysWeightAddSpikes**: (bool) If set to false, annual spikes are ignored.

- **WeekDaysFactor**: (double[] - length 7) weight multiplication factor for each day of the week. The first day is Sunday.
- **WeekDaysFactor**: (double[] - length 7) Weight multiplication factor for each day of the week. The first day is Sunday.

- **DayRandomness**: (double) percentage of randomness add to days, to avoid having a too-perfect curve over time.
- **DayRandomness**: (double) Percentage of randomness added to days, to avoid having a too-perfect curve over time.

- **OrderRowsWeights**: (double[]) distribution of the number of rows per order. Each element is a weight. The first element is the weight of orders with one row, the second is the weight of orders with two rows. and so on
- **OrderRowsWeights**: (double[]) Distribution of the number of rows per order. Each element is a weight. The first element is the weight of orders with one row, the second is the weight of orders with two rows. and so on

- **OrderQuantityWeights**: (double[]) distribution of the quantity applied to each order row. Each element is a weight. The first element is the weight of rows with quantity=1, the second element is the weight of rows with quantity=2, and so on.
- **OrderQuantityWeights**: (double[]) Distribution of the quantity applied to each order row. Each element is a weight. The first element is the weight of rows with quantity=1, the second element is the weight of rows with quantity=2, and so on.

- **DiscountWeights**: (double[]) distribution of the discounts applied to order rows. Each element is a weight. The first element is the weight of rows with a discount of 0%, the second element is the weight of rows with a discount of 1%, and so on.
- **DiscountWeights**: (double[]) Distribution of the discounts applied to order rows. Each element is a weight. The first element is the weight of rows with a discount of 0%, the second element is the weight of rows with a discount of 1%, and so on.

- **OnlinePerCent**: (double[]) distribution of the percentage of orders sold online, over the orders total.
- **OnlinePerCent**: (double[]) Distribution of the percentage of orders sold online, over the orders total.

- **DeliveryDateLambdaWeights**: (double[]) distribution of the days for delivery. The delivery date is computed by adding one day plus a random number generated using the distribution built from this parameter.
- **DeliveryDateLambdaWeights**: (double[]) Distribution of the days for delivery. The delivery date is computed by adding one day plus a random number generated using the distribution built from this parameter.

- **CountryCurrency**: table mapping Country to Currency
- **CountryCurrency**: Table mapping Country to Currency

- **AnnualSpikes** : set of periods where orders show a spike. For each spike, you define the start day, the end day, and the multiplication factor.
- **AnnualSpikes** : Set of periods where orders show a spike. For each spike, you define the start day, the end day, and the multiplication factor.

- **OneTimeSpikes**: set of spikes with a fixed start and end date. For each spike, you define the start end, the end date, and the multiplication factor.
- **OneTimeSpikes**: Set of spikes with a fixed start and end date. For each spike, you define the start end, the end date, and the multiplication factor.

- **CustomerActivity** : contains the configuration for customer start/end date
- **CustomerActivity** : Contains the configuration for customer start/end date

- **StartDateWeightPoints**, **StartDateWeightValues**: configuration for the spline of customer start date
- **StartDateWeightPoints**, **StartDateWeightValues**: Configuration for the spline of customer start date

- **EndDateWeightPoints**, **EndDateWeightValues**: configuration for the spline of customer end dates
- **EndDateWeightPoints**, **EndDateWeightValues**: Configuration for the spline of customer end dates
11 changes: 5 additions & 6 deletions _mydocs/contoso-data-generator/details.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,16 @@ Data schema (Orders & OrderRows version):

![Schema Sales](images/schema-orders.svg)

<br/>

Customers set of data is filled with fake, but realistic, customers data.
"Customers" is filled with fake, but realistic, customer data.


## Pre-data-preparation: static data from SQLBI repository

The tool needs some files containing static data: fake customers, exchange rates, postal codes, etc. The files are cached under cache folder specified as a parameter on the command line. The files are downloaded from a specific SQLBI repository if not found in the cache folder. In normal usage, if you reuse the same cache folder, the files are downloaded only on the first run.
After downloading, some files are processed to create a consistent set of fake customers. The output file, customersall.csv, is placed under cache folder. If you delete it, it will be recreated on the following run.
The tool needs some files containing static data: fake customers, exchange rates, postal codes, etc. The files are cached under the "cache" folder specified as a parameter on the command line. The files are downloaded from a specific [static files repository](https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/static-files) if not found in the cache folder. In normal usage, if you reuse the same cache folder, the files are downloaded only on the first run.

After downloading, some files are processed to create a consistent set of fake customers. The output file, *customersall.csv*, is placed under the "cache" folder. If you delete it, it will be recreated on the following run.


https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/static-files



2 changes: 1 addition & 1 deletion _mydocs/contoso-data-generator/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ order: /02
next_reading: true
---

Every output format has specific parameter to be set inside config.json
Every output format has specific parameters to be set inside config.json

## CSV

Expand Down
19 changes: 6 additions & 13 deletions _mydocs/contoso-data-generator/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ next_reading: true

The Contoso Data Generator is a tool for generating sample data with random generated orders for the Contoso data model in order to provide demo data. Generated data is ready to be imported into PowerBI, Fabric OneLake and other platforms.

It consist of a c# program, to generate the data plus additional scripts for simplifying the activity, importing data to sql-server, etc. The tool is available on GitHub: [https://github.com/sql-bi/Contoso-Data-Generator-V2/](https://github.com/sql-bi/Contoso-Data-Generator-V2/)
It consists of a c# program, to generate the data plus additional scripts for simplifying the activity, importing data to sql-server, etc. The tool is available on GitHub: [https://github.com/sql-bi/Contoso-Data-Generator-V2/](https://github.com/sql-bi/Contoso-Data-Generator-V2/)

If you are just interested in using **ready to use sets of data** generated by the tool, [download them here.](https://github.com/sql-bi/Contoso-Data-Generator-V2-Data)

Expand All @@ -18,7 +18,7 @@ Supported [output formats](formats.md):
- Delta Table (files)
- CSV
- CSV multi file
- CSV multi file - gz compressed
- CSV multi file, gz compressed
- Sql Server, via bulk-insert script of the generated CSV files

<br/>
Expand All @@ -45,19 +45,12 @@ Example:
databasegenerator.exe c:\temp\config.json c:\temp\data.xlsx c:\temp\OUT\ c:\temp\CACHE\
```

<br/>

**Note**: the tool needs some files containing static data: fake customers, exchange rates, postal codes, etc. The files are cached after been downloaded over the Internet from a specific SQLBI repository. [More details](details.md).

<br/>

To simplify running the tool, [a set of scripts](scripts.md) is available.

<br/>
<br/>
<br/>
**Note**: The tool needs some files containing static data: fake customers, exchange rates, postal codes, etc. The files are cached after being downloaded over the Internet from a specific SQLBI repository. [More details](details.md).

The current version is the evolution of the older one, still available on GitHub:
[Previous version](https://github.com/sql-bi/Contoso-Data-Generator)
## Previous versions
The current version of this tool is the evolution of the first Contoso Data Generator, still available on GitHub:
- [Previous version (v1)](https://github.com/sql-bi/Contoso-Data-Generator)


2 changes: 0 additions & 2 deletions _mydocs/contoso-data-generator/scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ published: true
order: /03
---



Under `script/dataset`, there are 3 scripts:
- `make_tool.cmd` : compiles the tool in release mode, using dotnet from the command line.
- `build_all.cmd` : creates the sets of data published on the ready-to-use repository.
Expand Down
11 changes: 6 additions & 5 deletions _mydocs/contoso-data-generator/sqlscripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ The set of scripts under `scripts/sql` allow you to import CSV output files to a
Steps:
- Create the set of data, as CSV files, as usual.
- Copy the output files under `scripts/sql/inputcsv`
- Adapt the script to your SQL server instance. SQLCMD requires sql server name and other parameters to connect to you sql server. The script defaults are:
- Adapt the script to your SQL server instance. SQLCMD requires SQL Server name and other parameters to connect to an SQL Server instance. The script defaults are:
`sqlcmd -S (LocalDb)\MSSQLLocalDB -d ContosoDGV2Test`
- Run the import script. When asked, choose what to import:
- sales : mamages base tables + sales table
- orders : mamages base tables + orders/order-rows tables
- both: mamages base tables + sales/orders/order-rows tables
- sales : Manages base tables + sales table
- orders : Manages base tables + orders/order-rows tables
- both: Manages base tables + sales/orders/order-rows tables


Resulting database:
Expand All @@ -34,7 +34,8 @@ Resulting database:

## SQLBI_ALL_DB.cmd

Specific scripts used by SQLBI for creating Sql Server database backups, available in the ready-to-use repository. The database are: Contoso 100k, Contoso 1M, Contoso 10M and Contoso 100M.
Specific scripts used by SQLBI for creating SQL Server database backups. It is available in the ready-to-use repository.
The database are: Contoso V2 10k, Contoso V2 100k, Contoso V2 1M, Contoso V2 10M, and Contoso V2 100M.

- `SQLBI_ALL_DB.cmd` : import data into the 4 databases, backup them and compress the resulting file.
- `SQLBI_CreateSqlDatabases.ps1` : creates the 4 databases on the specified SQL Server
9 changes: 5 additions & 4 deletions _mydocs/contoso-data-generator/uploadtofabric.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ published: true
order: /10
---

Delta Table output files generated by the tool can be easily and directly uploaded to Fabric LakeHouse without any conversions. Just copy all the files, respecting the folders structures, directly under "Tables" LakeHouse folder, and data will be ready to use.
Delta Table output files generated by the tool can be easily and directly uploaded to Fabric LakeHouse without any conversions. Just copy all the files, respecting the folder structures, directly under "Tables" LakeHouse folder, and the data will be ready to use.

<img src="images/fabric_01.png" width="800px"/><br/><br/>

Expand All @@ -18,7 +18,7 @@ There are 3 ways to upload files to Fabric LakeHouse directly:

## OneLake File Explorer

Currently in preview. It integrates with Windows File Explorer and allows you to manage files on Fabric with a OneDrive-like user-experience. Details here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-file-explorer
Currently in preview. It integrates with Windows File Explorer and allows you to manage files on Fabric with a OneDrive-like user experience. Details here: https://learn.microsoft.com/en-us/fabric/onelake/onelake-file-explorer

Data generated by the tool, uploaded to LakeHouse:

Expand Down Expand Up @@ -51,14 +51,15 @@ To test connectivity, try to list the files on the storage Tables partition:
azcopy list https://onelake.blob.fabric.microsoft.com/<WORKSPACE-NAME>/<LAKEHOUSE-NAME>.Lakehouse/Tables --trusted-microsoft-suffixes onelake.blob.fabric.microsoft.com
```

Upload all file:
Upload all files:

```
azcopy copy "../out/*" https://onelake.blob.fabric.microsoft.com/<WORKSPACE-NAME>/<LAKEHOUSE-NAME>.Lakehouse/Tables --recursive --check-length --put-md5 --trusted-microsoft-suffixes onelake.blob.fabric.microsoft.com
```

It also possible to delete all existing files. **Be very careful!**
Import: for some unknown reasons, the command often fails. Retry many times till all files have been deleted successfully.

**Important**: for some unknown reasons, the *remove* command often fails. Retry many times till all files have been deleted successfully.

```
azcopy remove https://onelake.blob.fabric.microsoft.com/<WORKSPACE-NAME>/<LAKEHOUSE-NAME>.Lakehouse/Tables/* --recursive=true --trusted-microsoft-suffixes onelake.blob.fabric.microsoft.com
Expand Down

0 comments on commit 5c63f64

Please sign in to comment.