Skip to content

Commit

Permalink
Add docs for spreadsheet data source
Browse files Browse the repository at this point in the history
  • Loading branch information
Xantier committed Sep 11, 2023
1 parent 31766b5 commit dee3a6e
Showing 1 changed file with 77 additions and 23 deletions.
100 changes: 77 additions & 23 deletions content/docs/tech-insights/define-custom-data-sources/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,77 @@ To set up a Data Source, you will, firstly, need to enter general information su

## Setting up data provider

Tech Insights supports multiple different data provider types to retrieve fact data from. You need to select an applicable source depending on what kind of data you want to create checks against. Below are descriptions of different provider types and their configuration options.

![Add Data Provider](./data-provider-step.png)

1 - You must specify a type for that new Data Source. Roadie provides few different types of data provider configurations:
1. The _HTTP_ type lets you connect to an external API via the Backstage proxy to pull in data
2. The _HTTP via Integration_ type les you connect directly to external APIs where there's an integration configured with Roadie. For example, you can use the GitHub app to authenticate requests to the GitHub API.
3. _Component repository file_ type lets you extract data from a file path in the corresponding repository of a given Component in your Catalog
4. _Component repository directory_ type allows you to extract a list of files from the repository
### HTTP via Proxy

For _HTTP_ type select a proxy from the provided dropdown and append a path extension to configure the URL the HTTP call should be made. The path extension should be input without the preceding slash.

This data provider provides connectivity via [Roadie proxies](/docs/custom-plugins/connectivity/proxy/) which allow you to configured URLs, headers and credentials in a secure way to reach to external endpoints.

The supported response types for HTTP data sources are JSON structures.

### HTTP via Integration

_HTTP via Integration_ data source exposes the same functionality as plain HTTP data source. The connection and authentication parameters for integration uses previously configured authentication mechanisms. For example in case of GitHub, the installed GitHub app credentials are used.

### Component repository file

Component repository file provider reaches out to the source location of an entity for data. This allows you to for example retrieve individual files from the GitHub repository where your entity is located. The supported data types are JSON, YAML as well as other file types. JSON and YAML types allow for well structured response extraction patterns, whereas other file types can be parsed using regex.

### Entity Definition

Entity definition returns the information on the entity as it is in the catalog. This data source provider can be used to retrieve information that could be for example stored directly in the entity manifest itself, like annotations or links.


### Component repository directory

Component repository directory allows you to retrieve a list of files that are located in the directory of the entity location. The returned data type from this provider is always a set of filenames.

### Spreadsheet

The spreadsheet data provider allows you to create facts from columnar data sources available on the internet, like Google Sheets. The provider requires an integration configured in the Roadie application to be able to access the source data. See configuration options for specific integration targets below.


<details>

<summary>Configuration options for spreadsheet data sources</summary>

#### Google Sheets

Google Sheets data source uses a secret key `GOOGLE_API_SA_KEY` to establish connectivity to Google APIs. You can configure this by generating a key.json against a Google Service Account and setting the contents of that file as a secret to Roadie application.

To use the Google Sheets API, you need a Google Cloud Platform Project with the API enabled, as well as authorization credentials. To get those, follow the steps below:
1. First open the Google Cloud Console in https://console.cloud.google.com, and create a new project.
2. Enable APIs and Services for your account
* At the top left, click Menu ☰ > APIs and Services > Enabled APIs and Services. Then click on the + Enable APIs and Services button.
3. Create a Service Account
* On the Credentials tab click the Create Credentials button at the top. Select Service Account in the drop-down menu.
* Take note of the email address that is assigned to this service account
4. Create API keys for the service account
5. Navigate to the Keys tab and click on the Add Key button. Select the Create New Key option, and then the key type of JSON.
6. Navigate to your Roadie instance Administration > Settings > Secrets section and paste the contents of the JSON file as a value to `GOOGLE_API_SA_KEY` secret.

To be able to expose a specific Google Sheet to your generated Service Account keys, you can click the Share button on a specific Sheet and paste in the email address from step 3 above.


</details>

### Example configuration steps

1 - You must specify a type for that new Data Source.
2 - Set additional configuration options depending on the type of the data provider
1. For _HTTP_ type
1. _proxy:_ Select a proxy from the provided dropdown
2. _path extension:_ The path to append to the proxy URL to which the HTTP call should be made. There is basic support for templating entity values into the path. E.g. `etc/{{ metadata.name }}` would insert the name of the entity. The path extension should be input without the preceding slash.
3. _HTTP Method:_ The HTTP method to use for the request. Mostly this will be `GET` but `POST` is also supported for graphQL and other endpoints which take query params through the request body.
4. _Body:_ For POST requests you can also send a body. Templating is also supported in the request body in the same way as above.
2. For _HTTP via Integration_ type you only need to set the _path extension_. The path to append to the base URL of the ingeration to which the HTTP call should be made. There is basic support for templating entity values into the path. E.g. `etc/{{ metadata.name }}` would insert the name of the entity. The path extension should be input without the preceding slash.
3. For _Component repository file_ configure the path to the file you want to extract data from in repositories, starting from the root. This can be anything from JSON files to YAML files.
4. For _Component repository_ configure the root folder where you want to list files from. To identify the repository root, you can use `.`.
1. For _HTTP_ type select a proxy from the provided dropdown and append a path extension to configure the URL the HTTP call should be made. The path extension should be input without the preceding slash.
1. _proxy:_ Select a proxy from the provided dropdown
2. _path extension:_ The path to append to the proxy URL to which the HTTP call should be made. There is basic support for templating entity values into the path. E.g. `etc/{{ metadata.name }}` would insert the name of the entity. The path extension should be input without the preceding slash.
3. _HTTP Method:_ The HTTP method to use for the request. Mostly this will be `GET` but `POST` is also supported for graphQL and other endpoints which take query params through the request body.
4. _Body:_ For POST requests you can also send a body. Templating is also supported in the request body in the same way as above.
2. For _HTTP via Integration_ type you only need to set the _path extension_. The path to append to the base URL of the ingeration to which the HTTP call should be made. There is basic support for templating entity values into the path. E.g. `etc/{{ metadata.name }}` would insert the name of the entity. The path extension should be input without the preceding slash.
3. For _Component repository file_ configure the path to the file you want to extract data from in repositories, starting from the root. This can be anything from JSON files to YAML files.
4. For _Component repository_ configure the root folder where you want to list files from. To identify the repository root, you can use `.`.
5. For _Spreadsheet_, configure the provider, sheet id and a sheet range/tab name of the values you want to retrieve

3 - Try out what would be the response when testing specific entity from the location you have provided. If you were to get the `package.json` from a `sample-service` component, the Data Source would get something like this:

Expand All @@ -46,13 +100,13 @@ Now that you have data, let’s define what Facts interest you. You’ll do this

4 - Data retention refers to maximum number of items or duration on how long to keep them before they are automatically removed from the database.

5 - Choose a parser to extract a Fact from the data obtained before. For the type “Component repository file” this can be either JSONata or Regex parser type, retrieved YAML files are handled as JSON. while for “HTTP” data provider type, only JSONata is supported. Repository directory configuration returns a single value of type Set and the only configurable options are the name and description of the field.
5 - Choose a parser to extract a Fact from the data obtained before. For the type “Component repository file” this can be either JSON or Regex parser type, while for “HTTP” data provider type, only JSON is supported. Retrieved YAML files are handled as JSON. Repository directory configuration returns a single value of type Set and the only configurable options are the name and description of the field. For spreadsheet data provider types, all columnar configuration options are used.

JSON type of parser uses [JSONata query syntax](https://jsonata.org/) to extract data from JSON. Regex type uses [ECMAScript syntax](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) to extract data from text.

6 - If you’re using the JSON parser, specify a path from the root of the object. For example _“version”,_ or “_scripts.test”_. If you’re using the Regex parser, specify a valid expression with a capture group if extracting values. Please note the Regex does not need slashes at the start or the end.

Let's look how we would do it with an example.
Let's look how we would do it with an example.

Using **Regex parser type** from the following result:

Expand All @@ -63,10 +117,10 @@ Retrieving Node version we could write the following Regex:
![Field extraction Regex](./field-extraction-regex.png)

On the other hand, if you were to have the following result:
![JSON result](./json-result-type.png)
and wanted to obtain total pages number, we could use the following syntax
![JSONata parser](./fact-parser-jsonata.png)
![JSON result](./json-result-type.png)

and wanted to obtain total pages number, we could use the following syntax
![JSONata parser](./fact-parser-jsonata.png)


7 - Select the type of the parsed value.
Expand All @@ -82,14 +136,14 @@ After successfully adding a fact you will be able to select kind and type of ser

![Data Source Entity Filter](./data-source-entity-filter.png)

You should be able to see the created Data Source in the overview screen. If you decide to create a draft Data Source, you will need to publish it in order for others to see it. This can be achieved using actions menu.
You should be able to see the created Data Source in the overview screen. If you decide to create a draft Data Source, you will need to publish it in order for others to see it. This can be achieved using actions menu.


## Running the data source

![Data Source Publish](./publish-data-source.png)

Newly created Data Sources have a refresh cycle set to 24hours, but you can modify this value in 'Edit' screen, as well as trigger an update manually from the kebab menu.
Newly created Data Sources have a refresh cycle set to 24hours, but you can modify this value in 'Edit' screen, as well as trigger an update manually from the kebab menu.

![Trigger update](./trigger-update.png)

Expand Down Expand Up @@ -187,7 +241,7 @@ API response:

### Proxy usage and the Broker

All Roadie HTTP Tech Insights data source are using Roadie Proxy to connect to third party services. You can configure different proxies [using these instructions](/docs/custom-plugins/connectivity/proxy/). Additionally if you want to connect to services or endpoints within your own infrastructure, you can also use the Broker connectivity to reach your secure services. To do this, you need to first [set up the broker connection](docs/integrations/broker/).
All Roadie HTTP Tech Insights data source are using Roadie Proxy to connect to third party services. You can configure different proxies [using these instructions](/docs/custom-plugins/connectivity/proxy/). Additionally, if you want to connect to services or endpoints within your own infrastructure, you can also use the Broker connectivity to reach your secure services. To do this, you need to first [set up the broker connection](docs/integrations/broker/).

Since the endpoints tech insights potentially connects to via the broker are unknown to Roadie beforehand, the user needs to construct their own `accept.json` Broker configuration file to connect to internal endpoints. An example configuration file connecting to a self-hosted metrics server mocked out below.

Expand Down Expand Up @@ -223,7 +277,7 @@ Where the secret `MY_METRICS_SERVICE_AUTH_TOKEN` is defined by an environment va

</details>

You can set up the broker connection by using `/broker` proxy configuration and defining an endpoint path like `my-broker-token/api/get-my-metrics`.
You can set up the broker connection by using `/broker` proxy configuration and defining an endpoint path like `my-broker-token/api/get-my-metrics`.

With this kind of set up the Tech Insights data source engine uses a broker connection identified by `my-broker-token` and calls an endpoint `/api/get-my-metrics` via the established broker connection. This configuration matches the mock `accept.json` file seen above, meaning that the Tech Insights data source calls an internal service on internal network hosted under address `http://metrics-server.internal.our-company.com/api/get-my-metrics` and returns response from there. This response can then be mapped to more streamlined and easily usable fact data using the JSONata extractor functionality.

Expand Down

0 comments on commit dee3a6e

Please sign in to comment.