generated from mintlify/starter
-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Excel extractor header detection update (#962)
* feat:header detection update * fix for failing check --------- Co-authored-by: Ashley Mulligan <[email protected]>
- Loading branch information
1 parent
6006567
commit a9f4dde
Showing
1 changed file
with
67 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,16 +42,27 @@ icon: "download" | |
take the raw numbers and disregard how it's displayed in Excel. | ||
</ParamField> | ||
|
||
<ParamField path="options.chunkSize" default="10_000" type="number" optional> | ||
<ParamField path="chunkSize" default="10_000" type="number" optional> | ||
The `chunkSize` parameter allows you to specify the quantity of records to in | ||
each chunk. | ||
</ParamField> | ||
|
||
<ParamField path="options.parallel" default="1" type="number" optional> | ||
<ParamField path="parallel" default="1" type="number" optional> | ||
The `parallel` parameter allows you to specify the number of chunks to process | ||
in parallel. | ||
</ParamField> | ||
|
||
<ParamField path="headerDetectionOptions" type="Object" optional> | ||
The `headerDetectionOptions` parameter allows you to specify the options for | ||
detecting headers in the file. By default, the first 10 rows are scanned for | ||
the row with the most non-empty cells. | ||
</ParamField> | ||
|
||
<ParamField path="debug" default="false" type="boolean" optional> | ||
The `debug` parameter lets you toggle on/off helpful debugging messages for | ||
development purposes. | ||
</ParamField> | ||
|
||
## API Calls | ||
|
||
- `api.files.download` | ||
|
@@ -70,7 +81,7 @@ icon: "download" | |
- [`@flatfile/[email protected]+`](https://npmjs.com/package/@flatfile/api) | ||
- [`@flatfile/[email protected]+`](https://npmjs.com/package/@flatfile/hooks) | ||
- [`@flatfile/[email protected]`](https://npmjs.com/package/@flatfile/listener) | ||
- [`@flatfile/[email protected]`](../utils/extractor) provides utility functions for extracting and parsing data from various file formats and sources, streamlining data import processes. | ||
- [`@flatfile/[email protected]`](https://npmjs.com/package/@flatfile/util-extractor) provides utility functions for extracting and parsing data from various file formats and sources, streamlining data import processes. | ||
- [`remeda`](https://remedajs.com/) offers a set of utility functions for functional programming and data manipulation in JavaScript, providing a convenient way to work with arrays and objects. | ||
- [`xlsx`](https://sheetjs.com/) allows for reading, writing, and manipulating Microsoft Excel files in JavaScript applications. | ||
|
||
|
@@ -98,6 +109,59 @@ listener.use(ExcelExtractor({ raw: true, rawNumbers: true })); | |
|
||
</CodeGroup> | ||
|
||
### Header Detection | ||
|
||
Three detection options are provided for detecting headers in the file: `default`, `explicitHeaders`, and `specificRows`. By default, the first 10 rows are scanned for the row with the most non-empty cells. This row is then used as the header row. | ||
|
||
#### Default | ||
|
||
It looks at the first `rowsToSearch` rows and takes the row | ||
with the most non-empty cells as the header, preferring the earliest | ||
such row in the case of a tie. | ||
|
||
```js | ||
listener.use(ExcelExtractor()); | ||
// or... | ||
listener.use( | ||
ExcelExtractor({ | ||
headerDetectionOptions: { | ||
algorithm: "default", | ||
rowsToSearch: 30, // Default is 10 | ||
}, | ||
}) | ||
); | ||
``` | ||
|
||
#### Explicit Headers | ||
|
||
This implementation simply returns an explicit list of headers it was provided with. | ||
|
||
```js | ||
listener.use( | ||
ExcelExtractor({ | ||
headerDetectionOptions: { | ||
algorithm: "explicitHeaders", | ||
headers: ["fiRsT NamE", "LaSt nAme", "emAil"], | ||
}, | ||
}) | ||
); | ||
``` | ||
|
||
#### Specific Rows | ||
|
||
This implementation looks at specific rows and combines them into a single header. For example, if you knew that the header was in the third row, you could pass it `{ rowNumbers: [2] }`. | ||
|
||
```js | ||
listener.use( | ||
ExcelExtractor({ | ||
headerDetectionOptions: { | ||
algorithm: "specificRows", | ||
rowNumbers: [2], // 0 based | ||
}, | ||
}) | ||
); | ||
``` | ||
|
||
### Full Example | ||
|
||
In this example, the `ExcelExtractor` is initialized with optional options, and then registered as middleware with the Flatfile listener. When an Excel file is uploaded, the plugin will extract the structured data and process it using the extractor's parser. | ||
|