-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init JSON multi-sheet extractor plugin #693
Init JSON multi-sheet extractor plugin #693
Conversation
function isEmpty(obj) { | ||
for (const prop in obj) { | ||
if (Object.prototype.hasOwnProperty.call(obj, prop)) { | ||
return false; | ||
} | ||
} | ||
|
||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like return Object.keys(obj).length === 0;
be equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns that this is much faster than using Object.keys
: https://stackoverflow.com/questions/679915/how-do-i-test-for-an-empty-javascript-object
@@ -0,0 +1,78 @@ | |||
<!-- START_INFOCARD --> | |||
|
|||
The `@flatfile/json-multisheet-extractor` plugin parses a JSON file and extracts second-level nested objects into first level defined Sheets in Flatfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to put a concrete JSON example here with some example keys that can be referred to so it's clear what this means to someone who is less of a power user.
// When a JSON file is uploaded, the data will be extracted and processed using the extractor's parser. | ||
``` | ||
|
||
See a working example in our [flatfile-docs-kitchen-sink](https://github.com/FlatFilers/flatfile-docs-kitchen-sink/blob/main/typescript/extractors/index.ts) Github repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this link used in all plugin docs? Should we include a plugin-specific full example in this repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I note as much in our convo here: https://aprimetechnology.slack.com/archives/C07T66YBDHB/p1730404293554049, but I'd like maintainer feedback as those examples don't really do as much as the README here (especially after I add the JSON example you suggested)
"name": "@flatfile/plugin-json-multisheet-extractor", | ||
"version": "0.1.0", | ||
"url": "https://github.com/FlatFilers/flatfile-plugins/tree/main/plugins/json-multisheet-extractor", | ||
"description": "A plugin for parsing json files into multiple sheets in Flatfile.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: JSON
"build:watch": "tsup --watch", | ||
"build:prod": "NODE_ENV=production tsup", | ||
"checks": "tsc --noEmit && attw --pack . && publint .", | ||
"lint": "tsc --noEmit", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to be good citizens (doesn't have to be in this PR) we could introduce a pattern for eslint
with a fix script as well.
WalkthroughThe changes in this pull request involve updates to two Flatfile plugins: Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Outside diff range and nitpick comments (11)
plugins/json-multisheet-extractor/src/index.ts (2)
4-10
: Add JSDoc documentation and improve type safety.Consider adding documentation and type improvements:
- Add JSDoc to describe the function's purpose and options
- Add validation or constraints for numeric options
- Explicitly type the return value
Here's a suggested improvement:
+/** + * Creates a JSON multi-sheet extractor with the specified configuration. + * @param options Configuration options for the extractor + * @param options.chunkSize Size of data chunks to process (must be positive) + * @param options.parallel Number of parallel operations (must be positive) + * @param options.debug Enable debug logging + * @returns Configured file extractor for JSON files + */ export const JSONMultiSheetExtractor = (options?: { - chunkSize?: number - parallel?: number + chunkSize?: number & { __brand: 'PositiveNumber' } + parallel?: number & { __brand: 'PositiveNumber' } debug?: boolean -}) => { +}): ReturnType<typeof Extractor> => { + if (options?.chunkSize && options.chunkSize <= 0) { + throw new Error('chunkSize must be positive') + } + if (options?.parallel && options.parallel <= 0) { + throw new Error('parallel must be positive') + } return Extractor('.json', 'json', parseBuffer, options) }
12-12
: Add documentation for the exported parser.Consider adding JSDoc to explain the purpose of this export and its relationship to parseBuffer.
+/** Exported parser function for processing JSON buffers into multi-sheet format */ export const jsonParser = parseBuffer
plugins/json-multisheet-extractor/jest.config.cjs (1)
1-16
: LGTM! Consider adding documentation for configuration choices.The Jest configuration is well-structured and follows common practices. However, it would be helpful to document the rationale for some of the configuration choices, especially the longer timeout and force exit settings.
Add a comment block at the top of the file explaining the configuration choices:
+/** + * Jest configuration for JSON multi-sheet extractor plugin + * + * - 60s timeout: Allows for potential network delays in integration tests + * - Force exit: Ensures clean CI pipeline execution + * - Pass with no tests: Enables gradual test implementation + */ module.exports = { testEnvironment: 'node', // ... rest of the configplugins/json-multisheet-extractor/src/parser.ts (2)
1-4
: Remove extra blank lineThere's an unnecessary extra blank line between imports and the function declaration.
import { WorkbookCapture, parseSheet } from '@flatfile/util-extractor' - export function parseBuffer(buffer: Buffer): WorkbookCapture {
1-42
: Consider architectural improvements for scalabilityThe current implementation might face challenges with large datasets:
- Loading entire file into memory
- No progress reporting for large files
- All-or-nothing processing approach
Consider these architectural improvements:
- Stream processing for large files
- Progress callback for processing status
- Option to return partial results on sheet failures
Would you like assistance in implementing any of these improvements?
plugins/json-multisheet-extractor/README.md (4)
1-11
: Enhance the info card with more specific details.Consider adding:
- An example JSON structure showing what "second-level nested objects" means
- Any limitations or requirements for the JSON structure
Example addition:
The `@flatfile/json-multisheet-extractor` plugin parses a JSON file and extracts second-level nested objects into first level defined Sheets in Flatfile. + +For example, given this JSON structure: +```json +{ + "orders": { + "items": [ + { "id": 1, "name": "Product A" } + ] + } +} +``` +The plugin will create a Sheet for "items" with the nested array data.
28-40
: Enhance API calls documentation with context and purpose.The API calls section would be more helpful if it explained when and why each API is used during the extraction process.
Consider restructuring like this:
## API Calls -List of APIs used: - -- `api.files.download` -- `api.files.get` +The plugin uses these APIs during different stages of processing: + +### File Processing +- `api.files.download` - Downloads the JSON file for processing +- `api.files.get` - Retrieves file metadata +- `api.files.update` - Updates file status during processing + +### Job Management +- `api.jobs.create` - Initiates the extraction job +- `api.jobs.update` - Updates job progress +- `api.jobs.complete` - Marks successful completion +- `api.jobs.fail` - Handles extraction failures +- `api.jobs.ack` - Acknowledges job receipt + +### Data Processing +- `api.records.insert` - Inserts extracted records into sheets +- `api.workbooks.create` - Creates workbooks for extracted data
43-77
: Expand usage examples with configuration and error handling.The usage section would benefit from additional examples showing:
- TypeScript usage with type definitions
- How to configure optional parameters
- Error handling scenarios
Consider adding:
### Full Example +### TypeScript Usage +```typescript +import { JSONMultiSheetExtractor, ExtractorOptions } from "@flatfile/plugin-json-multisheet-extractor"; + +const options: ExtractorOptions = { + chunkSize: 5000, + parallel: 2 +}; + +const jsonMultiSheetExtractor = JSONMultiSheetExtractor(options); + +// Error handling +listener.on("error", async (error) => { + console.error("Extraction failed:", error); + // Handle the error appropriately +}); +```
77-78
: Consider adding essential documentation sections.The README would benefit from additional sections:
- Troubleshooting Guide - Common issues and solutions
- JSON Schema Requirements - Expected structure and validation
- Rate Limiting - Performance considerations and limits
Would you like me to help draft these additional sections?
plugins/json-multisheet-extractor/src/parser.spec.ts (2)
6-8
: Add error handling for test file reading.Consider adding error handling for the file read operation to provide better debugging information if the test file is missing.
- const buffer: Buffer = fs.readFileSync( - path.join(__dirname, '../ref/test-basic.json') - ) + const testFilePath = path.join(__dirname, '../ref/test-basic.json') + let buffer: Buffer + try { + buffer = fs.readFileSync(testFilePath) + } catch (error) { + throw new Error(`Failed to read test file at ${testFilePath}: ${error.message}`) + }
10-138
: Consider enhancing test coverage with edge cases.While the test is comprehensive for the happy path, consider adding test cases for:
- Empty arrays
- Missing optional fields
- Malformed JSON
- Different data types (null, undefined, numbers, booleans)
- Special characters in field names
Also, consider extracting repeated test data into shared constants:
const FATHER_HIERARCHY = { 'Father.First Name': { value: 'Father_First_1' }, 'Father.Last Name': { value: 'Father_Last_1' }, // ... rest of the hierarchy } const COORDINATES = { 'Address.Coordinates.Latitude': { value: '40.7128° N' }, 'Address.Coordinates.Longitude': { value: '74.0060° W' } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (4)
plugins/json-extractor/package.json
is excluded by!**/*.json
plugins/json-multisheet-extractor/package.json
is excluded by!**/*.json
plugins/json-multisheet-extractor/ref/test-basic.json
is excluded by!**/*.json
utils/extractor/package.json
is excluded by!**/*.json
📒 Files selected for processing (10)
.changeset/slow-bottles-roll.md
(1 hunks)plugins/json-extractor/src/parser.ts
(2 hunks)plugins/json-multisheet-extractor/CHANGELOG.md
(1 hunks)plugins/json-multisheet-extractor/README.md
(1 hunks)plugins/json-multisheet-extractor/jest.config.cjs
(1 hunks)plugins/json-multisheet-extractor/src/index.ts
(1 hunks)plugins/json-multisheet-extractor/src/parser.spec.ts
(1 hunks)plugins/json-multisheet-extractor/src/parser.ts
(1 hunks)plugins/json-multisheet-extractor/tsup.config.mjs
(1 hunks)utils/extractor/src/index.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- .changeset/slow-bottles-roll.md
- plugins/json-multisheet-extractor/CHANGELOG.md
- plugins/json-multisheet-extractor/tsup.config.mjs
🧰 Additional context used
🪛 Markdownlint
plugins/json-multisheet-extractor/README.md
19-19: Expected: h3; Actual: h4
Heading levels should only increment by one level at a time
(MD001, heading-increment)
🔇 Additional comments (8)
plugins/json-multisheet-extractor/src/index.ts (2)
1-2
: LGTM! Clean and focused imports.
The imports are well-structured, importing only the necessary components.
2-2
: Verify parser implementation and test coverage.
Let's verify the existence and implementation of the parser module.
✅ Verification successful
Parser implementation and test coverage verified successfully
The parser implementation is well-structured and properly tested:
- Parser correctly implements JSON parsing with error handling
- Comprehensive test coverage includes:
- Full workbook capture verification
- Header validation for multiple sheets (contacts, orders)
- Complex nested data structure handling
- Edge case handling for invalid inputs
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify parser implementation and test coverage
# Check if parser file exists and show its content
echo "Checking parser implementation..."
cat plugins/json-multisheet-extractor/src/parser.ts || echo "parser.ts not found!"
# Check if tests exist
echo -e "\nChecking test coverage..."
cat plugins/json-multisheet-extractor/src/parser.spec.ts || echo "parser.spec.ts not found!"
Length of output: 7926
plugins/json-extractor/src/parser.ts (2)
1-3
: LGTM! Good use of shared utilities.
The import of parseSheet
from @flatfile/util-extractor
promotes code reuse and maintainability.
18-24
: 🛠️ Refactor suggestion
Consider improving type safety.
The type assertion as WorkbookCapture
is used without validation. Consider adding runtime type checking or using type guards.
Let's verify the shape of WorkbookCapture type:
Consider adding type validation:
interface SheetData {
[key: string]: unknown;
}
function isWorkbookCapture(data: unknown): data is WorkbookCapture {
return typeof data === 'object' && data !== null;
}
✅ Verification successful
Let's analyze the type definition and its usage to provide a more accurate verification of the type safety concern.
From the search results, I can see that WorkbookCapture
is defined in utils/extractor/src/index.ts
as:
export type WorkbookCapture = Record<string, SheetCapture>
Let's verify the SheetCapture type to complete our analysis:
Type assertions are appropriate in this context
The type assertions as WorkbookCapture
are safe in this case because:
WorkbookCapture
is defined asRecord<string, SheetCapture>
whereSheetCapture
has a well-defined structure- The empty object
{}
is a validWorkbookCapture
as it satisfies the type definition - The object literal
{ [sheetName]: sheetCapture }
is also valid assheetCapture
is created by theparseSheet
function which ensures the correct structure
The type assertions are used in three scenarios:
- Returning empty object for invalid file contents
- Returning empty object when sheet capture is empty
- Returning a properly structured object with sheet data
All these cases align with the type definition and runtime structure.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for WorkbookCapture type definition
ast-grep --pattern 'type WorkbookCapture = {
$$$
}'
Length of output: 55
Script:
#!/bin/bash
# Let's try different patterns to find WorkbookCapture type definition
ast-grep --pattern 'type WorkbookCapture'
# Also search for interface definition
ast-grep --pattern 'interface WorkbookCapture'
# Let's also check the imports to see if it's imported from another file
rg "WorkbookCapture" -A 2 -B 2
# Search for any type aliases or interfaces containing WorkbookCapture
ast-grep --pattern '$_ WorkbookCapture'
Length of output: 11384
Script:
#!/bin/bash
# Search for SheetCapture type definition
ast-grep --pattern 'type SheetCapture'
# Also check for interface definition
ast-grep --pattern 'interface SheetCapture'
# Look for any usage or references
rg "SheetCapture" -A 3 -B 3
Length of output: 5302
plugins/json-multisheet-extractor/src/parser.spec.ts (2)
5-170
: Verify test coverage for parser functionality.
The test suite focuses on successful parsing scenarios. Let's verify the coverage:
#!/bin/bash
# Look for error handling in parser implementation
rg -A 5 "throw|catch|error" src/parser.ts
# Look for additional test files
fd "parser.*\.spec\.(ts|js)" plugins/json-multisheet-extractor/src/
# Look for edge cases in existing tests
rg -A 5 "describe|it|test" plugins/json-multisheet-extractor/src/ --type ts
163-169
: Ensure consistent type handling for Amount field.
The Amount field uses number values (5678, 5432) but is handled as a string value in the workbook capture. Consider:
- Documenting the expected type behavior
- Adding type validation in the parser
- Adding test cases for different numeric formats (decimals, scientific notation)
utils/extractor/src/index.ts (2)
296-296
: LGTM! Well-structured type definitions.
The WorkbookCapture
and SheetCapture
types are clearly defined and properly document the expected data structures.
297-345
: Verify integration with existing workbook creation flow.
The new parseSheet function is used by the JSON multi-sheet extractor plugin. Let's verify its integration with the existing workbook creation flow.
✅ Verification successful
Integration with workbook creation flow is properly implemented
The verification shows that parseSheet
is correctly integrated in both JSON extractors:
- In json-multisheet-extractor: Processes multiple sheets, handles errors per sheet, and returns a WorkbookCapture
- In json-extractor: Processes a single sheet with proper error handling and empty state checks
- Both implementations properly type the return value as WorkbookCapture
- Error handling is implemented at both the sheet parsing and workbook creation levels
The function is being used as intended within the existing workbook creation flow with appropriate error boundaries and type safety.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the usage of parseSheet function and its integration
# Expected: Find calls to parseSheet in the plugin implementation
# Search for parseSheet usage
echo "Searching for parseSheet usage..."
rg "parseSheet" --type ts
# Search for potential error handling patterns
echo "Checking error handling patterns..."
rg "catch.*error.*parseSheet" --type ts
Length of output: 719
Script:
#!/bin/bash
# Let's examine the implementation details of how parseSheet is integrated
# Check the parser implementation in json-multisheet-extractor
echo "=== JSON Multi-sheet Extractor Implementation ==="
rg -A 10 "const result = parseSheet" plugins/json-multisheet-extractor/src/parser.ts
# Check the parser implementation in json-extractor
echo "=== JSON Extractor Implementation ==="
rg -A 10 "var sheetCapture = parseSheet" plugins/json-extractor/src/parser.ts
# Look for WorkbookCapture type usage to understand the data flow
echo "=== WorkbookCapture Usage ==="
ast-grep --pattern 'interface WorkbookCapture {
$$$
}'
Length of output: 1134
module.exports = { | ||
testEnvironment: 'node', | ||
|
||
transform: { | ||
'^.+\\.tsx?$': 'ts-jest', | ||
}, | ||
setupFiles: ['../../test/dotenv-config.js'], | ||
setupFilesAfterEnv: [ | ||
'../../test/betterConsoleLog.js', | ||
'../../test/unit.cleanup.js', | ||
], | ||
testTimeout: 60_000, | ||
globalSetup: '../../test/setup-global.js', | ||
forceExit: true, | ||
passWithNoTests: true, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding test coverage configuration.
To ensure proper test coverage of the plugin's functionality, consider adding Jest coverage configuration.
Add coverage configuration to enforce minimum coverage thresholds:
module.exports = {
testEnvironment: 'node',
+ collectCoverage: true,
+ coverageDirectory: 'coverage',
+ coverageThreshold: {
+ global: {
+ branches: 80,
+ functions: 80,
+ lines: 80,
+ statements: 80
+ }
+ },
// ... rest of the config
Committable suggestion skipped: line range outside the PR's diff.
plugins/json-extractor/src/parser.ts
Outdated
const filteredResults = results.filter( | ||
(item) => typeof item === 'object' && item !== null | ||
) | ||
var sheetCapture = parseSheet(results); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use const
instead of var
.
Replace var
with const
to prevent accidental reassignment and follow modern JavaScript/TypeScript best practices.
- var sheetCapture = parseSheet(results);
+ const sheetCapture = parseSheet(results);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
var sheetCapture = parseSheet(results); | |
const sheetCapture = parseSheet(results); |
} as WorkbookCapture | ||
} catch (error) { | ||
console.error('An error occurred:', error) | ||
throw error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance error handling specificity.
The current error handling catches all errors and logs them before rethrowing. Consider handling specific error types differently (e.g., JSON parsing errors vs sheet processing errors).
- } catch (error) {
- console.error('An error occurred:', error)
- throw error
+ } catch (error) {
+ if (error instanceof SyntaxError) {
+ console.error('JSON parsing error:', error)
+ throw new Error('Invalid JSON format')
+ }
+ console.error('Sheet processing error:', error)
+ throw error
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
} as WorkbookCapture | |
} catch (error) { | |
console.error('An error occurred:', error) | |
throw error | |
} | |
} as WorkbookCapture | |
} catch (error) { | |
if (error instanceof SyntaxError) { | |
console.error('JSON parsing error:', error) | |
throw new Error('Invalid JSON format') | |
} | |
console.error('Sheet processing error:', error) | |
throw error | |
} |
function isEmpty(obj) { | ||
for (const prop in obj) { | ||
if (Object.prototype.hasOwnProperty.call(obj, prop)) { | ||
return false; | ||
} | ||
} | ||
|
||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing the isEmpty utility function.
While the implementation is correct, there are several improvements that could be made:
- Add TypeScript type annotations
- Use a more concise implementation
- Add JSDoc documentation
-function isEmpty(obj) {
- for (const prop in obj) {
- if (Object.prototype.hasOwnProperty.call(obj, prop)) {
- return false;
- }
- }
-
- return true;
-}
+/**
+ * Checks if an object has any own enumerable properties
+ * @param obj - The object to check
+ * @returns true if the object is empty, false otherwise
+ */
+function isEmpty(obj: Record<string, unknown>): boolean {
+ return Object.keys(obj).length === 0;
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
function isEmpty(obj) { | |
for (const prop in obj) { | |
if (Object.prototype.hasOwnProperty.call(obj, prop)) { | |
return false; | |
} | |
} | |
return true; | |
} | |
/** | |
* Checks if an object has any own enumerable properties | |
* @param obj - The object to check | |
* @returns true if the object is empty, false otherwise | |
*/ | |
function isEmpty(obj: Record<string, unknown>): boolean { | |
return Object.keys(obj).length === 0; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed above: #693 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amarbakir, glad to hear that you've addressed the issue! Thank you for your prompt action.
(_/)
(•ᴗ•)
/ \
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
if (!fileContents) { | ||
console.log('Invalid file contents') | ||
return {} as WorkbookCapture | ||
} | ||
|
||
const parsedData = JSON.parse(fileContents) | ||
if (typeof parsedData !== 'object' || parsedData === null) { | ||
console.error("Invalid input: data must be an object."); | ||
return {}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling consistency and type safety
Several issues in the validation logic:
- Inconsistent return types between empty file and invalid JSON cases
- Inconsistent logging levels (log vs error)
- Missing type guard for parsedData
Apply these improvements:
if (!fileContents) {
- console.log('Invalid file contents')
- return {} as WorkbookCapture
+ console.error('Invalid file contents')
+ return {} as WorkbookCapture
}
- const parsedData = JSON.parse(fileContents)
+ const parsedData: unknown = JSON.parse(fileContents)
if (typeof parsedData !== 'object' || parsedData === null) {
console.error("Invalid input: data must be an object.");
- return {};
+ return {} as WorkbookCapture;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if (!fileContents) { | |
console.log('Invalid file contents') | |
return {} as WorkbookCapture | |
} | |
const parsedData = JSON.parse(fileContents) | |
if (typeof parsedData !== 'object' || parsedData === null) { | |
console.error("Invalid input: data must be an object."); | |
return {}; | |
} | |
if (!fileContents) { | |
console.error('Invalid file contents') | |
return {} as WorkbookCapture | |
} | |
const parsedData: unknown = JSON.parse(fileContents) | |
if (typeof parsedData !== 'object' || parsedData === null) { | |
console.error("Invalid input: data must be an object."); | |
return {} as WorkbookCapture; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PFacheris, glad to hear you agree!
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
it('has contact headers', () => { | ||
const headers = parseBuffer(buffer).contacts.headers | ||
expect(headers).toEqual([ | ||
'First Name', | ||
'Last Name', | ||
'Email', | ||
'Address.Street', | ||
'Address.City', | ||
'Address.State', | ||
'Address.Zip', | ||
'Address.Coordinates.Latitude', | ||
'Address.Coordinates.Longitude', | ||
'Father.First Name', | ||
'Father.Last Name', | ||
'Father.Father.First Name', | ||
'Father.Father.Last Name', | ||
'Father.Father.Father.First Name', | ||
'Father.Father.Father.Last Name', | ||
'Father.Father.Father.Father.First Name', | ||
'Father.Father.Father.Father.Last Name', | ||
'Father.Father.Father.Father.Father.First Name', | ||
'Father.Father.Father.Father.Father.Last Name', | ||
]) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce test duplication and consider header depth limits.
- Extract headers into a shared constant to avoid duplication with the main test.
- Consider documenting or validating the maximum supported nesting depth for headers.
const CONTACT_HEADERS = [
'First Name',
'Last Name',
'Email',
// ... rest of the headers
]
it('has contact headers', () => {
const headers = parseBuffer(buffer).contacts.headers
expect(headers).toEqual(CONTACT_HEADERS)
})
} catch (error) { | ||
console.error('An error occurred:', error) | ||
throw error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling to avoid exposing sensitive data.
The current error handling logs the entire error object, which might contain sensitive information.
Consider this improved error handling:
} catch (error) {
- console.error('An error occurred:', error)
+ console.error('Error parsing JSON sheet:', error instanceof Error ? error.message : 'Unknown error')
throw error
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
} catch (error) { | |
console.error('An error occurred:', error) | |
throw error | |
} | |
} catch (error) { | |
console.error('Error parsing JSON sheet:', error instanceof Error ? error.message : 'Unknown error') | |
throw error | |
} |
export function parseSheet(jsonArray: any[]): SheetCapture { | ||
try { | ||
// Ensure all items are objects | ||
const filteredResults = jsonArray.filter( | ||
(item) => typeof item === 'object' && item !== null | ||
) | ||
|
||
if (filteredResults.length === 0) { | ||
return {} as SheetCapture | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add depth limit to prevent stack overflow attacks.
The function accepts arbitrary JSON objects without depth limits, which could lead to stack overflow attacks with deeply nested objects.
Add a depth limit parameter and check:
-export function parseSheet(jsonArray: any[]): SheetCapture {
+export function parseSheet(jsonArray: any[], maxDepth: number = 10): SheetCapture {
try {
// Ensure all items are objects
const filteredResults = jsonArray.filter(
(item) => typeof item === 'object' && item !== null
)
if (filteredResults.length === 0) {
- return {} as SheetCapture
+ return {
+ headers: [],
+ data: []
+ } as SheetCapture
Committable suggestion skipped: line range outside the PR's diff.
utils/extractor/src/index.ts
Outdated
// Custom flatten function | ||
const flattenObject = (obj: any, parent: string = '', res: any = {}) => { | ||
for (let key in obj) { | ||
const propName = parent ? parent + '.' + key : key | ||
if (typeof obj[key] === 'object') { | ||
flattenObject(obj[key], propName, res) | ||
} else { | ||
res[propName] = obj[key] | ||
} | ||
} | ||
return res | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve flattenObject implementation for better type safety and performance.
The current implementation could be improved to handle edge cases and provide better type safety.
Consider this improved implementation:
- const flattenObject = (obj: any, parent: string = '', res: any = {}) => {
+ const flattenObject = (
+ obj: Record<string, unknown>,
+ parent: string = '',
+ res: Record<string, unknown> = {},
+ depth: number = 0
+ ): Record<string, unknown> => {
+ if (depth >= maxDepth) {
+ res[parent] = JSON.stringify(obj);
+ return res;
+ }
for (let key in obj) {
- const propName = parent ? parent + '.' + key : key
- if (typeof obj[key] === 'object') {
- flattenObject(obj[key], propName, res)
- } else {
- res[propName] = obj[key]
+ if (Object.prototype.hasOwnProperty.call(obj, key)) {
+ const propName = parent ? `${parent}.${key}` : key;
+ const value = obj[key];
+
+ if (value !== null && typeof value === 'object' && !Array.isArray(value)) {
+ flattenObject(value as Record<string, unknown>, propName, res, depth + 1);
+ } else {
+ res[propName] = value;
+ }
}
}
return res
}
Committable suggestion skipped: line range outside the PR's diff.
utils/extractor/src/index.ts
Outdated
// Flatten and filter all rows | ||
const filteredData = filteredResults.map((row) => { | ||
const flattedRow = flattenObject(row) | ||
return headers.reduce((filteredRow, header) => { | ||
const cell = flattedRow[header] | ||
filteredRow[header] = { | ||
value: Array.isArray(cell) ? JSON.stringify(cell) : cell, | ||
} | ||
return filteredRow | ||
}, {}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve value handling for arrays and null values.
The current implementation could handle array values and null checks more robustly.
Consider this improved implementation:
const filteredData = filteredResults.map((row) => {
const flattedRow = flattenObject(row)
return headers.reduce((filteredRow, header) => {
const cell = flattedRow[header]
- filteredRow[header] = {
- value: Array.isArray(cell) ? JSON.stringify(cell) : cell,
- }
+ filteredRow[header] = {
+ value: cell === null ? null :
+ Array.isArray(cell) ? JSON.stringify(cell) :
+ typeof cell === 'object' ? JSON.stringify(cell) :
+ cell
+ }
return filteredRow
}, {})
})
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Flatten and filter all rows | |
const filteredData = filteredResults.map((row) => { | |
const flattedRow = flattenObject(row) | |
return headers.reduce((filteredRow, header) => { | |
const cell = flattedRow[header] | |
filteredRow[header] = { | |
value: Array.isArray(cell) ? JSON.stringify(cell) : cell, | |
} | |
return filteredRow | |
}, {}) | |
}) | |
// Flatten and filter all rows | |
const filteredData = filteredResults.map((row) => { | |
const flattedRow = flattenObject(row) | |
return headers.reduce((filteredRow, header) => { | |
const cell = flattedRow[header] | |
filteredRow[header] = { | |
value: cell === null ? null : | |
Array.isArray(cell) ? JSON.stringify(cell) : | |
typeof cell === 'object' ? JSON.stringify(cell) : | |
cell | |
} | |
return filteredRow | |
}, {}) | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practice is not having logic directly in the root level index.ts
but just a series of export * from "./otherFile"
or export { OneThing } from "./otherFile"
statements. Let's move the JSONMultiSheetExtractor
to another file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pattern used in this repo though. Shouldn't we follow the repo standard set by the authors? If we were to change this I'd suggest changing the standard across the repo. Example: https://github.com/FlatFilers/flatfile-plugins/blob/main/plugins/json-extractor/src/index.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is standard probably best to leave for now then.
console.log('Invalid file contents') | ||
return {} as WorkbookCapture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is standard but console.error
seems to make more sense here.
utils/extractor/src/index.ts
Outdated
return {} as SheetCapture | ||
} | ||
|
||
// Custom flatten function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly more descriptive comment would be good
} | ||
|
||
// Custom flatten function | ||
const flattenObject = (obj: any, parent: string = '', res: any = {}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference is not mutating argument inputs but creating a copy of the input and returning the mutated version to avoid confusion.
utils/extractor/src/index.ts
Outdated
|
||
// Flatten and filter all rows | ||
const filteredData = filteredResults.map((row) => { | ||
const flattedRow = flattenObject(row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: flattenedRow*
Please explain how to summarize this PR for the Changelog:
Tell code reviewer how and what to test: