Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEXT-37310 - Added single row import strategy on import error #35

Conversation

CR0YD
Copy link
Contributor

@CR0YD CR0YD commented Aug 21, 2024

When encountering an for now unhandleable error during an import of a chunk we return the whole error but skip the chunk.
I've added a new strategy where we try to import the chunk piece by piece to specifically filter the ones that are invalid (the error will be still returned to the user, cf. import.rs ll. 172 - 173).
There was also a small bug in remove_invalid_entries_from_chunk where if an row contained multiple invalid fields the row itself but also some of the following ones (2 errors -> 1 row after the invalid one, and so on) were filtered. To prevent that I added a filter for duplicates in import.rs l. 255.

Copy link
Member

@LarsKemper LarsKemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@MalteJanz MalteJanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 💪 , only added some small notes 🙂

src/data/import.rs Outdated Show resolved Hide resolved
CHANGELOG.md Outdated
@@ -1,4 +1,5 @@
# NEXT-RELEASE
- NEXT-37310 - Added single row import strategy when encountering an unhandleable error during a chunk import.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- NEXT-37310 - Added single row import strategy when encountering an unhandleable error during a chunk import.
- NEXT-37310 - Added single row import strategy when encountering an error without a reference to a specific row during a chunk import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say that this formulation is better as there are errors, e.g. a product with a duplicate product number, that reference a specific row but also exit the sync call with an error.
In my opinion it would maybe be better to write something like "[...] when encountering an error that cannot be handled automatically during a chunk import".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say that this formulation is better

Of course I also wouldn't say it is better, just a suggestion my mind came up with. Feel free to discard it 🙂 .

as there are errors, e.g. a product with a duplicate product number, that reference a specific row but also exit the sync call with an error.

Does it? I thought that in case we have an "error pointer" we would just remove that invalid entry from the chunk and retry. Of course the user sees the error.

And in the case where we don't have specific "error pointers", we fallback to one-by-one import and report any errors encountered there, which still shouldn't fail the whole import.

Copy link
Contributor Author

@CR0YD CR0YD Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only filter the rows that have a corresponding SwError::WriteError. But the error that is thrown when we have for example a duplicate product number is of the type SwError::GenericError so it won't be filtered by remove_invalid_entries_from_chunk.

And with my current implementation these falsy rows will still be skipped during the "single row import" as every row that triggers an automatically unhandleable error e.g. the "duplicate product number" error (not deadlocks and the like!) will just be ignored.

src/data/import.rs Outdated Show resolved Hide resolved
@LarsKemper LarsKemper added the bug Something isn't working label Aug 21, 2024
@CR0YD CR0YD force-pushed the next-37310/added-single-import-strategy-on-unhandleable-error-on-import branch from 6e4a5ba to d41610a Compare August 28, 2024 12:05
Copy link

Summary of the total line code coverage for the whole codebase

Total lines Covered Skipped % (pr) % (main)
2154 1363 791 63.28 63.93
Summary of each file (click to expand)
File Total lines Covered Skipped %
src/api/filter.rs 171 168 3 98.25
src/api/mod.rs 477 293 184 61.43
src/cli.rs 45 31 14 68.89
src/config_file.rs 76 56 20 73.68
src/data/export.rs 136 0 136 0.00
src/data/import.rs 184 0 184 0.00
src/data/transform/mod.rs 336 259 77 77.08
src/data/transform/script.rs 292 258 34 88.36
src/data/validate.rs 298 298 0 100.00
src/main.rs 139 0 139 0.00
More details (click to expand)

Download full HTML report

You can download the full HTML report here: click to download
Hint: You need to extract it locally and open the index.html, there you can see which lines are not covered in each file.

You can also generate these reports locally

For that, you need to install cargo-llvm-cov, then you can run:

cargo llvm-cov --all-features --no-fail-fast --open

Hint: There are also other ways to see code coverage in Rust. For example with RustRover, you can execute tests with coverage generation directly in the IDE.

Remember

Your tests should be meaningful and not just be written to raise the coverage.
Coverage is just a tool to detect forgotten code paths you may want to think about, not your instructor to write tests

@CR0YD CR0YD merged commit 20940fd into main Aug 28, 2024
3 checks passed
@CR0YD CR0YD deleted the next-37310/added-single-import-strategy-on-unhandleable-error-on-import branch August 28, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants