Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Belgium addresses fail to parse correctly #63

Open
ericclaeren opened this issue May 18, 2021 · 5 comments
Open

Belgium addresses fail to parse correctly #63

ericclaeren opened this issue May 18, 2021 · 5 comments

Comments

@ericclaeren
Copy link
Contributor

Hi,

On a large production site we are encountering many scenarios for Belgium customers where we run into issues with submitting data to Paazl and where customers aren't getting their packages.

Some are because people tend to enter their full address with postal code and city in the street address lines.
We are trying to prevent this through better labeling and preview the address. But sometimes you can't just win them all.

But there are quite some common Belgium scenarios which fail, based on OrderTest::testParseAddress

All are real examples (anonymized street names and numbers) where packages were returned by the carrier due to an incorrect address.

[
    ['street', '65', 'bus 101'],
    [
        'street' => 'street',
        'houseNumber' => '65',
        'houseNumberExtension' => 'bus 101'
    ]
],
[
    ['street', '65', 'bus 101, app 1'],
    [
        'street' => 'street',
        'houseNumber' => '65',
        'houseNumberExtension' => 'bus 101, app 1'
    ]
],
[
    ['street 5 bus 1'],
    [
        'street' => 'street',
        'houseNumber' => '5',
        'houseNumberExtension' => 'bus 1'
    ]
],
[
    ['street', '10 bus 5'],
    [
        'street' => 'street',
        'houseNumber' => '10',
        'houseNumberExtension' => 'bus 5'
    ]
],
[
    ['saint - street - 10'],
    [
        'street' => 'saint - street',
        'houseNumber' => '10',
        'houseNumberExtension' => ''
    ]
],
[
    ['king sir 1-laan 96B002'],
    [
        'street' => 'king albert 1-laan',
        'houseNumber' => '96',
        'houseNumberExtension' => 'B002'
    ]
],
[
    ['saint sir street', '37', 'Bus 1'],
    [
        'street' => 'saint sir street',
        'houseNumber' => '37',
        'houseNumberExtension' => 'Bus 1'
    ]
],
[
    ['street', '22A', '1A'],
    [
        'street' => 'street',
        'houseNumber' => '22A',
        'houseNumberExtension' => '1A'
    ]
],
[
    ['street 22A/1A'],
    [
        'street' => 'street',
        'houseNumber' => '22A',
        'houseNumberExtension' => '1A'
    ]
],

If have tried to create a 1 to rule them all regex to match this and failed miserably 😬 .

Is there a way you could support more types of address notations for Belgium addresses?

Cheers, Eric

@ericclaeren
Copy link
Contributor Author

ericclaeren commented May 25, 2021

Please tell me you value the Belgians and don't leave them waiting on their missing packages ... 😉

This is my attempt for a better working regex for a [street] [num] [addition] strategy, haven't attempted a [num] [street] pattern, but guessing yours will be suffice.

~^(?<street>\D+)(?:[ ]+)(?<number>\d{1,}[[[:space:]]*]?\p{Pd}[[[:space:]]*]?\d{1,}(?:(?![[:punct:]]))|\d{1,}\w{1}(?=[ \p{Pd}\/\\\\]\d+.{1,})|\d{1,})(?:[ -\/\\\\]+)?(?<ext>.*?)$~u

This doesn't cover this scenario:
king sir 1-laan 96B002, but seems to work pretty well, you might want to look into this, if you have any improvements, please let me know.

My full code:

// Match unicode letters to include the german ß or Polish e.g. ćęł
        $pattern = '~^(?<street>\D+)(?:[ ]+)(?<number>\d{1,}[[[:space:]]*]?\p{Pd}[[[:space:]]*]?\d{1,}(?:(?![[:punct:]]))|\d{1,}\w{1}(?=[ \p{Pd}\/\\\\]\d+.{1,})|\d{1,})(?:[ -\/\\\\]+)?(?<ext>.*?)$~u';
        preg_match($pattern, $street, $matches);

        // Spaces matches 'a / b' and turns into 'a/b', but not 'a bus a'.
        // Selects one or more spaces followed or preceded by a punctuation mark.
        $spacesPattern = '~(?:[[:space:]]+(?=[[:punct:]])|(?<=[[:punct:]])[[:space:]]+)~';

        return [
            'street' => trim($matches['street'] ?? ''),
            'houseNumber' => preg_replace($spacesPattern,'', trim($matches['number'] ?? '')),
            'houseNumberExtension' => preg_replace($spacesPattern,'', trim($matches['ext'] ?? '')),
        ];

@paazl-jaime
Copy link

Hi @ericclaeren ,

Thanks for checking in and contributing! Getting addresses in the correct format from a single line has proven to be a very big challenge that will always lead to edge cases that can't be parsed into the correct fields.

What I am particularly interested in, is why the first 7 examples of your list went wrong. They seem to have been placed into the correct fields. Could you please clarify on that one?

If it's easier to discuss this by email via our Support desk, feel free to send us an email as well.

@ericclaeren
Copy link
Contributor Author

Hi @paazl-jaime

Yeah there's no to rule them all, I have overridden Paazl at this time and use the Paazl regex as a fallback when the shared example fails. In this case I have covered quite some scenario's but far from all and not ideal.

Well the why, is pretty obvious if you add the addresses provided to your own unit test suite 😄

13) WeProvide\Paazl\Test\Unit\Model\Api\Builder\PaazlOrderTest::testParseAddress with data set #39 (array('street', '10 bus 5'), array('street', '10', 'bus 5'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    'street' => 'street'
-    'houseNumber' => '10'
-    'houseNumberExtension' => 'bus 5'
+    'street' => 'street 10 bus'
+    'houseNumber' => '5'
+    'houseNumberExtension' => ''
 )

I'd rather use Github as other customer also may benefit or could provide additional information when running into similar issues.

Cheers, Eric

@paazl-jaime
Copy link

paazl-jaime commented May 31, 2021

Hi @ericclaeren,

Thanks! That's a clear one.

Two points we'd like to make on this matter:

  1. It is also possible to send street, house number and extension separately to Paazl from the extension. There is a setting in the Paazl-extension that makes sure you can use address line 1 for 'street', line 2 for 'houseNumber' and line 3 for 'extension'. If you would adjust the labels to this setup, that might already solve a greater deal of the issues you mentioned. I read that you were already planning on using this setup, in your first message. If you'd need any help setting up this configuration, please let us know!
  2. I don't expect we will perform any updates on the regex. We are even in discussion about removing this part, because it doesn't solve the actual issue well enough. On top of that, as mentioned at 1, there is a solution available in the extension that solves the issue.

Let us know if you have any questions or remarks!

@ericclaeren
Copy link
Contributor Author

Hi @paazl-jaime

The first is not an option for us at this time, because labels aren't forcing a correct input and html autocomplete which lots of people use will ignore this, thus leaving us with incorrect addresses.

How do you see this in the future if this will be removed from your code?

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants