Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regex capture issue with special characters #162

Open
wexxlee opened this issue Jun 7, 2024 · 2 comments
Open

Possible regex capture issue with special characters #162

wexxlee opened this issue Jun 7, 2024 · 2 comments
Assignees
Labels
bug Something isn't working needs-review Awaiting review raidboss /ui/raidboss module

Comments

@wexxlee
Copy link
Collaborator

wexxlee commented Jun 7, 2024

Some additional discussion/context on #138. The following line caused unit test failures when used with capturing regex due to the presence of the (escaped) \r:

'262|2023-04-21T23:24:06.0630000-04:00|en|00000051|_rsv_35827_-1_1_0_0_S13095D61_E13095D61|Further testing is required.�����,\\r���)������ ��, assist me with this evaluation.|38151741aad7fe51',

This issue was worked around for unit testing in #139, but we should confirm that this is not a persistent issue in regex capturing that will cause errors/fail to capture during actual execution. Assigning this to myself since I hope to have time to take a look at it next week, but completely fine if someone else wants to look at it in the meantime.

@wexxlee wexxlee added bug Something isn't working raidboss /ui/raidboss module needs-review Awaiting review labels Jun 7, 2024
@wexxlee wexxlee self-assigned this Jun 7, 2024
@valarnin
Copy link
Collaborator

valarnin commented Jun 7, 2024

This issue is caused by bad base data. The data in the actual raw log line is unicode characters (specific to FFXIV unicode space) and should have been added with '\uXXXX' instead of raw copy/pasted. I unfortunately don't have the original raw log lines to properly add them in here.

The parse logic should be handling unicode properly to begin with since it handles JP/CN/KR characters fine.

@valarnin
Copy link
Collaborator

valarnin commented Jun 7, 2024

Hmm, looks like the raw unicode data is actually there.

$ xxd resources/example_log_lines.ts| less
...
000057b0: 317c 4675 7274 6865 7220 7465 7374 696e  1|Further testin
000057c0: 6720 6973 2072 6571 7569 7265 642e efbf  g is required...
000057d0: bdef bfbd efbf bdef bfbd efbf bd2c 5c5c  .............,\\
000057e0: 72ef bfbd efbf bdef bfbd 29ef bfbd efbf  r.........).....
000057f0: bdef bfbd efbf bdef bfbd efbf bd20 efbf  ............. ..
00005800: bdef bfbd 2c20 6173 7369 7374 206d 6520  ...., assist me 

Just need to properly encode the data I guess? If there's still an issue with parsing after that, then it's an actual bug.


Edit: Should be something like this, please double check.

'262|2023-04-21T23:24:06.0630000-04:00|en|00000051|_rsv_35827_-1_1_0_0_S13095D61_E13095D61|Further testing is required.\uefbf\ubdef\ubfbd\uefbf\ubdef\ubfbd\uefbf\ubd2c\u5c5c\u72ef\ubfbd\uefbf\ubdef\ubfbd\u29ef\ubfbd\uefbf\ubdef\ubfbd\uefbf\ubdef\ubfbd\uefbf\ubd20\uefbf\ubdef\ubfbd, assist me with this evaluation.|38151741aad7fe51',

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-review Awaiting review raidboss /ui/raidboss module
Projects
None yet
Development

No branches or pull requests

2 participants