Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements (lexer/parser) #46

Closed
fcaps opened this issue Apr 25, 2021 · 6 comments
Closed

Performance improvements (lexer/parser) #46

fcaps opened this issue Apr 25, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@fcaps
Copy link

fcaps commented Apr 25, 2021

Hi guys,

the memory usage is awesome but the cpu-time is ~100x compared to json_decode (100MB json with 10000 entries).
Did you consider using a c-extension for the tokenizing/parsing?
Never wrote a extension, but looks like we could extend ext-json
or even just use ext-parle for the heavy lifting.

Could try to implement a lexer with ext-parle and look how the performance changes and then implement a parser if you guys think this is a good idea.

Greeting

@halaxa
Copy link
Owner

halaxa commented Apr 25, 2021

Hi,

I was considering that and looked into it a little bit but I do not have the knowledge. I was fiddling with it in zephir branch. Learning to code extensions in pure C would be fun and is appealing to me but I do not have the time to do it right now. I am open to that if anyone else does.

@halaxa
Copy link
Owner

halaxa commented Apr 25, 2021

I was looking into Parle earlier. I am worried about the lack of documentation. Is it even able to consume the source lang (json) chunks iteratively? If it is feel free to write a prototype. A lexer producing json tokens which would be interchangable with JsonMachine\Lexer might be a good start.

@fcaps
Copy link
Author

fcaps commented Apr 27, 2021

sorry, was busy^^
had a deep look into parle, yeah... no documentation for PHP, but there is some for the original c implementation lexertl.
The first working lexer working with parle was terrible x2 slower compared to the pure php implementation.
The second lexer was "state aware" and was much faster, but at this stage it's almoast a parser.

Current State:

  • Prototype is working (Lexer only) but slow and not easy to debug
  • Implementation is working with "chucks/streams" -> working with a local buffer if something looks wrong (just waiting for the next chunk)
  • Lexer is "state-aware"

Open Questions:

  • How to "select" a Lexer/Parser in the future for JsonMachine, since the Parser/Lexer is Hardcoded?
  • Is there a simpler option to improve performance? maybe a "ArrayItemMatcher" that just can handle chunks and returning raw JSON values that json_decode can handle (object/array) and handle (string/number/null/true/false) directly/with hacks.

Next Steps:

  • Creating a test for performance comparison
  • Investigate "ArrayItemMatcher"
  • Removal/Ignoring of unused whitespaces/junk (every iteration hurts in php), stream pre-processor to the rescue?
  • Working to have a "clean" way to define tokens, like the ext-php scanner, so maintenance/debugging is easy
  • Having a look into the RParser if it fits our needs
  • Publish a alpha-version for you/others

@halaxa halaxa added the enhancement New feature or request label Apr 28, 2021
@halaxa
Copy link
Owner

halaxa commented Apr 28, 2021

You seem dedicated :) Can you show some code?

Are you using tests to verify correctness?

I would elaborate on other topics when we see a significant impact on performance.

@Remo
Copy link

Remo commented Sep 17, 2021

While looking for a faster alternative, I found this https://github.com/shevron/ext-jsonreader
It's written in C and offers streaming as well. I haven't done any testing, but before you guys start working on something new you might want to check it out. Unfortunately it's quite old, but it might still give you a head start.

@halaxa
Copy link
Owner

halaxa commented May 6, 2023

Let's contiune in #97

@halaxa halaxa closed this as completed May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants