-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tape reading without alloc #226
base: main
Are you sure you want to change the base?
Conversation
Hey @poncito! Thanks for opening a PR. Some really interesting ideas here. I've also been thinking here and there about how we can support more lazy/non-allocating queries/workflows. I'm playing with some ideas here. In truth, I'd like to move away from the tape all together, since that itself requires a big upfront allocation and to be held onto via lazy objects. Ideally, I'm hopeful we can figure out a purely functional solution where we/users could make custom Anyway, I'm not sure if I'll be able push forward on it much in the near term (depends on a few other projects going on), but I'd be happy to discuss some ideas more. |
Hey @quinnj , thanks for your answer. I can dedicate a few hours a week of my free time to develop this kind of stuff. So if you have time to coach me on that, I could propose some code. My understanding of JSONBase.jl and your explanation is the following:
Am I understanding correctly? I'm really not sure because of your comment on lazy objects. You could also mean that you would like to make the parsing lazy? I'm not sure what it would mean and where the performance would come. My guess on how to efficiently parse and read a json would be the following dichotomy (and their corresponding 'AbstractContext'?),
From this dichotomy, it appears that not only the "storage" should be abstract, but also the "parsing" (that does not appear in your JSONBase.jl). It is required to exploit the schema. But it can also be used to relax some checks. For instance, if I want to take the risk to suppose that the serialized json is well formed, I could decide to replace pos + 3 <= len &&
b == UInt8('t') &&
buf[pos + 1] == UInt8('r') &&
buf[pos + 2] == UInt8('u') &&
buf[pos + 3] == UInt8('e') by b == UInt8('t') I could also not read the keys, and directly jump ahead from the size of the expected key (and the next delimiter), ... I can find some free time if you want to chat: [email protected] |
Any updates on this? I really would like to help since I really need it, the problem is that Im inexperienced, so I might cause some troubles :D |
I've been slowly chipping away at some ideas at https://github.com/quinnj/JSONBase.jl. I've gotten pretty far, but haven't been able to do really thorough benchmarking yet to make sure everything is squared away. There's also some polish, package admin, and testing to do, but I think the fundamentals are in pretty good shape. Happy to chat in that repo with anyone who's interested to collaborate and try things out with me. |
Hi,
This PR is not finished, just a POC to get the discussion started.
So, the idea is to be able to read the tape without allocation. I created a benchmark with those results, where
f0
just callsread
, andf1
builds the tape, but then uses my non allocating code.The implementation is completely non-mutable, and has two main concepts:
Cursor
they just contain an Int that is an index of the tape,JSONItem
that contain the cursor, the tape and the original string.There are some variations of JSONItem, that implement some handy interface given the data they represent:
JSONField
(that represents a field of an object), key and value can be accessed withkey
andvalue
JSONObject
, that represents an object and is used as a dictionary (and that could be anAbstractDict
),JSONArray
that acts represents an array, and can be iterated over efficiently.A few details about the implementation:
JSONItem
are returned.JSONItem
contain the tape, and hence should be allocating. Yet, it's generally possible for the compiler to remove to those. In any case, users can use a low level use of the isbitsCursor
to not rely on those compiler optimizations.Vector
.@quinnj Are you interested?
To pursue this PR, I would like to:
Object
and anArray
...@inbounds
in the code this weekend, they are intended for the benchmark, but they are not acceptable (especially for thegetindex
methods that take aCursor
as an argument)."[1.0,1.5]"
will beUnion{Int64,Float64}
, where the user would most certainly expect a JSONArray{<:AbstractFloat}. So there's a bit of logic to code here to handle those kind of cases.To go even further: