You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The changes in #5384 bumped the lake version from v3 to v4. While getting Shasta and Autoperf caught up with SuperDB-era changes, I bumped into the need to migrate their lake data through this transition and hit a couple snags. That led me to confront that we'll have users that need to make the same transition, so we'll likely want to provide tools and/or functionality to make this seamless. I'll document below the bumps I hit along the way and we can figure out how we'd best like to address.
Details
At the time this issue is being opened, super is at commit 966d3a9.
Let's start from the the assumption there's a v3 lake with some pool data that was all created using the last GA Zed release v1.18.0, similar to what whih would be found on someone's desktop if they'd been running the last GA Zui release v1.18.0.
Assuming a running lake service:
$ zed -lake lake serve
{"level":"info","ts":1734808141.856885,"logger":"core","msg":"Started","auth_enabled":false,"root":"file:///Users/phil/lake","version":"v1.18.0"}
{"level":"info","ts":1734808141.8587148,"logger":"httpd","msg":"Listening","addr":"[::]:9867"}
...
And a client:
$ zed -version
Version: v1.18.0
$ zed create -use foo
pool created: foo 2qXTwq29mYhnYApJF0f9TTd8MqW
Switched to branch "main" on pool "foo"
$ echo '{"foo": "bar"}' | zed load -
(2/1) 15B 15B/s
2qXTxPxgibsGXS4tDfulEHTrGP6 committed
$ zed query 'from foo'
{foo:"bar"}
Now let's say I want to start using the super command. If I happened to try accessing the lake directly rather than trying to immediate provide access to the data via a service, I'm helpfully stopped in my tracks (albeit with an error message that's not entirely helpful... perhaps we could benefit from something that notices the older lake version and mentions this explicitly?)
$ super -version
Version: v1.18.0-202-g966d3a90
$ super db -lake lake ls
file:///Users/phil/lake: lake does not exist
(hint: run 'super db init' to initialize lake at this location)
If I scratch my head at this point and go ahead and try running the service anyway, the plot thickens, as it looks like all my data has disappeared.
$ super db -lake lake serve
{"level":"info","ts":1734808395.79775,"logger":"core","msg":"Started","auth_enabled":false,"root":"file:///Users/phil/lake","version":"v1.18.0-202-g966d3a90"}
{"level":"info","ts":1734808395.7980201,"logger":"httpd","msg":"Listening","addr":"[::]:9867"}
$ super db ls
[no output]
What's happened is that the lake has effectively been re-initialized. There's now a new lake.bsup file mentioning v4 alongside the prior lake.zng mentioning v3. More destructively though, the lake/pools/HEAD file now contains a 0, which effectively makes it look like there's no pool data to serve.
$ find lake
lake
lake/pools
lake/pools/TAIL
lake/pools/HEAD
lake/pools/1.zng
lake/lake.zng
lake/lake.bsup
lake/2qXTwq29mYhnYApJF0f9TTd8MqW
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/commits
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/commits/2qXTxPxgibsGXS4tDfulEHTrGP6.zng
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/commits/2qXTxPxgibsGXS4tDfulEHTrGP6.snap.zng
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/branches
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/branches/TAIL
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/branches/HEAD
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/branches/2.zng
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/branches/1.zng
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/data
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/data/2qXTxJrSiF42EHnit25K8aBFha9.zng
lake/2qXTwq29mYhnYApJF0f9TTd8MqW/data/2qXTxJrSiF42EHnit25K8aBFha9-seek.zng
$ super lake/lake.zng
{magic:"ZED LAKE",version:3}(=lake.LakeMagic)
$ super lake/lake.bsup
{magic:"ZED LAKE",version:4}(=lake.LakeMagic)
$ cat lake/pools/HEAD
0
It seems I was able to heal the lake from this state by:
Removing the lake/lake.zng file
Renaming all the remaining .zng files to instead use extension .bsup
Change the contents of the HEAD file to the highest number in the pools/ directory (1 in this case)
$ rm lake/lake.zng
$ for zngfile in $(find lake -name \*.zng); do mv $zngfile $(echo $zngfile | sed 's/\.zng$/.bsup/'); done
$ echo -n "1" > lake/pools/HEAD
Now after I restart the lake service, I'm back in business.
$ super db ls
foo 2qXTwq29mYhnYApJF0f9TTd8MqW key ts order desc
$ super db query 'from foo'
{foo:"bar"}
Ideas
In conclusion:
I think this was all the necessary steps, as it gives me the feel of a functioning lake, but if anyone can spot additional surgery I missed, please speak up.
We could surely debate different ways to deliver the surgery to users, e.g.,:
We could provide a separate standalone migration script and documentation to guide users how to run it
super itself could recognize the characteristics of the v3 lake and either:
Make the modifications all at once as a bulk operation
Make the modifications in a lazy fashion (e.g., allow for reading both .zng and .bsup extensions and write new .bsup files as necessary and let .zng ones fade away on their own over time via compaction/vacuum)
The text was updated successfully, but these errors were encountered:
tl;dr
The changes in #5384 bumped the lake version from v3 to v4. While getting Shasta and Autoperf caught up with SuperDB-era changes, I bumped into the need to migrate their lake data through this transition and hit a couple snags. That led me to confront that we'll have users that need to make the same transition, so we'll likely want to provide tools and/or functionality to make this seamless. I'll document below the bumps I hit along the way and we can figure out how we'd best like to address.
Details
At the time this issue is being opened,
super
is at commit 966d3a9.Let's start from the the assumption there's a v3 lake with some pool data that was all created using the last GA Zed release
v1.18.0
, similar to what whih would be found on someone's desktop if they'd been running the last GA Zui releasev1.18.0
.Assuming a running lake service:
And a client:
Now let's say I want to start using the
super
command. If I happened to try accessing the lake directly rather than trying to immediate provide access to the data via a service, I'm helpfully stopped in my tracks (albeit with an error message that's not entirely helpful... perhaps we could benefit from something that notices the older lake version and mentions this explicitly?)If I scratch my head at this point and go ahead and try running the service anyway, the plot thickens, as it looks like all my data has disappeared.
What's happened is that the lake has effectively been re-initialized. There's now a new
lake.bsup
file mentioning v4 alongside the priorlake.zng
mentioning v3. More destructively though, thelake/pools/HEAD
file now contains a0
, which effectively makes it look like there's no pool data to serve.It seems I was able to heal the lake from this state by:
lake/lake.zng
file.zng
files to instead use extension.bsup
HEAD
file to the highest number in thepools/
directory (1
in this case)Now after I restart the lake service, I'm back in business.
Ideas
In conclusion:
super
itself could recognize the characteristics of the v3 lake and either:.zng
and.bsup
extensions and write new.bsup
files as necessary and let.zng
ones fade away on their own over time via compaction/vacuum)The text was updated successfully, but these errors were encountered: