feat: Add Bunny.net object storage support #4325

l0wl3vel · 2024-01-09T20:02:07Z

This PR implements support for Bunny.net object storage.

CLAassistant · 2024-01-10T11:27:07Z

All committers have signed the CLA.

zhijian-pro · 2024-01-12T06:50:28Z

In fact, the S3 API is better suited for juicefs, and if the S3 api is released in the future, we may need to reimplement it with the S3 API. @l0wl3vel @solracsf

l0wl3vel · 2024-01-12T08:05:51Z

I know about the pending implementation of S3 for bunny storage. I would say this is a similar situation to B2 with their native API and their S3 compatible API.

I think it might be useful to have their native API supported for two reasons:

It is really simple compared to S3. Four API calls that can be done via curl, if you desire
Performance. Bunny Storage seems to perform much better than their competitors in a Multi-Cloud scenario. (I tested B2 and Cloudflare R2)

They advertise extremely low TTFB latency. I think it might be a good idea to have their native API implemented to compare to S3 when they offer it. Bunny's performance advantage could be caused partially by their own API which would justify having a bespoke implementation.

solracsf · 2024-01-12T08:43:01Z

S3 is "coming soon" since 2022... 💭
SFTP is broken (not using standards).

So API is the better option (also because there are no egress fees using the API).

solracsf · 2024-01-12T19:35:55Z

It is really simple compared to S3. Four API calls that can be done via curl, if you desire.

@l0wl3vel regarding this, if this can simplify things, is the external library git.sr.ht/~jamesponddotco/bunnystorage-go really necessary? Could the necessary code to make the API calls be integrated directly into JuiceFS codebase, instead of relying on an external lib?

l0wl3vel · 2024-01-12T19:50:38Z

Maybe, but I do not think that is a good idea to basically put an API SDK inside of a specialized codebase the size of JuiceFS.

I thought about forking/redoing the git.sr.ht/~jamesponddotco/bunnystorage-go library, since I think that it features some weird abstractions and it is using some homebrew libraries written by the author, some of which (e.g. git.sr.ht/~jamesponddotco/httpx-go) are deprecated.

@solracsf Would you be interested in building new bunnystorage-go library using more standard dependencies with me?

If the JuiceFS maintainers are fine with pulling in the jamesponddotco/bunnystorage-go dependency for now I would prefer it to just get it merged and working on replacing the API SDK with another implementation later on.

solracsf · 2024-01-13T11:42:44Z

@l0wl3vel I'm not a Go developer, I'm afraid i can't help so much here 😿

zhijian-pro · 2024-01-13T13:11:08Z

We want to merge a stable code. And one that is useful to users.

l0wl3vel · 2024-01-13T13:31:40Z

Would you consider merging the code in its current form? And if not, what changes would be required to consider it for merging?

What kind of stability are we talking about? Upgrade Stability (Users can upgrade JuiceFS without need for migration), functional stability, dependency stability? Are there guidelines on what you would consider stable and what assurances are necessary to proof a certain kind of stability?

jamesponddotco · 2024-01-14T18:43:50Z

@l0wl3vel Let me know what you need changed in the base module and I can see if I can find the time to work on it. At the very least, I'll remove the need for httpx-go, since I deprecated that, and replace it with the standard net/http module.

l0wl3vel · 2024-01-14T20:10:30Z

@jamesponddotco Hey, really appreciating you chiming in here. Thanks for writing bunnystorage-go. Here are a few things I think would improve it.

Combine the functionality of Config and Client. I do not see the reason to keep them separate.
Initialize the new Client with a single New function or with a builder-style semantic
Replace the xstrings logic for building URLs with net/url. I think it is much more idiomatic and easier to read.
Change the Enum-Style Abstraction for the endpoints. It requires updating the library every time a new endpoint is added by Bunny.net. Replace the field in the Client struct with a net/url and providing some constants to quickly fill in the endpoint with a known endpoint.
Remove the dependency on xerrors
Replace UserAgent with a string without any custom datatype.
Move the code to Github for visibility and better collaboration.

I think this is getting a bit off-topic here, but I already started converting bunnystorage-go to resty on my fork. Uploading does not work properly, unfortunately. Do you think you could move the main bunnystorage-go repo to github?

No offense, I would really like to remove the dependency on sr.ht and your xstd-go entirely to make the module it more idiomatic and accessible.

jamesponddotco · 2024-01-14T21:36:27Z

@l0wl3vel, thank you for the feedback, it's invaluable!

Combine the functionality of Config and Client. I do not see the reason to keep them separate.
Initialize the new Client with a single New function or with a builder-style semantic

I mostly like to keep configuration in its own struct, hence why they're separate. This is personal preference, really, so I don't see it changing. If you look at other clients I wrote, like the cloudcraft-go one for Datadog, you'll see the same style.

Replace the xstrings logic for building URLs with net/url. I think it is much more idiomatic and easier to read.

xstrings is used to avoid allocations, but since I didn't really took precautions anywhere else, I could definitely remove it.

Change the Enum-Style Abstraction for the endpoints. It requires updating the library every time a new endpoint is added by Bunny.net. Replace the field in the Client struct with a net/url and providing some constants to quickly fill in the endpoint with a known endpoint.

That was a mistake indeed, but fixing it would require a breaking change that I'm not sure I'm willing to make right now. I'll consider it, though.

Same goes for that logging interface. I should probably have gone with something that would allow me to use log/slog instead.

Remove the dependency on xerrors.

Using errors as constants is also a personal preference, but there is some reason behind it:
https://dave.cheney.net/2016/04/07/constant-errors

Replace UserAgent with a string without any custom datatype.

Another mistake that would require a breaking change to undo, one that I make on every client, really. Not sure if this will be undone, though.

Move the code to Github for visibility and better collaboration.

Not happening, sorry. I much prefer the email workflow for working with code and git in general, and prefer SourceHut to GitHub as a platform for several reasons. I still use GitHub as a read-only mirror and would probably accept PRs over here since I can't disable them, but that is as far as I'll go with the platform.

As for the usage of xstd, you'll see that module in pretty much all code I write, because I wrote it to decrease duplication. If I need to write the same function for two different projects, it goes into xstd instead.

I'll look into those changes this week, but I assume you'll use your fork instead for this PR. Still, glad my code could be of some help, haha.

Since this is indeed becoming off-topic, reach out to me at [james at cipher dot host] if you want to continue the conversation.

l0wl3vel · 2024-01-17T12:32:55Z

Ok, as I see it, the plan is the following:

@jamesponddotco moves bunnystorage-go away from using httpx
@zhijian-pro We move away from using the deprecated httpx dependency. Would the JuiceData team be willing to merge this PR? If not, what are your concerns and preferred methods for dealing with them?
The only thing that concerns me about this PR is the format of the endpoint passed via CLI: I think something like https://<endpoint>/<bucketname> makes more sense than bunny://<bucketname>.<endpoint>. We stay closer to how the bunny API works internally and opens the path to easily moving away from bunnystorage-go, if the JuiceFS maintainers see a problem with using it, since it is just the base URL for all API calls. I will implement the endpoint change this week.

Does anyone have anything to add?

l0wl3vel · 2024-02-07T21:42:13Z

object_storage_test.go is passing now 🎉

zhijian-pro · 2024-02-11T13:23:03Z

object_storage_test.go is passing now 🎉

I will review this pr again after the Spring Festival.

previously the path parsed out of list responses did not contain the whole path. This commit allows juicefs delete, sync etc. to work properly

Required some fixes around error masking due to changed errors

l0wl3vel · 2024-02-25T15:28:36Z

Updated the bunny-storage-go-sdk to 0.0.10, which uses fasthttp instead of net/http and broke 1GiB/s in write speed and got some nice boost in read speed as well:

BlockSize: 1 MiB, BigFileSize: 1024 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 1
Time used: 0.0 s, CPU: NaN%, Memory: 0.0 MiB
+------------------+----------------+---------------+
|       ITEM       |      VALUE     |      COST     |
+------------------+----------------+---------------+
|   Write big file |  1001.05 MiB/s |   1.02 s/file |
|    Read big file |   706.92 MiB/s |   1.45 s/file |
| Write small file |   24.6 files/s | 40.63 ms/file |
|  Read small file | 1540.2 files/s |  0.65 ms/file |
|        Stat file | 6314.6 files/s |  0.16 ms/file |
|   FUSE operation |   0 operations |    0.00 ms/op |
|      Update meta |   0 operations |    0.00 ms/op |
|       Put object |   0 operations |    0.00 ms/op |
|       Get object |   0 operations |    0.00 ms/op |
|    Delete object |   0 operations |    0.00 ms/op |
| Write into cache |   0 operations |    0.00 ms/op |
|  Read from cache |   0 operations |    0.00 ms/op |
+------------------+----------------+---------------+

pkg/object/bunny.go

davies · 2024-02-27T02:42:47Z

pkg/object/bunny.go

+// Deleting keys that do not exist are expected to not throw an error
+func (b *bunnyClient) Delete(key string) error {
+	if err := b.client.Delete(key, false); err != nil {
+		if err.Error() == "Not Found" {


This error message is confusing, we should only retry it when the key is ending with /.

Or call b.client.Delete(key, strings.HasSuffix("/"))

This workaround is necessary to pass the tests. The boolean flag on b.client.Delete() only appends a / character to the path, which means that setting it at all should not be necessary.

Tests fail on repeated test runs with the following implementation:

func (b *bunnyClient) Delete(key string) error { if err := b.client.Delete(key, false); err != nil { if err.Error() == "Not Found" { return nil } else { return err } } return nil }

Well, apparently some tests do not consistently pass paths with a trailing /. The first run works fine, but on any subsequent run a list test fails because there are empty directories left over from the first test run.

First run

=== RUN TestBunny object_storage_test.go:131: PUT testEncodeFile failed: Unauthorized object_storage_test.go:180: out-of-range get: '', but got 0, error: Requested Range Not Satisfiable object_storage_test.go:284: list with delimiter is not supported object_storage_test.go:326: list with delimiter is not supported object_storage_test.go:527: https://storage.bunnycdn.com/juicefs-testunit-test/ does not support multipart upload: not supported --- PASS: TestBunny (10.97s) PASS

Object storage directory after the first run

Second Run

go1.20.13 test -v -tags gluster -run TestBunny github.com/juicedata/juicefs/pkg/object === RUN TestBunny object_storage_test.go:131: PUT testEncodeFile failed: Unauthorized object_storage_test.go:180: out-of-range get: '', but got 0, error: Requested Range Not Satisfiable object_storage_test.go:222: First key should be test, but got a/ --- FAIL: TestBunny (0.65s) FAIL FAIL github.com/juicedata/juicefs/pkg/object 1.020s FAIL

This workaround seems to be necessary, since it is not guaranteed that JuiceFS passes in the trailing /.

I retested it after replacing ListAll with List and the tests fail again. With or without my deletion workaround there are objects left in the unit-test directory. Subsequent runs will always fail again. Is it possible that testStorage does not handle the cleanup of the unit-test directory on object stores, that allow prefixes containing only empty directories? I would propose cleaning up the root directory of the storage prefix like this:

diff --git a/pkg/object/object_storage_test.go b/pkg/object/object_storage_test.go index cfab5b97..a806c4ef 100644 --- a/pkg/object/object_storage_test.go +++ b/pkg/object/object_storage_test.go @@ -120,6 +120,7 @@ func testStorage(t *testing.T, s ObjectStorage) { } prefix := "unit-test/" s = WithPrefix(s, prefix) + defer s.Delete("/") defer func() { if err := s.Delete("test"); err != nil { t.Fatalf("delete failed: %s", err)

It seems like this problem only appears on Object Storage providers that do support the concept of an empty directory, which S3 style APIs do not support. We would need to adjust the tests to clean up the unit-test directory properly.

pkg/object/bunny.go

davies · 2024-03-01T02:59:21Z

pkg/object/bunny.go

+
+	//listInParentDirectory := !strings.HasSuffix(prefix, "/")
+
+	listedObjects, err := b.client.List(path.Dir(prefix))


Please take the file.go as an example

SandyXSD · 2024-03-25T03:49:04Z

@l0wl3vel Hi, it seems there's still an issue relating to the 'list' method that is not solved yet. Are you still working on this?

l0wl3vel · 2024-03-29T20:41:10Z

@SandyXSD I still want to finish it. I already have the changes implemented, but I can´t make sure it works, since Bunny Object Storage has the concept of empty directories, which trips up the test suite when running it multiple times, because there are empty directories left from the previous run. I just pushed my changes.

davies · 2024-03-30T13:52:32Z

pkg/object/bunny.go

+		}
+	} else if marker == "" { // If Directory && no marker: Return prefix directory as well
+		parentPath := path.Dir(path.Dir(prefix))
+		objects, err := b.client.List(parentPath+dirSuffix)


We just need to add the directory itself here, will list it later. Please follow the example in file.go

I am not sure what changes you would like to see. The code here is based on sftp.go, like you suggested before. Could you elaborate please on what about it is incorrect?

Here is how sftp implements List:

var objs []Object dir := f.path(prefix) if !strings.HasSuffix(dir, "/") { dir = filepath.Dir(dir) if !strings.HasSuffix(dir, dirSuffix) { dir += dirSuffix } } else if marker == "" { obj, err := f.Head(prefix) if err != nil { if os.IsNotExist(err) { return nil, nil } return nil, err } objs = append(objs, obj) } infos, err := c.sftpClient.ReadDir(dir) if err != nil { if os.IsPermission(err) { logger.Warnf("skip %s: %s", dir, err) return nil, nil } if os.IsNotExist(err) { return nil, nil } return nil, err } entries := f.sortByName(c.sftpClient, dir, infos, followLink) for _, o := range entries { key := o.Key() if !strings.HasPrefix(key, prefix) || (marker != "" && key <= marker) { continue } objs = append(objs, o) if len(objs) == int(limit) { break } }

When the marker is '', it try to return the prefix self (could be a directory). So instead of list the parent of prefix, can we just call Head() instead?

I am not sure, but listing a prefix in S3 does not only return the virtual prefix "directory", rather all keys that start with that prefix. This would mean that I implemented the correct behavior. Working on this is also really difficult, since the tests for object storage are very incomplete and miss a lot of corner cases around listing objects.

davies · 2024-04-02T02:50:19Z

@l0wl3vel please rebase to main branch and fix the conflict, I think we are very close to merge this one.

davies · 2024-04-17T08:26:06Z

This one is took over by #4719, thanks very much!

Add Bunny.net object storage implementation

9d262da

l0wl3vel force-pushed the main branch from dc5f9be to 3d4f392 Compare January 9, 2024 21:33

l0wl3vel mentioned this pull request Jan 9, 2024

Bump Go Dependency to 1.20+ #4326

Closed

l0wl3vel marked this pull request as draft January 9, 2024 22:04

l0wl3vel and others added 7 commits January 10, 2024 16:24

Add Documentation

e0f4cc0

Fix incorrect mtime parsing

9f265fa

fix: Emulate bunny.net ranged reads properly

447495f

fix: split keys into file and dir correctly

cbac60a

fix: Improve error handling

2350270

Merge branch 'juicedata:main' into main

0b08425

fix: handle errors on delete correctly

35e9b13

l0wl3vel force-pushed the main branch from 6014c45 to 35e9b13 Compare January 10, 2024 15:25

l0wl3vel marked this pull request as ready for review January 10, 2024 17:02

This comment was marked as resolved.

Sign in to view

fix: go fmt

a8070ad

zhijian-pro self-assigned this Jan 11, 2024

zhijian-pro removed their assignment Jan 16, 2024

fix: Rework ListAll functionality to pass JuiceFS testsuite

f5657a6

l0wl3vel force-pushed the main branch from d409cf6 to c03266f Compare February 23, 2024 17:34

l0wl3vel added 2 commits February 25, 2024 15:55

fix: Use correct path normalization within Storage Zone

680ce8c

previously the path parsed out of list responses did not contain the whole path. This commit allows juicefs delete, sync etc. to work properly

Update bunny-storage-go-sdk to 0.0.10

f8d30f3

Required some fixes around error masking due to changed errors

l0wl3vel force-pushed the main branch from 7f7eca6 to f8d30f3 Compare February 25, 2024 14:56

davies reviewed Feb 27, 2024

View reviewed changes

pkg/object/bunny.go Outdated Show resolved Hide resolved

davies reviewed Feb 27, 2024

View reviewed changes

pkg/object/bunny.go Outdated Show resolved Hide resolved

davies reviewed Feb 27, 2024

View reviewed changes

pkg/object/bunny.go Outdated Show resolved Hide resolved

davies reviewed Feb 27, 2024

View reviewed changes

pkg/object/bunny.go Outdated Show resolved Hide resolved

fix: return correct endpoint

434fcd5

l0wl3vel force-pushed the main branch from 43c03fb to 434fcd5 Compare February 28, 2024 22:27

l0wl3vel and others added 3 commits February 28, 2024 23:37

fix: Remove verbose logging in the Head function

a8ca86e

Merge branch 'juicedata:main' into main

9d83467

feat: Implement List method instead of ListAll

b3058c0

davies reviewed Mar 1, 2024

View reviewed changes

l0wl3vel added 3 commits March 29, 2024 21:31

fix: rework list function

c4bb030

fix: use correct path normalization for parsing metadata

445e2ef

fix: remove incorrect deletion workaround

dc24edd

davies reviewed Mar 30, 2024

View reviewed changes

Merge branch 'main' into main

8ffab2e

davies mentioned this pull request Apr 17, 2024

Support Bunny Object Storage #4719

Merged

davies closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Bunny.net object storage support #4325

feat: Add Bunny.net object storage support #4325

l0wl3vel commented Jan 9, 2024

CLAassistant commented Jan 10, 2024 •

edited

Loading

This comment was marked as resolved.

zhijian-pro commented Jan 12, 2024

l0wl3vel commented Jan 12, 2024

solracsf commented Jan 12, 2024 •

edited

Loading

solracsf commented Jan 12, 2024

l0wl3vel commented Jan 12, 2024

solracsf commented Jan 13, 2024

zhijian-pro commented Jan 13, 2024

l0wl3vel commented Jan 13, 2024

jamesponddotco commented Jan 14, 2024 •

edited

Loading

l0wl3vel commented Jan 14, 2024 •

edited

Loading

jamesponddotco commented Jan 14, 2024 •

edited

Loading

l0wl3vel commented Jan 17, 2024 •

edited

Loading

l0wl3vel commented Feb 7, 2024

zhijian-pro commented Feb 11, 2024

l0wl3vel commented Feb 25, 2024

davies Feb 27, 2024

davies Feb 27, 2024

l0wl3vel Feb 28, 2024

davies Feb 29, 2024

l0wl3vel Feb 29, 2024

l0wl3vel Mar 1, 2024

davies Mar 1, 2024

SandyXSD commented Mar 25, 2024

l0wl3vel commented Mar 29, 2024

davies Mar 30, 2024

l0wl3vel Mar 30, 2024

davies Apr 2, 2024

l0wl3vel Apr 15, 2024

davies commented Apr 2, 2024

davies commented Apr 17, 2024


		//listInParentDirectory := !strings.HasSuffix(prefix, "/")

		listedObjects, err := b.client.List(path.Dir(prefix))

feat: Add Bunny.net object storage support #4325

feat: Add Bunny.net object storage support #4325

Conversation

l0wl3vel commented Jan 9, 2024

CLAassistant commented Jan 10, 2024 • edited Loading

This comment was marked as resolved.

zhijian-pro commented Jan 12, 2024

l0wl3vel commented Jan 12, 2024

solracsf commented Jan 12, 2024 • edited Loading

solracsf commented Jan 12, 2024

l0wl3vel commented Jan 12, 2024

solracsf commented Jan 13, 2024

zhijian-pro commented Jan 13, 2024

l0wl3vel commented Jan 13, 2024

jamesponddotco commented Jan 14, 2024 • edited Loading

l0wl3vel commented Jan 14, 2024 • edited Loading

jamesponddotco commented Jan 14, 2024 • edited Loading

l0wl3vel commented Jan 17, 2024 • edited Loading

l0wl3vel commented Feb 7, 2024

zhijian-pro commented Feb 11, 2024

l0wl3vel commented Feb 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

First run

Object storage directory after the first run

Second Run

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SandyXSD commented Mar 25, 2024

l0wl3vel commented Mar 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davies commented Apr 2, 2024

davies commented Apr 17, 2024

CLAassistant commented Jan 10, 2024 •

edited

Loading

solracsf commented Jan 12, 2024 •

edited

Loading

jamesponddotco commented Jan 14, 2024 •

edited

Loading

l0wl3vel commented Jan 14, 2024 •

edited

Loading

jamesponddotco commented Jan 14, 2024 •

edited

Loading

l0wl3vel commented Jan 17, 2024 •

edited

Loading