Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datastoreusage Command #6442

Merged

Conversation

nepet
Copy link
Collaborator

@nepet nepet commented Jul 25, 2023

This PR aims to add a RPC command to the datastore-api called datastoreusage that is somewhat inspired by the unix command du.

datastoreusage returns an object that contains a usage report and total that is the size in byte occupied by the data stored in the datastore.

datastoreusage accepts the following (optional) parameter:

  • key is used as a starting point to traverse the datastore to collect the size of the data that is stored under this key and all children.
  • depth gives the max depth of children that are returned in the usage report from the (optional) key. depth does NOT affect the total size, only the depth of the details in usage (much like du).

Examples (all from the same datastore):

datastoreusage

{
  "usage": [
     {
       "key": [
          "level0",
          "level0.1"
        ],
        "size": 32
      },
      {
        "key": [
          "level0",
          "level0.2"
        ],
        "size": 64
      },
      {
        "key": [
          "level1"
       ],
       "size": 128
     },
  ],
  "total": 224
}

datastoreusage "" 1

{
  "usage": [
      {
        "key": [
          "level1"
       ],
       "size": 128
     },
  ],
  "total": 224
}

datastoreusage "level0"

{
  "usage": [
     {
       "key": [
          "level0",
          "level0.1"
        ],
        "size": 32
      },
      {
        "key": [
          "level0",
          "level0.2"
        ],
        "size": 64
      },
  ],
  "total": 96
}

datastoreusage "level0" 0

{
  "usage": [],
  "total": 96
}

Thoughts and comments are appreciated!

@rustyrussell
Copy link
Contributor

OK, I would omit the level option for now: simply give a root value (or []) and return the bytes used.

Note that bytes here is not an accurate reflection of actual db usage, since there is overhead. However, make sure you correctly account for keys, as well as values! (People can store information in keys!)

Also, result for command X should usually be an object called similar to X. Perhaps something like:

"datastoreusage": {
    "key": [],
    "self_bytes": 0, /* len(key + data) */
    "total_bytes": 2222 /* self_bytes + all descendents' self_bytes */ 
}

const jsmntok_t *params)
{
struct json_stream *response;
const char **key, **k = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization is wrong BTW, it's **key which needs initializing, not **k!

@@ -343,6 +349,11 @@ static struct command_result *json_datastoreusage(struct command *cmd,
stmt;
stmt = wallet_datastore_next(cmd, cmd->ld->wallet, stmt,
&k, &data, &gen)) {

// Break if we got a key that we are not interested in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just merged API change means you don't need to do this any more: wallet_datastore_next will do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased to include the API change.

@rustyrussell rustyrussell added this to the v23.11 milestone Jul 27, 2023
@nepet nepet force-pushed the 202306-add-datastoreusage-command branch from ea1eb80 to 42cf691 Compare July 27, 2023 10:18
@nepet
Copy link
Collaborator Author

nepet commented Jul 27, 2023

Rebased onto master to include new wallet API and implemented the proposed changes.

Since the datastore-API currently only allows data to be stored for leafs, the database does not store any bytes for ancestors (no key, no data). Therefore the proposed self_bytes would be either self_bytes=0 if a key has descendants, or self_bytes=total_bytes if the key is a leaf and stores data.
That's why I left the field out for now, but we can easily add it if needed later on.

"datastoreusage": {
    "key": [],
    "total_bytes": 2222 /* len(key_blob + data) + all descendents' len(key_blob + data) */ 
}

@nepet
Copy link
Collaborator Author

nepet commented Jul 27, 2023

Oh, and I saved the key as a string in the response, but I'm happy to change it back to an array type if that's more convenient.

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like the CI is complaining about some whitespace

ightningd/datastore.c:319:	     stmt = wallet_datastore_next(cmd, key, stmt, 
lightningd/datastore.c:323:		
lightningd/datastore.c:334:	
Extraneous whitespace found
make: *** [Makefile:521: check-whitespace/lightningd/datastore.c] Error 1
make: *** Waiting for unfinished jobs....
Cloning into '/home/runner/work/lightning/lightning/external/libwally-core'...
Cloning into '/home/runner/work/lightning/lightning/external/lnprototest'...
Cloning into '/home/runner/work/lightning/lightning/external/lowdown'...

@nepet nepet force-pushed the 202306-add-datastoreusage-command branch from 8c79c96 to 5984290 Compare August 2, 2023 08:30
Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but it needs:

  1. A Changelog-Added entry saying there's a new RPC command.
  2. A test :)

@nepet nepet force-pushed the 202306-add-datastoreusage-command branch from 5984290 to ac4c919 Compare October 23, 2023 16:50
@nepet nepet requested a review from cdecker as a code owner October 23, 2023 16:50
@nepet
Copy link
Collaborator Author

nepet commented Oct 23, 2023

Changed the version, updated pyln-client, added some tests and a Changelog entry.
I kept all changes to different subsystems separated but I am also happy to squash these if needed.

@nepet nepet force-pushed the 202306-add-datastoreusage-command branch 3 times, most recently from 8e4d618 to d1c6c59 Compare October 23, 2023 18:52
CHANGELOG.md Outdated
## [Unreleased]

### Added

- JSON-RPC: `datastoreusage`: returns the total bytes that are stored under a given key.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, don't edit CHANGELOG.md!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a Changelog-Added line in your commit message instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, sorry about that, I added a Changelog-Added line to the initial commit message.

Comment on lines 6 to 12
"added": "v23.11",
"properties": {
"key": {
"oneOf": [
{
"type": "array",
"description": "key is an array of values (though a single value is treated as a one-element array). Used as the staring point to traverse the datastore.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo nit: "starting point"

Comment on lines 298 to 302
const char **k, **key = NULL;
struct db_stmt *stmt;
const u8 *data;
u64 gen;
u64 total = 0;
u64 gen, total_bytes = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine first two commits, since the first doesn't compile anywayk, and you radically change behavior.

Also **key doesn't need initialization.


// We use the formatted datastore key to calculate the size of
// the key to account for the delimiter byte in the database.
self_bytes += strlen(datastore_key_fmt(tmpctx, k)) - 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the fmt to get the right answer seems strange here?

How about:

u64 self_bytes = tal_bytelen(data);
/* One byte per separator */
self_bytes += tal_count(key) - 1;
for (size_t i = 0; i < tal_count(key); i++) {
        self_bytes += strlen(key[i]);

Copy link
Collaborator Author

@nepet nepet Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, It now is calculated form the key chars itself not the formatted string. This is straight forward and makes it even easier as it saves us a weird -2 or -1 to remove separators from the calculation.

@nepet nepet force-pushed the 202306-add-datastoreusage-command branch 3 times, most recently from ce3631d to d0a310b Compare October 24, 2023 10:53

u64 self_bytes = tal_bytelen(data);
for (size_t i = 0; i < tal_count(k); i++) {
self_bytes += strlen(k[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right, since the actual space in the db includes the tal_count(k)-1 separators. Note that tal_count(k) is always >=1 though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!
Uh yeah I guess I was still asleep when I did this yesterday. I forgot to double check the db schema. The PK of the table is the key as a BLOB with key entries separated by a string terminator \0. I forgot to count these and added a fix self_bytes += tal_count(k) - 1;

nepet added 6 commits October 25, 2023 14:18
datastoreusage returns the total_bytes that are stored under a given
{Key} or from root. {Key} is the entry point from which we begin to
traverse the datastore.

Changelog-Added: JSON-RPC: `datastoreusage`: returns the total bytes that are stored under a given key.

Signed-off-by: Peter Neuroth <[email protected]>
Signed-off-by: Peter Neuroth <[email protected]>
@nepet nepet force-pushed the 202306-add-datastoreusage-command branch from d0a310b to 2acb8b3 Compare October 25, 2023 12:18
@rustyrussell rustyrussell merged commit 747725d into ElementsProject:master Oct 26, 2023
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants