What should the torrents API endpoint return? #775

josecelano · 2024-04-03T17:06:54Z

josecelano
Apr 3, 2024
Maintainer

Relates to: #646

The current torrents API endpoint returns the current list of torrents in the tracker:

curl "http://127.0.0.1:1212/api/v1/torrents?token=MyAccessToken&offset=1&limit=1"

[
    {
      "info_hash": "5452869be36f9f3350ccee6b4544e7e76caaadab",
      "seeders": 1,
      "completed": 0,
      "leechers": 0,
    }
]

We are using pagination so you can get a page at the time.

Recently, we have been working on a refactor: Replacing the torrent repository from a BTreeMap to a DashMap. DashMap allows concurrent inserts, which makes the tracker faster.

The problem with the DashMap structure is it does not allow iterating over all torrents. The API endpoint requires that feature.

@da2ce7 proposed using a second data structure (BoxCar) which is a "concurrent, append-only vector". That vector would include only the infohashes of all the torrents that have been announced since the tracker started running.

That's a breaking change for the API because the current implementation only returns active torrents and the new implementation would return "removed" or "inactive" torrents, which means peer-less torrents. By the way, removing peer-less torrents is a configuration option you can disable. It's enabled by default. However, It consumes a lot of memory because it keeps a whole torrent entry structure with statistics even if the torrent does not have any peers.

It would also change the order of the results. Torrents would be ordered by insertion data instead of by infohash.

I think that's a change we should implement even if, in the end, we don't switch to DashMap because, with the current implementation, there is no way for the API client to be sure it has fetched all the torrents in the tracker. If you start getting the first page, new torrents can be added afterwards and the client would not know it unless it starts again from the beginning. Ideally, the client should be able to fetch only new torrents. And that's not possible right now.

Recently, I added another feature to the endpoint:

API: add scrape endpoint. #725

That allows clients to get torrents providing a list of infohashes.

My question is:

Should we keep the endpoint behaviour to get all torrents and use the insertion date order to get the torrents?
Or should we remove the option to get all torrents? In this case, it would be impossible to get all the torrents from clients (and also from testing code that is using this feature).

I would go with option 2 because the "get all torrents" feature is not used in production yet. In the future we can implement it if we see that is needed for a real use case.

NOTES:

The current behavior is only used in testing code.
Alternatively, we could have different endpoints. The current one would be the scrape one and we could have another extra endpoint with all torrents that is only enabled when torrents are persisted. That endpoint would use the database.

cc @torrust/maintainers

josecelano · 2024-04-09T20:33:01Z

josecelano
Apr 9, 2024
Maintainer Author

There are some proposals described here: #646

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should the torrents API endpoint return? #775

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

What should the torrents API endpoint return? #775

josecelano Apr 3, 2024 Maintainer

Replies: 1 comment

josecelano Apr 9, 2024 Maintainer Author

josecelano
Apr 3, 2024
Maintainer

josecelano
Apr 9, 2024
Maintainer Author