Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Only allow to search *owned* files in storage search endpoint #5772

Conversation

bisgaard-itis
Copy link
Contributor

@bisgaard-itis bisgaard-itis commented May 2, 2024

What do these changes do?

  • The result in File uploads using the new-style api are getting errors related to search for filename #5729 shows that searching for a file with read-access using its sha256 checksum is almost 200x slower than searching for it if you consider it a file with write-access (of exact factor depends on the amount of data the user has in the db etc). As pointed out to me by @sanderegg the write-access search is in fact buggy. What's really searched are owned files. To make this explicit I remove the access_rights query parameter from the endpoint
POST /v0/simcore-s3/files/metadata:search

and instead introduce the compulsory query parameter kind which must equal owned to clearly indicate to the user (of storage) that only owned files are searched using this endpoint.

  • Update api-server to use this corrected endpoint

Related issue/s

How to test

Dev-ops checklist

@bisgaard-itis bisgaard-itis requested a review from pcrespov as a code owner May 2, 2024 20:27
@bisgaard-itis bisgaard-itis requested a review from sanderegg May 2, 2024 20:27
@bisgaard-itis bisgaard-itis self-assigned this May 2, 2024
@bisgaard-itis bisgaard-itis added the a:apiserver api-server service label May 2, 2024
@bisgaard-itis bisgaard-itis added this to the The Next One milestone May 2, 2024
@bisgaard-itis bisgaard-itis changed the title 🐛 only search write access files in api-server 🐛 Only search write access files in api-server May 2, 2024
Copy link

codecov bot commented May 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.6%. Comparing base (cafbf96) to head (e00d85c).
Report is 175 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #5772      +/-   ##
=========================================
- Coverage    84.5%   67.6%   -17.0%     
=========================================
  Files          10     666     +656     
  Lines         214   32725   +32511     
  Branches       25     205     +180     
=========================================
+ Hits          181   22133   +21952     
- Misses         23   10540   +10517     
- Partials       10      52      +42     
Flag Coverage Δ
integrationtests 63.9% <ø> (?)
unittests 89.9% <100.0%> (+5.3%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...src/simcore_service_api_server/api/routes/files.py 78.1% <100.0%> (ø)
...vice_api_server/api/routes/solvers_jobs_getters.py 92.9% <100.0%> (ø)
...src/simcore_service_api_server/services/storage.py 68.7% <100.0%> (ø)
...api_server/services/study_job_models_converters.py 95.6% <100.0%> (ø)
...src/simcore_service_storage/handlers_simcore_s3.py 100.0% <100.0%> (ø)
...ices/storage/src/simcore_service_storage/models.py 95.7% <100.0%> (ø)
...rage/src/simcore_service_storage/simcore_s3_dsm.py 94.7% <ø> (ø)

... and 666 files with indirect coverage changes

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for checking. But tbh I don't really understand the solution. Here you are changing the logic in order to improve performance. But actually write access is a completely different thing than read access.
As far as I remember, @pcrespov made the "file access rights" actually reads the projects access rights which are a JSON column.
So that means that a project that is shared with READ access rights means that the files are also readable. If the project is shared with WRITE access rights, then you have also WRITE access to the files. Therefore I am not sure your change makes sense.
Now the projects access rights are a JSON column, which is for sure not efficient. Maybe we should discuss that also together with @matusdrobuliak66 as we talked about moving the JSON column to its own table, which could also vastly improve that speed.
But a change like that might have large implications here.

@bisgaard-itis
Copy link
Contributor Author

Thanks a lot for checking. But tbh I don't really understand the solution. Here you are changing the logic in order to improve performance. But actually write access is a completely different thing than read access. As far as I remember, @pcrespov made the "file access rights" actually reads the projects access rights which are a JSON column. So that means that a project that is shared with READ access rights means that the files are also readable. If the project is shared with WRITE access rights, then you have also WRITE access to the files. Therefore I am not sure your change makes sense. Now the projects access rights are a JSON column, which is for sure not efficient. Maybe we should discuss that also together with @matusdrobuliak66 as we talked about moving the JSON column to its own table, which could also vastly improve that speed. But a change like that might have large implications here.

I see your point. The reason I think it probably makes sense to change to write access here is that files which are accessed through the api-server are (or at least that's what I believe) uploaded via the api-server. Meaning the user has write access to them anyway. But I am not 💯 if this is correct, so that's why I asked the question above.

@pcrespov
Copy link
Member

pcrespov commented May 3, 2024

Thanks a lot for checking. But tbh I don't really understand the solution. Here you are changing the logic in order to improve performance. But actually write access is a completely different thing than read access. As far as I remember, @pcrespov made the "file access rights" actually reads the projects access rights which are a JSON column. So that means that a project that is shared with READ access rights means that the files are also readable. If the project is shared with WRITE access rights, then you have also WRITE access to the files. Therefore I am not sure your change makes sense. Now the projects access rights are a JSON column, which is for sure not efficient. Maybe we should discuss that also together with @matusdrobuliak66 as we talked about moving the JSON column to its own table, which could also vastly improve that speed. But a change like that might have large implications here.

I see your point. The reason I think it probably makes sense to change to write access here is that files which are accessed through the api-server are (or at least that's what I believe) uploaded via the api-server. Meaning the user has write access to them anyway. But I am not 💯 if this is correct, so that's why I asked the question above.

It also makes sense to me.

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

Copy link
Contributor

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in person

@bisgaard-itis
Copy link
Contributor Author

@pcrespov @sanderegg I rerequest reviews because I now changed this PR to also contain changes to storage

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice. Consider my comments regarding adding the assertion and/or the test

@bisgaard-itis bisgaard-itis changed the title 🐛 Only search write access files in api-server 🐛 Only allow to search *owned* files in storage search endpoint May 3, 2024
Copy link

sonarqubecloud bot commented May 3, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@bisgaard-itis bisgaard-itis enabled auto-merge (squash) May 3, 2024 21:04
@bisgaard-itis bisgaard-itis merged commit 5fe0a1c into ITISFoundation:master May 3, 2024
56 checks passed
@bisgaard-itis bisgaard-itis deleted the 5771-only-search-write-files-in-api-server branch May 3, 2024 21:40
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Jun 12, 2024
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:apiserver api-server service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only search write access files in api-server
5 participants