-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image uploads #576
Image uploads #576
Conversation
652cbf1
to
ab74ed6
Compare
I've decided to just drop the unique index. So no other changes required. This might look weird in the schema now that an item can have a dedicated upload id and multiple uploads can point to it. The row in But this was already the case. So I think this is ok to guarantee 100% backwards compatibility and if we don't like that, we can change that in another PR. I hope dropping this index does not make any query slow. I will add checking that to the TODO list. |
Regarding:
We pay $0.023 per GB per month (for the first 50 TB). Still quite cheap. I also realized that it doesn't make sense to limit amount of images. Only the size should matter. For better numbers and to limit storage per stacker, I am therefore thinking about using the following algorithm to calculate cost per image: // factor for bytes to megabyte
const MB = 1024 * 1024
// factor for msats to sats
const SATS = 1000
async function uploadCosts (models, userId, photoId, size) {
let { _sum: { size: sumSize } } = await models.upload.aggregate({
_sum: { size: true },
where: { userId, createdAt: { gt: datePivot(new Date(), { days: -1 }) } }
})
// assume the image was already uploaded in the calculation
sumSize += size
if (sumSize <= 5 * MB) {
return 0 * SATS
}
if (sumSize <= 10 * MB) {
return 10 * SATS
}
if (sumSize <= 25 * MB) {
return 100 * SATS
}
if (sumSize <= 100 * MB) {
return 1000 * SATS
}
// check return value and throw error on -1?
// or maybe not use -1 since in case of a bug, we might actually pay stackers to upload images, lol
return -1
} |
Edit: Actually, forget that, proceed as you want. I'm probably lacking context and I should just let you do you thing. |
I'm not sure if you get re-notified on edits, but I edited my above comment to be like "ignore me, never mind." |
No, I don't get new notifications per edit and that's good because else you would have gotten a ton of notifications from me editing all my posts on Github, haha Even though you said I should ignore what you said, I want to respond anyway since I think these are good points you mentioned and I think it's good to have them written down even if we already agree. Also, I can provide that context you might be missing:
The question I'm trying to answer is when it starts to resemble abuse. Is it 25MB per day? 100MB per day? 1GB per day? Or something completely different? Even ignoring that there is no protection against just spinning up new accounts. Was also wondering about anon image uploads. That's going to be the most trickiest UX wise I think since they shouldn't see each others unsubmitted images vs. what if they pay but then forget to submit the image. The former is more for UX reasons than privacy reasons though since the images are public anyway.
Yes, that's why I have decided to do a fee escalation in levels, but the size of the images increases the level. So not a constant rate sats/MB (or even sats/B) but escalating rate with a free quota per day. I think if we do it like this instead of per image count, most stackers will probably never hit the first level with fees. |
To zoom out a bit, how useful is it to showcase unsubmitted images? If we suppose it's not very useful, then we can just get rid of it for everyone solving this problem. Re: UX I think the way you've handled the UX is really nice in many ways, but could a github ux be better? With you current UX, it's a two step process of (1) uploading then (2) linking. Github's is a one step process. Are there good reasons to not do it that way? This ties in with the unsubmitted issue above some I think. |
The only reason I'm doing it this way is because if they pay, they pay immediately, not only when they submitted an image. If we do it like Github, stackers might accidentally delete the link. Then they wasted sats because now they don't know anymore for which link they paid. So I show them all images they haven't submitted yet including the ability to get a refund if they delete the image. And I used pay immediately for presigned URL since else it seemed unnecessary complex to detect if URLs were just not submitted yet or if they are just abusing our storage without paying for it. So I could implement the payment logic within And I wanted to keep client upload with presigned URLs to not diverge too far from what we're already doing with avatars. |
I see. I'm not sure if what I was thinking makes sense in light of some of these problems, but here's how I always thought it would work:
We could even be more or less aggressive with the deletion timing depending on other factors like how often the stacker does this, or how much trust they have.
Can you elaborate on the difficulties of securing images this way? To be clear, the way you've done it is really nice. It's just trading some UX for other things and I want to make sure we really need the other things. |
That's also how I imagined it to work initially and I actually would love to do it like this. But then I thought about how I would abuse this. I would upload as much images as I can with zero variable costs to me. This ignores bandwidth and other factors since most likely, an attacker already pays for it with a flat fee; so it's independent of how much they exactly use these resources. (You could potentially even argue that not using these resources to attack is actually the cost because of opportunity costs, lol) Yes, we will delete them at some point if they weren't used to post on SN, but the damage might already have been done in forms of storage costs or known and unknown unknowns. And yes, this attack might be more theoretically since what exactly has the adversary to gain? Being able to host a lot of images for a limited amount of time? I am not sure if that is a valuable goal, but I wanted to err on the side of caution here. Basically for "Fear of the Unknown"-reasons. So if it's possible to completely shut down such attacks by introducing variable costs, why not do it? But thanks to the discussion with you, it's clear now how much I am trading UX for security. UX problems if we introduce immediate variable costs (pay per presigned URL):
Also, since we want to allow refunds (imo), this actually introduces a new, potentially even more serious risk: we add code which increases the balance of users. So instead of potentially being able to store a lot of images, in case of a bug, it would now be possible to steal funds, lol So I think after having explored this "pay per presigned URL" approach enough, I would also prefer the "Github way". But just want to mention that:
Is not inherent to the current approach. I could immediately link them on upload and show them as unsubmitted. It's just work I didn't put in yet and I wasn't 100% sure yet if it's worth the work of more cursor tracking code (I think this means we need to refactor some stuff so we can reuse existing cursor tracking code to paste content at the cursor easily). But funnily, my showcase actually demonstrates the UX problem quite well since I clicked on "reply" but I didn't link the image yet, haha So I'll use this approach now: Stackers don't have to pay and we can now just paste the image link without the UX risk of stackers wasting their sats. If they accidentally delete the link, they can just do a new upload for free. And then during
Only two reasons:
|
Your initial approach is really nice and demonstrates your ability to think through all the adversary's moves. Regardless of the exact approach we take (here or elsewhere) we need to forever be capable of doing this exercise. In our efforts to starve our enemies though, we have to be careful to not starve our allies.
Cool, I do think this is the right approach. I think we can make the incentives for abuse low enough that we don't have to be too concerned about taking upfront variable fees. If we're wrong, we can reconsider. Whenever introducing fees comes up, I think of this coin operated park bench Adding fees should strictly increase the UX of SN. If it's not clear we're making SN better by introducing a fee, I'd rather we get some real life experience with the fee-free version (which should make it clear if the fee is needed). In the general case, fees make SN's UX worse so it's worth being very selective with them. I should probably start writing some of these principles down. |
b0f5914
to
d679485
Compare
This is done now. Only testing and code cleanup is left (see PR description). (I'll keep this in draft until I've tested and refactored the code. Will do this tomorrow.)
2023-10-26.02-23-39.mp4
2023-10-26.02-35-33.mp4
2023-10-26.03-03-22.mp4
2023-10-26.03-13-26.mp4
I think main areas which can be improved is the UX around image fees. I currently don't show in the receipts when fees per image will start or rise. I made image fees dependent on size (so a specific size must be hit before the next level of fees per image is reached) since I think this way, most stackers won't exhaust their free quota per day compared to a free quota of images based on image amount (and not size). Also, if we limit per image amount, using fees as defense against abuse doesn't really work. Also, I update image fees in the frontend on upload (images were added) and on text input blur (images might have been removed). Checking if images were removed on blur is not ideal. Maybe I should update fees on every text change but debounced? edit: I noticed some image fees are not properly shown in the frontend. For example, in the second video, the frontend shows 40 sats but then 100 sats are deducted. Need to look into this. Created TODO in PR description. |
<tr> | ||
<td>{numWithUnits(0, { abbreviate: false })}</td> | ||
<td align='right' className='font-weight-light'>edit fee</td> | ||
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 0 sats edit fee
so I can use + X sats imageFees
... Also, I might should not have removed the usage of paidSats
in the edit receipt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example receipts:
reply:
edit:
I realized paidSats
wasn't used before since addImgLink
was not used.
So I think it's not a breaking UX change. I think it's fine to mention that editing itself does not cost anything but fees were added because of images.
What might be confusing though is why image suddenly cost something since they don't cost anything until 10 MB were uploaded within the last 24 hours. There is no mention of this in the receipt.
I played a bit around with a ProgressBar, to fix this but that also didn't feel right. Maybe mentioning it in the FAQ is enough? Or a info box in the receipt? But then we have a info box inside a info box, lol.
Using We could disable the reply button until the image fee info was updated; similar to how the dupe check works for links ( But I think disabling the reply button after one second for every text change is going to be annoying. Thinking about a pure client-side implementation of the fee calculation so we don't need to do a request at all. We can probably assume that the user is not typing multiple items at once so we can assume that during one "item session", the state of unpaid images does not change. Paid images always stay paid. |
Putting this out of draft now since only this and the lacking info in the receipts can be considered missing and I believe these are not blocking UX things. They can be handled in a follow-up PR imo. |
|
* also update schedule from every minute to every hour
Why'd you remove It seems like useful metadata to keep. Imagine we have a profile tab for |
Because I realized that for the fee calculation, I only needed to know if it was already paid instead of knowing in which items it was linked. Previously, the column wasn't used and it was only a one-to-one-relationship. So to continue such a column, it becomes a many-to-many relationship: one item can have multiple uploads, and the same upload can be included in multiple items. So I decided to replace the column with a simple In your commit, this line replaces any previous existing upload id with the new one. So this doesn't track all links, only the latest link from an upload to an item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated your code to keep itemId
s on Upload
then realized your code has bugs. 😄
Please fix the bugs on top of my changes.
To speak broadly on changes, when we change something generic we need to consider everything else that uses that code. Form
is such a generic thing. Ideally updates to Form
should be made in a way that they have no effect on existing things that use it.
When I can't avoid making fundamental changes to a piece of code meant to be generic: I search for everything that uses it, walk through it still working logically after the changes, and manually test it just in case I did something unintentional.
Spaghetti code happens when we fail to make our changes isolated and work with generic parts.
This means we lose the original |
It also breaks trying to insert two images into the same post because of the unique index. So I think we should replace this column since it's not complete anyway and create a table to maintain this many-to-many relationship |
Oh, I see! Edit: I'll fix my own changes |
Sorry if I seem impatient. Totally not personal. I've just been in a bit of a mood all day and I left myself a bunch of code I promised to review 😄 I'll give your code another shot and add the many-to-many relationship. Thanks! |
my todos:
|
This is excellent. Can you walk me through why we have static avatar upload ids again? It's causing issues for me possibly because my bucket is configured slightly differently, also see below, and I'm wondering if we can figure out a way to not need to be conscious of caching (I know you know how hard it is to debug that stuff). Under the following circumstance, won't I see stale profile images?
Reasons I can imagine we have static avatar ids (with possible alternatives):
But I might be missing something. |
Other than that, this is good to go! I haven't tested the fee part yet, but we have some other pending PRs that involve |
Mhh, I didn't test it with two users but only with one. I expected it wouldn't already work with one user since on page reload, the browser would fetch the URL without the random parameter and thus it may use the cached one, as you mentioned with
But it didn't, it used the up-to-date one even though there were
And maybe testing this stuff out locally is irrelevant since we use a CDN in production and that's where the caching issues will appear. Not on my local build which talks straight to S3.
Yes, the reason for static avatar ids was to keep them free and thus the storage they can occupy should be limited. That they aren't deleted is handled by only deleting the rows with To prevent any cache issues, we could give them a new id on every upload but limit the amount of free storage in other, free ways like applying time limits? |
To not lose my code again, here is the WIP code.
This PR adds the following button:
button to upload images
The button opens a file explorer. When you've selected an image, it shows up beneath the text input for you to insert into the text:
unsubmitted images show up beneath text input
This works by requesting a presigned AWS S3 URL on image select, immediately uploading the image on the client to S3 (similar to how avatar upload works) and then storing the URL to the image in a state array to keep track of submitted and unsubmitted image URLs.
An image URL is submitted if it's included in an item. We can check if an image URL is included in an item by looking at the
Upload.itemId
column of the row withUpload.id = <s3_key>
. This column is populated by the worker during the imgproxy job. If it encounters a S3 URL in an item, it parses the S3 key from the URL and updates the corresponding upload row with the id of the item that the imgproxy job is currently processing:Upload.itemId
therefore also allows us to show the stacker which images they haven't used yet. This is implemented with a new resolverUser.images
:To not run unnecessary database queries by adding this simply to the
ME
query which polls the database every second, I added a new providerImagesProvider
which fetches this once on session init.When posting something (comment, discussion, link, ...), I used the
onCompleted
callback to update the state on the client.That's the current state. Simple showcase:
2023-10-19.20-16-06.mp4
TODO:
Upload
I searched for
Upload
(case-sensitive) and"Upload"
(since the table might be mentioned using"
) andmodels.update
but I haven't found anything of interest. So I think this table wasn't used much so far.There can be multiple uploads (= multiple images) per item now. However, this requires a data migration since jobs currently rely on there being only exactly one upload per item viaItem.uploadId
I have decided to only drop the unique index
desired schema diff
by the way, afaict,
Upload.itemid
was not used before. Possibly just a relict of a previous approach which wasn't pursued further but replaced withItem.uploadId
?refund fees if image is deleted (which is only possible when it wasn't submitted / used in an item yet)As the code is currently, stackers could upload infinite amount of images to S3 for free. I thought about giving stackers some free images per day such that they don't have to think if uploading an image is worth the sats all the time. However, while writing this and trying to explain my reasoning, I realized it might actually be easier to code if every upload costs sats instead of the other way around because of less edge cases, lol
So image uploading could work similar to fee escalation for posting, just with a different slope. It could even be a staircase pattern: first 5 images per day cost 1 sat, then next 5 cost 10 sat, then next 5 100 etc.
The fee should also depend on the size however. So your first 5 images OR your first 5 MB cost 1 sat per image, whatever is hit first. If you reached 5 MB, you get escalated to the next level which is 10 sat (the image you currently want to upload should already be included in this calculation so you can't upload 100 MB and still pay 1 sat if you haven't uploaded 5 MB yet - a common programming mistake)
However, avatar uploading should still be free and thus an exception imo. This brings me to the next point:
use trash instead of cross icon for deletebut make sure delete only works if the URLs weren't submitted already! else there will be 404s for images in items
use FileReader API to show progress bar while uploadingmake sure anons have to pay per image (1000 sats per image?)but submitted items should only show within a session, not across all anon sessions. but this opens up bad UX if anon pays for image but then doesn't include it in an item (= unsubmitted) and then forgets the image URL. mhhh 🤔
<Upload>
in components/upload.js and<ImageUpload>
in components/image.jsNew TODOs because of this conclusion:
refactor image queries into own wrapper likeimage queries are now run insideserializeInvoiceable
:serializeImages
?create_item
andupdate_item
run image queries in same transaction asimage queries are now run insidecreate_item
... need to access item id somehow thoughcreate_item
andupdate_item
Description how new approach works:
[Uploading <file>...]()
is pasted into the text input like in Githubimage_fees_info
.image_fees_info
is called insidecreate_item
to deduct image fees and update rows in theUpload
table to mark images as paid (images can be reused in multiple items but only need to be paid once)image_fees_info
is called insideupdate_item
to deduct image feesimage_fees_info
knows which images were paid and which not using theUpload.paid
column