Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images v2 #513

Merged
merged 19 commits into from
Oct 1, 2023
Merged

Images v2 #513

merged 19 commits into from
Oct 1, 2023

Conversation

ekzyis
Copy link
Member

@ekzyis ekzyis commented Sep 21, 2023

Depends on #500

Fixes #451

TODO:

  • add placeholder image while images are not processed by worker yet (condition imgproxyUrls IS NULL)
  • fix fullscreen stays open on navigation
  • fix alignment of open original / open in new tab in fullscreen mode
  • image upload (maybe in next PR?) in next PR
  • add script to process existing items
  • evaluate lazy loading of images and add if considered useful -> evaluation: don't use it

  • add video to demonstrate all changes related to images
  • improve accuracy of script by adding known positives (image hosting services like https://i.postimg.cc/) and known negatives (links to tweets etc.). currently, there might be false negatives for image detection because of timeouts (rate limiting?)
  • make sure GIFs are not cut off if possible - use this GIF for testing
  • fixme: privacy setting is not respected since we load the original image anyway to check if it's an image. only afterwards, we check if the original image should be displayed

videos:

if "click to load external images" is enabled, there will be a placeholder for unprocessed images. on click, it loads the original image: https://files.ekzyis.com/public/sn/images_v2_processing_click_to_load.mp4

if "click to load external images" is not enabled, it will automatically load the original url if images are not processed yet (imgproxyUrls is falsy): https://files.ekzyis.com/public/sn/images_v2_processing_automatically_load_original.mp4

image processing is done in worker + responsive images: https://files.ekzyis.com/public/sn/images_v2_processing_in_worker.mp4
(note: it will always load the best resolution if it was already cached)

on imgproxy errors, it will show "click to load image" (if privacy setting enabled) or automatically load the original image (if privacy setting not enabled): https://files.ekzyis.com/public/sn/images_v2_imgproxy_error.mp4


TODO:

  • replace regexp usage with hardcoded list of known image hosting services and known not-image hosts, based on our existing data
  • sign URLs in the database path for these known image hosting services to improve UX (no processing delay) decided not to since that would only be partial processing which breaks frontend assumptions about when items have been processed by the worker

@ekzyis ekzyis added the feature new product features that weren't there before label Sep 21, 2023
@ekzyis ekzyis marked this pull request as draft September 21, 2023 23:54
@ekzyis ekzyis force-pushed the images-v2 branch 2 times, most recently from 0837930 to de79ccd Compare September 23, 2023 19:59
@ekzyis
Copy link
Member Author

ekzyis commented Sep 23, 2023

I looked at https://web.dev/browser-level-image-lazy-loading and I think that's not something we need at the moment.

  • For example, the thresholds can be quite large:

On fast connections (4G), we reduced Chrome's distance-from-viewport thresholds from 3000px to 1250px and on slower connections (3G or lower), changed the threshold from 4000px to 2500px. This change achieves two things:

But I tested on https://stacker.news/items/178769 (with "click to load external images" turned off) which has a height of 10,000 pixels and all images were loaded nonetheless in Brave.

@huumn
Copy link
Member

huumn commented Sep 23, 2023

I think that's not something we need at the moment.

Agreed. Premature

@ekzyis
Copy link
Member Author

ekzyis commented Sep 25, 2023

TODO:

  • add video to demonstrate all changes related to images
  • improve accuracy of script by adding known positives (image hosting services like https://i.postimg.cc/) and known negatives (links to tweets etc.). currently, there might be false negatives for image detection because of timeouts (rate limiting?)
  • make sure GIFs are not cut off if possible - use this GIF for testing

FIXME:

  • privacy setting is not respected since we load the original image anyway to check if it's an image. only afterwards, we check if the original image should be displayed

@ekzyis
Copy link
Member Author

ekzyis commented Sep 25, 2023

Only thing left to do is to fix the fullscreen CSS (if we don't want to ship as is):

Screenshots from desktop and mobile which show the misalignment of the link to the original (bottom left corner):

2023-09-25-225514_1920x1080_scrot

sn ekzyis com_items_187221(iPhone SE)

I wanted to have the link aligned with the image. So on desktop it's too far to the left and on mobile, it's too far below the image.

@ekzyis ekzyis marked this pull request as ready for review September 25, 2023 23:58
@ekzyis ekzyis marked this pull request as draft September 26, 2023 11:49
@ekzyis
Copy link
Member Author

ekzyis commented Sep 26, 2023

Collected metrics using this patch:

commit 2357be4754ace7611cba91804a19e95f0e9ea97e
Author: ekzyis <[email protected]>
Date:   Tue Sep 26 12:12:31 2023 +0200

    Collect URL metrics during imgproxy processing

diff --git a/lib/url.js b/lib/url.js
index 14f7a6f..c32f3be 100644
--- a/lib/url.js
+++ b/lib/url.js
@@ -1,7 +1,7 @@
 export function ensureProtocol (value) {
   if (!value) return value
   value = value.trim()
-  if (!/^([a-z0-9]+:\/\/|mailto:)/.test(value)) {
+  if (!/^([a-z0-9]+:\/\/|(mailto|data):)/.test(value)) {
     value = 'http://' + value
   }
   return value
diff --git a/worker/imgproxy.js b/worker/imgproxy.js
index 4ca45c6..5e83555 100644
--- a/worker/imgproxy.js
+++ b/worker/imgproxy.js
@@ -1,5 +1,6 @@
 import { createHmac } from 'node:crypto'
 import { extractUrls } from '../lib/md.js'
+import { ensureProtocol } from '../lib/url.js'
 
 const imgProxyEnabled = process.env.NODE_ENV === 'production' ||
   (process.env.NEXT_PUBLIC_IMGPROXY_URL && process.env.IMGPROXY_SALT && process.env.IMGPROXY_KEY)
@@ -13,30 +14,7 @@ const IMGPROXY_SALT = process.env.IMGPROXY_SALT
 const IMGPROXY_KEY = process.env.IMGPROXY_KEY
 
 const cache = new Map()
-
-const knownPositives = [
-  /\.(jpe?g|png|gif|webp|avif)$/,
-  /^https:\/\/i\.postimg\.cc\//,
-  /^https:\/\/i\.imgflip\.com\//,
-  /^https:\/\/i\.imgur\.com\//,
-  /^https:\/\/pbs\.twimg\.com\//,
-  /^https:\/\/www\.zapread\.com\/i\//,
-  /^https:\/\/substackcdn\.com\/image/,
-]
-const knownNegatives = [
-  /^https:\/\/(twitter\.com|x\.com|nitter\.(net|it|at))\/\w+\/status/,
-  /^https:\/\/postimg\.cc/,
-  /^https:\/\/imgur\.com/,
-  /^https:\/\/youtu\.be/,
-  /^https:\/\/(www\.)?youtube\.com/,
-  /^mailto:/,
-  /^https:\/\/stacker\.news\/items/,
-  /^https:\/\/news\.ycombinator\.com\/(item|user)\?id=/,
-  /^https:\/\/\w+\.substack.com/,
-  /^http:\/\/\w+\.onion/,
-  /^http:\/\/nitter\.priv\.loki/,
-  /^http:\/\/\w+\.b32\.i2p/,
-]
+export const urlMetrics = {}
 
 function decodeOriginalUrl (imgproxyUrl) {
   const parts = imgproxyUrl.split('/')
@@ -131,11 +109,54 @@ async function fetchWithTimeout (resource, { timeout = 1000, ...options } = {})
   return response
 }
 
+function countMetric(url, metric) {
+  let k1, k2, m1, m2
+  try {
+    const u = new URL(ensureProtocol(url))
+    const path = u.pathname.split('/').slice(0,-1).join('/')
+    k1 = u.host
+    k2 = u.host + path
+    m1 = urlMetrics[k1] || {}
+    m2 = urlMetrics[k2] || {}
+  } catch(err) {
+    const m = urlMetrics[url] || {}
+    urlMetrics[url] = Object.assign(m, { err: (m.err || 0) + 1})
+    return
+  }
+  if (metric === 'TOTAL') {
+    urlMetrics[k1] = Object.assign(m1, { total: (m1.total || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { total: (m2.total || 0) + 1})
+  }
+  else if (metric === 'HEAD_ERR') {
+    urlMetrics[k1] = Object.assign(m1, { headErr: (m1.headErr || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { err: (m2.err || 0) + 1})
+  }
+  else if (metric === 'HEAD_TRUE') {
+    urlMetrics[k1] = Object.assign(m1, { headTrue: (m1.headTrue || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { err: (m2.err || 0) + 1})
+  }
+  else if (metric === 'HEAD_FALSE') {
+    urlMetrics[k1] = Object.assign(m1, { headFalse: (m1.headFalse || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { err: (m2.err || 0) + 1})
+  }
+  else if (metric === 'GET_ERR') {
+    urlMetrics[k1] = Object.assign(m1, { getErr: (m1.getErr || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { err: (m2.err || 0) + 1})
+  }
+  else if (metric === 'GET_TRUE') {
+    urlMetrics[k1] = Object.assign(m1, { getTrue: (m1.getTrue || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { err: (m2.err || 0) + 1})
+  }
+  else if (metric === 'GET_FALSE') {
+    urlMetrics[k1] = Object.assign(m1, { getFalse: (m1.getFalse || 0) + 1})
+    urlMetrics[k2] = Object.assign(m2, { err: (m2.err || 0) + 1})
+  }
+}
+
 const isImageURL = async url => {
   if (cache.has(url)) return cache.get(url)
 
-  if (knownPositives.some(regexp => regexp.test(url))) return true
-  if (knownNegatives.some(regexp => regexp.test(url))) return false
+  countMetric(url, 'TOTAL')
 
   let isImage
 
@@ -146,6 +167,7 @@ const isImageURL = async url => {
     const buf = await res.blob()
     isImage = buf.type.startsWith('image/')
   } catch (err) {
+    countMetric(url, 'HEAD_ERR')
     console.log(url, err)
   }
 
@@ -153,8 +175,10 @@ const isImageURL = async url => {
   // However, negatives may be false negatives
   if (isImage) {
     cache.set(url, true)
+    countMetric(url, 'HEAD_TRUE')
     return true
   }
+  countMetric(url, 'HEAD_FALSE')
 
   // if not known yet, run GET request with longer timeout
   try {
@@ -162,8 +186,10 @@ const isImageURL = async url => {
     const buf = await res.blob()
     isImage = buf.type.startsWith('image/')
   } catch (err) {
+    countMetric(url, 'GET_ERR')
     console.log(url, err)
   }
+  countMetric(url, isImage ? 'GET_TRUE' : 'GET_FALSE')
 
   cache.set(url, isImage)
   return isImage

Log: https://files.ekzyis.com/public/sn/imgproxy_processing.txt
Raw data: https://files.ekzyis.com/public/sn/sn_urlmetrics.json
Parsed as table: https://stacker.news/items/267200

Table with hit count:

hit means that link was found to be an image

Rank Domain Image Hits Total Hit Percentage Total Percentage
1 i.postimg.cc 2,266 2,279 99.43 30.49
2 pbs.twimg.com 560 617 90.76 7.54
3 i.ibb.co 331 335 98.81 4.45
4 nostr.build 292 340 85.88 3.93
5 www.zapread.com 263 285 92.28 3.54
6 cdn.nostr.build 248 255 97.25 3.34
7 i.imgflip.com 169 347 48.70 2.27
8 nitter.at 135 721 18.72 1.82
9 imagedelivery.net 103 104 99.04 1.39
10 media.tenor.com 91 93 97.85 1.22
11 bitcoinscoresby.com 78 86 90.70 1.05
12 upload.wikimedia.org 76 78 97.44 1.02
13 image.nostr.build 69 137 50.36 0.93
14 i.redd.it 58 117 49.57 0.78
15 files.peakd.com 54 55 98.18 0.73
16 preview.redd.it 53 107 49.53 0.71
17 void.cat 45 52 86.54 0.61
18 gcdnb.pbrd.co 43 71 60.56 0.58
19 www.ft.com 41 125 32.80 0.55
20 substackcdn.com 35 36 97.22 0.47
21 i.kym-cdn.com 34 35 97.14 0.46

@ekzyis ekzyis marked this pull request as ready for review September 26, 2023 17:50
ekzyis added 9 commits October 1, 2023 12:31
* moved imgproxy code into worker:
  - item table now has JSONB column `imgproxyUrls`
  - worker parses `text` and `url` and check whichs URLs are images
  - for every image URL found, we sign multiple proxy URLs for use with `srcset` (responsive images)

* we longer no replace image URLs
  - code is backwards compatible by decoding original URL for imgproxy URLs

* improved image detection:
  - fallback to GET if HEAD failed or returned negative
  - this shouldn't be a problem since this is done inside the worker with timeouts

* improved UI/UX: there are placeholder images now if original URLs should be loaded automatically:
  - placeholder if image is still processing
  - placeholder if there was an error loading the image from our image proxy

* moved image code in frontend to components/image.js
@huumn huumn merged commit b2b38d8 into stackernews:master Oct 1, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new product features that weren't there before
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Potato" image quality via proxy
2 participants