Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonize Readers' User-Agent #110

Open
rgaudin opened this issue Oct 28, 2024 · 2 comments
Open

Harmonize Readers' User-Agent #110

rgaudin opened this issue Oct 28, 2024 · 2 comments
Labels

Comments

@rgaudin
Copy link
Member

rgaudin commented Oct 28, 2024

Looking for per-reader downloads stats, we realized (back in 2017! kiwix/container-images#82) that our stats analytical tool (matomo) doesn't identify most of our readers.
While this is a matomo/operations issue/change, it highlighted the fact that our downloader's User-Agents are mising or poorly chosen.

Here's the situation. The identification column was built with matomo's lib for this.

Reader User-Agent Sample Identified as
kiwix-desktop aria2/{aria-version} aria2/1.36.0 Client Type=library, Name=Aria2, Version=1.36.0
kiwix-android kiwix-android-version:{VersionCode} kiwix-android-version:231101, kiwix-android-version:-1 OS Name=Android, ShortName=AND, Platform=, Family=Android, Version=
kiwix macOS Kiwix/{ProjectVersion} CFNetwork/{CFNetworkVersion Darwin/{DarwinVersion} Kiwix/173 CFNetwork/1568.100.1.1.1 Darwin/24.0.0 OS Name=iOS, ShortName=IOS, Platform=, Family=iOS, Version=18.0
kiwix iOS Kiwix/{ProjectVersion} CFNetwork/{CFNetworkVersion Darwin/{DarwinVersion} Kiwix/173 CFNetwork/1568.100.1.2.1 Darwin/24.0.0 OS Name=iOS, ShortName=IOS, Platform=, Family=iOS, Version=18.0
Kiwix JS Electron xxx KiwixJSElectron/{nwVersion}-E xxx (used in UA built by Electron) Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) KiwixJSElectron/3.4.1-E Chrome/122.0.6261.156 Electron/29.3.1 Safari/537.36 Client & OS Type=mobile app, Name=KiwixJSElectron, Version=3.4.1, Name=Windows, ShortName=WIN, Platform=x64, Family=Windows, Version=10
  • Kiwix JS extensions and PWA uses the browser so there's no specific U-A
  • It looks like Kiwix JS Election is the only one that's correct and I believe it's not specifically set but built by Electron.
  • Apple version for macOS and iOS are almost identical. matomo is even detecting iOS for the macOS version.
  • User-Agent convention is pretty clear
  • kiwix-desktop should set --user-agent when configuring aria2
  • Apple should manually set the User-Agent header of the URLRequest
  • Build versions as used in Android and Apple are of little help for our use cases. We want human version numbers. U-A allows us to specify both in the same string though.

I suggest we use the following: Kiwix-{flavor}/{humanVersion} ({platform}/{platformVersion}). I believe the build number is not useful but this is debatable and can be added as a comment (after the parenthesis).

Which would translate as follows:

Kiwix-desktop/2.3.1-4 (Windows/11)
Kiwix-android/3.11.1 (droid/12)
Kiwix-ios/3.6.0 (iOS/18.0)
Kiwix-macos/3.6.0 (macOS/15.0)

The most important question being: do we want to consider all readers as Kiwix product or should each be its own product? If not, then matomo for instance would group them all under the Kiwix product and we'd only be able to distingish readers-originated downloads from others but not compare readers together

Kiwix/2.3.1-4 (Windows/11)
Kiwix/3.11.1 (droid/12)
Kiwix/3.6.0 (iOS/18.0)
Kiwix/3.6.0 (macOS/15.0)

Once this is settled, we can both patch our matomo image and make a PR for matomo's repo

628a629,653
>
> - regex: 'Kiwix-desktop/(\d+[\.\d]+)'
>   name: 'Kiwix Desktop'
>   version: '$1'
>   url: 'https://github.com/kiwix/Kiwix-desktop'
>
> - regex: 'Kiwix-android/(\d+[\.\d]+)'
>   name: 'Kiwix Android'
>   version: '$1'
>   url: 'https://github.com/kiwix/kiwix-android'
>
> - regex: 'Kiwix-ios/(\d+[\.\d]+)'
>   name: 'Kiwix iOS'
>   version: '$1'
>   url: 'https://github.com/kiwix/kiwix-apple'
>
> - regex: 'Kiwix-ipados/(\d+[\.\d]+)'
>   name: 'Kiwix iPadOS'
>   version: '$1'
>   url: 'https://github.com/kiwix/kiwix-apple'
>
> - regex: 'Kiwix-macos/(\d+[\.\d]+)'
>   name: 'Kiwix macOS'
>   version: '$1'
>   url: 'https://github.com/kiwix/kiwix-apple'

There's also the question of CustomApps. In the logs, I've seen AndroidDownloadManager (when deferred to the system?), QtWebEngine (kiwix-desktop ??), WikivoyagebyKiwix (another branding hell).

@Jaifroid
Copy link
Member

Jaifroid commented Nov 11, 2024

That's interesting. Can I assume you are referring to running analytics on the download library's server?

It looks like Kiwix JS Election is the only one that's correct and I believe it's not specifically set but built by Electron.

This is derived from package.json and Electron populates its headers based on the data set there. As package.json is only valid with a version number, I don't think I'd be able to disable that easily (well, you can do most things in Electron, so there would be a way, but it would need specific coding, which it would be a shame to have to do if the version number is the only thing you don't want).

Kiwix JS extensions and PWA uses the browser so there's no specific U-A

This is more complex. I cannot directly modify the User-Agent header through JavaScript for security reasons. This is a protected header that only browsers control. For cross-origin requests, I can, however, set certain CORS-safe headers like:

Accept
Accept-Language
Content-Language
X-Requested-With
X-App-Info

For this to work, say if I were to use X-App-Info, the server must be configured to accept a custom header through CORS by including it in the Access-Control-Allow-Headers response header.

Please note that the PWA still accesses download.kiwix.org (which was CORS-enabled at my request several years ago), while the Browser Extension's in-app library uses an iframe to display library.kiwix.org to the user in a basic way, given that we haven't implemented (lack of time!) API access via application/json. Since library.kiwix.org does not (or didn't at the time of development) enable cross-origin requests in its Response header, we cannot modify anything about those requests currently. I'm not sure if you'd be willing to enable CORS for library.kiwix.org given that the proper way to access the library is via the API... (and I do intend to explore that sometime).

One further issue is the necessary inconsistency in handling of the ZIM download (if this is being monitored):

  • Browser Extension: download is handed over to the Browser. It is not currently enabled in app.
  • Electron: all downloads are managed in-app using Electron APIs.
  • PWA: if the browser has the File System Access API and/or Origin Private File System, the app manages direct downloads from Kiwix into the OPFS or access-enabled directory, but it hands off downloads from mirrors to the browser (due to CORS). If those APIs are not available, download is handed over to the browser.

@rgaudin
Copy link
Member Author

rgaudin commented Nov 12, 2024

Thanks for those details @Jaifroid ; the data was indeed extracted from logs of download.kiwix.org.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants