Skip to content

Latest commit

 

History

History
444 lines (308 loc) · 18.1 KB

decisions.adoc

File metadata and controls

444 lines (308 loc) · 18.1 KB

Talkyard Decisions

In this file: Decisions made when building Talkyard, about how to do things, and how Not do things.

Usage:

1) Before adding any 3rd party lib, e.g. react-transition-group, search this file for the name of that lib, and related words, e.g. "animations". Because maybe a decision was made to Not use that lib and do things in other ways instead. And it’d be annoying if these decisions got forgotten so that years later, things got done in the bad ways.

2) Before making any non-trivial decision, search this file for related words, and see if there’re related thoughts and previous decisions here, already.

Long ago

Use Webdriver.io for end-to-end tests because:

  • It supports Multiremote, i.e. running many browsers at the same time in the same test, useful for testing e.g. the chat system.

  • It has synchronous commands: console.log(browser.getTitle()) instead of browser.getTitle().then(title ⇒ console.log(title)) (or async await everywhere?).

  • It is under active development (even more nowadays 2020-11).

Year 2020

Use Asciidoc for documentation

2020-05-16: Use Asciidoc for docs, not Markdown or CommonMark.

Why? See e.g.: https://news.ycombinator.com/item?id=18848278: "I’ve entirely replaced markdown and been using asciidoctor for both documentation and general writing. Asciidoctor is far superior to Markdown [ …​]". You can also websearch: "site:news.ycombinator.com markdown vs asciidoc".

As of 2020-05 most Ty docs are in Markdown. But don’t add any more Markdown docs. And some day, can run sth like: find . -type f -name '*.md' -exec pandoc -f markdown -t asciidoc {} \; to convert from .md to .adoc. https://news.ycombinator.com/item?id=22875242

Use CSS animations

2020-05-16: Don’t add react-transition-group or any other animations lib. [REACTANIMS] Only use CSS anims in Ty for now.

CSS animations are simple and fast and keep the bundles small. Others agree: "use it instead of importing javascript libraries, your bundle remains small. And browser spends fewer resources" https://medium.com/@dmitrynozhenko/5-ways-to-animate-a-reactjs-app-in-2019-56eb9af6e3bf

"ReactTransitionGroup has small size" they write, but: react-transition-group.min.js is 5,6K min.js.gz (v4.4.1, May 2020) — that’s too much for Talkyard! (at least for now, for slim-bundle).

Don’t use Nodejs server side

Don’t use Express (expressjs.com) or sth like that. Hard to reason about security, e.g.: https://hackerone.com/reports/712065, who could have guessed:

const _ = require('lodash');
.zipObjectDeep(['proto_.z'],[123])
_.zipObjectDeep(['a.b.__proto__.c'],[456])
console.log(z) // 123
console.log(c) // 456

Plus other vulns all the time in js libs it seems to me. And 9999 microdependencies —> supply chain attacks made simple. "npm has more than 1.2M of public packages […​] perfect target for cybercriminals" writes https://snyk.io/blog/what-is-a-backdoor. Look: is-odd and is-even: https://news.ycombinator.com/item?id=16901188.

Using Javascript in the browser, though, is different — as long as the browser talks only with the Ty server.

Also better not use Python or Ruby for server code.

SECURITY [to_ty_risky] to-talkyard is a bit risky: nodejs code that parses user defined data. Use a different lang? Run in sandbox?

Edit, half a year (?) later: Turns out there’s a "Nodejs done right", namely Deno, by he who created Nodejs actually. Ty might use Deno for scripting (instead of js or bash).

Vendor everything

Vendor all dependencies, i.e. bundle their source code (or at least JARs) together with Talkyard.

Four reasons:

  • Security. Can diff changes in vendored deps.

  • Developer friendliness. Seems some people are behind company firewalls and proxy servers that break downloading some dependencies. But accessing Git seems to have worked for everyone.

  • Continuous Integration. Then good if can build offline?

  • Reproducible builds (also requires using a specific OS? Which should be mostly fine, since things get built in Docker containers with a specific OS and version — but will need to specify the exact image hash?) [repr_builds]

  • Working offline. If the wifi stops working and your phone too (no tethering).

Place the vendored stuff in Git submodules. Then, if the vendored stuff’s Git repos eventually grows large because of old historical commit objects, those Git submodules can be squashed or sth like that, making them small again — without affecting the history in the main repo (it’d just be updated to point to the squashed stuff).

"Join" or "Sign Up" button title

What should the join community button title be? [join_or_signup]

"Join", "Sign Up", "Create Account", "Register"? _ - Not "Register" — sounds like the tax agency. - Not "Create Account" — that’s too long, causes UX trickiness on 400 px wide phones. - Not "Sign up" because non-native English speakers sometimes don’t know the difference between "Sign Up" and "Sign In", or "Sign Up" and "Log In". (See e.g.: https://ell.stackexchange.com/questions/24384/what-is-the-difference-among-sign-up-sign-in-and-log-in )

But "Join" is simple to understand? And short and friendly? So "Join" it’ll be.

Don’t use "Sign in" for the login button title either, because can get confused with "Sign up", and also, is 1 letter longer than "Log in". Facebook uses "Log in" so "everyone" should be used to this.

So: "Join" and "Log in" and "Log out", in Talkyard.

Sort posts by time, by default [why_sort_by_time]

Talkyard used to (before 2020-11-25) sort replies by most-popular-first. However I think this was confusing, in that most of the time, people don’t upvote others' replies — so most of the time, replies are in effect sorted by time only, …​

  1. Except for sometimes, someone clicks Like, causing the thereafter upvoted post, to berak the seemingly by-time sort order. Making things look a bit random?

Instead, there can be a one-click button "Show Popular Replies" or "Show best answers", to sort best-first.

There’s a per site config value to change the sort order — maybe per category or topic one day? Topic type? Q&A = best-first, others by time?

Don’t use shared CDN

It’s pointless: Chrome >= v86 partitions the cache by URL, and Safari too, so there’re no performance benefits — instead, slightly slower, because of a TLS handshake. The previous browser behaviour enabled cross-site tracking, checking if the user had visited certain sites etc.

Git submodules, not subtrees

Talkyard already uses somewhat many submodules — and then, if starting to use subtrees too, "everyone" would have to learn two different things: both submodules and subtrees.

Year 2021

Use lua-resty-acme for automatic HTTPS

There’s also lua-resty-auto-ssl but it has a dependency on Dehydrated, a Bash script — whilst lua-resty-acme is Lua only: simpler to install and to understand the code. It’s actively developed by Kong which Ty a bit depends on indirectly anyway because they sometimes contribute to OpenResty. See: fffonion/lua-resty-acme#5

Use Snowpack; skip Webpack and Parcel

Procrastinating isn’t too bad — now, thanks to having done nothing for years (just Typescript and concatenated files with Gulpjs), suddenly Snowpack has appeared, and looks better than Webpack, Parcel and Browserify. (Rollup? Snowpack uses Rollup.)

It’s ok to continue using ElasticSearch

Elastic recently (2021-01) announced they’ll change from Apache2 to Server Side Public License (SSPL):

However they dual-license under both SSPL and the Elastic License — the latter allows using ElasticSearch the way Talkyard does it: all Basic features are available at no cost, when using ES via object code (not source code) and when the primary purpose is not ElasticSearch itself. In Ty’s case, the primary purpose with Ty is not ES, but discussions between humans. From the license:

You agree not to: …​ (iv) use Elastic Software Object Code for providing …​ software-as-a-service, …​ where obtaining access to the Elastic Software or the features and functions of the Elastic Software is a primary reason or substantial motivation for users of the SaaS Offering to access and/or use the SaaS Offering …​

Also, it seems only Elastic may distribute the ES object code:

You agree not to: (iii) …​ distribute, sublicense, …​ Elastic Software Object Code,

That’s fine — with Talkyard, people download the ElasticSearch Docker image themselves (via Docker-Compose) from Elastic’s own servers — see images/search/Dockerfile.

(FYI: Talkyards' source code integrating with ElasticSearch (via its REST API) is about 750 lines? See app/talkyard/server/search/. Whilst Ty’s whole code base is about 160 000 lines? (as of 2021-01), I don’t remember exactly — anyway, the directly related to ElasticSearch code in Talkyard should be < 0.5%? of everything. This can make it easy to understand, also for outsiders, that the primary benefit with Talkyard really is not ElasticSearch. Also, that should be about how much code would need to change, if replacing ElasticSearch with, say PostgreSQL’s search or some of the upcoming alternatives in Rust or Golang.)

Avoid Bash scripts. Use Deno

Bash scripts should be forbidden. I’m getting more and more upset at various scripts, e.g. this PostgreSQL related snippet from here:

if [[ ${1:0:1} = '-' ]]; then
  ...
  set --

What does ${1:0:1} do, what can I search for, to find out? I search for "bash bracket colon one zero"? …​ 10 frustrating minutes later, I find something at AskUbuntu:

${var:pos} means that the variable var is expanded, starting from offset pos. ${var:pos:len} means that the variable var is expanded, starting from offset pos with length len.

And what does set -- do? Fortunately help set shows this — but still, the above Bash code is utterly unreadable without jumping to a web browser and start searching & browsing the Internet, even if you’ve been coding for 10+ years and written somewhat much Bash code yourself already.

Use Deno instead of Bash. Why Deno:

  • Can code in Typescript, which Ty uses already (no new language to learn).

  • More secure: Doesn’t allow file system or network access, by default.

  • Easy to install (just a single binary, can incl in the vendors Git submodule).

  • Popular, large community, many libs if needed.

  • Couldn’t find anything else anyway. Ruby? No. Python or Nim? That’d be new languages to learn. (And lacks Deno’s security features?)

  • Maybe Lua? Ty uses Lua (Nginx plugins). But tiny community and few libs, lacks the Deno security features — could maybe implement oneself, but seems a bit complicated?: how-to-wrap-the-io-functions-in-lua-to-prevent-the-user-from-leaving-x-directory. So, Deno then.

Pub API: Avoid GraqhQL. Use custom REST + JSON

GraphQL looks nice, I start thinking, now when working with Ty’s Search, List and Get APIs. However GraphQL doesn’t seem to bring any particular benefits to Ty, as of now, 2021-02.

GraphQL pros:

  • Well thought through query language, see e.g.: https://dgraph.io/docs/graphql/schema/search/ — but Ty already has its own ok REST API with get/list/search and filters: [api_json_query]. Supporting via GraphQL too would be dupl work, right.

  • Maybe simpler to use for people who know GraphQL already. But not many are familiar with GraphQL? — No one has ever asked about GraphQL.

  • Subscriptions seems nice?

(A supposed GraphQL benefit / REST + JSON drawback from howtographql.com: "With REST, you have to make three requests to different endpoints to fetch the required data. You’re also overfetching since the endpoints return additional information that’s not needed" — But one can create a REST api that looks at some request params and then includes all that’s needed, and only what’s needed / not-much-more, in a single response.)

GraphQL cons:

  • DoS attack risk? Clients could construct GraphQL queries that ask for lots of things and lists of nested things, missing the caches, causing lots of database queries. Slightly complicated code would be needed to ensure the clients are well behaved — and lots of test code, for that code. — In the same way, custom ElasticSearch queries or custom regular expressions, would also be risky.

  • Runtime bug risk: Wiring the schema definition to Java/Scala code and SQL queries is not type safe — look at e.g.: https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/#datafetchers dataFetchingEnvironment.getArgument("id") — the compiler won’t check "id". Or is this Scala lib type safe, compile time code gen? "automatically derives GraphQL schemas from your data types", and: https://ghostdogpr.github.io/caliban/docs/#a-simple-example.

  • More things to learn — there’s a lot to read over at https://graphql.org, plus learning the particilar Java/Scala lib to implement GraphQL server side.

  • What Java/Scala lib to use? There’re 5 to choose among: https://graphql.org/code/ Takes a while to evaluate all of them, pick the one that’s best for Ty.

  • Duplicated work: Would need to create a GraphQL schema, in addition to having Ty’s domain model in Typescript already.

  • Even more libs and complicated tooling: Would probably eventually want to auto generate Typescript interfaces, from the GraphQL schema?

a myriad of [GraphQL] tools have been released that are trying to improve the workflows around SDL-first development

[challenging ot keep] schema definition is in sync with the resolvers at all times

split into files (instead of one big schema file)

reuse files

Also, now there’s another similar graph query language, DQL: https://dgraph.io/docs//query-language/graphql-fundamentals/ > Dgraph Query Language, DQL, (previously named GraphQL+-) is based on GraphQL

So, for now, use Talkyard’s Get, List and Search APIs, but not GraphQL. At least wait until Scala 3 and the Scala 3 macro system — probably the existing / any-future Scala GraphQL libs will make use of them in some nice more-type-safe ways. Or maybe shouldn’t even be in the Scala app, maybe instead a separate server in Rust, optionally reading from a read-only replica rdb.

Use WebAssembly for plugins

WebAssembly is "future compatible": If (parts of) the Talkyard app server ever gets rewritten in another language, say, Rust, plugins written in WebAssembly can be called from that new language too. But if the plugins were instead in Java or Scala, they couldn’t easily be used from the new code.

Browsers can run WebAssembly. Deno can run WebAssembly natively: https://deno.land/manual/getting_started/webassembly. GraalVM (JVM) can also run WebAssembly. And Rust and Bash: https://docs.wasmtime.dev/lang.html. There’s a Scala interpreter: https://github.com/satabin/swam.

Also: No GC, small code size.

Rust and Typescript, well, AssemblyScript, can be compiled to WebAssembly: https://rustwasm.github.io and https://github.com/AssemblyScript/assemblyscript. (But not Scala: scala-js/scala-js#1747)

Use the tag system for user titles

In Talkyard, one can tag not only pages, but also users. A user tag functions as a user badge: the badge (tag) title can be configured to appear next to the names of people with that badge, for example in the about-user dialog, or in posts by them, e.g.: "By Some Name @some_name support-team" so everyone notices that Some Name is an official member of the support team.

Why use the tag system, for user titles, instead of adding titles directly to users and groups?

  • Because if there’s both user badges, and also user titles / group-member titles, then there’d be two systems for making a text appear next to someone’s name. And two systems, is more complicated, than just one system? E.g. more code for deciding which titles to show/not-show, if someone has, say, two titles from two groups hen is in, and also has two badge titles. And in which order to show the titles. Sometimes the group/user title(s) would be more interesting; in other cases the badge title(s)? And db columns determining what to show/hide & show in which order, would need to be duplicated between tables (both pats_t and tags_t)? And there’d be source code that understands and considers both group titles and badge titles, and maybe keeps badge + group titles unique accorss two tables? More complicated.

  • Maybe maybe a security issue, if using groups for titles? A moderator might want to give someone a title, and so adds hen to a group with that user title. But groups can have security settings, and maybe the mod didn’t remember all security settings associated with that group — and by adding the user, accidentally granted unintended permissions. However, giving someone a badge, won’t have any such security implications.

    ("Badge permissions" gives 0.8k Google search hits, whilst "group permissions"
    gives 300k hits — so, probably people expect groups, but not badges, to be
    associated with permissions? Groups = security, badges = for display.)

Summary: Group member titles, and user titles, would make sense, if there wasn’t also group member badges and user badges. But one solution is enough.