-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New concept for eliminating compression (280 byte limit) #529
Comments
Hi Frank! I too am interested to see "double dweets" discussed again, and am glad you're the person to resurface it. I have quite a few thoughts on this, so bear with me 😬
I think this may be the strongest argument... 140 Unicode / 194 ASCII has been amazing, but it does feel harder and harder to find new and interesting things to squeeze into it. I'm not completely sure if this is a personal limitation, a temporary plateau, or a fundamental combinatorial limit of 140 characters, but I'm leaning toward the latter (yes there are The other consideration is the arbitrarity of these limits, we have to pick something, but we should pick one thing... multiple or variable limits feels too arbitrary, because it's no longer a shared limitation. I do believe that the singular limit of Dwitter is part of what makes it so compelling: "do anything on a canvas with JS, in 140 characters", that's it... you know what to do... and you instantly appreciate and understand what has been done. My point is, while I like the idea of moving onto double dweets, I also think maintaining multiple limits is a bad idea... so this proposal should be a change, not an addition. Though something along the lines of a badge of appreciation for the original 140 limit is a good idea... There is already a kind of understanding and appreciation for uncompressed dweets, <140 dweets.
This is tricky. As soon as we start talking about bytes outside of ASCII and enter the land of Unicode, we have to start talking about encodings, and different encodings consume bytes in different ways. As far as JS is concerned I think we only need to consider two: Modern text files today are stored in UTF-8, and most databases use utf8mb3 or 4 for Unicode which is roughly equivalent. Both of these use one byte for the ASCII part of Unicode, then variable (multi-byte) bytes for extended Unicode points up to 4. To differentiate from single byte ASCII these come with special prefixes indicating the byte length and prefixing continuation bytes, I wont bother explaining in detail because wikipedia does a pretty good job. The point is that these prefixes take up codepoint space, so the first Internally JS uses USC-2 which is essentially UTF-16 (don't ask me the difference, i'm not sure anyone can give you a straight answer on this either). UTF-16 is not a variable byte encoding, it stores everything in 16 bit words as the name implies: ASCII? still two bytes, unicode When you compare these side by side UTF-16 looks a bit silly for most of the chars we care about (ASCII), because it will double the byte count, but we could probably discount this as it's only an internal representation (in fact I think this is how most JS engines counts string length, ignoring empty upper bytes). However there are more subtle differences for the codepoints above ASCII, where some will fit into 2 bytes in UTF-16 but will need 3 bytes for UTF-8. I'm not 100% sure how the differences turn out empirically when looking at UTF-8 and JS string.length side by side, but i'm pretty sure there will be differences, (there are even differences between JS engines for string.length). ... anyway, I find it hard to understand and explain all these subtleties even though I know they exist 😕 so maybe you can see why i'm not convinced we are going to be able to explain them to people who don't even know they exist! (unless we stick with one byte ASCII and say goodbye to emojis and unicode byte packing). [edit] I just realised JS string.length is supposed to count UTF-16 words, which is why every character that fits into one UTF-16 word =1char and characters needing a surrogate pair are counted as 2 chars... anyway, string.length!== bytes in general, but it does == 1/2 the bytes of UTF-16 encoding. [edit] alternatively... we just say bytes are counted in UTF-8 which has basically become the universal storage encoding for text, even though it's not used internally for programs and languages for performance reasons. But "storage" is probably more tangable, a tweet on twitter is no doubt stored as utf8mb4 on a database, a tweet copied into a text file is no doubt saved as UTF-8 on the block device etc etc. |
Thanks for resurfacing this discussion! I know it's been simmering in the background and it's clear you've spent a lot of time reflecting on this. It's also great to get this input from one of the most prolific dweeters! Here's my personal stance at the moment:
All points are open for discussion, but that's where my head is at the moment. Thoughts? As for implementation; I'd like to get the new frontend across the finish line first, and I'm starting a new job on Monday so the near future is hard to predict. Let's keep the discussion going here and on discord 👍 |
Agreed. This fact alone means that If bytes are to be the new metric, the encoding must be UTF-8, because counting bytes in any other unicode encodings will result in more for ASCII. Which means we are back to two options:
Sticking to codepoint counting would continue to allow the ASCII unicode packing via UTF-16 surrogates, so The more I think about UTF-8, the more it makes sense. Demos are usually limited in bytes but also by stored size, not size in memory. Text in a modern DB is UTF-8, text in a HTTP request is usually UTF-8 encoded, text in a file is usually UTF-8 encoded, pretty much everything that is not an internal representation for performance is UTF-8... Not to mention pointing at and explaining UTF-8 encoding is way simpler than the headache of UTF-16/USC-2... https://en.wikipedia.org/wiki/UTF-8#Encoding Sorry for the detour.
Understood, no rush whatsoever, and ultimately this is your decision 👍 I'm just glad to see it being discussed. Hope everything goes smoothly for you on Monday. [edit] Quick compatibility check... existing "compressed" dweets should be 43 bytes for the ASCII unpacking code + a max of 4 bytes for each of the 97 packed characters when using UTF-8 encoding *4 = 237 bytes max. Which almost achieves what @KilledByAPixel suggests (eliminating compression) by bringing it close to 280 bytes. But we should eventually check this empirically for cases outside of the usual compression methods. |
Hi, a few things...
It's a great idea but after thinking about it there is another problem, remixing legacy dweets. If someone tries to remix a dweet that uses those shortcuts, how does it work? Even just looking at a legacy dweet is going to be confusing for new users unaware of these shortcuts. Personally I don't mind the shortcuts as much as the compression. I think a few shortcuts are friendly to newcomers while compression is scary. It would be cool to have vanilla js, but it's really just 4 quick defines. I also appreciate the time savings, i've typed S and C many thousands of times over. I'd actually suggest one additional shortcut to be introduced, nothing crazy mind you... H - hsla color, works the same as R but for hsl. This isn't really gonna change what we can do math wise but should result in better color usage. Generative art generally is easier to produce and looks better in hsl. HSL would be an nice addition whether or not we decide to go to 280 bytes.
The way I envision it is like right now, we already added code to auto uncompress dweets. So, compressed dweets would be swapped out with uncompressed code and show the uncompressed byte count which for most is capped at 194. There can still be a button or something to view the original uncompressed code, but we don't even need to allow editing that, it should be read only. It would be great if we stored the uncompressed version of legacy dweets in the database along side them. I can write a script to spin through the database pump out json of all the uncompressed code. I've been meaning to do that for dweetabase so compressed dweets are searchable. There are some older dweets that are compressed in different ways that may be above 280, but that's pretty rare. In this case it would show the byte count in red like it currently does. |
Ah i see! So the change could achieve multiple things... we get more characters to play with, the existing uncode packed dweets can be made more accessible by simply replacing them with their 194 byte ASCII, obviating the compression toggle idea... and a new ASCII unicode packer will not emerge because now we are counting UTF-8. However other more interesting compression techniques might continue and emerge that don't obfuscate the source so much. This feels like a really cohesive change. |
You got it exactly. The problem I have with the current compression is it is dumb, because it doesn't actually compress anything. I mean it's a brilliant invention but completely useless outside of dwitter. It silly that compressed dweets take up more space. I would love to see people experiment with real compression schemes that are case dependent and potentially useful for other size coding projects. Like for example my way of drawing bitmaps on dwitter will no longer work as it uses a similar unicode trick. So, we'd need to find a new way to pack images. |
Sounds sensible to me!
Another alternative is to add a preamble when remixing since there's now more space:
Keeping c and x makes sense, though.
Yes, we should revisit the shortcuts; if we're not removing them all, we could certainly consider whether we should add some. Note that this could also break old dweets; if they rely on H being undefined, for instance. |
I don't think that would work by just adding it at the top because so many of the best dweets rely on those shortcuts being setup before the dweet runs. Not as trig functions, but because they are valid variables. It is not a small amount but in the hundreds that use techniques relying on that. Can you explain more your reasoning about wanting to go vanilla?
Maybe, but it is unlikely, I would be surprised if it came up. There are some dweets that use the variable H but relying on something being undefined is very unusual for a dweet. |
Since the rulesets are already arbitrary, my bias is to stay as close to vanilla js as possible. Makes it somewhat clearer what's going on for people who know js. But I'll admit there's an allure to the purity itself. It's also a much more consistent approach than adding high-value shortcuts for a semi-arbitrary subset of js. And I think 280 would be enough to make up for the loss of the shortcuts.
The scenario that came to mind was variables that are initialized once, by relying on them being undefined for the first iteration. But you're right it's probably not many dweets that would break if we add a few more shortcuts, just mentioning that theoretically there could be. |
I agree with counting by bytes. When it came to utility functions. Either extend or eliminate. I like either idea. Extending exposes us to feature creep, but that may not necessarily be a bad thing. If utility functions were added at a slow steady pace, say one per month by vote from a list of candidates, it would allow for the growth of a well considered micro library that might be quite nice to use in general. It would mean an ever growing capability of Dweets, but it would also mean committing to non-vanilla JS. An extending library might require versioning and Dweets being tied to a version that existed at post time to avoid compatibility. It may not be an issue if no Dweets rely on things being undefined at call time. Taking the path of Vanilla JS I also had the idea of a default preamble of
note, this version also defines M. Include the preamble for old Dweets, For new Dweets just prepend it to the preamble to the default and let people reuse the characters |
If you do get rid of the shortcuts, I think it would make sense to get rid of some other variables too like frame, FPS or anything else that pollutes the space. Though I still lean towards keeping everything as it is now. |
I'm pretty sure that's not the case. You can still stuff bits into a unicode point however you want... how many bytes it ends up as is determined by UTF-8 encoding, but how it's used in JS is completely independent. It will certainly count for more when packing beyond 8bits with byte counting, but it's still an efficient way of encoding bitmaps etc in JS source and there's no reason it should stop working.
I'm two minds on this, on the one hand I feel like it will be a can of worms for compatibility after having attempted to make changes to the dweet.html file myself. On the other hand I feel like that file is way more complex than necessary and agree the scope could be a lot cleaner... I did an experiment a while ago removing all of the postmessage stuff and external libraries and found that the only significantly complex thing that was actually necessary to keep was the instrumentation, to prevent infinite loops when you are mid typing something... I dislike the existing solution of injecting code because it requires a whole parser and is quite messy, so found two other methods:
|
You could potentially put the essentials into parameters and end up with a global free
This would break a few dweets, but those ones were a bit sus' anyway :-) Amusingly, if you passed x,c,t as parameters. that would allow you to have all dweets as functions of the form dweet24232(c,x,t) and call dweets from your dweets. You could do a prescan to only bring in dweets actually called. I would also like to make clear that I'm not suggesting we actualy do this thing, that way madness lies, but imagine the possibilities. |
While a fun idea, and something that I think would be fun for the regular community, I do not want to go in the direction of an extending library for two reasons: As you said, the increased cost of backward compatibility seems like a potential pain, as dweets behave in strange ways. But more importantly, this creates an increasingly complex "dwitter language" that adds to the confusion of new users. I like the premise to be as simple as possible. So either a small well-selected set of shorthands or none :) If we want to play with evolving rules I think a potential idea would be "seasons" were do a full change of the ruleset, breaking backward compatibility, every year or two.
I like this idea, only a handful of parameters to the function. Also, if we think 280 without shorthands doesn't allow enough complexity, there's nothing stopping us from doing 310 or something to account for it. In my mind adding shorthands is similar to increasing the limit, while also biasing the cost of functions in a different direction than vanilla js.
Yes absolutely. Having a clear separation of legacy dweets will allow us to serve the old dweet.html file for those dweets, ensuring they will still run, while starting with a clean slate!
Very cool! Webworkers would really be the solution, last time I looked at it there was lacking support and potential workarounds would have horrendous performance. Worth revisiting at some point! |
If we are going 280 chars and no compression, it would be nice to have some functions be 1 character - Math.random(), Math.abs, new Path2D() Also a checkbox that applies c.style.filter = 'invert(1)' The last one is the most important, because its more of an aesthetic thing than algorithmic |
I was hoping we wouldn't go too far down this road, it's inevitable as soon as we start to discuss it everyone will have their favourite builtin they want a shorthand for. Mine would be Math.hypot ... but I don't want to ask for it, because a huge library of shorthands is a bad idea as Lionleaf has already mentioned.
Agreed, and I don't think there's any harm to stick to the existing set of shorthands, they are a well balanced selection. For other math functions, not having access to them has forced us to be pretty innovative... Math.random() is a good example, probably the most requested shorthand, but many of us have ended up pulling psudo randomness out of the same math we are doing instead. The trig functions are probably one of the most widely useful and the most difficult to find alternatives for (just try implementing it yourself). If anything R() feels like the extra to me, but it's only 1 extra. I like the idea of making a cleaner dweet.html off the back of the change to 280, but we don't have to rip everything up. In my attempt at creating a cleaner more minimal one I essentially preserved these custom globals: I'm ignoring the bunch of dweets that hacked on the FPS object etc (i've done some), but they are pretty niche and completely dependent on that library, so i'm considering them sacrificial. Use of [edit] TL;DR If we have to break compatibility anyway, I propose we only subtract and get a really clean scope for A huge number of the old dweets would actually still run in this new version, which means remixing old dweets will usually work. For all of those exploiting implementation details on the previously poluted scope of |
Thank you, I appreciate the feedback everyone! About 280 bytes
About changing the shortcuts/variables
About the infinite loop buster
|
Crap you're right! Also my math is wrong in my earlier estimate, it's For those wondering you can test this with the browser native UTF-8 TextEncoder: So we would actually have no choice but to decompress them into ASCII if we want all of the "compressed" dweets to be counted under the 280 limit. And for special cases like your mona dweet, there isn't really any way to force it under the limit. |
The offscreen canvas API was actually brought about specifically for this purpose.
The idea is that you can do all the calculations and canvas API calls on the worker thread, but ultimately you are operating on the same pixel buffer that's attached to a canvas element in the DOM, so there is no copying pixels back and forth, it's just a pointer. But unfortunately this is still Chrome only, so it's something we can switch to in the future only once firefox and safari catch up. But it would be worth it, because we would get better performance, better usability, and dispose of unnecessary instrumentation code because the timing can be measured from the main thread and worker killed if it takes too long. For the purposes of our discussion, I was just mentioning how the loop busting stuff was the only complex thing that is necessary once you strip down the dweet.html. I did consider regex, but it's not as thurough as a full parser and will miss cases unless you are also building some kind of AST with the regex... but perhaps we don't need to be that thurough, i'd be interested to see your regex implementation. [edit] I think my point here is that if moving onto 280 bytes, this is a good opportunity to clean up the dweet.html and get a really clean implementation (even if we stick to the official unchanged globals). Then we can load all of the old dweets with the old dweet.html incase they exploit implementation details (or in-case like your monadweet they are actually too big), and then probably 95% of all dweets will remix into the clean implementation without changes anyway. But I realise this is a bit of an aside to the main purpose of this discussion... I just don't see another opportunity to make breaking (ish) changes in the future. |
I'm sure it can be done with web workers, but my concern is running at a smooth 60 fps if the dweet runs fast enough to allow it. The only way I've found to run smooth animation in javascript is with requestAnimationFrame, though it is only smooth for the refresh rate of the monitor which for most is a multiple of 60. It would be a great proof of concept to have a standalone html file with the dwitter shim that allows you to type code into a text area and runs the code via web worker with the loop protection. That's where I would start. Let me know if you need help digging into it, sounds like an interesting problem. It may also be possible to produce better error results for the code using the web worker. Imagine if we could highlight in the code what caused the error. Here is my regex, it has served me well and is a good solution if you don't have a full parser available...
|
Don't worry, The spec is further along than I previously realised, there's now a working draft [0], Chrome has finished it, Firefox [1] and Webkit [2] have in progress implementations, Firefox have it behind a flag, Webkit has Linux support, Safari will likely pick it up after Webkit has stabilised. Anyway this made me feel like it was worth exploring more, since we may not have to wait terribly long. My findings so far: It makes for super clean and minimal dweet implementation, because most of the gnarly details required to make it practical for dwitter.net just evaporate when running the dweet code in a worker. e.g keeping the scope clean is trivial. Errors don't need to be explicitly handled, just let them throw and use
Yup, i just played with this a bit, it works! This should make errors a lot easier to follow. I'm going to keep iterating, and I'll post my results when it feels complete. I just wish offscreenCanvas adoption was further along. [0] https://html.spec.whatwg.org/multipage/canvas.html#the-offscreencanvas-interface |
I wrote webworker that can run dweets. It's very smooth and works well. It's just that the worker thread doesn't have access to the canvas dom element so some dweets won't work as expected i.e. those with commands like Personally I'd like to see more shortcuts added to the dwitter language for the most commonly used items e.g. drawing rectangles and arcs, and mouse input. Newcomers can still use Canvas commands, but I didn't find it hard using the existing library of command coming onboard. I like the idea of the code being compact and fitting in a tweet. |
I've put some thought into this over the 1000 or so dweets I've published and here's my suggestion, it's very simple.
Change the limit to 280 bytes. There are many benefits to this...
That's my idea! It's pretty simple, just a change code length to be counted by bytes and up the limit to 280 (needs to be changed on both front and back end), old dweets will still work.
The text was updated successfully, but these errors were encountered: