Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New concept for eliminating compression (280 byte limit) #529

Open
KilledByAPixel opened this issue Nov 13, 2021 · 22 comments
Open

New concept for eliminating compression (280 byte limit) #529

KilledByAPixel opened this issue Nov 13, 2021 · 22 comments

Comments

@KilledByAPixel
Copy link
Collaborator

KilledByAPixel commented Nov 13, 2021

I've put some thought into this over the 1000 or so dweets I've published and here's my suggestion, it's very simple.

Change the limit to 280 bytes. There are many benefits to this...

  • Since bytes are counted rather then characters, unicode character compression would no longer be necessary.
  • Makes it much easier to newcomers to understand what is going on.
  • Legacy dweets are still be allowed, most will probably be below 280 bytes.
  • Special category/icon for dweets that are 140 bytes or less called "half dweets".
  • All current uncompressed dweets would fall into the category of half-dweet.
  • You can sort by full or half dweets.
  • Breathe some new life into dwitter, let's see what else is possible with more code.
  • New challenges to fit things without using compression, maybe there are new tricks we can discover.
  • Get more exposure for dwitter because we would be able to make cooler stuff.
  • We can still keep the uncompress button to show up for legacy dweets. Or maybe it would make more sense to just automatically uncompress them.

That's my idea! It's pretty simple, just a change code length to be counted by bytes and up the limit to 280 (needs to be changed on both front and back end), old dweets will still work.

@KilledByAPixel KilledByAPixel changed the title New concept for eliminating compression New concept for eliminating compression (280 byte limit) Nov 13, 2021
@ThomasBrierley
Copy link
Collaborator

ThomasBrierley commented Nov 13, 2021

Hi Frank!

I too am interested to see "double dweets" discussed again, and am glad you're the person to resurface it.

I have quite a few thoughts on this, so bear with me 😬

Breathe some new life into dwitter, let's see what else is possible with more code.

I think this may be the strongest argument... 140 Unicode / 194 ASCII has been amazing, but it does feel harder and harder to find new and interesting things to squeeze into it. I'm not completely sure if this is a personal limitation, a temporary plateau, or a fundamental combinatorial limit of 140 characters, but I'm leaning toward the latter (yes there are 128**140 = x10^295 ASCII combinations, but there is a much smaller subset of JS that actually runs, a smaller subset that results in pixels, and an even smaller subset that is interesting and unique). 280 would not be merely doubling the combinations, it would be multiplying it by 128**140! so we are unlikely to stagnate in our lifetime... but it's also not so huge as to feel meaningless.

The other consideration is the arbitrarity of these limits, we have to pick something, but we should pick one thing... multiple or variable limits feels too arbitrary, because it's no longer a shared limitation. I do believe that the singular limit of Dwitter is part of what makes it so compelling: "do anything on a canvas with JS, in 140 characters", that's it... you know what to do... and you instantly appreciate and understand what has been done.

My point is, while I like the idea of moving onto double dweets, I also think maintaining multiple limits is a bad idea... so this proposal should be a change, not an addition. Though something along the lines of a badge of appreciation for the original 140 limit is a good idea... There is already a kind of understanding and appreciation for uncompressed dweets, <140 dweets.

Since bytes are counted rather then characters, unicode character compression would no longer be necessary.

This is tricky. As soon as we start talking about bytes outside of ASCII and enter the land of Unicode, we have to start talking about encodings, and different encodings consume bytes in different ways. As far as JS is concerned I think we only need to consider two:

Modern text files today are stored in UTF-8, and most databases use utf8mb3 or 4 for Unicode which is roughly equivalent. Both of these use one byte for the ASCII part of Unicode, then variable (multi-byte) bytes for extended Unicode points up to 4. To differentiate from single byte ASCII these come with special prefixes indicating the byte length and prefixing continuation bytes, I wont bother explaining in detail because wikipedia does a pretty good job. The point is that these prefixes take up codepoint space, so the first 2**16 unicode points do not fit inside 2 bytes with UTF-8 unlike other encodings.

Internally JS uses USC-2 which is essentially UTF-16 (don't ask me the difference, i'm not sure anyone can give you a straight answer on this either). UTF-16 is not a variable byte encoding, it stores everything in 16 bit words as the name implies: ASCII? still two bytes, unicode <2**16? two bytes, unicode >2**16? now you've got "surrogate pairs" 2*16bit words, with a bit of codespace gone to prefixing for surrogate pairs (I lied the boundary is a bit less than 2**16).

When you compare these side by side UTF-16 looks a bit silly for most of the chars we care about (ASCII), because it will double the byte count, but we could probably discount this as it's only an internal representation (in fact I think this is how most JS engines counts string length, ignoring empty upper bytes). However there are more subtle differences for the codepoints above ASCII, where some will fit into 2 bytes in UTF-16 but will need 3 bytes for UTF-8.

I'm not 100% sure how the differences turn out empirically when looking at UTF-8 and JS string.length side by side, but i'm pretty sure there will be differences, (there are even differences between JS engines for string.length).

... anyway, I find it hard to understand and explain all these subtleties even though I know they exist 😕 so maybe you can see why i'm not convinced we are going to be able to explain them to people who don't even know they exist! (unless we stick with one byte ASCII and say goodbye to emojis and unicode byte packing).

[edit]

I just realised JS string.length is supposed to count UTF-16 words, which is why every character that fits into one UTF-16 word =1char and characters needing a surrogate pair are counted as 2 chars... anyway, string.length!== bytes in general, but it does == 1/2 the bytes of UTF-16 encoding.

[edit]

alternatively... we just say bytes are counted in UTF-8 which has basically become the universal storage encoding for text, even though it's not used internally for programs and languages for performance reasons. But "storage" is probably more tangable, a tweet on twitter is no doubt stored as utf8mb4 on a database, a tweet copied into a text file is no doubt saved as UTF-8 on the block device etc etc.

@lionleaf
Copy link
Owner

Thanks for resurfacing this discussion! I know it's been simmering in the background and it's clear you've spent a lot of time reflecting on this. It's also great to get this input from one of the most prolific dweeters!

Here's my personal stance at the moment:

  • I agree 280 limit would make sense!
  • The exact details on how to count is a separate, but important, discussion. I lean in favor of something that is relatively easy to understand in most circumstances when sticking to ASCII (i.e. 1 ascii == 1 count)
  • Great points on breathing new life into dwitter; a lot of new unexplored techniques!
  • Yes to only having a single active rule set; keeping things focused is valuable.
  • As for legacy dweets, I think something like the badge you suggest makes sense.
  • I also think we could move to a purer JS, by removing the S / C etc shorthands. This breaks legacy dweets, so we would have to render them in a different way, which is fine. Breaking backward compatibility allows us to reconsider the full "language"
  • Even if you count bytes you won't eliminate compression; yes the byte packing tricks of dwitter stop working, but with 280 limit there's more room for compression bootstrapping code, so I bet someone could still find some clever schemes; it's easy enough to make dweets with a subset of the characters from ASCII. But this sounds like a fun challenge for the community; and if there's a new default compression technique that becomes popular we can add tools/detection for it.

All points are open for discussion, but that's where my head is at the moment. Thoughts?

As for implementation; I'd like to get the new frontend across the finish line first, and I'm starting a new job on Monday so the near future is hard to predict. Let's keep the discussion going here and on discord 👍

@ThomasBrierley
Copy link
Collaborator

ThomasBrierley commented Nov 14, 2021

The exact details on how to count is a separate, but important, discussion. I lean in favor of something that is relatively easy to understand in most circumstances when sticking to ASCII (i.e. 1 ascii == 1 count)

Agreed. This fact alone means that If bytes are to be the new metric, the encoding must be UTF-8, because counting bytes in any other unicode encodings will result in more for ASCII. Which means we are back to two options:

  • UTF-8 Byte Counting (Proposed): Count UTF-8 encoded bytes, ASCII = 1, other = 2/3/4
  • Codepoint Counting (Existing): Count unicode, each code-point (character) = 1

Sticking to codepoint counting would continue to allow the ASCII unicode packing via UTF-16 surrogates, so 280 = 2 * (280 - 43) = 474 ASCII characters which is a 169% boost, compared to 194 which is only a 138% boost, due to the constant overhead... So pretty sure we don't want this?

The more I think about UTF-8, the more it makes sense. Demos are usually limited in bytes but also by stored size, not size in memory. Text in a modern DB is UTF-8, text in a HTTP request is usually UTF-8 encoded, text in a file is usually UTF-8 encoded, pretty much everything that is not an internal representation for performance is UTF-8... Not to mention pointing at and explaining UTF-8 encoding is way simpler than the headache of UTF-16/USC-2...

https://en.wikipedia.org/wiki/UTF-8#Encoding

Sorry for the detour.

As for implementation; I'd like to get the new frontend across the finish line first, and I'm starting a new job on Monday so the near future is hard to predict. Let's keep the discussion going here and on discord +1

Understood, no rush whatsoever, and ultimately this is your decision 👍 I'm just glad to see it being discussed. Hope everything goes smoothly for you on Monday.

[edit]

Quick compatibility check... existing "compressed" dweets should be 43 bytes for the ASCII unpacking code + a max of 4 bytes for each of the 97 packed characters when using UTF-8 encoding *4 = 237 bytes max. Which almost achieves what @KilledByAPixel suggests (eliminating compression) by bringing it close to 280 bytes.

But we should eventually check this empirically for cases outside of the usual compression methods.

@KilledByAPixel
Copy link
Collaborator Author

Hi, a few things...

I also think we could move to a purer JS, by removing the S / C etc shorthands. This breaks legacy dweets, so we would have to render them in a different way, which is fine. Breaking backward compatibility allows us to reconsider the full "language"

It's a great idea but after thinking about it there is another problem, remixing legacy dweets. If someone tries to remix a dweet that uses those shortcuts, how does it work? Even just looking at a legacy dweet is going to be confusing for new users unaware of these shortcuts.

Personally I don't mind the shortcuts as much as the compression. I think a few shortcuts are friendly to newcomers while compression is scary. It would be cool to have vanilla js, but it's really just 4 quick defines. I also appreciate the time savings, i've typed S and C many thousands of times over.

I'd actually suggest one additional shortcut to be introduced, nothing crazy mind you...

H - hsla color, works the same as R but for hsl. This isn't really gonna change what we can do math wise but should result in better color usage. Generative art generally is easier to produce and looks better in hsl.

HSL would be an nice addition whether or not we decide to go to 280 bytes.

Quick compatibility check... existing "compressed" dweets should be 43 bytes for the ASCII unpacking code + a max of 4 bytes for each of the 97 packed characters when using UTF-8 encoding *4 = 237 bytes max. Which almost achieves what @KilledByAPixel suggests (eliminating compression) by bringing it close to 280 bytes.

The way I envision it is like right now, we already added code to auto uncompress dweets. So, compressed dweets would be swapped out with uncompressed code and show the uncompressed byte count which for most is capped at 194. There can still be a button or something to view the original uncompressed code, but we don't even need to allow editing that, it should be read only.

It would be great if we stored the uncompressed version of legacy dweets in the database along side them. I can write a script to spin through the database pump out json of all the uncompressed code. I've been meaning to do that for dweetabase so compressed dweets are searchable.

There are some older dweets that are compressed in different ways that may be above 280, but that's pretty rare. In this case it would show the byte count in red like it currently does.

@ThomasBrierley
Copy link
Collaborator

Ah i see! So the change could achieve multiple things... we get more characters to play with, the existing uncode packed dweets can be made more accessible by simply replacing them with their 194 byte ASCII, obviating the compression toggle idea... and a new ASCII unicode packer will not emerge because now we are counting UTF-8.

However other more interesting compression techniques might continue and emerge that don't obfuscate the source so much.

This feels like a really cohesive change.

@KilledByAPixel
Copy link
Collaborator Author

KilledByAPixel commented Nov 14, 2021

You got it exactly.

The problem I have with the current compression is it is dumb, because it doesn't actually compress anything. I mean it's a brilliant invention but completely useless outside of dwitter. It silly that compressed dweets take up more space.

I would love to see people experiment with real compression schemes that are case dependent and potentially useful for other size coding projects. Like for example my way of drawing bitmaps on dwitter will no longer work as it uses a similar unicode trick. So, we'd need to find a new way to pack images.

@lionleaf
Copy link
Owner

lionleaf commented Nov 14, 2021

The more I think about UTF-8, the more it makes sense.

Sounds sensible to me!

It's a great idea, but after thinking about it there is another problem, remixing legacy dweets. If someone tries to remix a dweet that uses those shortcuts, how does it work?
You raise a valid point, but I'm still not sure we need backward compatibility. As for remixes, we could either disable it, or auto-expand it (which could now be >280 in some cases). It's also possible to have a legacy-remix-mode, but that gets us back to supporting multiple modes.

Another alternative is to add a preamble when remixing since there's now more space:

S=Math.sin;
C=Math.cos;
T=Math.tan;
function R(r,g,b,a){a=a===undefined?1:a;return "rgba("+(r|0)+","+(g|0)+","+(b|0)+","+a+")";}
$REMIX_CODE

Keeping c and x makes sense, though.

I'd actually suggest one additional shortcut to be introduced, nothing crazy mind you...

H - hsla color, works the same as R but for hsl.

Yes, we should revisit the shortcuts; if we're not removing them all, we could certainly consider whether we should add some. Note that this could also break old dweets; if they rely on H being undefined, for instance.

@KilledByAPixel
Copy link
Collaborator Author

Another alternative is to add a preamble when remixing since there's now more space:

I don't think that would work by just adding it at the top because so many of the best dweets rely on those shortcuts being setup before the dweet runs. Not as trig functions, but because they are valid variables. It is not a small amount but in the hundreds that use techniques relying on that.

Can you explain more your reasoning about wanting to go vanilla?

Note that this could also break old dweets; if they rely on H being undefined, for instance.

Maybe, but it is unlikely, I would be surprised if it came up. There are some dweets that use the variable H but relying on something being undefined is very unusual for a dweet.

@lionleaf
Copy link
Owner

Can you explain more your reasoning about wanting to go vanilla?

Since the rulesets are already arbitrary, my bias is to stay as close to vanilla js as possible. Makes it somewhat clearer what's going on for people who know js. But I'll admit there's an allure to the purity itself. It's also a much more consistent approach than adding high-value shortcuts for a semi-arbitrary subset of js. And I think 280 would be enough to make up for the loss of the shortcuts.
Again, I'm not set on this and am open to discussion, so happy to keep exploring the implications.
Having S/C/T shortcuts certainly makes math relatively cheaper, so maybe removing them will be a net negative as it reduces the maximum mathematical complexity possible?

Note that this could also break old dweets; if they rely on H being undefined, for instance.

There are some dweets that use the variable H but relying on something being undefined is very unusual for a dweet.

The scenario that came to mind was variables that are initialized once, by relying on them being undefined for the first iteration. But you're right it's probably not many dweets that would break if we add a few more shortcuts, just mentioning that theoretically there could be.

@Lerc
Copy link

Lerc commented Nov 14, 2021

I agree with counting by bytes.

When it came to utility functions. Either extend or eliminate. I like either idea. Extending exposes us to feature creep, but that may not necessarily be a bad thing.

If utility functions were added at a slow steady pace, say one per month by vote from a list of candidates, it would allow for the growth of a well considered micro library that might be quite nice to use in general. It would mean an ever growing capability of Dweets, but it would also mean committing to non-vanilla JS. An extending library might require versioning and Dweets being tied to a version that existed at post time to avoid compatibility. It may not be an issue if no Dweets rely on things being undefined at call time.

Taking the path of Vanilla JS I also had the idea of a default preamble of

M=Math;S=M.sin;C=M.cos;T=Math.tan;R=(r,g,b,a=1)=>"rgba("+[r|0,g|0,b|0,a]+")

note, this version also defines M.

Include the preamble for old Dweets, For new Dweets just prepend it to the preamble to the default and let people reuse the characters

@KilledByAPixel
Copy link
Collaborator Author

If you do get rid of the shortcuts, I think it would make sense to get rid of some other variables too like frame, FPS or anything else that pollutes the space. Though I still lean towards keeping everything as it is now.

@ThomasBrierley
Copy link
Collaborator

Like for example my way of drawing bitmaps on dwitter will no longer work as it uses a similar unicode trick. So, we'd need to find a new way to pack images.

I'm pretty sure that's not the case. You can still stuff bits into a unicode point however you want... how many bytes it ends up as is determined by UTF-8 encoding, but how it's used in JS is completely independent. It will certainly count for more when packing beyond 8bits with byte counting, but it's still an efficient way of encoding bitmaps etc in JS source and there's no reason it should stop working.

If you do get rid of the shortcuts, I think it would make sense to get rid of some other variables too like frame, FPS or anything else that pollutes the space. Though I still lean towards keeping everything as it is now.

I'm two minds on this, on the one hand I feel like it will be a can of worms for compatibility after having attempted to make changes to the dweet.html file myself. On the other hand I feel like that file is way more complex than necessary and agree the scope could be a lot cleaner... I did an experiment a while ago removing all of the postmessage stuff and external libraries and found that the only significantly complex thing that was actually necessary to keep was the instrumentation, to prevent infinite loops when you are mid typing something... I dislike the existing solution of injecting code because it requires a whole parser and is quite messy, so found two other methods:

  1. Offscreen canvas context with webworkers, this is unfortunately chrome only at the moment, but is the ideal solution because you essentially get a separate thread but with a canvas API... this is great because it's also non-blocking for the UI, but more to the point you can monitor the worker's progress on a frame from the main thread, so you can intervene and just kill it without having to inject code.

  2. Intercept all method calls and property setters on the canvas context instance. This is not so ideal, but it catches 90% of cases without injecting code, the other cases are where a loop is being constructed that doesn't yet have any calls to the canvas context, or separate loops e.g a raymarcher, that doesn't call canvas methods at all. This one has quite a few holes obvously, so I'm hoping the offscreen canvas is implemented in Firefox and Safari sometime in the future.

@Lerc
Copy link

Lerc commented Nov 15, 2021

If you do get rid of the shortcuts, I think it would make sense to get rid of some other variables too like frame, FPS or anything else that pollutes the space. Though I still lean towards keeping everything as it is now.

You could potentially put the essentials into parameters and end up with a global free

function(c,x,t) {
  ...dweet body...
}

This would break a few dweets, but those ones were a bit sus' anyway :-)

Amusingly, if you passed x,c,t as parameters. that would allow you to have all dweets as functions of the form dweet24232(c,x,t) and call dweets from your dweets. You could do a prescan to only bring in dweets actually called. I would also like to make clear that I'm not suggesting we actualy do this thing, that way madness lies, but imagine the possibilities.

@lionleaf
Copy link
Owner

When it came to utility functions. Either extend or eliminate. I like either idea. Extending exposes us to feature creep, but that may not necessarily be a bad thing.

If utility functions were added at a slow steady pace [...] An extending library might require versioning and Dweets being tied to a version that existed at post time to avoid compatibility.

While a fun idea, and something that I think would be fun for the regular community, I do not want to go in the direction of an extending library for two reasons: As you said, the increased cost of backward compatibility seems like a potential pain, as dweets behave in strange ways. But more importantly, this creates an increasingly complex "dwitter language" that adds to the confusion of new users. I like the premise to be as simple as possible.

So either a small well-selected set of shorthands or none :) If we want to play with evolving rules I think a potential idea would be "seasons" were do a full change of the ruleset, breaking backward compatibility, every year or two.

If you do get rid of the shortcuts, I think it would make sense to get rid of some other variables too like frame, FPS or anything else that pollutes the space. Though I still lean towards keeping everything as it is now.

You could potentially put the essentials into parameters and end up with a global free

function(c,x,t) {
  ...dweet body...
}

I like this idea, only a handful of parameters to the function. Also, if we think 280 without shorthands doesn't allow enough complexity, there's nothing stopping us from doing 310 or something to account for it. In my mind adding shorthands is similar to increasing the limit, while also biasing the cost of functions in a different direction than vanilla js.

[...] can of worms [...] dweet.html file [...]

Yes absolutely. Having a clear separation of legacy dweets will allow us to serve the old dweet.html file for those dweets, ensuring they will still run, while starting with a clean slate!

injecting code [...] is quite messy, so found two other methods:

  1. Offscreen canvas context with webworkers [...]
  2. Intercept all method calls and property setters on the canvas context instance. [...]

Very cool! Webworkers would really be the solution, last time I looked at it there was lacking support and potential workarounds would have horrendous performance. Worth revisiting at some point!

@rep-movsd
Copy link
Collaborator

If we are going 280 chars and no compression, it would be nice to have some functions be 1 character - Math.random(), Math.abs, new Path2D()

Also a checkbox that applies c.style.filter = 'invert(1)'

The last one is the most important, because its more of an aesthetic thing than algorithmic

@ThomasBrierley
Copy link
Collaborator

ThomasBrierley commented Nov 15, 2021

I was hoping we wouldn't go too far down this road, it's inevitable as soon as we start to discuss it everyone will have their favourite builtin they want a shorthand for. Mine would be Math.hypot ... but I don't want to ask for it, because a huge library of shorthands is a bad idea as Lionleaf has already mentioned.

So either a small well-selected set of shorthands or none :)

Agreed, and I don't think there's any harm to stick to the existing set of shorthands, they are a well balanced selection. For other math functions, not having access to them has forced us to be pretty innovative... Math.random() is a good example, probably the most requested shorthand, but many of us have ended up pulling psudo randomness out of the same math we are doing instead. The trig functions are probably one of the most widely useful and the most difficult to find alternatives for (just try implementing it yourself). If anything R() feels like the extra to me, but it's only 1 extra.

I like the idea of making a cleaner dweet.html off the back of the change to 280, but we don't have to rip everything up. In my attempt at creating a cleaner more minimal one I essentially preserved these custom globals: u, c, x, t, S, C, T, R, src, code, frame. I think the only trully questionable ones are src, code, frame and R, but is it worth breaking compatibility to remove them?

I'm ignoring the bunch of dweets that hacked on the FPS object etc (i've done some), but they are pretty niche and completely dependent on that library, so i'm considering them sacrificial. Use of frame on the otherhand is far more abstract and widespread, because it's 3 shorter than (t*60|0). Many quines have used code and src, so removing them would definately destroy compatibility, although once the "instrumentation" is removed, u() source (toString) will be clean, so we could argue those other two variables aren't necessary for quines any more. All the other globals are huge long variable names i'm pretty sure no one has bothered with before.

[edit]

TL;DR

If we have to break compatibility anyway, I propose we only subtract and get a really clean scope for u() that is essentially c, x, S, C, T, R, u(t)... in other words, exactly what it says in the instructions bellow a new dweet, so we are honouring the original idea, and removing all implementation side effects. Note that those are mutable globals, they cannot be parameters because many dweets reassign them across calls.

A huge number of the old dweets would actually still run in this new version, which means remixing old dweets will usually work. For all of those exploiting implementation details on the previously poluted scope of u(), we could run old dweets with the original code as Lionleaf suggested.

@KilledByAPixel
Copy link
Collaborator Author

Thank you, I appreciate the feedback everyone!

About 280 bytes

  • it must be utf-8, i think we have all decided
  • the unicode compression techniques are no longer viable. this includes the image packing system that I have used. that is a good thing! for example the Mona Dweetsa is actually 332 bytes in utf-8. There is a actually lot of wasted space with this type of encoding and there are smarter ways if the goal is less bytes versus less characters.
  • we can run a script to generate uncomrpessed code for all the legacy dweets to make them searchable. then just replace the initial code with the uncompressed version if it exists. There could be a button to view the uncompressed code, but i'm not even sure that is necessary

About changing the shortcuts/variables

  • I dislike the idea of removing any variables or shortcuts and am not very keen on adding them either.
  • The fact that there are a few shortcuts seems straightforward to me, shadertoy has a similar thing. Like iDate is not normally available in your shader but for shadertoy it is.
  • The dwitter shim as it stands is not overly complicated, here's one I made that works on 99% of dweets.
  • I've mentioned a few times that adding a preamble to fix legacy dweets is a bit more complex then some have suggested. Many dweets depend on these variables being defined in a certain way. take this dweet by @ThomasBrierley for example, Pavel and I have done similar things. Also quines depend on the original code not being changed.
  • Breaking old dweets and the ability to remix old dweets is going against my goal. I want it to be easier to remix old dweets by having them auto uncompress with more space available.

About the infinite loop buster

  • It's a simple system to inject the loop buster and it has worked well for me. With CapJS I use an even simpler system that just uses a regex and even that works quite well, though I need to update it to respect comments. I doubt a novice user would even notice the loop buster exists.
  • I don't think web workers will be fruitful because they are not intended for real time graphics applications. So the video will not be smooth as if using requestAnimationFrame. One thing you could try to combat that is to build a buffer of a few frames with the web worker that then get updated as needed with a requestAnimationFrame call. I'd be happy to be proven wrong if someone gets it working.
  • It is an interesting idea to wrap the context functions in loop protection but I think probably overkill for this.
  • Let's take a step back though and ask what is really the goal and is loop protection a problem?
  • How about a simple checkbox? "Enable infinite loop protection" which always defaults to true and is not saved with a dweet, but can be disabled while testing. If you hover over that check box it can give you a better explanation of the loop buster.

@ThomasBrierley
Copy link
Collaborator

ThomasBrierley commented Nov 16, 2021

the unicode compression techniques are no longer viable. this includes the image packing system that I have used. that is a good thing! for example the Mona Dweetsa is actually 332 bytes in utf-8.

Crap you're right! Also my math is wrong in my earlier estimate, it's (140-43)*4+43=431 bytes for a standard "compressed" dweet maxing out the char limit.

For those wondering you can test this with the browser native UTF-8 TextEncoder:
(new TextEncoder()).encode("dweet").length

So we would actually have no choice but to decompress them into ASCII if we want all of the "compressed" dweets to be counted under the 280 limit. And for special cases like your mona dweet, there isn't really any way to force it under the limit.

@ThomasBrierley
Copy link
Collaborator

ThomasBrierley commented Nov 16, 2021

I don't think web workers will be fruitful because they are not intended for real time graphics applications. So the video will not be smooth as if using requestAnimationFrame.

The offscreen canvas API was actually brought about specifically for this purpose.

Another way to use the OffscreenCanvas API, is to call transferControlToOffscreen() on a element, either on a worker or the main thread, which will return an OffscreenCanvas object from an HTMLCanvasElement object from the main thread. Calling getContext() will then obtain a rendering context from that OffscreenCanvas.

The idea is that you can do all the calculations and canvas API calls on the worker thread, but ultimately you are operating on the same pixel buffer that's attached to a canvas element in the DOM, so there is no copying pixels back and forth, it's just a pointer.

https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas#asynchronous_display_of_frames_produced_by_an_offscreencanvas

But unfortunately this is still Chrome only, so it's something we can switch to in the future only once firefox and safari catch up. But it would be worth it, because we would get better performance, better usability, and dispose of unnecessary instrumentation code because the timing can be measured from the main thread and worker killed if it takes too long.

For the purposes of our discussion, I was just mentioning how the loop busting stuff was the only complex thing that is necessary once you strip down the dweet.html. I did consider regex, but it's not as thurough as a full parser and will miss cases unless you are also building some kind of AST with the regex... but perhaps we don't need to be that thurough, i'd be interested to see your regex implementation.

[edit]

I think my point here is that if moving onto 280 bytes, this is a good opportunity to clean up the dweet.html and get a really clean implementation (even if we stick to the official unchanged globals). Then we can load all of the old dweets with the old dweet.html incase they exploit implementation details (or in-case like your monadweet they are actually too big), and then probably 95% of all dweets will remix into the clean implementation without changes anyway.

But I realise this is a bit of an aside to the main purpose of this discussion... I just don't see another opportunity to make breaking (ish) changes in the future.

@KilledByAPixel
Copy link
Collaborator Author

I'm sure it can be done with web workers, but my concern is running at a smooth 60 fps if the dweet runs fast enough to allow it. The only way I've found to run smooth animation in javascript is with requestAnimationFrame, though it is only smooth for the refresh rate of the monitor which for most is a multiple of 60.

It would be a great proof of concept to have a standalone html file with the dwitter shim that allows you to type code into a text area and runs the code via web worker with the loop protection. That's where I would start. Let me know if you need help digging into it, sounds like an interesting problem. It may also be possible to produce better error results for the code using the web worker. Imagine if we could highlight in the code what caused the error.

Here is my regex, it has served me well and is a good solution if you don't have a full parser available...

code.replace
(
    /(for\s*\([^;]*;[^;]*;|while\s*\()\s*(\S)/g, 
    (loopBody, group1, group2)=>
        group1 && group2 && !group1.match(/\sof\s|\sin\s/g) ? 
            group1 + `--maxLoop||(e=>{throw"Timed out!"})()` +
            (group2 == ')' ? '' : ',') + group2 : loopBody
);

@ThomasBrierley
Copy link
Collaborator

my concern is running at a smooth 60 fps if the dweet runs fast enough to allow it. The only way I've found to run smooth animation in javascript is with requestAnimationFrame, though it is only smooth for the refresh rate of the monitor which for most is a multiple of 60.

Don't worry, requestAnimationFrame is part of the spec, i.e you can call it from within a worker after a canvas control has been transferred to it. The spec matches our use case exactly. In fact when looking through the history one of the originators pushing for it were from shadertoy.

The spec is further along than I previously realised, there's now a working draft [0], Chrome has finished it, Firefox [1] and Webkit [2] have in progress implementations, Firefox have it behind a flag, Webkit has Linux support, Safari will likely pick it up after Webkit has stabilised. Anyway this made me feel like it was worth exploring more, since we may not have to wait terribly long.

My findings so far: It makes for super clean and minimal dweet implementation, because most of the gnarly details required to make it practical for dwitter.net just evaporate when running the dweet code in a worker. e.g keeping the scope clean is trivial. Errors don't need to be explicitly handled, just let them throw and use worker.onerror handler, which comes with the correct colnum for the dweet source code! Also the instrumentation is not needed as i predicted, by sending a message to the main thread upon completion of each frame, the main thread can then check the last timestamp in an independent interval, and terminate the worker instantly if necessary. The whole thing is so small and fast I disposed of the idea of updatable iframes via postmessage, i.e you can just destroy and recreate the whole thing in <1ms while editing.

It may also be possible to produce better error results for the code using the web worker. Imagine if we could highlight in the code what caused the error.

Yup, i just played with this a bit, it works! This should make errors a lot easier to follow.

I'm going to keep iterating, and I'll post my results when it feels complete. I just wish offscreenCanvas adoption was further along.

[0] https://html.spec.whatwg.org/multipage/canvas.html#the-offscreencanvas-interface
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1390089
[2] https://bugs.webkit.org/show_bug.cgi?id=183720

@jdspugh
Copy link

jdspugh commented Feb 17, 2022

I wrote webworker that can run dweets. It's very smooth and works well. It's just that the worker thread doesn't have access to the canvas dom element so some dweets won't work as expected i.e. those with commands like c.style.filter=`invert(`)

Personally I'd like to see more shortcuts added to the dwitter language for the most commonly used items e.g. drawing rectangles and arcs, and mouse input. Newcomers can still use Canvas commands, but I didn't find it hard using the existing library of command coming onboard. I like the idea of the code being compact and fitting in a tweet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants