Skip to content

2022 notes document

Manuel Rego Casasnovas edited this page Feb 20, 2023 · 1 revision

Web Engines Hackfest 2022 notes

Monday, June 13 2022

Talks

Salesforce and the Web Platform (Leo Balter @leobalter)

LWC (Lightweight Web Components framework) - introduced ~3 years ago. Was introduced to make sure that every JavaScript developer can code on Salesforce. Using web standards helps increase productivity for developers; makes it easier for developers to learn how to develop code for Salesforce.

"Work with the web platform, not against it"; don't try to reinvent any wheels. Embrace the web platform, try to contribute. Push the web forward for the benefit of Salesforce customers and the customers' users; also push the Open Web forward, together with TC39, W3C, and WHATWG. Also collaborate with browser implementors.

Work that the Standards and Web Platform team does:

  • address browser breaking changes
  • track browser bugs
  • test early (when standards become recommendations)
  • innovation

Browser breaking changes

A new version of Chrome was released following a WHATWG HTLM change, where alert()/confirm()/prompt() stopped working in cross-origin iframes. This caught people by surprise. 1000+ Salesforce customers reported this issue. How to prevent it and to address it when it happens?

Migration plan from LWC: replace window.alert() call with alternative code. Wrote blog post to inform developers of how they can migrate their code. Reached out to Chrome and other browser implementors asking them not to break this API right away. Verbal contract so that Salesforce can give developers a deadline to make changes, without holding the Web back from modernizing.

Another example: performance impact when a particular accessibility feature was enabled in Chromium 96. Tab switching time went from < 1s to > 6s. Had a back and forth with Chrome engineers; had fixes and wrote tests in-house. Working in partnership to move things forward without disabling anything.

Tracking browser bugs

Keep a tracking list; Salesforce reported CSS :host::part(foo) bug; reached out to WebKit, Chromium was the last browser to address it. Partnered with Igalia to get the bug fixed in Chromium. Seems like a small detail, but opens up a lot of doors to not just try workarounds. We don't want workarounds; they stick around too long.

Not just reporting bugs: creating an internal culture at Salesforce that any enterprise should have, where people are encouraged to report, track, and describe the impact of bugs. Sometimes bugs are reported and people forget to create a test case to show the browser vendors what the impact is. This is where our work comes in to try to help educate people in the org to not just do a workaround, but report it to the appropriate browser vendors. Do the work of tracking them and make sure they don't just stall, especially on the Salesforce end. Do the work of describing impact and letting people know what is happening. Has been good before, progress still being made.

Testing early for responsible migration

Understanding that not everything comes fast with standards. Example: mixed shadow mode. Shadow DOM support in 2017 was not fast. Needed more. Couldn't just release LWC using native Shadow DOM at the time if we wanted to keep compatibility. At the time, used a synthetic shadow DOM that we still have today. This is not exactly native, not exactly a polyfill. We have to make a plan to get rid of it. A lot of things have changed since 2017. Creating a way to move smoothly to native Shadow DOM. We want to mitigate the impact on Salesforce users. As of 2022, there is much more browser support for Shadow DOM (except for IE). Hoping to get more usage of native Shadow DOM; coming in as opt-in for users.

Challenges: Not everything goes smoothly with standards work. Created accessibility capabilities with the synthetic shadow DOM that are not available in native shadow DOM. Migration can, therefore, compromise accessibility. We don't want that. How do we ship when some things are missing from native shadow DOM? We want web standard solutions; don't want to reinvent the wheel. AOM (Accessibility Object Model) is not yet available. Complex work, but interesting. Working with Igalia. We don't have our own browser implementors. Igalia comes in with the best solutions we have. We have ideas and use cases, Igalia has expertise. Working on implementing the parts of AOM we want to unlock native shadow DOM. Important for anyone working on Web components. Going well so far.

Driving Salesforce innovation into the web platform

ShadowRealms: a new way to evaluate JS in a synchronous context, creating a new global scope. Alternative to iframes; lightweight. iframes have functionality we don't need and some of it is a dealbreaker.

Example to illustrate why we need this: Salesforce is a platform that developers use to create things; extensible web application. Imagine a browser IDE with a code editor and extensions and plugins created by different companies. In your platform you can have Salesforce components but also add custom components where the developer/user creates their own components and also imports components from the marketplace ("App Exchange"). When you put together all the components, have to ensure compatibility and integrity. Creating something secure requires having integrity: e.g. components don't mess up each other's global state. Building blocks from different origins - even if the origins are all trusted, still have to make sure they work well together. Lightweight web security framework (membrane system) - today being used on iframes, but now we're ready to use ShadowRealms. Salesforce is sponsoring Igalia to ship ShadowRealms in web browsers. Now implemented in WebKit (Safari technology preview). Fixing some HTML integration issues, but close to being done. We have some tracking on Chrome; since the slides were created, Firefox implementation work has been done and Deno has done some implementation work on node.js.

Ran some quick performance tests for object creation, comparing simple iframe initialization with ShadowRealm minimal initialization for DOM virtualization. Witnessed 13x speedup over iframes. Up to 8x faster using LWS's membranes framework due to memory footprint.

Can get rid of iframes entirely because we will end support for IE11 as of January 2023.

Conclusion

We are not browser implementors, but we love the open web and it's also our responsibility to move the web forward; we are part of it too.

Q&A

Q: "No workarounds" policy - standards take time and companies have their own timelines. How do you deal with that?

A: A goal, not a policy. When we have workarounds, we try our best to make sure they have a deadline for removal. After the deadline, code must be changed to use the standard API. Goal is to observe what is coming in for web standards and make sure we're using it. There is always a risk: e.g. we are early adopters of decorators. It's our responsibility to deal with it today, but we're betting that it will come to the web, so that we don't have to reinvent the wheel. Identifying what the missing features are so we can enable native Shadow DOM.

Q: They avoid any unexpected side effects by global namespace pollution. That said, the current design still allows a [[callable]] boundary. Do you think this is still not ideal and that people can somehow "escape the shadowrealm"? What does this mean for the security guarantees we promise to developers?

A: I don't exactly see how to escape the ShadowRealm. We initially referred to the proposal as "realms" where you can transfer objects. One of the biggest concerns was about transferring objects against realms, so we have this [[callable]] boundary. Objects carry identity for the realm they were created in. When you compare objects, it can leak the other realm and you have identity discontinuity. This can cause issues that we have today with iframes. The [[callable]] boundary today is good enough for us to have a membrane system. We see a big difference in the memory footprint compared to iframes. I believe we can have extensions to it that would make it even better. There are discussions on serialized structuring, which are interesting. We would love to experiment with transferring shared ArrayBuffers, because it makes no sense if you have a ShadowRealm but you can't transfer a shared ArrayBuffer. Ongoing discussion at TC39. Very low-level, but you can use it. Still the same heap, same process. Some of the risks are mitigated. A lot to improve, but it's sufficient enough to have the whole membranes framework. We already have it tested and running.

There is always a question of why we need ShadowRealms and how we use it, but there are also questions on security. This proposal was beta-tested and was one of the longest proposals to be on stage 2 at TC39. ShadowRealms advanced to stage 3 only after 7 years of discussion, mostly very active. A lot of things were addressed.

Q: Does Salesforce give any recommendations to customers on what browser versions are to be used? Does the component framework have support limits in terms of Chrome versions?

A: We have a compatibility table, but it's supportive of slightly older versions. That's why we still support IE11. At the same time, we provide support, it's good for many of our customers, but driving web browsers that are so old- eventually we stop support.

Q: How do you educate customers? Thousands of thousands of companies use your product; how do you deal with expectations?

A: Support isn't something only we can drive; it's tailored by Salesforce developers. Salesforce developers can create components, use their own code on the web site. We are a platform; most of it is created by customers. Things rely on our frameworks. Giving support isn't just giving support to end-users, not fully under our control. When we have a browser breaking change, we need to make sure we not only fix it for ourselves, but create a plan for our customers to get ready for the change that is coming. It's a contract. Our support is not only a contract with the end user on the web, but with Salesforce customers in general and their end users. It's a chain of support.

Q: Is there a technical reason why native Shadow DOM wasn't designed with accessibility built in?

A: I was not there; perhaps some people in this room know more. As with all the proposals, you need to identify, step by step, all the building blocks for the new feature you will be releasing. This is happening with ShadowRealms. Lots of things we want to expand, but we need to make sure it works. It's a new concept with a lot of stuff related to it. Web components are building blocks that can be put together and they just work. But they have encapsulated DOM trees, and how they communicate with each other for accessibility is an important feature. It's a feature that doesn't exist today; have to go step by step.

"COLRv1 in Chromium & OSS" (Dominik Röttsches drott@ )

I work at Google on font and text rendering in Chrome; part of the layout team for Chrome.

COLRv1 is a vector format developed by the Blink team and Google fonts team with collaboration from Microsoft. Focus on the foundations of COLRv1 that we use in Chrome being open source. Open source components that enable rendering of COLRv1 in Chrome.

  • Intro to color fonts
  • How does the COLRV1 format work?
  • Use cases
  • Open source
  • Preview: WIP open type variations

Color fonts

It's been possible for a long time to colorize fonts. Color fonts means using multiple colors inside a glyph. Can give text more life by using this. Examples: using Material icon set with extra color to enhance meaning of glyphs; Arabic font with colors to simulate ink effects of Arabic calligraphy.

COLRv1 format

Bitmap fonts (Apple, Google formats) have been used to make colorful glyphs. OpenType SVG exists as a format that encodes color fonts. Issues with space usage and implementation complexity. We designed COLRv1 format to address these issues. Efficient at encoding color vector graphics.

Glyphs are a DAG of paint operations. Rendering a glyph means traversing the graph and executing 2D graphics operations on each node.

Example: we have a graph of paint operations for each glyph. PaintRotate, PaintGlyph, then PaintRadialGradient or PaintColrGlyph. There must be only one path through the graph. First rotate the drawing canvas, then apply the PaintGlyph operation and then fill the glyph. End up with a star shape filled with a gradient.

If we change the graph a little bit, we can replace the gradient with the contents of a different glyph in the font (re-use something inside the font).

Long list of possible operations; different ways to bring color and filling operatoins to glyphs. Other operations are geometric transformations (transform, scale, rotate). Each operation has a variable counterpart to integrate with variable fonts (ones that allow users to apply parameters to them to change the representation of the glyphs).

Shape reuse: important for emoji fonts where many variations exist on the same glyph. This helps with space efficiency. Repeated shapes can be re-used and don't need to be re-encoded.

New Web use cases

Most important problem for fonts on the Web is size, esp. with large bitmapped fonts (graphics/visual fonts). Compared to the full Noto emoji set, we brought down the size from 9 MB to 2 MB. At the same time, rendering fidelity is higher because we aren't scaling bitmapped fonts up and down, but rather doing a vector rasterization.

If you can bring down the font size, it becomes possible to use a font natively for emojis on the web. With WhatsApp on the web, emoji are replaced with inline images, which complicates text handling (e.g. copy and paste). Want to use a font instead of images so that copy and paste is easier. Can do more interesting things with headlines and graphical elements.

Open source aspects

COLRv1 is available in Chrome since milestone 98. We released a blog post. In Chrome version 100, we released font palettes in addition. CSS styling together with COLRv1 allows you to change colors used in emojis; powerful tool for enhancing text on page.

COLRv1 is a new format and we can't expect system rasterizers to support it yet. How do we ship it on all platforms nevertheless? We use a concept we call "hybrid font stack" in Chrome. Download a web font, decide which features it uses. Either use the system rasterizer, or the open source rasterizer stack (Skia/FreeType).

Open source foundations: Chrome's font stack (loads/displays web fonts) depends on Skia graphics library (traverses the graph and executes graphics ops), which depends on FreeType library (understands binary format and can provide compact representations). Released in FreeType version 2.11.0; matches Chrome 98.

When you introduce a new font format, what do you test with? Needed tools to create these fonts at the same time. Similar open source stacks for making fonts: nanoemoji on top of fonttools or picosvg.

nanoemoji = takes SVG source images and metadata. Outputs a font, usually a COLRv1 font but can also output OpenType SVG. Intention to use it to compare other formats to COLRv1 in terms of size and performance. fonttools is used to convert the font to the binary representation, including the font info in the actual font binary. picosvg is for flattening the original SVG source images and turning them into COLRv1 GIFs.

Going from SVG to COLRv1: design tools tend to provide SVGs and create complex SVGs. Verbose encoding, unwanted metadata, complex references. These don't lend themselves well to the COLRv1 pipeline. We developed picosvg to canonicalize and simplify SVGs before turning them into COLRv1 fonts.

In the second stage, nanoemoji with these prepared SVGs puts everything together and generates paint operations, outputting an OpenType font.

(Demo of nanoemoji in action - command-line tools)

Similarly, there is an open-source tool called FontGoggles that uses Python, nanoemoji, fonttools, COLRv1 to display fonts.

(Demo of FontGoogles - GUI)

(Demo of Bungee font using COLRv1 gradients)

Conclusion

We've seen which open-source libraries provide the foundation for COLRv1 in Chrome and which open-source tools help create fonts. We've looked at the use cases for these fonts and seen the COLRv1 internals, open-source libraries for rasterization.

Q&A

Q: How can this work with CSS?

A: To use a COLRv1 font, you don't need to do anything except specify the font face URL. If the browser supports it, you get this enhancement. How do you feature-detect/make your CSS so that it looks different depending on support or lack thereof for COLRv1? Can list this font first in the source line; browser uses it if available, falls back otherwise. Also working on feature detection so you can use styles conditionally. Font palette - actual colors we use in the font are in a separate table called the CPAL table. If color is used in a glyph, it's usually just an index into this table. CSS allows you to change the palette and colorize font.

Q: How does browser compatibility work?

A: Currently only supported in Chrome. Can use Skia/WebAssembly to display color fonts in other browsers. OpenType SVG is supported in Safari, so it can be used as an alternative. Graphical capabilities are similar. Hoping for more support, receiving positive signals. Microsoft released it in Edge as well, so, not only Chrome. Mozilla may adopt.

Q: What happens with text-to-speech?

A: It works with text-to-speech because it's text at the bottom. I'm not sure what screen readers do with emoji. In Unicode, every emoji has a name. Copy/pasting it into a text-only editor will copy as text. So screen readers can read it.

Q: Can you combine variable fonts and Web Animations API?

A: Only indirectly. I think you can put the font variation settings in CSS into styles that are used as CSS animations. There are performance issues when you animate text. Not the most efficient way to do animations.

Q: Did you find rasterization performance problems for complex fonts?

A: An emoji taken from a photo and quickly vectorized becomes a complex shape. Not efficient to resize. With standard emojis, we don't see performance issues. If scaling is added, then it's faster with vector graphics.

Q: You mentioned that you're using Skia and FreeType that provides the base information for the rasterization part. Is Skia strictly needed? Can it be replaced by something else that does rasterization?

A: Yes; the idea when we designed the format was "what can we use as a common denominator that's in most of the graphics libraries?" Also looked at CAIRO, Direct2D... Paint operations can be implemented with any 2D graphics library that could also be used for implementing CSS. Operations are semantically similar to CSS concepts. The FontGoggles font viewing tool, experimentally, has four different back-ends (Cairo, Skia, CoreGraphics...), so we've shown that this works.

Modernizing Internationalization in Gecko and SpiderMonkey (Daniel Minor)

What is internationalization?

Part of a group of related ideas: i18n, translation, l10n. Talk about translation first.

Translation: not easy, but the idea is easy to understand.

Localization: Translation and cultural adaptation. Not purely linguistic factors but also appropriate use of colors, symbols, images. Example: Mozilla support site was using a cartoon cat nurse. In Japanese culture, this had some unintended erotic connotations and the image had to be hidden for Japanese locales.

Internationalization: no accepted definition, but consists of all the things that can be done by computer. For example: dates, times, currency formatting. Enables localization; reduces work for translators. Driven by data represented in standard formats.

  • Examples: numbers, number groupings, date/time, calendars, lists, pluralization rules, directionality...

ECMA-402: set of APIs that implement i18n in JS. Create a formatter by specifying a locale and options; the API provides a format() method that does the work.

Why do we care? Part of Mozilla's manifesto: Internet is global, must be accessible to all. An English-only Web is not accessible to all. Ensure people can access the web in their own languages by localizing the browser; ECMA-402 provides tools for that goal.

Deep dive: text segmentation

Text segmentation: the process of chunking text into meaningful units (on character, word, or line boundaries).

Grapheme breaking: graphemes are units rendered to screen, code points encode characters. Not always a 1:1 mapping. Thumbs up sign is a grapheme plus a color modifier (multiple codepoints)

Segment by grapheme => thumbs up stays together; segment by codepoint => you get a list with separate characters for the thumbs up and color modifiers.

Word breaking: depnds on language, different languages have different rules. Spanish: can just use spaces, can be done by anyone familiar with a European language. Japanese: word boundaries aren't what you would expect if you don't know Japanese.

Intl segmenter: stage 4 proposal for ECMA-402, already implemented in Chromium and Safari. Use case: implementing text editors in JS. Includes grapheme and word segmenting but not line breaking.

Localization in the browser

ECMA-402 allows developers to use JS to localize the web, but the browser itself also needs to be localized. Firefox ships 133 locales. Project Fluent: developers write in English, translators produce appropriate translations for more complex languages (even if language has grammatical features not present in English).

For translators: Pontoon; Mozilla l10n is mostly done by volunteers, not necessarily technically minded. UI presents English source text and allows translator to edit target language text. Example: Lithuanian has more plural categories than English; Fluent enables this.

Fluent also needs to localize numbers and dates; built-in functions based on ECMA-402 APIs.

Implementation

ICU4C - International Components for Unicode for C. Large C/C++ lib first released in 1999. Used broadly. Provides a lot of functionality, but monolithic. Hard to remove code or data you don't want to use; large dependency for browser. API mismatch with ECMA-402.

At the start of 2021 Firefox had 3 different i18n impls: SpiderMonkey, Fluent, and Gecko. Lots of code duplication; inconsistent results in SpiderMonkey vs. Gecko (different number and date formatting, for example).

To solve this, moved all ICU calls to a single library shared by SpiderMonkey and Gecko. Developed interfaces based on ECMA-402 to hide complexity from developers. Largely used existing SpiderMonkey code because it's well-tested thanks to test262.

Results: got rid of duplicated code, almost no regressions, made full ECMA-402 APIs available, can experiment easily with other i18n libs.

Experimenting with ICU4X

Reimplementation of ICU4C -- smaller, faster; makes data easily separable. APIs more similar to ECMA-402. Initial impl in Rust, FFI for other languages. Under dev by Google and Mozilla since 2020.

June 2021: ran experiments with ICU4X in SpiderMonkey with 4 commonly used APIs. Studied performance, memory usage, correctness. ICU4X was significantly faster and used significantly less memory.

Ran correctness tests using test262. Some failures, mostly around locale canonicalization. This is basically edge cases on top of edge cases to handle things like countries that emerged from the dissolution of the Soviet Union.

Unexpected results: Intl.NumberFormat memory use was worse with ICU4X -- objects are smaller and thus being GCed less frequently. Side-effect of integration with SpiderMonkey; not an intrinsic problem with ICU4X.

Overall time/space usage is much better, but need to be cautious; some impls are incomplete and integration was a little hacky. Would be interesting to benchmark again. We might see better performance after 1.0 release. Results were promising enough to continue involvement. Mozilla focused on Collator, DateTimeFormat, Segmenter.

ICU4X Text Segmentation

ICU4C not suitable for Firefox segmentation, b/c Firefox layout engine doesn't use it. Need to adjust line breaking results according to CSS properties; combinatorial explosion with ICU4C data set. Want consistent results. Segmentation is a good use case for experimenting with ICU4X -- not just a faster version of ICU4C, but does stuff we can't currently do. Can give us consistent segmentation across platforms (layout impl is platform-specific).

Implemented 3 segmentation models: rule-based (based on Unicode-defined algorithms), dictionary-based (look up codepoint in tree), neural-network-based models (developed by Google, LSTM models; space/time trade-off).

Integrating ICU4X segmenter: will do after ICU4X 1.0 release, already refactored layout code to have an ECMA-402 like interface. Need to figure out how to handle data packaging in Firefox.

Will write microbenchmarks for word and line breaking. Need to analyze impact on code and data size. Layout has some platform-dependent bits; Linux is the least functional, so the lowest risk but greatest potential reward. Once we're happy with that, will implement the segmenter; have a WIP impl already using the old version of the spec.

Conclusions

  • Intl is important for browser and JS
  • Simplified code base significantly
  • Set ourselves up to try new implementations

Q&A

Q: You mentioned message format --

A: I think I left that out. It's something we're participating in standardizing; moving from Fluent (developed/maintained by Mozilla) to an industry-wide standard.

Q: You plan to port stuff currently in Fluent into message format. I know a bunch of other orgs also do something similar to Fluent. Is there an initiative to try to get more orgs to contribute their existing databases into the Unicode Consortium?

A: You mean for tooling --

Q: For the data. To add to the CLDR data --

A: That's a good question. I haven't really been involved. The process can be a little bit slow. I think CLR is quite good for large/mainstream languages and for smaller languages, people pushing for it don't necessarily have the same connections or ability to get things into CLDR. Minority languages would be helped by better support for this.

Q: Does using ICU4X mean that all the engines would have a common denominator when it comes to the implementation of i18n?

A: Excellent question. We're working with Google and they've been focusing more on other products but they're interested in the results we see in SpiderMonkey. I see it becoming a common denominator, but not necessarily the easiest thing to build upon.

Q: Will it be possible to back an ECMA-402 implementation using ICU4X 1.0 only?

A: More of a minimum viable product release, not a feature-complete release. There's a long tail of things that need to get fixed. Trying to get something functional early on. For SpiderMonkey, we wanted to focus on Segmenter because it's a relatively small API. For things like NumberFormat and DateTimeFormat, it'll be 6-12 months of testing/etc.

Q: You hinted at the Segmenter Version 2 proposal, which would add line breaking. I hear there is a lot of divisiveness in that proposal. Can you explain?

A: That's also what I've heard. Competing proposals as to how to best implement that. From the Mozilla POV, we need to implement line breaking anyway, so that was important to us.

Q: Did you ever talk to Apple about WebKit integration?

A: Not yet. So far, just Google and Mozilla, but we're hoping other groups will be interested and contribute to this.

Q: Are you exposing ICU4X as a plain C API and then building more idiomatic C++ bindings on top of it? Or will you expose other languages directly on top of Rust?

A: We have a tool called Diplomat, written at Google, that generates C/C++ interfaces from a Rust defn. Plans to add Java and Swift in the near future. Two-pronged approach: generate idiomatic interfaces with Diplomat, or can generate C FFI and use that to build your own.

WebXR and Augmented Reality (Ada Rose Cannon @adarosecannon ) (virtual talk)

I'm here to talk to you about VR and AR on the Web. I'm a developer advocate for the Samsung Internet Browser; can be found on the Play Store. I think it's a pretty good browser, works on all Android devices. Privacy-focused: tracker blocking, redirect to HTTPS, ability to add ad blockers. Also supports WebXR, the API I'll talk to you about today.

I'm also co-chair of the W3C Immersive Web groups -- responsible for designing APIs for immersive hardware on the web. All work is done in the open -- you can read the issues and pull requests to see what's going on, as well as minutes from previous minutes. You can also check out explainers for different modules -- go really in-depth into what the modules do and how they work.

Core work is in the WebXR API. Core module -- others plug into it. This is the basis of the work. Render a scene from the user's point of view using [...] Do this at the frame rate of the immersive device itself; create the illusion of VR or AR. Designed with progressive enhancement in mind. When you're building a scene, tell the web browser which WebXR features are required; will fail if these aren't available. Others are optional: use if available, but if not, the scene can work without it. By working in this mindset, you can support the widest possible amount of hardware.

Immersive VR render mode is used to render on devices that support a paint display. This was the mode that was defined first in the API. Later we supported augmented reality devices, e.g. AR headsets. This also includes mobile phones. We don't actually provide the real-world sensing or any features you'd expect from AR, only the ability to provide some 3-D content on top of the world. World sensing is taken care of by other modules. Most platforms that support AR [...]

The WebXR stack kind of looks like this from the bottom up: on the bottom, raw hardware on the device itself. Some kind of screen, semi-transparent or opaque. A camera; depth sensors sometimes as well. Generally, raw access to these raw feeds direct from the camera aren't directly exposed to WebXR or native API. Combined with machine learning to provide a higher-level API. Next level is the native API (ARCore or OpenXR). Take raw sensor info from the hardware, combine with ML to provide as much info about the env't as possible in a format that's easier for devs to consume.

WebXR is built on top of this API. Features supported by multiple platforms -- don't tend to support features supported by only a single platform (don't want to encourage that). Info might be a bit tweaked to make it easier for developers; might be fuzzed or slightly simplified to protect user privacy. Have to walk a careful line between the power devs want, and user privacy/security. Lots of pressure from devs to give them as much power as possible so they can build a good experience, but also have to maintain the level of security/privacy users expect.

As APIs go, WebXR is pretty friendly; good explainers, not too bad to use in vanilla JS. Unfortunately, pretty much entirely tied to WebGL/WebGPU, because that's how you render; and those aren't as friendly. You'll need some kind of framework, like three.js (or a few others). WebXR is comparatively not that complex (vs. WebGL), it often gets bundled into the WebGL libraries themselves. There are abstraction libraries on top of the JS libraries, like A-Frame, a web component wrapper around three.js. Write using custom HTML elements; this is what I use the most. A nice way to work with WebXR.

The point of this talk is to introduce different WebXR modules and the features they bring:

  • gamepad, hand input
  • AR, hit test, anchors, lighting estimation, DOM overlay
  • layers

These are the ones you can use today.

The first module is to help devs build experiences that closely match the user's real hardware: XR gamepad API. Usually for headsets, not phones (no external controller for phones). Through WebXR, there's an input source object that tells you the position and rotation of different parts of the controller. Also have a gamepad object that exposes the various buttons and joysticks through the API. Looks exactly the same as the objects you get out of the gamepad API, but it's different; don't expose it to avoid confusion. Don't want devs to mix up different types of controllers. If you use the gamepad API, you know it's kind of tricky to use; different platforms expose different buttons. Different hardware often has similar-looking buttons that function differently. Encourage browsers and hardware manufacturers to have standard mappings so developers can easily implement consistent behavior. New hardware should hopefully "just work" in an expected way. That's not good enough for people who want to add great support for different hardware; we have a new tool for them.

WebXR input profiles

3D models of gamepads, descriptions in JSON of how the physical buttons on the hardware map onto the buttons and axes of the gamepad API. You can use the profile info from the controller to find which 3D model and which JSON file to use. You can then combine these with the gamepad object from the XR input source, which lets you render the correct 3D model for what the user is currently holding, with the user's interactions represented on the controller. You can use this info to work out the name of the specific button the user is pressing. It works really well, is built into three.js.

Hand input

You can get info on the position of each joint in the user's hand. Can access each joint by name or get an array of matrices for each individual joint. Does not do pose detection; don't know from OS if user is "pinching" or "grabbing" or "pointing"; can write your own code to infer this from joint positions. There are libraries that do this for you. Can use this info to render a hand, so when the user waves their hand around and does things, they can see that hand represented in front of them.

(Demo - 3D model rendered by the developer; can replace hands with any other kind of hands or give them gloves)

Augmented reality

Hit test

Combine 3D and the real world; no other AR features. 3D objects just placed on top of real world. Find a point in the real world (hit test) and track it over time (anchors). You can request a hit test source; if you want to keep track of a point over time, call createAnchor(). You can use this to get its position to later frames. If you place a bunch of objects and then refresh the page, you'll have to place new anchors, but this is a feature we're looking into. The other highly requested feature is being able to share anchors. Possible in native but handled very differently on different platforms, making it difficult to standardize. We'll have to wait for this space to settle down.

Lighting estimation

Makes 3D models look integrated with reality - light direction and color, reflection map, approximates the environmental lighting as a numerical format. Works really well.

(Demo: objects are brightly lit at the front and cast shadow. Really improves the believability of the 3D object.)

DOM overlay

Based on WebGL and WebGPU, not HTML/CSS; DOM overlay is the exception. Only works for handheld AR. Lets you pick an HTML element and stretch it full-screen on top of the WebXR content. Only makes sense in handheld AR; if you try using it in a headset, it won't make much sense (place full-screen over user's face, or further away?) It's hard to define, especially since users may expect different things. More predictable on a handheld device. Thinking about other ways to display HTML/CSS content, but those are far down the line.

(Demo: real HTML elements placed over the room.)

Interactive content in AR -- do as much of the interface as possible using a DOM overlay, accessible by default provided you write accessible HTML.

Layers

Efficiently display images/videos/shape on top of or under the 3D content. Displayed using Layers API. What's the point when you could do this via WebGL? Advantage is that they are composited in a way that makes it look very crisp and readable to user. Text on a WebGL texture might not look that good; text drawn on a WebXR layer might be crisp enough to read. Video would be rendered by the system underneath and skip some copy operations, leaving graphics cycles to do special effects in your scene. WebXR layers are fantastic for displaying media. Don't build your whole scene out of just layers -- use them for specific purposes where it makes a difference to the user. Restricted to being only on top of or under the scene.

Useful tools

WebXR emulator -- in Firefox/Chrome. Install to browser and lets you emulate 3D headsets in dev tools. Lets you try out your XR content without actually needing to put on a headset. Test simple interactions or make sure rendering is correct. More rapid iterations mean this is essential -- putting on/taking off a headset slows you down.

immersiveweb.dev -- useful things to help you get started. Support table for features/hardware/browsers. Guides for getting started quickly; cool examples of stuff people have built. Corrections and more examples welcome (it's on GitHub).

Q&A

Q: (about accessibility)

A: Accessibility story isn't great; trying to improve it. You can build it yourself, there are tools on the web, but we should be hooking into existing a11y tech and not encouraging devs to build it themselves. Sorry, not a very positive answer.

Q: Can you explain some examples of where people are using PE on the web to make great experiences that work on lots of hardware?

A: This question is largely academic at the moment. There is one XR device that has such incredible market saturation that there isn't a lot of variability in what people are using. One thing you can do, and I've done some work to make this easier within the A-frame library, is make a single experience that works great on handheld devices, AR headsets, and VR headsets. I haven't seen a lot of people actually building that. I want to encourage AR developers to build in that fashion, because the market can shift. If developers want to build something long-lasting, they should build it to work across devices. I wish I could point you to an example that works well in AR and VR; a couple of the demos I've built do actually work pretty well, but I don't think there are any big or famous ones.

Q: I've been to a lot of conferences where there's a demo talk about XR, and I'm sure you've seen many such demos. Do you have a favorite?

A: I don't want to pick favorites -- I'm friends with too many XR developers. There's a really cute physics-based XR game... it's pretty simple, but it looks really good... one that blew me away was that someone did an oil-paint renderer for WebGL and so you could do oil painting with physics-based oil paints. It's pretty amazing. I don't have the URL but you might search for it.

Expanding what's possible on the Web: Project Fugu (Thomas Steiner @tomayac )

I will show you something that probably some of you have experienced. You see an amazing logo and it's in PNG. You think "this should actually be an SVG". What do you do? Search for a PNG-to-SVG converter, pick one of the top results, and you see a bunch of ads and something that promises to do the thing you want for free. You wonder if you should really trust the service and if you should really upload this. Typically you're just doing it because you really want the SVG, so you press the "convert" button and pray for the best that they don't do anything stupid with the file and actually delete it after doing the conversion. Sometimes you end up with something that's the wrong color, because they lost a lot of information. There's got to be a better way!

The tool most of these tools use underneath is open-source. You can download Potrace and run it for yourself. Lots of options you can tweak. Something should be possible to build on top of this. I'm introducing a tool called SVGcode. It exposes all the features of potrace and lets you convert any image to an SVG. Sometimes you need to tweak the thing a little, but usually after some fiddling, you get something useful.

What does SVGcode do differently? It's a PWA, offline-enabled, runs entirely in the client; no server to trust. It posterizes images, so you can convert photos and get something useful. Sends all the colors per channel to the potrace tool, that does the conversion into SVG, the tracing, and reassembles the whole thing into the resulting image. Unlike most of these online tools, it exposes all of potrace's options, including expert options. Not meant to be beautiful at this stage. I do accept design PRs if anyone here is willing to make this more beautiful. Gives you the full power of the command-line tool but in a useful GUI. Integrates with yet another tool, svgo -- allows you to optimize the paths to minimize the file sizes.

This is an installable Progressive Web App, thus offline-enabled. Internally uses Vite, Vanilla JS template, internally works with the Workbox library to make all of this offline-enabled. You can install it (e.g. to your dock on Mac); it feels like a real application. This is standard, you've seen PWAs before.

Window controls overlay

Some of the things that make SVGCode a little different: window controls overlay. If you remember those tiny ASUS laptops, screen real estate really matters. If you look at the info in the title bar, you already know that you're running the SVGCode application. There's no need for a title bar. Window Controls Overlay allows you to move your UI components up into the title bar area so you gain some screen real estate.

How do we actually use this? In your web app manifest, you need to tell the browser that you want to use it. There's a new display_overrides field that you can use where you can tell the browser you actually want to use window-controls-overlay. Then in your JS code, you can listen to special events - geometrychange fires when the geometry changes. This lets you change how your application looks. For example: toggle the window controls overlay, moving up the menu controls into the title bar. Just listen for this event and then adapt your existing UI by switching some CSS classes; encode the possible positions in your CSS. This is something the user can control; it won't automatically be changed. Microsoft has been playing with this when you have a PWA on a store and it automatically, programmatically clicks the chevron and makes it go away. So some PWAs won't have the title bar to begin with and start with the fully-control-overlay variant. A lot of users don't expect that this is something they can change -- maybe it should be available from the start without a manual user activation. There's always a risk of someone doing something malicious, so security teams/browser vendors are thinking about it.

File System Access API

File System Access API -- no server, therefore no upload, but still want to open files with this application. Click "Open File" button, filesystem access API is not supported on all browsers so we use the browser-fs-access ponyfill. On supported browsers it can serialize the file handle you get back from calling the file open function, which can be serialized into IndexDB so that when the user reloads the application, you can bring them back where they left off. Also works with drag and drop -- can drag and drop an image into SVGCode and get a serializable file handle. Allows you to restore state when the user reloads the application. For a lot of use cases, this makes sense.

Async Clipboard API

Async Clipboard API -- SVGcode is also fully integrated with the OS clipboard. Can right-click any file in Mac OS, copy it, and then paste it into the app. Can also copy the generated SVG image and paste it into another application for further processing.

How does this work? Content negotiation -- can put two representations of the same resource, e.g. one as a string representing the SVG and one as the actual image. Take SVG code from the SVG output, giving a string; run the SVG optimization step to generate an image. Creates two blobs, one with type text/plain and one with type image/svg+xml. The clipboard item has two properties, an SVG blob and a text blob. Can paste as either text or image. You can do this on the Web now with the async clipboard API.

File handling API

Become a handler, or even the default handler, for image files. File handler: if you have the Finder here, you can right-click and say "Open With", then choose SVGCode from the existing installed applications on your system. If you are working on an application and you have .something files, in your application state, you can say "I want to have a PWA that is the default handler for .something files", which you would choose uniquely. Then people can double-click the file and the PWA will open it. A pretty powerful integration for a PWA; you need to deal with the "launch queue". In your window object there's a new launch queue entry now that lets you set a consumer, which takes launch params. From the launch params, you get file handles back that you can iterate over, and use the files to create blobs. Can deal with the launch queue at whatever time is suitable for the application; can wait until it's loaded and the launch queue will have the files that were passed that application ready. Can also serialize this file handle to allow the same integration as before so you can reload the application and store it.

Web Share API

This works on Edge, doesn't work on Chrome yet. On desktop you can convert an SVG and then click the "Share" button; you get a dialog that lets you send the image by email, etc. Spare the user the work of saving the file, going through the file explorer, opening it with a new application. Just choose the application they want to.

Web Share Target API

Your application can become the share target: take a screenshot and then say you want to share it to SVGCode. The file will open in SVGCode. Works with an entry in the web app manifest where you say share_target and then mark it up as a form. Add event listener that dispatches on the request URL.

Conclusion

That's a few of the APIs used in SVGCode. There are a lot of other APIs developed in the context of Project Fugu. Can check out the Project Fugu API Showcase. Fugi API Tracker shows progress that the Fugu team has made on shipping/piloting various APIs. Typically there's a spec draft and explainer for each API (sometimes user studies).

Slides: https://goo.gle/web-engines-hackfest-fugu

Q: Why did you name it Fugu? Is it also lethally poisonous?

A: It's an inside joke. Simpsons episode where Homer orders fugu. Delicious if you cut it right, will kill you if you cut it wrong. This is a hat tip to what the APIs enable. If you hold them right they can do amazing things, but if you hold them wrong, you can do harm (e.g. file upload, if you let someone open their /etc/passwd file -- this isn't actually possible in practice). Some of the APIs can be dangerous; treat them with respect.

Q: A lot of these APIs can have unintended consequences. On the web platform it's easy for people to get tricked into opening a web site without their consent. What's the process like when you consider new APIs -- how do you want to make sure they aren't exposing people to harm?

A: There are different categories of risk. Some of the risks are more severe than others. For example, if you allow someone to have a web application that has write access to the Windows folder, this could ruin your OS install. In this particular case, a block list of file paths that you just can't use; there's no way around it. Very strict security measure for filesystem access API. There's a lot of prompting that tells you the application wants to write to a file -- is this fine or not? Some APIs, e.g. the web-USB API, let you talk to connected hardware devices. There was one bug where people could write to Yubikeys that are USB-enabled. This was a bug, it was disclosed, and in the end we special-cased this device category to not be exposed.

Q: Re: window-controls-overlay, would container queries solve the same problem for you?

A: Container queries might help you make responsive use of the space that you get, but without window-controls-overlay, you can't make use of the content.

Q: Project Fugu looks like an incubator for many new APIs. Will we have feature parity between different platforms/env'ts?

A: We're seeing adoption for some of the the APIs. WebShare is also supported by Safari. Use the APIs as progressive enhancements. I'm not forcing the user to download any code in SVGcode that their browser doesn't support. Only load a supported set of features. Will we ever come to a point where all browsers agree? I don't think so; differentiation between browsers is too big. Firefox stopped this whole notion of installable web apps, so if you don't have an installable web app, there's no window controls overlay because you're running in a tab. Others, like content picker, are implemented fine and work in Safari, but behind a flag (you have to enable it in experimental settings). We encourage defensive programming as a habit.

Q: One thing I found interesting in the code examples was the content negotiation in the clipboard API. Does that work across platforms?

A: If you look at the example code, it's horrible right now, because different browsers allow for different things. Very active development right now in editing working group at W3C to unite on a shared behavior. Enabling raw clipboard access: when we copy a PNG image, we re-encode it to avoid compression bombs (exploit that makes it a massive file). With this new proposal, there will be a way to copy from trusted applications. The idea is called "tickling" -- mark these data objects as coming from the web, and let people decide if they want to accept them or not. Core idea is that if you have something that runs in the browser, like spreadsheets, they need rich copy and pasting. They want to enable you to copy images, text, custom HTML; there's a lot of motivation from the Edge folks to get this right. Obviously also for Google, Google has the Workplace set of tools for modifying office kind of content. There's a lot of motivation for raw clipboard access in a safe way; rapidly developing as we are talking.

Q: And on the OS side, I know Chrome supports very diverse OSes. Some Linux distros might not have a similar implementation of a clipboard. How does it work with that?

A: It always hooks into whatever OS APIs are available. I don't know how they do it on a lower level; I can talk to you about the JS side, but not the OS side. When there's talks about how to safely standardize things like raw clipboard access, they make sure it's practically possible. Specifying things that can't be done on a major platform is helpful to no one. There are experts from all OSes involved in the standardization discussion, but to know how it works you need to talk to someone who's actually on the OS side.

Q: One mental model I use when thinking about Project Fugu APIs is that every one of them is part of a big puzzle and every piece is adding one more capability to the platform, bringing things closer to the native. But there's only so much you can do at a time. What do you think is the one capability that enables the most at the moment?

A: Right now, probably filesystem access -- makes a lot of things people do in traditional applications possible in the web. If you want to open an image from your disk or a raw image from your camera, it seems trivial but it's not, because there's files and drag and drop involved. Ideally, once you install a PWA and you have this magic moment of double-clicking a file that's dot something and opening it in the PWA, you forget that it's running on the web in the end. You mentioned native applications -- Electron is a big framework for building native apps in a cross-platform way. Electron is also a very, very active Fugu member in that they use Fugu APIs. This may sound counterintuitive, but a lot of people use Electron APIs to get access to Bluetooth devices. Before Web Bluetooth was a thing, they had to pull in Node libraries that would work with Web Serial on the Node level. By having Web access to Web Serial, they get rid of another dependency and each Electron application gets smaller because they can just use the Web-exposed API. This was surprising to me because people might say Fugu will kill Electron in the long term. But all the dependencies they don't need to pull in will make their app smaller, and that's a win/win.

Q: You showed Fugu interfaces for one thing or another. Does Fugu exist anywhere in the code? Is there something named Fugu when I program, or is it just the collective name for some interfaces?

A: It started as an internal code name for this entire idea of bridging the gap between native apps and things you can do on the web. Each capability helps close the gap between native and web applications. Has become an umbrella term for the whole idea, but you won't find anything named Fugu in Chromium or the source code. Officially the project is called the Capabilities Project, but people seem to like the "Fugu" idea. There's also the fugu fish as the icon. It become a thing people use as an umbrella term like "HTML5" or "AJAX".

Clone this wiki locally