Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow LiveViews to be adopted #3551

Open
josevalim opened this issue Dec 4, 2024 · 6 comments
Open

Allow LiveViews to be adopted #3551

josevalim opened this issue Dec 4, 2024 · 6 comments

Comments

@josevalim
Copy link
Member

One of the issues with LiveView is the double render when going from dead render to live render and the fact we lose all state on disconnection.

This issue proposes for us to render a LiveView (a Phoenix.Channel really) upfront and then it gets "adopted" when necessary. In a nutshell:

  • On disconnect, we keep the LiveView alive for X seconds. Then on reconnect, we reestablish the connection back to the same LiveView, and we just send the latest diff

  • On dead render, we already spawn the LiveView, keep it alive until the WebSocket connection arrives and "adopts" the LiveView, so we just need to send the latest diff

However, this has some issues:

  • If the new connection happens on the same node, it is perfect. However, if it happens on a separate node, then we can either do cluster round-trips on every payload, copy only some assigns (from assigns_new) or build a new LiveView altogether (and discard the old one).

  • This solution means we will keep state around on the server for X seconds. This could perhaps be abused for DDoS attacks or similar. It may be safer to enable this only on certain pages (for example, where authentication is required) or keep the timeout short on public pages (e.g. 5 seconds instead of 30).

On the other hand, this solution should be strictly better than a cache layer for a single tab: there is zero copying and smaller payloads are sent on both connected render and reconnects. However, keep in mind this is not a cache, so it doesn't share across tabs (and luckily it does not introduce any of the caching issues, such as unbound memory usage, cache key management, etc).

There are a few challenges to implement this:

  • We need to add functionality for adoption first in Phoenix.Channel

  • We need to make sure that an orphan LiveView will submit the correct patch once it connects back. It may be we cannot squash patches on the server. We would need to queue them, which can introduce other issues

  • We may need an opt-in API

While this was extracted from #3482, this solution is completely orthogonal to the one outlined there, as live_navigation is about two different LiveViews.

@elliottneilclark
Copy link
Contributor

elliottneilclark commented Dec 4, 2024

After dead view renders the http response we have all of the assigns. If we spend the cycles to start a LV process off the critical path of sending the response then the cost of extra work is not user facing. So when the websocket connection comes in we already have the LiveView started and the assigns are cached. This would trade a bit of memory and non-critical CPU usage for faster websocket connect.

However we don't want to leak memory forever, and it's possible that the websocket never comes. So there would need to be some eviction system in place (time, memory, etc)

I don't know the Erlang VM well enough, but is it possible to convince the VM to move ownership of structs (specifically assigns) rather than copying if there are no live references? That would be another way to make spinning up LV processes even cheaper. Rather than copying assigns to a new LV process, since we are using them after we have the bytes of the response ready, we can drop all references.

@josevalim
Copy link
Member Author

@elliottneilclark there are some tricks we could do:

  • On HTTP 1, we send connection close but we keep the process around to be adopted as a LiveView later. No copying necessary. This may require changes to the underlying webservers.

  • On HTTP 2, each request is a separate process, so we can just ask for it to stick around.

Outside of that, we do need to copy it, the VM cannot transfer it (RefCounting is only for large binaries). But we can spawn the process relatively early on. For example, we spawn the process immediately after the router, so none of the data mounted in the LiveView needs to be copied, only the assigns set in the plug pipeline that are accessed by the LiveView are copied (using a similar optimization as live_navigation).

@elliottneilclark
Copy link
Contributor

Outside of that, we do need to copy it, the VM cannot transfer it (RefCounting is only for large binaries).

That makes sense; large binaries are equivalent of huge objects in JVM with different accounting.

@simoncocking
Copy link

it's possible that the websocket never comes

Our experience is that this happens only on public pages which are exposed to bots / search engines / other automatons. So in our situation:

It may be safer to enable this only on certain pages (for example, where authentication is required)

this is exactly what we'd do.

If the new connection happens on the same node, it is perfect. However, if it happens on a separate node, then we can either do cluster round-trips on every payload, copy only some assigns (from assigns_new) or build a new LiveView altogether (and discard the old one).

We have some LiveViews that do some pretty heavy lifting on connected mount, so we'd need some way to guarantee that this work wouldn't be repeated if the LV was spawned on a different node to that which receives the WebSocket connection.

@josevalim
Copy link
Member Author

josevalim commented Dec 5, 2024

I just realized that the reconnection approach has some complications. If the client crashes, LiveView doesn't know if the client has received the last message or not. So in order for reconnections to work, we would need to change LiveView server to keep a copy of all responses and only delete them when the client acknowledges it. This will definitely make the protocol chattier and perhaps affect the memory profile on the server. So for reconnection, we may want to spawn a new LiveView anyway, and then transfer the assigns, similar to push_navigate.

This goes back to the previous argument that it may be necessary to provide different solutions for each problem, if we want to maximize their efficiency.

We have some LiveViews that do some pretty heavy lifting on connected mount, so we'd need some way to guarantee that this work wouldn't be repeated if the LV was spawned on a different node to that which receives the WebSocket connection.

This is trivial to do if they are in the same node, it is a little bit trickier for distinct nodes. For distinct nodes, you would probably need to opt-in and say that a LiveView state is transferrable, which basically says that you don't rely on local ETS or resources (such as dataframes) in your LiveView state.

@bcardarella
Copy link
Contributor

bcardarella commented Jan 12, 2025

There may be an option here by pulling from a prior work, although I don't know if the solution is entirely what @josevalim has in mind.

Back in Ember the SSR framework is called Ember Fastboot. Ember's own Data library, Ember-Data, introduced a double-render issue in that on the server it would request the data, render the template, then that template was sent to the client. The app would boot, then make the same request again. The solution was called the Shoebox

The Shoebox worked by encoding the Ember-Data content during the SSR, and injecting a <script> tag with the data sent back to the client. Ember would then load data from the shoebox first rather than make the API request for data from the server (our version of double-render)

So what if this is what LiveView did? On the dead render it also injected certain data from the assigns into the template. When the Liveview connection is made inject that data as a query param. If the QP is present skip the mount function and render a diff back to the client to hydrate the app.

The upside here is that we could skip the double-render if the mount is skipped. The trade off is that there may be data that we don't want to expose to the client.

I could see this working with something like this:

def mount(params, session, socket) do
  data = MyApp.some_data_operations()
  {:ok, shoebox_assign(socket, data: data)}
end

This way there could be an acceptance from the developer that they have to opt-into this rendering benefit. Not assiging to the special keys or with a specific assign function would be existing double-render behavior.

Furthermore, it would be on the developer to ensure that whatever data they are embedding would need to be encodable and shouldn't be exposing sensitive information.

This doesn't entirely solve the problem for every use case but considering how difficult this problem has been to solve on the server I do wonder if we'll ever find a "one solution for all" here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants