The Wayback Machine - https://web.archive.org/web/20201221051312/https://github.com/pubkey/broadcast-channel/issues/414
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Leader Election] IndexedDB method doesn't work in Safari #414

Open
akovalev opened this issue Dec 8, 2020 · 13 comments
Open

[Leader Election] IndexedDB method doesn't work in Safari #414

akovalev opened this issue Dec 8, 2020 · 13 comments

Comments

@akovalev
Copy link

@akovalev akovalev commented Dec 8, 2020

OS: macOS 10.14.6
Browser: Safari 14.0.1 (14610.2.11.51.10)

While playing around with the library, and with its implementation of a leader election algorithm in particular, we ended up in a situation when in Safari multiple browsing contexts (tabs) are elected as a leader unless a localstorage method is forced.

In order to illustrate the drastic difference in how a leader election works in Safari with different method and the same responseTime, please take a look at the following videos, where

On each of these videos you can see how we open a simple HTML page containing the following code snippet (in some cases it was slightly adjusted in order to specify a custom responseTime or force the usage of a specific method of cross-tab communication:

const channel = new BroadcastChannel('playground');

const elector = createLeaderElection(channel);

elector.awaitLeadership().then(onLeader);

function onLeader() {
  document.title = "I'm a leader \uD83C\uDFC6";
  document.getElementById('header').innerText = "I'm a leader \uD83C\uDFC6";
  console.log("I'm a leader!");
}

CPU usage was pretty low and no "heavy processes" were running in the background while we were capturing the videos, so it's not very likely that the behaviour we observed was caused by the throttling of DOM Timers.

If we understand correctly, the broadcast-channel tries to fallback to alternative methods of cross-tab communication in case BroadcastChannel is not natively supported, e.g. in Safari. The fallback is done in the following order: native -> idb -> localstorage. Having tried to force the usage of a localstorage method by passing {type: "localstorage"} into a BroadcastChannel constructor, we've got an impression that it fixes the problem of having multiple leader per browsing origin (see the last video in the list above), but taking into account that 1) Safari has a rather controversial reputation when it comes to implementing IndexedDB spec and 2) it lacks a built-in way to monitor changes in a given Object Store
perhaps it makes sense to adjust the order of fallback methods, so that a localStorage is preferred over IndexedDB, at least in WebKit/Safari.

@pubkey
Copy link
Owner

@pubkey pubkey commented Dec 8, 2020

Can you check if you have any errors on the console.
Localstorage does not work in webworkers, this is why the indexeddb method is prefered. That is also true on safari.
Also I tested multiple safari version via browserstack and there the leader election seems to work.

This is very likely and indexeddb+safari bug, can you try different safari versions and maybe pin down which one is affected.

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

Unfortunately I haven't seen any errors in the console, and the version of Safari I'm currently using is Safari 14.0.1 (14610.2.11.51.10). I have also added it to the issue description

@pubkey
Copy link
Owner

@pubkey pubkey commented Dec 8, 2020

Can you reproduce also on the "official" example page https://pubkey.github.io/broadcast-channel/ ?

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

yes, I can reproduce it on the "official" playground page as well
See at the screenshot below (there're 2 leader tabs)

Screenshot 2020-12-08 at 15 20 26

@pubkey
Copy link
Owner

@pubkey pubkey commented Dec 8, 2020

Hmm.
The newest Safari I can get my hands on, is 13.1.2 and there everything works.

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

In my opinion it has something to do with the throttling of DOM Timers in pages that transitioned into a so-called background mode.

It's a well-known fact that DOM timers are throttled in background tabs, but it turns out that in Safari(WebKit) the timers are throttled in a very aggressive way, once the timer nesting level reaches the maximum value which is set to 5. Even though the maximum nesting level is the same both in WebKit in Blink, the way Blink treats the spec diverged a bit since Chromium forked WebKit: Chromium ensures that timers on hidden pages are aligned so that they fire once per second at most, while in WebKit the first 5 invocations of a repeating timer get no clamping at all, but after the max nesting level is reached, the timers are augmented in very aggressive and even randomised fashion. Moreover, based on my prev experience WebKit doesn't seem to respect the fact that there's an active WebSocket connection in the background tab which, at least some time ago, made browsers disable the throttling of timers.

I have prepared a simple/test page that demonstrates this effect and while playing around with it in Safari, I saw a delay of ~60 seconds instead of planned 100 ms.

Honestly, I was about to fill in a separate ticket regarding the potential problems that might occur because of throttled timers as just last night I intentionally left several tabs in the background in Chrome and even though there was only a single elected leader when I was closing my laptop, by the moment I re-opened my laptop today in the morning all tabs became a leader. It wasn't the
case either in Safari using a localStorage method or in Firefox using a native method although.

I believe it's a pretty important issue given that:

Please let me know if you think it's better to create a separate issue where we can discuss the potential issue of throttled DOM Timers.

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

The newest Safari I can get my hands on, is 13.1.2 and there everything works.

Having left these tabs running for about 20 minutes or so I ended in the situation when all tabs are leaders:

image

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

This is a relevant discussion on what a nesting level of DOM Timers is and how to interpret it especially in the context of setInterval. Sharing it because the latter is used in order to schedule future periodic attempts to become a leader each fallbackInterval milliseconds if the initial attempt failed.

@pubkey
Copy link
Owner

@pubkey pubkey commented Dec 8, 2020

Thank you for the good investigation.
So one thing we should try is reducing the nesting timer level. Is there any way to track that value so we can create a unit test around it?
Another thing I was thinking about a while ago, is to change the indexeddb method to use transactions as markers for the leader. Instead of sending messages around, we could try to open transactions and then the tab with the opened one is the leader. But I have not done any tests around that concept.

As a last approach we can of course set the localstorage method as default and make the webworker support optional. But this would cause a breaking change and still make the leader election buggy when people want webworker support.

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

So one thing we should try is reducing the nesting timer level. Is there any way to track that value so we can create a unit test around it?

Not sure I understand the question, but I guess the nesting level is simply a "counter" of a recursive function invocation scheduled via a setTimeout/setInterval. Something like a count variable in the snippet below:

      function setTitle(title) {
        window.document.title = title;
      }
      
      function now() {
        return Date.now();
      }
      
      function updateTitle(interval) {
        let count = 0;
        let lastUpdatedAt = now();
        
        setTimeout(
          function task() {
            setTitle(`${++count} - ${now() - lastUpdatedAt} ms`);
            lastUpdatedAt = now();
            setTimeout(task, interval);
          }, 
          interval
        );
      }

      updateTitle(100);

Another thing I was thinking about a while ago, is to change the indexeddb method to use transactions as markers for the leader.

It might be an option but at the same time might be problematic as well due to an autocommit feature of IndexedDB transaction. There was a proposal to add an explicit tx.commit method and its support may be pretty good these days but I've never tried it before.

As a last approach we can of course set the localstorage method as default and make the webworker support optional. But this would cause a breaking change and still make the leader election buggy when people want webworker support.

Could it be solved by clamping a given value of responseTime setting in order to make sure that for each method a responseTime value is always greater than a certain threshold value? Otherwise it's very likely that one specifies a rather small responseTime (say, about 50 ms) in order to speed up the initial election step, implicitly fallbacks to IndexedDB method in Safari and eventually ends up in a situation when all tabs are leaders (see the second video referenced in the issue description)

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 8, 2020

I intentionally left several tabs in the background in Chrome and even though there was only a single elected leader when I was closing my laptop, by the moment I re-opened my laptop today in the morning all tabs became a leader.

The only case when multiple tabs became leader in Chromium was when tabs were left in the background over the night.
I will try to check whether it's reproducable with a localstorage rather than a native method later tonight and will report back to you probably tomorrow.

@akovalev
Copy link
Author

@akovalev akovalev commented Dec 9, 2020

It looks like the "native" method is also suffering from the same problem, i.e. BroadcastChannel#postMessage invocations seem to be aggressively delayed/throttled when a page's visibility state is set to hidden. For instance, last night I left 7 opened tabs running a demo page with a "native" method and 7 tabs running a "localstorage" method.

This morning I found out that all "native" tabs became a leader over the night, while the tabs running a "localStorage" method were still in the correct/expected state. (see the screenshots below)

"Native" method
Screenshot 2020-12-09 at 09 32 40

LocalStorage method
Screenshot 2020-12-09 at 09 32 59

Having looked briefly through the source code of Chromium, I've got an impression that postMessage tasks are added to a a PausableTaskQueue. Of course I lack a proper knowledge to fully grasp what it does under the hood, but it seems that it affects the current approach to implement a renegotiation logic which if I understand correctly looks like this:

  1. send apply message (it's used to communicate an intention to become a leader)
  2. wait/sleep for a responseTime ms
  3. if some other process/tab applied for a leader or is already a leader (a tell action was received), then give up (stopCriteria = true)otherwise repeat 2 previous steps and if there's still no reply from the current leader, become a leader.

In other words, it means that if a postMessage tasks are indeed suspended/delayed once a tab is transitioned to a "hidden" state, at some point this hidden tab would eventually become a leader too, since the current leader wasn't able to respond back to apply message and tell that a leader role was already taken.

The reason why a "localstorage" method is still working correctly is perhaps because a StorageEvent that is used as a cross-tab message transport, is not throttled at all.

I think the next logical step would be to reach out to Chromium engineers to clarify the details of heuristics they apply when a browser page transition into a "hidden" state as this part seems like a black box to me at least.

@pubkey
Copy link
Owner

@pubkey pubkey commented Dec 12, 2020

Ok that was a lot of new information for me.
So I think the first we should do, is to detect duplicate leaders and emit an error from the LeaderElector. Then the developers can at least detect that event and handle properly, for example by reloading the page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.