-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[css-nesting-1] Can we relax the syntax further? #7961
Comments
The ambiguous case is a small subset of valid rules, but unfortunately it's all valid declarations, so any cost added to the parsing of them is going to balloon the cost of parsing virtually all stylesheets. |
(We have the ambiguity of invalid declarations with valid rules to worry about too, but that is a tiny percentage of declarations, so it's not nearly as bad. That is, for existing invalid decls like |
That is exactly why I'm proposing to parse all ambiguous structures as declarations, so there's no cost added to them.
But since these are invalid we throw them away anyway, no? |
I'm confused. If when the structure is ambiguous you definitely parse as a declaration, that means If you mean "parse as a declaration but also do X to allow for reinterpreting as a rule later", then it is indeed a per-declaration cost to do X.
Yeah, but today we throw them away as an immediate error-handling consequence - we see the |
I mean parse as a declaration, but keep its tokens around until the declaration finishes parsing (not forever). I suppose that's an extra cost per declaration, but it seems a very small one since, you have these tokens anyway. |
You have those tokens transiently today, yes. Keeping them around for the duration of the decl parse is precisely the extra cost that I've been told in the past is too much. It would be good to verify that this is indeed the case, but we can't just sweep it under the rug, as it is the precise reason we're not just doing Sass-style nesting today. |
It would be good to get confirmation on that, but also, if keeping tokens around for the duration of parsing a declaration is such a big cost, another option could be to …not. Just keep the raw string around, or even just the index in the source. Of course in the rare cases where you have an element selector followed by a pseudo-class you have to re-tokenize, but it may be deemed a worthy tradeoff? I'm not trying to push a specific proposal, but we should all try to brainstorm about this before we rule the syntax we want out as impossible. |
I think some of the things that at least some of today's parsers do today for error handling (e.g., have general rules for how to parse the high level style sheet constructs, and then specific parsers for values of specific properties, designed so that error handling bugs in the specific value parsers can't cause violations of the general rules) is sort of like keeping the tokens around. But I'm not sure if it's exactly like keeping the tokens around, and I think this may vary between implementations. This is an area that I'm not super-familiar with because the CSS parser that I knew best (the old Gecko one) didn't do this sort of multi-layer parsing, but instead relied on having the error handling correct in all the specific value parsers. I think you probably need to ask the folks who are closest to today's parsers, but perhaps with a slightly firmer write-up of what it is you'd propose to do. |
Gecko's CSS parser (https://github.com/servo/rust-cssparser) doesn't keep tokens around other than the very last token we tokenized (as a performance optimization): It has the ability to restart at arbitrary points. In fact we use them quite aggressively: So my understanding is that implementing something like this should have no extra overhead for valid declarations, since we already have the parser state from before the start of the declaration even: It might have extra overhead for invalid declarations however, which are unsurprisingly rather common (due to vendor prefixes and other stuff...), so it'd need some measurement with different inputs to see how bad it is... It should be feasible to measure this either inside or outside of Firefox btw, happy to help if someone wants to give that a shot. |
Blink's tokenizer would not be happy about that; we don't always keep the raw string around after tokenization, so it would not be easy for us to re-tokenize (and tokenization is a nontrivial part of the page load cost, so tokenizing things twice would also be bad). So that's both a confirmation that keeping the tokens around are bad, and a NAK on the workaround of re-tokenizing. |
Even with lookahead, this is still ambiguous per core syntax since blocks are valid component values in a declaration: div:hover span {} For instance inside a style rule, does the semi-colon make a difference between: span {
div:hover span {}
color: blue;
} and: span {
div:hover span {};
color: blue;
} ? |
Right, a consequence of resolving this is that curly-brace blocks are no longer a valid (top-level) component of properties. (And conversely, top-level semicolons are not valid in selectors, but that's true of the existing Option 3 already, since we have to avoid confusing invalid declarations for blocks.) We haven't yet tried to use {} for anything in properties, and I think the loss to future extensibility is minimal. |
Never mind, Lea reminded me that custom properties can have {} in them , so nope, this won't work, the ambiguity is indeed unresolvable (in some circumstances). |
How difficult/expensive would it be to change it to do so? Remember we only need to keep stuff around for the current declaration being parsed, not beyond that.
As I described in the first post, retokenization would only be needed on a tiny fraction of nested rules (namely element selectors followed by a pseudo-class).
I opened this issue so we could all think hard about this problem, instead of treating it as insurmountable from the get go. So far we have been designing syntax around it, axiomatically accepting it cannot possibly be solved in a performant way. I'd like us to challenge that assumption, and for that we need more details from implementers than "NAK". If all implementations have similar issues, that's a stronger argument, but I’m wary about designing syntax that will be suboptimal for authors forever, so that Nesting can be implemented without many changes to Blink's CSS parser. Could we please examine the impact more closely, and maybe do actual measurements, as @emilio is offering to do for Gecko? @tabatkins Yeah, that's unfortunate. I could ping the HttpArchive folks to measure how many custom property declarations actually contain curly braces. I suspect it's going to be a negligible amount. It is already illegal to include an opening curly brace without a closing one, so it's not like custom properties are completely unrestricted rn and would have to become restricted — they'd just become a tiny bit more restricted. |
The best numbers I have is that it would cost us about 2% of total page load time (not just parse time; all of load time), since we'd need to remove some optimizations that have been said to yield approximately that.
I don't share this goal; I would like us to just get the syntax (any syntax) specified, out the door and be done with it. So no, I won't be spending resources trying to bend our minds around how we tokenize CSS to this avail; I can NAK what is a non-starter for us, and that's what you'll get. |
Regardless of whether implementations are willing accept the performance impact, this would also make it more difficult for authors to know whether something is a style declaration or a nested rule in edge cases, thus making the language more messy. I don't like this at all. |
More details would be helpful here. What kind of optimizations? Why would you need to remove them? For which approach of the ones discussed would you need to remove them?
That’s …not a great attitude. 😕 We definitely don’t want to get "any syntax" out the door and be done with it, this is something that millions of developers will be using all over their stylesheets. It matters to get it right.
This is about dropping the requirement of an explicit |
The difference is that when restricting to "all other cases", you only need to look at the beginning in order to know if its a selector or a declaration. But with this it won't be easy at all. And yes, I prefer option 4 over 3, but then it's not such a problem because declarations and rules are separate so no confusion at all. And each person can like option 4 for different reasons, being able to omit |
I'm concerned this may not be true, e.g. I remember some proposal to use #element {
margin: {
top: 10px;
left: 20px;
}
} Is |
This is not a thing, pseudo-classes cannot be empty. |
Can we start a list of all the stuff we are sacrificing specifically to make this style of nesting possible? I don't think we can really judge the impact of all this without mapping it out exactly. |
I know pseudo-classes can't be empty, that's why I said "hypothetical". Still, the concern is that, if I'm reading the previous comments right, to avoid ambiguities when parsing, declarations couldn't have |
I understand that you and a couple other people from #7834 feel very strongly about this and disagree with the recent WG resolution. However, can we please keep that discussion in #7834? I opened a separate issue specifically so we could discuss implementation challenges. Whether this syntax is possible is orthogonal to whether it's desirable (though polling authors has shown time and time again that it is). Declaration values starting with |
What we are giving up is related to what we are realising. These can not be separated. And I don't like waving it away indefinitely. These additions to nesting do have a cost and it should be recorded. It helps everyone to make a better and more informed choice. It is weird that I have to advocate for fully exploring and describing the downsides. I am asking to do this in a separate issue. I don't see why we should not do this. |
Yes, this distinction would be possible. |
@LeaVerou reached out to me about this analysis. I found braces in 25,804 custom property values on 2,128 sites (0.01%) in the October HTTP Archive dataset of 15.6 million sites. I'll leave it up to all of you to decide how valid/correct this usage is in a few examples: https://gocmod.com/xingtu-viet-hoa-vip/
https://www.wheeloffortune.com/watch
Let me know if this was helpful or if there's any other follow-up analysis you need. |
@tabatkins an interesting thing that I came across while working on nesting. Presumably this changes error recovery in some cases right? E.g., a stray semicolon right now makes the following I think that both boxes here should be the same color (and probably green): <!doctype html>
<style>
div {
width: 50px; height: 50px;
background-color: red;
}
@media screen {
#a { background-color: yellow }
;
#a { background-color: green }
}
:root {
@media screen {
#b { background-color: yellow }
;
#b { background-color: green }
}
}
</style>
<div id="a"></div>
<div id="b"></div> Does that match your intuition? There are some tests that effectively expect the yellow behavior in the top level context, but I think it'd be unfortunate if stuff was inconsistent here... |
Yes, this was noted as the (afaict) only non-editorial change not directly related to Nesting itself in #8834. I agree that this is almost certainly a good change; it's definitely preferable that At the moment I still have top-level parsing preserve the old behavior, because I suspect the compat impact is larger, but I'd be willing to change (letting the top-level context parse declarations and just throw them away) if someone's willing to put in the legwork to determine compat. (If we could continue discussion on this topic in #8834 that would be great.) |
…abled. See w3c/csswg-drafts#7961 (comment) In general I agree with Tab that media blocks should parse the same regardless of whether they are nested. We should add a test that covers this to WPT, but for now tweak the local reftest. MANUAL PUSH: Trivial orange fix CLOSED TREE
…abled. See w3c/csswg-drafts#7961 (comment) In general I agree with Tab that media blocks should parse the same regardless of whether they are nested. We should add a test that covers this to WPT, but for now tweak the local reftest. MANUAL PUSH: Trivial orange fix CLOSED TREE
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799 bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1839946 gecko-commit: b271d6ad5492b424ca52dd82097b08f90cd07be5 gecko-reviewers: boris
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799 bugzilla-url: https://bugzilla.mozilla.org/show_bug.cgi?id=1839946 gecko-commit: b271d6ad5492b424ca52dd82097b08f90cd07be5 gecko-reviewers: boris
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799 UltraBlame original commit: b271d6ad5492b424ca52dd82097b08f90cd07be5
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799 UltraBlame original commit: b271d6ad5492b424ca52dd82097b08f90cd07be5
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799 UltraBlame original commit: b271d6ad5492b424ca52dd82097b08f90cd07be5
As per w3c/csswg-drafts#7961 bare tag selectors and so are fine now. Differential Revision: https://phabricator.services.mozilla.com/D181799
Specs should be updated to the resolutions, so closing now. |
WebKit implemented here: WebKit/WebKit#17189 |
By the way, if anyone implementing this was confused about how to know whether we're parsing a declaration or a rule this was my algorithm:
|
You might want to check the "Implementation Note" in "consume a block's contents" for even more efficiency wins in this regard. |
In #7834 we resolved to relax the Nesting syntax to basically "
&
is only required to indicate a descendant selector only when the selector starts with an ident" (e.g. in& h1
the&
is necessary, but& .foo
can become just.foo
and& > h1
can be just> h1
).I opened this issue to brainstorm about ways to relax the syntax even further and do away with the
&
for all descendants, without introducing infinite lookahead, so we can have our 🍰 and eat it too.The problem
If we do not require a
&
before descendant element selectors, then when followed by a pseudo-class they can look like declarations (which are<ident>
:
and then anything goes, including another<ident>
) to the parser. The parser cannot know if it's dealing with a declaration or a nested rule, until it sees a;
or{
, which could come after an arbitrary number of tokens, hence unbounded lookahead.Non-starters
:
for declarations, minifiers currently remove that so there is a lot of code out there with declarations that do not include whitespace after:
.font
is both a valid property name and an HTML element — also with custom elements any vaid property name can also be an element selector).Proposed algorithm
One way this problem is bounded is that there are only two distinct possibilities: either you have a declaration, or a selector.
In CSS, tokenization is context-less, i.e. parsing a declaration or rule creates the same tokens, it's only the higher-level structures that are different.
Assuming parsing a declaration takes O(M) time and parsing a rule takes O(N) time, it would theoretically solve the problem to naively parse every rule-or-declaration twice (one as declaration, one as rule), and then throw away the structure we don't need. Clearly, that's a silly idea, because that would take O(M+N) time for every rule-or-declaration.
One optimization would be to parse as a declaration (there are far more declarations than nested rules), and keep the list of raw tokens around until the
;
or{
. Then declarations continue to be parsed in O(M) time, and rules are parsed in O(M+N) time. The extra space needed is minimal, since we don't need to keep these tokens around after the current structure is parsed.But also, as discussed in #7834, we can rule out the possibility of being a declaration very early for nearly every selector. The only exception is element selectors followed by a pseudo-class (e.g.
strong:hover
) which are fairly rare in nested stylesheets (you usually want to style the base selector as well, so it's usually& { /* ... */ &:hover {...} }
)So in the end, declarations still take O(M) time, nearly all rules still take O(N) time, and some, but very few rules take O(M + N) time.
And there's probably more room for optimizations.
I'd love to hear from implementers whether this is feasible, and whether I'm missing something.
Edit: Another restriction of going that way is that we'd need to forbid
{}
from property values, including custom properties (having it in strings is fine). But usage of that in the wild seems very low (and a lot of that includes the braces in strings, which will always be fine). Note we can reserveproperty: {
for future syntax, since pseudo-classes cannot be empty so there's no ambiguity there.The text was updated successfully, but these errors were encountered: