Skip to content
This repository has been archived by the owner on Jan 23, 2020. It is now read-only.

Canonicalization should take care of EOF as it would lead to XSS in IE8 or below. #42

Open
adon-at-work opened this issue Oct 8, 2015 · 6 comments
Labels

Comments

@adon-at-work
Copy link
Contributor

Here's a sample of an EOF in Attribute value (double-quoted) state:

hello <a href="<script>{{untrusted}}</script>

According to the spec, when the EOF is encountered. It says it's a parse error, and that compliant browser will switch to DATA state. If rendered in latest browsers like Chrome and Firefox, only hello will get rendered, and ended in DATA state. the incomplete tag is actually NOT emitted to the DOM/output.

But unfortunately older browsers like IE7-8 behaved differently, the incomplete tag gets rendered, and that the string <a href="<script>{{untrusted}}</script> is considered as begun in DATA state, and somehow transitioned into SCRIPT state.

Context parser now considers the placeholder {{untrusted}} as placed in attribute value (double-quoted) state. But it ignored the consequence of EOF. That leads the downstream project secure-handlebars to simply insert a filter equiv. to uriInDoubleQuotedAttr() for that placeholder. An attacker using alert(1) will be able to launch XSS.

The EOF problem was marked as TODO inside the source code.

@neraliu @yukinying @maditya

@adon-at-work adon-at-work changed the title Solving parse errors due to EOF Handling parse errors due to EOF Oct 8, 2015
@yukinying
Copy link
Contributor

How did your canalization logic handle the case without EOF? i.e. Could you show me the code that you handle <a href="<script>{{untrusted}}</script>">bar</a>? (note the closing double quote)

@adon-at-work
Copy link
Contributor Author

Short answer: No canonicalization needed.

tl;dr

Our current canonicalization categorizations
(a) parse error correction
(b) fixing comment precedence in RAWTEXT/RCDATA
(c) voiding IE conditional comments

The step-by-step process I check whether and how something is canonicalized:

  1. Given the sample, it's obviously unrelated to the two categories (b) and (c). Most chars being used there looks innocuous to me except those in the attr value.
  2. Check the spec, there're two parse errors in attr val double-quoted state
    https://html.spec.whatwg.org/multipage/syntax.html#attribute-value-(double-quoted)-state
  3. None of the chars in the sample actually could trigger parse errors. So, no canonicalizations.

In case really a char is found resulting in parse error (note, none in this sample), a further check is to see the corresponding 1:1 mapping of state-specific cases at https://github.com/yahoo/context-parser/blob/master/tests/unit/run-strict-context-parser.js#L338 and we can quickly tell how it's corrected from the example case. Further check the source code if details are needed.

also note that the sample ends in DATA state, where EOF at there is valid.

@yukinying
Copy link
Contributor

Let me repeat my question in another form:

Have you check if IE7 and 8 exploitable with the following?

<a href="<script>{{untrusted}}</script>">bar</a>

@yukinying yukinying changed the title Handling parse errors due to EOF Canonicalization should take care of EOF as it would lead to XSS in IE8 or below. Oct 9, 2015
@yukinying
Copy link
Contributor

Our current canonicalization categorizations
(a) parse error correction
(b) fixing comment precedence in RAWTEXT/RCDATA
(c) voiding IE conditional comments

I think (a) should be "Null byte parse error correction" instead of all types of parse error, right?

@adon-at-work
Copy link
Contributor Author

Have you check if IE7 and 8 exploitable with the following?
<a href="<script>alert(1)</script>">bar</a> ({{untrusted}} is swapped with alert(1))

Checked. Surely non-exploitable.

Our current canonicalization categorizations
(a) parse error correction
(b) fixing comment precedence in RAWTEXT/RCDATA
(c) voiding IE conditional comments

I think (a) should be "Null byte parse error correction" instead of all types of parse error, right?

No. Here're some non-NULL kind of parse error corrections:
https://html.spec.whatwg.org/multipage/syntax.html#attribute-value-(unquoted)-state
https://github.com/yahoo/context-parser/blob/master/tests/unit/run-strict-context-parser.js#L350

yukinying changed the title from Handling parse errors due to EOF to Canonicalization should take care of EOF as it would lead to XSS in IE8 or below

It may not be appropriate to immediately jump to a title like using canonicalization to handle EOF before any discussions on the mitigation side.

My two cents:

  • My first reaction is that in fact EOF is unlike others, it's NOT a character. This naturally cannot fit into the existing Canonicalize() model.
  • And a quick fix to this problem could be like just warn the user when the HTML does not end in DATA state.
  • ... may be there's an intermediate fix...
  • A complete fix may call for the approach to emit valid tags only until > is encountered to transition back to DATA state. This is what I've perceived as fusion with the html-purify() approach. So basically incomplete tags <... at the end won't get into the output, just like how the latest Chrome/Firefox behave. This takes time though. c.c. @maditya

@yukinying
Copy link
Contributor

It may not be appropriate to immediately jump to a title like using canonicalization to handle EOF before any discussions on the mitigation side.

The current implementation of canonicalization is unfit to handle EOF, and it does not necessarily mean it should not be in scope of canonicalization. Indeed this should give you some hints that the way Canonicalize() is designed is not sufficient and some rewrite of it is unavoidable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants