Skip to content
Jonathan edited this page Jun 7, 2022 · 7 revisions

Type: HtmlSanitizer

Namespace: Ganss.XSS

Interfaces: IHtmlSanitizer

Cleans HTML documents and fragments from constructs that can lead to XSS attacks.

XSS attacks can occur at several levels within an HTML document or fragment:

  • HTML Tags (e.g. the <script> tag)
  • HTML attributes (e.g. the "onload" attribute)
  • CSS styles (url property values)
  • malformed HTML or HTML that exploits parser bugs in specific browsers

The HtmlSanitizer class addresses all of these possible attack vectors by using a sophisticated HTML parser (AngleSharp).

In order to facilitate different use cases, HtmlSanitizer can be customized at the levels mentioned above:

  • You can specify the allowed HTML tags through the property AllowedTags. All other tags will be stripped.
  • You can specify the allowed HTML attributes through the property AllowedAttributes. All other attributes will be stripped.
  • You can specify the allowed CSS property names through the property AllowedCssProperties. All other styles will be stripped.
  • You can specify the allowed URI schemes through the property AllowedSchemes. All other URIs will be stripped.
  • You can specify the HTML attributes that contain URIs (such as "src", "href" etc.) through the property UriAttributes.

Example:

var sanitizer = new HtmlSanitizer();
var html = @"<script>alert('xss')</script><div onload="" alert('xss')"" style="" background-color: test"">Test<img src="" test.gif"" style="" background-image: url(javascript:alert('xss')); margin: 10px""></div>";
var sanitized = sanitizer.Sanitize(html, "http://www.example.com");
// -> "<div style="background-color: test">Test<img style="margin: 10px" src="http://www.example.com/test.gif"></div>"
Key
  • i = instance
  • s = static

Events

Name Description
PostProcessNode(PostProcessNodeEventArgs) Occurs for every node after sanitizing.
RemovingTag(RemovingTagEventArgs) Occurs before a tag is removed.
RemovingAttribute(RemovingAttributeEventArgs) Occurs before an attribute is removed.
RemovingStyle(RemovingStyleEventArgs) Occurs before a style is removed.
RemovingAtRule(RemovingAtRuleEventArgs) Occurs before an at-rule is removed.
RemovingComment(RemovingCommentEventArgs) Occurs before a comment is removed.

Methods

Name Description
i Sanitize(String) Sanitizes the specified HTML body fragment. If a document is given, only the body part will be returned.
i Sanitize(String, String) Sanitizes the specified HTML body fragment. If a document is given, only the body part will be returned. Relative URLs will be resolved against the baseUrl parameter.
i Sanitize(String, String, IMarkupFormatter) Sanitizes the specified HTML body fragment. If a document is given, only the body part will be returned. Relative URLs will be resolved against the baseUrl parameter. Sanitized output will be formatted using the outputFormatter parameter.
i SanitizeDocument(String) Sanitizes the specified HTML document. Even if only a fragment is given, a whole document will be returned.
i SanitizeDocument(String, String) Sanitizes the specified HTML document. Even if only a fragment is given, a whole document will be returned. Relative URLs will be resolved against the baseUrl parameter.
i SanitizeDocument(String, String, IMarkupFormatter) Sanitizes the specified HTML document. Even if only a fragment is given, a whole document will be returned. Relative URLs will be resolved against the baseUrl parameter. Sanitized output will be formatted using the outputFormatter parameter.

Properties

Name Description
AllowDataAttributes Gets or sets a boolean value for allowing all HTML5 data attributes (the attributes prefixed with data-)
AllowedAtRules Gets or sets the allowed CSS at-rules such as "@media" and "@font-face".
AllowedAttributes Gets or sets the allowed HTML attributes such as "href" and "alt".
AllowedCssProperties Gets or sets the allowed CSS properties such as "font" and "margin".
AllowedSchemes Gets or sets the allowed URI schemes such as "http" and "https".
AllowedTags Gets or sets the allowed HTML tag names such as "a" and "div".
DefaultHtmlParserFactory Gets or sets the default Func object that creates the parser used for parsing the input.
DefaultKeepChildNodes Gets or sets the default value indicating whether to keep child nodes of elements that are removed. Default is false.
DefaultOutputFormatter Gets or sets the default IMarkupFormatter object used for generating output. Default is Instance.
DisallowCssPropertyValue Gets or sets a regex that must not match for legal CSS property values.
HtmlParserFactory Gets or sets the Func object the creates the parser used for parsing the input.
KeepChildNodes Gets or sets a value indicating whether to keep child nodes of elements that are removed. Default is DefaultKeepChildNodes.
OutputFormatter Gets or sets the IMarkupFormatter object used for generating output. Default is DefaultOutputFormatter.
UriAttributes Gets or sets the HTML attributes that can contain a URI such as "href".

Fields

Name Description
DefaultAllowedAtRules The default allowed CSS at-rules.
DefaultAllowedSchemes The default allowed URI schemes.
DefaultAllowedTags The default allowed HTML tag names.
DefaultAllowedAttributes The default allowed HTML attributes.
DefaultUriAttributes The default URI attributes.
DefaultAllowedCssProperties The default allowed CSS properties.
DefaultDisallowedCssPropertyValue The default regex for disallowed CSS property values.
Clone this wiki locally