How to Prevent Cross Site Scripting (XSS)

Cross Site Scripting (XSS): Brief History
Cross Site Scripting (XSS) is a common issue that plagues many web applications, check out xssed for a frame of reference. The most widely scene form is called reflective cross site scripting (XSS). This is when user supplied data is submitted to a application and the data is reflected back to the user in the rendered page. Another type of cross site scripting (XSS) and much more dangerous form is called persistent cross site scripting (XSS). This type of cross site scripting (XSS) is characterized by the dangerous data being stored in the applications database. The type of cross site scripting (XSS) affects every visitor of the web application, as oppose to reflective cross site scripting where a user must click on a specially crafted link to be affected. The most widely known persistent cross site scripting attack was the Samy Worm launched against MySpace which affected over a million users in 24 hours.
Cross site scripting dates back all the way to 1996, shortly after JavaScript was released by Netscape. If cross site scripting (XSS) has such a long history why is it still such a prevalent issue? One of the core attributes of web applications is to display back dynamic data sourced from outside the application from web requests and database queries. This data is untrusted and can be placed into many different contexts in an HTML page, each of which have their own caveats. HTMLEncode is NOT the end all be all solution to cross site scripting and this article will demonstrate why.
How to Prevent Cross Site Scripting (XSS)
The prevention of Cross Site Scripting involves a two prong approach. The first element, which applies to all web application vulnerabilities, is to validate our input. Our users are not to be trusted! Validating the input into our web application helps reduce the available character set and lengths available to the user helping reduce the attack surface for launching a successful cross site scripting (XSS) attack. The second prong is encoding our output before rendering any untrusted data on the page. This involves passing our untrusted data through an encoding method that will appropriately escape dangerous characters.
Validate User Input
Validating a web applications input requires being able to identify the valid characters and valid data structure for each and every input for the web application. This can seem like a daunting task, but there are a number of frameworks available for validating common data types, such as email address, phone numbers, times and dates. These frameworks are sometimes built into the language we are programming in, such as the validation annotations available in the Spring Framework and Java or the ASP.NET Validation controls available in the .NET framework.
Encode Our Output
Encoding our output involves passing our data to an appropriate encoding method before writing it the HTML page. Appropriate means “context” appropriate. Context means the section of the HTML page being written to which can be within HTML Tags, into HTML attributes, into JavaScript aware sections, into CSS, or as URLs. The “go to standard” was to pass everything to HTMLEncode(), but this has proven to not be sufficient.
Why is HTMLEncode() Not Enough?
HTMLEncode() was designed to enforce the HTML XML standard and encode any characters that violate this standard. The list, like this sentence without the aside, is short. As per the HTML specification the following characters need to be encoded: < > & ” ‘
This short list does put a damper on a number of cross site scripting (XSS) attacks, but there are many contexts in which data can be dynamically outputted onto an HTML page that require special attention to detail to prevent the Document Object Model (DOM) from being manipulated by user supplied characters. There are also areas of an HTML page where dynamic data should never be introduced as there is no way to fully protect against cross site scripting (XSS). Theses areas are directly inside of script or CSS elements without being inside of quotes, inside of HTML comments or as HTML entity tag or attribute names.
The solution to properly encoding these various contexts is first to understand these contexts and then to use a purpose built encoding library. In Java, the OWASP ESAPI Library has specific encoding methods for preventing cross site scripting (XSS) and Microsoft provides the AntiXSS library for .NET applications. There are implementations of the ESAPI library in other languages, but they have varying levels of maturity. The ESAPI library also has the added benefit, among other features, of providing diverse validation functionality.
Why Context is Important – The Caveats
The following sections describe the various caveats associated with each of these different contexts that you should be aware of. The Open Web Application Security Project (OWASP) maintains this information in their Cross Site Scripting (XSS) Prevention Cheat Sheet. One of the many terrific resources available from OWASP!
HTML Attributes
HTML attributes are used to provide additional information about HTML elements, e.g. style, class, id, type. In this context it is important to make sure attribute values are surrounded by quotes. Values that are not surrounded by quotes can be broken out of by a number of different characters. The simplest being a space, but also certain browsers will treat % * + , – / ; < = > ^ and | as whitespace which will allow an attacker to break out of the attribute’s context. The attacker can now start adding their own JavaScript aware elements and events such as “onmouseover” or “onerror” to get script execution to occur.
It is important to to be aware when placing data into an HTML element’s event handler such as “onclick”, “onmouseover”, and “onerror”. These attributes are capable of executing javascript without the need for JavaScript “script” tags and should be protected against using the Javascript protection mechanism.
The only allowable characters in this context are alphanumerics. All other characters should be escaped using HTML escape characters in the format of &#[ASCII DECIMAL]; e.g. ! for “!”. These characters will be rendered properly on the page, but will appear as the escaped values in the page source.
Javascript
Dynamic data can only be placed inside of quotation marks in javascript. The quotation marks are needed to ensure dynamic data is not able to break into the script context. It is also extremely important to not output dynamic data into JavaScript functions that will evaluate and execute the provided data or code. Examples of this would include “setInterval()”, “setTimeout()”, and “eval()”. There is no amount of escaping that can prevent code execution in these functions.
The solution is similar to the solution for HTML Attributes where only allowable characters are alphanumerics. However in this case, all other characters should be escaped using the format of \x[ASCII HEX]; e.g. \x33 for “!”.
Cascading Style Sheets (CSS)
Cascading Style Sheets (CSS) is quite surprisingly extremely powerful and riddled with various caveats of dangerous execution context, especially when getting involved in browser specific sections. Dynamic data can only be used as a value. Dynamic data should never specify a property (e.g. width, url, background). If outputting into the URL property it is important to to ensure URLs start with “http” and not “javascript”. Internet Explorer (IE) supports an “expression” property that allows JavaScript execution, so it is important not to output dynamic data into the field, or to allow the “expression” property to be added to the CSS file.
The solution is similar to the solution for JavaScript, where only allowable characters are alphanumerics. Also, in this case, all other characters should be escaped using the format of \x[ASCII HEX]; e.g. \x33 for “!”.
URLs
URLs are another area with a number of complex rules. URL Get parameters should be URL Encoded when being dynamically constructed. The entire URL should should be surrounded by quotes if being placed into an HTML Attribute. The other portions of the URL should be encoded according to the context it is being outputted into. Special care needs to take place when outputting into “href”, “src” or other URL-based attribute to ensure that other protocols are not specified, such as JavaScript, e.g. href=”javascript:alert(12);”.
Recommended Reading
[amazon template=iframe image&chan=default&asin=0071776168][amazon template=iframe image&chan=default&asin=1118362187 ][amazon template=iframe image&chan=default&asin=1118026470 ]
Brian Cardinale
Latest posts by Brian Cardinale (see all)
- Empower Those Who Stand, Forgive Those Who Stumble - January 21, 2017
- Add Custom Header to Nikto Scan - October 28, 2015
- CVE-2015-4670: Directory Traversal to Remote Code Execution in AjaxControlToolkit - June 22, 2015