Content Security Policy (CSP)

CSP is a proposal being talked about by Mozilla developers. The proposal can be viewed here: https://wiki.mozilla.org/Security/CSP/Spec

My nutshell interpretation of the proposal is to give client side web browsers the ability to enforce a policy defined by the web master of the web site serving the web page that the client is viewing.

If it is a policy defined by the web master, why can’t the web master just be cautious? In a perfect world he can, but very often in today’s world the web master is not really a code professional, but an everyday person running software (such as a blog or CMS or forum) that he did not write nor fully understands. This often results in XSS (and other) vulnerabilities that are not noticed or closed that allow a malicious individual to inject scripts into the web site for the purpose of attacking other users who visit that web site.

Sometimes these attacks can remain on the server for quite some time before they are found and dealt with. This is why I am so excited about the Mozilla proposal. With CSP, the web master can define a few simple parameters that alert the web browser that something fishy is going on. Furthermore, the web master can define a URL that the web browser can then use to alert the web master something fishy was attempted.

See the web page (linked above) for proper details and definition.

Disclaimer: I am not a Mozilla Developer and I am not involved in any of the decision making with respect to the CSP proposal.

Server Side CSP Filtering

Since it will be some time before CSP capable browsers have significant market share, I decided to implement CSP server side in a PHP class.

What the class (assuming it properly functions) does is remove any content from the web page that violates the CSP before the web page is even sent to the requesting browser. The beauty of doing it this way is that the user will not have to use a specific browser or install a plugin/add-on in order to benefit from a web master establishing a reasonable CSP policy (is CSP policy redundantly redundant?).

A web master who implements my class should still send the CSP header to the requesting client, server side CSP can not protect the user from policy violations that result from DHTML modification of the web page. DHTML should only happen via a script that is from an approved domain so the risk is lower, but the risk is still there, especially with the recent scary popularity of web masters using third party hosted JavaScript libraries.

Class Source

Target Audience

Anyone with dynamic content, especially dynamic content that displays data not provided by the page programmer.

I would love to see this class implemented by template developers and web applications commonly used by the general public (IE blogging software, content management software, boards, etc.)

License

The class is distributed under the terms of the Common Public License, version 1.0

CSP Implementation

Here is how the script implements CSP:

Image Nodes

If an img node does not have a src attribute that matches a CSP allowed source for images, one of two things will happen:

  1. If an alt attribute exists, the img node is replaced with a text node containing the contents of the alt attribute. Earlier pre-release versions of the class just removed the src attribute, but that broke the W3C specification causing valid input to return invalid output after filtering.
  2. If an alt attribute does not exist, the img node is simply removed.

Media Nodes

Media nodes consist of the audio and video elements that are part of the (X)HTML 5 specification.

If a media node does not either have a src attribute or a source child node that matches a CSP allowed source for media, the media node is removed.

Script Nodes

Since there are no circumstances where a script node may have children in a CSP, any children of a script node are removed.

If a script node does not have a src attribute that matches a CSP allowed source for scripts, the script node is removed.

Event Attributes

Event attributes are not allowed. All event attributes are therefore removed.

If you need event attributes, define the functions is an external script file and use the JavaScript facilities for attaching them to an element.

Object / Embed / Applet Nodes

If an object node does not have a src attribute that matches a CSP allowed source for objects, one of two things will happen:

  1. If the object node has children, it is turned into a div node (without any attributes), the children are preserved.
  2. If the object node does not have children, it is removed.

Embed

The embed element started as a proprietary element that has been deprecated by the object element, yet I have seen some references to it being officially included in (X)HTML 5 which makes zero sense to me since it does not do anything that can not be done with the object element. However, deprecated or not, it is a fairly popular element, especially for flash.

The embed element is not allowed to have any children, so any children of an embed node are removed.

If an embed node does not have a src attribute that matches a CSP allowed source for objects, it is removed.

Applet

The applet element has been deprecated, you should use object instead. However, it is still fairly popular element, especially for Java.

If an applet node does not have a code attribute that matches a CSP allowed source for objects, one of two things will happen:

  1. If the applet node has children, it is turned into a div node (without any attributes), the children are preserved.
  2. If the applet node does not have children, it is removed.

Frame Nodes

A frame node is not allowed to have children by W3C specification, so any children of a frame node are removed.

If a frame node does not have a src attribute that matches a CSP allowed source for frames, it is removed.

If an iframe node does not have a src attribute that matches a CSP allowed source for frames, one of two things will happen:

  1. If the iframe node has children, it is turned into a div node (without any attributes), the children are preserved.
  2. If the iframe node does not have children, it is removed.

Style Sheets

If the href attribute of a link node does not match a CSP allowed host for style sheets, the link node is removed.

If the host serving the web page does not match a CSP allowed host for style sheets, any style nodes and attributes are removed.

Trigger Notification

If the CSP has a report URI specified, and the $policyLogFile variable does not point to a real file, code that violated the policy will be reported to that URI using a similar XML syntax as specified in the Mozilla CSP specification. It is up to the web developer to code something that does something useful with the report.

<cspfilter-report>
  <request>$_SERVER['REQUEST_METHOD'] $_SERVER['REQUEST_URI'] $_SERVER['SERVER_PROTOCOL']</request>
  <request-headers>Some headers related to the request</request-headers>
  <blocked-uri directive="policy">blocked resource URI</blocked-uri>
  <misc>violations that are not a URI resource</misc>
</csp-report>

It is very similar to what a CSP aware browser would send, but currently has some differences:

  1. The main node uses the tag name cspfilter-report opposed to csp-report to differentiate server side CSP enforcement from what is caught by the requesting client when the report is sent via post to the report-uri.
  2. The blocked-uri element will have an attribute whose name is the policy directive that caused the block and whose value is the policy host expression.
  3. I have added a misc node for alterations of the source by the filter that do not involve a blocked URI. The element name of that node may change.
  4. All violation reports for a single page request will be children of a single cspfilter-report node.

If you have specified a trigger log file, the report will be appended to the log file rather then sent to a report URI. The web server of course must have permission to write to the log file.

Beyond CSP

My class allows for some parameters that are not part of the CSP proposal.

Script Only in Head

This allows you to forbid any script nodes regardless of the src attribute that do not appear in the document head. It is useful if you declare third party JavaScript sources for your page to use but want to make sure your users can not inject a script from the same third party script host.

Other Checks

Attribute Content

The following are not allowed in any attribute and are removed if they are found:

Redundant Nodes

By W3C specification some nodes, such as title, are only allowed to occur once. The class assumes something fishy is being attempted and removes additional occurrences of those nodes.

Redundant Nodes

By W3C specification some nodes, such as title, are only allowed to occur once. The class assumes something fishy is being attempted and removes additional occurrences of those nodes.

Head Section

By W3C specification, some elements may only occur as children of the head node, such as the meta element. When those elements are not direct children of the head node, the output filter considers it to be suspicious and they are removed.

Nodes That Should Not Have Children

By W3C specification some elements, such as the hr element, are not allowed to have children. When those nodes are found to have children, the output filter considers it to be suspicious and the child nodes are removed.

Live Testing of Filter Class

You can test the current implementation of the CSP filter class here: dom_script_test.php

No intentional import filtering of code entered into the textarea is performed. The text area upon submit is eaten by DOMDocument importHTML() which does do some minimal filtering of it’s own, but if the code you enter is clean HTML it should be properly imported without modification and you can test how the class filters the input. Filtered input is displayed as XML at the bottom of the page after you hit submit so that you do not have to continuously view the page HTML source (which can sometimes fail in some browsers, fetching a new copy of the page via get to show the source) to see how it filtered your input.

Known Bugs

IDNA needs testing.
Little testing has been done with host names and server paths that use IDNA.
Not Fully Compliant
Need to bring into compliance with https://wiki.mozilla.org/Security/CSP/Spec before 1.0
Implement CSP data keyword
I need to understand it’s function a bit more
Host Expression protocol and port
host names in the host expression need to take protocol and port into consideration if specified.

Needs Checking

The obfus() function needs some heavy duty checking. I initially borrowed some of the regex in it from another source, and found the regex to be improper. The regex I replaced needs to be brutally tested to make sure it does what it is suppose to do, and the regex I did not replace needs thorough testing to make sure it does not need replacing.

I need to make sure I have the scope of the base element correct, which elements it impacts and which elements it does not impact.

For script attributes, I need to make sure I catch all methods for invoking client side scripting. Right now it checks for javascript: vbscript: mocha: but there may be others it needs to check for?

I need to check whether or not namespaced elements are legal in XHTML, right now I only check for namespaced attributes.

I really need to make sure there are no cases where W3C valid input result is invalid output after filtering.

Class Usage

Not a Substitute for Input Filter

You still need to filter your input, for a variety of reasons (IE you do not want an XSS vector stored in your database). If your input filtering is good and is configured to properly enforce your policy, the class should never be triggered to modify output generated by user input. The class is a second line of defense for situations where input validation failed to catch something.

If you are developing code and need an input filter, you are probably better off using something like HTML Purifier than trying to write something yourself.

Class Operates on Complete Document

To use the class, you need to fully construct your HTML/XHTML document BEFORE using the class to filter it. Most (all?) popular template systems allow for this with relative ease.

If you mix HTML with PHP and/or use the PHP print() and echo() functions to send output to the browser before your PHP has finished running, you can not use this class. Since HTTP is a stateless protocol, sending data before you have finished building the document IMHO is poor design anyway, but doing so is very common. It will not work with this class though.

With respect to template systems, I am not too familiar with them but it looks like some of them allow the page to be built in chunks and then sent as chunks instead of sent all at once. It would be better to create a buffer and append all your chunks to the buffer before sending them, and then pass the entire buffer through the output class.

Class requires PHP 5 with XML support

The class requires your document be in the form of a PHP 5 DOMDocument object. If your PHP installation does not support the PHP XML functions, you either need to recompile PHP or install the XML loadable module.

To import existing static document into a DOMDocument object:

<?php
$dom = new DOMDocument("1.0","utf-8");
$dom->preserveWhiteSpace = false; // optional
$dom->formatOutput = true; // optional but makes for prettier output
$dom->loadHTML($yourhtmlasbuffer); // use loadXML() for well formed XHTML
?>

Now your HTML/XHTML is ready for filtering.

In reality if you are using PHP and need to filter your content, you almost certainly working with dynamic content. For more information on using DOMDocument for dynamic content generation, see my docType and other DOMDocument related classes.

Warning: If your document uses multibyte UTF8 characters, the loadHTML() function may mutilate them. In those cases, you need to make sure your input is valid xhtml and use loadXML() to input existing data. The issue is an issue with libxml2 which is what the PHP DOMDocument class uses. I do not know of any options to make loadHTML() do the right thing.

Be Careful with Tidy

It may be tempting to pass your completed document through tidy before feeding it to a DOMDocument class for filtering. Be aware that tidy will move some tags into the head node that you do not want there. For example, if a malicious user manages to sneak a meta tag past your input filtering, tidy will then help the malicious user by moving that meta node into the head node where the output filter will merrily allow it.

There may be a tidy option to prevent this, I do not know.

It should be relatively safe to use tidy as part of an import filter, as long as the tidy option show-body-only is specified as true.

Internationalized Domain Names for Applications

This functionallity has has only been partially tested.

Support for IDN host names is provided by the IDNA Convert class.

If you use IDN host names, you must make sure that your document is UTF-8 before load it into a DOMDocument object. DOMDocument expects to work with UTF-8 and the idna_convert class expects UTF-8 input.

To use IDNA you need to include the file idna_convert.class.php either in your php code before you call the cspfilter class, or you can uncomment the include near the top of the cspfilter_class.php file.

The cspfilter class will then convert hosts in a policy host expression to punycode (make sure they are UTF-8) and will also convert resource sources to punycode when checking whether or not they violate policy.

IDNA Convert is written by Matthias Sommerfeld and Leonid Kogan. It is licensed under the LGPL 2.1 license.

Using the Class

Initiate the class and specify your options:

<?php
$filter = new cspfilter($dom);
$filter->csp['allow'] = 'none';
$filter->csp['img-src'] = '*.yourdomain.com *.photobucket.com www.w3.org';
$filter->csp['style-src'] = 'self';
?>

When your document is ready to be served, run the filter:

<?php
$filter->processData();
?>

Optionally apply the CSP meta tag or send the CSP header:

<?php
$filter->cspHeader = true; // defaults to false which creates meta tag
                           // instead of header
$filter->makeCSP(); // creates and applies the meta tag or sends the header
?>

Now you are ready to serve the document:

<?php
print $dom->saveHTML(); //for XHTML - use saveXML();
?>

Public Variables

CSP Specific Variables

The class has a public array called csp that has nine indexes. These indexes are identical to the names and settings used by the CSP recommendation and take the same syntax for their setting.

$csp['allow']
Default policy. Set to self to default allow sources that originate from same domain as page is being served from, set to none to default deny (recommended, and default if not set), or set to a space delimited list of hosts that resources may be served from (* as leading wild card allowed, IE *.example.com).
$csp['img-src']
Override default policy for images. Same syntax as for allow.
$csp['media-src']
Override default policy for (X)HTML 5 media. Same syntax as for allow.
$csp['script-src']
Override default policy for scripts. Same syntax as for allow.
$csp['object-src']
Override default policy for objects (including embed and applet). Same syntax as for allow.
$csp['frame-src']
Override default policy for frames and iframes. Same syntax as for allow.
$csp['frame-ancestors']
Override default policy for frame ancestors. Note that there is no way to server side filter frame-ancestors, frame-ancestor filtering can only be done by client side CSP. Same syntax as for allow.
$csp['style-src']
Override default policy for style sheets. Same syntax as for allow.
$csp['report-uri']
URI that the browser should report policy violations to.

NOTE: cspfilter does not make use the policy-uri directive. Specify your desired policy using the above array variables in your script. If you want to set site wide policy, you can create a class that extends the cspfilter class and define your defaults there. Why? See the FAQ.

Non CSP Variables

$version
The version of the class. Not used by the class. This variable should be considered read-only, but unfortunately there is not (yet, appears to be in CVS) a simple way to declare public class variables as read-only. Yes, I have seen the hacks that do it, but since changing that variable will not break anything they do not interest me. Just understand there is not a need to ever alter it.
$cspHeader
Boolean. Only used by the makeCSP() function. If set to false (default) then makeCSP() creates a CSP meta tag and puts it in the head node of your document. If set to true, makeCSP() sends an HTTP header to notify the client browser of the CSP for the page.
$scriptOnlyInHead
Boolean. If set to true, the script node is only allowed in the document head.
$httphost
String. The fully qualified host name the page is being sent from. The class does detect this from the $_SERVER["HTTP_HOST"] global, but that global can be impacted by headers the client sends, so it should not be trusted. It is better to specify the variable value.
$policyLogFile
String. If specified and the file exists, policy violations will be logged to this file. Even if $csp['report-uri'] is set, the $policyLogFile variable takes precedence so that the class does not need to make an HTTP connection.

Public Functions

cspfilter()
Constructor function that is run whenever a new instance of the class is created. There is no need to ever call this function directly. Requires a DOMDocument object as argument.
processData()
No Arguments. Applies the filtering to the DOMDocument object specified when the class is initiated according to the rules specified by the public class variables.
makeCSP()
No arguments. Either creates a meta node specifying the CSP or sends an HTTP header specifying the CSP.

Extending The Class

One thing you can do is to create an extension to the class that specifies the base policies you want to enforce. You can still over ride them on a per page basis. Here is an example:

FAQ

Why do you not support the policy-uri directive?
There are several possible sources for policy. One is the meta tag, one is the policy-uri, and one of course is the $csp array that the class uses.
 
I actually started to write code that merged the different possible sources using set intersection, but to do it right it started to get more and more complex. The more complex something is, the more likely it has bugs and the more difficult it is to maintain, so I opted for a KISS philosophy. Ignore any directives set in a policy file or in a meta tag of the DOM.
 
Since the default policy for the class is none, ignoring those other methods is safe to do. Also, the headers / meta tag are not suppose to use the policy-uri unless that is the only directive sent, but if a user wants to use a site wide policy, the user can just extend the class and define the site wide policy there via the $csp variables.
When will the class support International Host Names?
I think it does now, but It needs more testing.
Do you have a bugzilla?
No. Do not really want one. Just e-mail me bug reports.
Why do I have to have the page fully constructed before sending to use your class?
Because the class is an output filter that operates on a DOMDocument. You can cheat and alter the DOMDocument after running the filter, but then any content after running the filter is not filtered.
Are you going to write a CSP input filter?
From scratch? No. At some point I may try to extend the HTML Purifier class to add some CSP checks on input, but if (and I stress the if) I ever do, it probably will not be for awhile.
[W3C Valid]