firefox

How the browser submits a file, differences between x-www-form-urlencoded and form-data

Even though the task of uploading a file to a server might look simple at first, it actually involves several complex steps.
In the text below I’ll try to give a brief overview of the steps involved and some questions that I gathered during the examination process.

Enctypes

The html form element has several different attributes, one of them is the enctype.

Enctypes as defined in the HTML4.0.1 Specification

“The enctype attribute of the FORM element specifies the content type used to encode the form data set for submission to the server”

The possible enctypes are:

  • application/x-www-form-urlencoded: The default value if the attribute is not specified.
  • multipart/form-data: Use this value if you are using an <input> element with the type attribute set to “file”.
  • text/plain (HTML5)

1. application/x-form/urlencoded

When the form’s method is set to POST, there are two options available for data encoding.
The default one is the x-www-form-urlencoded also known as Percent encoding
The encoding mechanism is pretty straight forward.
A set of characters are deemed reserved, and if they need to be used they are encoded using their hexadecimal values prefixed by a percent sign.

The list of reserved chars is:

! * ‘ ( ) ; : @ & = + $ , / ? # [ ]

The unreserved chars are:

A    B    C    D    E    F    G    H    I    J    K    L    M    N    O    P    Q    R    S    T    U    V    W    X    Y    Z
a    b    c    d    e    f    g    h    i    j    k    l    m    n    o    p    q    r    s    t    u    v    w    x    y    z
0    1    2    3    4    5    6    7    8    9    –    _    .    ~

So how the + would look like after encoding?

We know that the decimal value for + as by the ascii table is 43
The binary representation of 43 is: 0010 1011
If we get the decimal values of each nibble we end up with:
0010 => 2
1011 => 11

So we can represent the plus sign in three different ways

  • Char => +
  • Base10 => 43
  • Base16 => 2B
  • Url Encoded => %2D

dec-hex-bin

How does the browser would parse a simple form like this using UrlEncoding?

 
 <form id="simpleForm" action="/" method="POST" name="simeplForm" enctype="application/x-www-form-urlencoded">  
 <input id="simpleInput" type="text" name="simpleInput" />  
 <input id="simpleSubmit" type="submit" name="simpleSubmit" value="Submit Simple Form" />
   

As you can see on the image below it appends all the elements of the form using the name attribute as the key.
In the end request body looks something like:

simpleInput=simple+form&simpleSubmit=Submit+SimpleForm

form-url-encode

If you are curious and wants to see how Firefox can do that you can check the source code implementation here

Most likely, this is the part where the query string gets constructed:

   
 nsresult  
 nsFSURLEncoded::AddNameValuePair(const nsAString& aName,  
 const nsAString& aValue)  
 {  
 // Encode value  
 nsCString convValue;  
 nsresult rv = URLEncode(aValue, convValue);  
 NS_ENSURE_SUCCESS(rv, rv);

// Encode name  
 nsAutoCString convName;  
 rv = URLEncode(aName, convName);  
 NS_ENSURE_SUCCESS(rv, rv);

// Append data to string  
 if (mQueryString.IsEmpty()) {  
 mQueryString += convName + NS_LITERAL_CSTRING("=") + convValue;  
 } else {  
 mQueryString += NS_LITERAL_CSTRING("&") + convName  
 + NS_LITERAL_CSTRING("=") + convValue;  
 }

return NS_OK;  
 }  

Now assuming that the browser:

  • Went through all the elements of the form
  • Parsed them
  • Created the HTTP request
  • Sent the request

The server somehow needs to be able to interpret the request and parse the data.
If the form enctype is set to x-www-form-urlencoded then the server knows which format to expect the data, thus it will be able to parse and do something useful with it.

So when you’re running Node.js ,PHP, .NET, Ruby, or any other server side technology they are implementing a parser that goes through the HTTP request body and creates a key value pair data structure providing all the data contained in the form.
That’s why it is required to set a name in the form elements, they will be the keys of the data structure created in the server with the proper values for each element.

2. multipart/form-data

Now with that being said, lets think about the multipart/form-data encoding type.
If the browser is trying to send a file, whatever type it may be, does it makes sense to encode the whole file using percent encoding then append to a string containing all the other form elements like x-www-form-url-encoding does? I would say no.

So how does multipart/form-data encodes the form elements?
The definition in the RFC 2388 summarizes pretty well:

Definition of multipart/form-data

The media-type multipart/form-data follows the rules of all multipart
MIME data streams as outlined in [RFC 2046].  In forms, there are a
series of fields to be supplied by the user who fills out the form.
Each field has a name. Within a given form, the names are unique.

“multipart/form-data” contains a series of parts. Each part is
expected to contain a content-disposition header [RFC 2183] where the
disposition type is “form-data”, and where the disposition contains
an (additional) parameter of “name”, where the value of that
parameter is the original field name in the form. For example, a part
might contain a header:

Content-Disposition: form-data; name=”user”

with the value corresponding to the entry of the “user” field.

Field names originally in non-ASCII character sets may be encoded
within the value of the “name” parameter using the standard method
described in RFC 2047.

As with all multipart MIME types, each part has an optional
“Content-Type”, which defaults to text/plain.  If the contents of a
file are returned via filling out a form, then the file input is
identified as the appropriate media type, if known, or
“application/octet-stream”.  If multiple files are to be returned as
the result of a single form entry, they should be represented as a
“multipart/mixed” part embedded within the “multipart/form-data”.
Each part may be encoded and the “content-transfer-encoding” header
supplied if the value of that part does not conform to the default

Basically, each element has a different content-type, which allows a image to be sent in the same request as a bunch of text and still provide enough information for the server to parse all the data.

So if we ha a form like this:

 
 <form id="simpleForm" action="/" method="POST" name="uploadForm" enctype="multipart/form-data">  
  <input id="simpleInput" type="text" name="simpleInput" />  
  <input id="fileUpload" type="file" name="fileUpload" />  
  <input id="simpleSubmit" type="submit" name="simpleSubmit" value="Submit Simple Form" />
  </form>

The request body would look something like this:

——WebKitFormBoundary0K1fvU2Vy3qpT4ua
Content-Disposition: form-data; name="fileUpload"; filename=""
Content-Type: application/octet-stream

——WebKitFormBoundary0K1fvU2Vy3qpT4ua
Content-Disposition: form-data; name="textInput"

——WebKitFormBoundary0K1fvU2Vy3qpT4ua
Content-Disposition: form-data; name="submitBtn"

——WebKitFormBoundary0K1fvU2Vy3qpT4ua–

form-multi-data

Conclusion

From time to time I find myself in the need to create file upload plugins, and when the time comes I always find my self googling the answer for how that can be accomplished.
I never stopped and asked myself, wait a minute, how does this actually work, why are there two different encode types? or why do I need to change the encode type when sending files to the server?

I just simply take that as a given and move on.
However, even though we need to leverage what has already been created and try not to reinvent the wheel, at the same time, having a basic knowledge of how the tools we use on a daily basis function might not be a bad idea.
We don’t have to learn them to the point where we can write one from scratch, but we should learn them to the point where we can make smart decisions about when and how to use them.

Firefox Bug 784402, Pointer Lock must respect iframe sandbox flag

Recently I’ve worked on the Firefox Bug 784402 – Pointer Lock must respect iframe sandbox flag.

This is a quick overview of what had to be done on the bug.

Sandbox flags

First lets check what the sandbox attribute does:
A quote from the w3c spec

The sandbox attribute, when specified, enables a set of extra restrictions on any content hosted by the iframe. Its value must be an unordered set of unique space-separated tokens that are ASCII case-insensitive. The allowed values are allow-forms, allow-popups, allow-same-origin, allow-scripts, and allow-top-navigation. When the attribute is set, the content is treated as being from a unique origin, forms and scripts are disabled, links are prevented from targeting other browsing contexts, and plugins are secured. The allow-same-origin keyword allows the content to be treated as being from the same origin instead of forcing it into a unique origin, the allow-top-navigation keyword allows the content to navigate its top-level browsing context, and the allow-forms, allow-popups and allow-scripts keywords re-enable forms, popups, and scripts respectively.

With pointerlock landing on Firefox 15, it was decided that a new sandbox flag should be created to restrict the pointerlock usage on embedded scripts in a page, so for example: if you add an advertisement script on your page, you don’t want to give the permissions to the advertisement to lock the pointer to itself.
To manage that, the allow-pointer-lock sandbox was created.

An overview of how the sandbox flags work:
List of flags:

 
 /**  
 * This flag prevents content from navigating browsing contexts other than  
 * the sandboxed browsing context itself (or browsing contexts further  
 * nested inside it), and the top-level browsing context.  
 */  
 const unsigned long SANDBOXED_NAVIGATION = 0x1;

/**  
 * This flag prevents content from navigating their top-level browsing  
 * context.  
 */  
 const unsigned long SANDBOXED_TOPLEVEL_NAVIGATION = 0x2;

/**  
 * This flag prevents content from instantiating plugins, whether using the  
 * embed element, the object element, the applet element, or through  
 * navigation of a nested browsing context, unless those plugins can be  
 * secured.  
 */  
 const unsigned long SANDBOXED_PLUGINS = 0x4;

/**  
 * This flag forces content into a unique origin, thus preventing it from  
 * accessing other content from the same origin.  
 * This flag also prevents script from reading from or writing to the  
 * document.cookie IDL attribute, and blocks access to localStorage.  
 */  
 const unsigned long SANDBOXED_ORIGIN = 0x8;

/**  
 * This flag blocks form submission.  
 */  
 const unsigned long SANDBOXED_FORMS = 0x10;

/**  
 * This flag blocks script execution.  
 */  
 const unsigned long SANDBOXED_SCRIPTS = 0x20;

/**  
 * This flag blocks features that trigger automatically, such as  
 * automatically playing a video or automatically focusing a form control.  
 */  
 const unsigned long SANDBOXED_AUTOMATIC_FEATURES = 0x40;

/**  
 * This flag blocks the document from acquiring pointerlock.  
 */  
 const unsigned long SANDBOXED_POINTER_LOCK = 0x80;  

Parsing the flags

So we have a 32 bit integer to store the sandbox flags.

Breaking down the integer we have 8 bytes
We can represent each byte in hexadecimal format:

So the number 0xFFFFFFFF has all the bits turned ON

Knowing that, we could use each bit of the integer to represent a flag.
We don’t care about the decimal value of that integer, since we are using it to store flags and not values.
So by saying 0x1, we are telling to turn the first bit of the first byte on, 0x2 turns the second bit of the first byte on
0x10 on the other hand tells to turn the first bit of the second byte on.
Remember that we are using hexadecimal notation.

So in the end, what’s happening is that each flag is turning a different bit on the integer

Later we’ll be able to check if that specific bit is ON or OFF and determine the status of the flag.

One thing to keep in mind is that if the iframe doesn’t have the sandbox attribute, then all the flags are turned OFF by default.

 
 <iframe></iframe>  

If the iframe has an empty sandbox attribute, then all the flags are ON by default

 
 <iframe sandbox=""></iframe>  

To turn the flags off, you can specify the feature you want to enable in the sandbox attribute:

   
 <iframe sandbox="allow-pointer-lock allow-same-origin"></iframe>  

In the snippet above both the allow-pointer-lock and allow-same-origin flag would be turned OFF, all the other flags would be ON

This is the code that parses the sandbox flags:

   
 /**  
 * A helper function that parses a sandbox attribute (of an <iframe> or
 * a CSP directive) and converts it to the set of flags used internally.
 *
 * @param aAttribute the value of the sandbox attribute
 * @return the set of flags
 */
uint32_t
nsContentUtils::ParseSandboxAttributeToFlags(const nsAString & aSandboxAttrValue) {
  // If there’s a sandbox attribute at all (and there is if this is being  
  // called), start off by setting all the restriction flags.  
  uint32_t out = SANDBOXED_NAVIGATION |
    SANDBOXED_TOPLEVEL_NAVIGATION |
    SANDBOXED_PLUGINS |
    SANDBOXED_ORIGIN |
    SANDBOXED_FORMS |
    SANDBOXED_SCRIPTS |
    SANDBOXED_AUTOMATIC_FEATURES |
    SANDBOXED_POINTER_LOCK;

  if (!aSandboxAttrValue.IsEmpty()) {
    // The separator optional flag is used because the HTML5 spec says any  
    // whitespace is ok as a separator, which is what this does.  
    HTMLSplitOnSpacesTokenizer tokenizer(aSandboxAttrValue, ‘‘,
      nsCharSeparatedTokenizerTemplate < nsContentUtils::IsHTMLWhitespace > ::SEPARATOR_OPTIONAL);

    while (tokenizer.hasMoreTokens()) {
      nsDependentSubstring token = tokenizer.nextToken();
      if (token.LowerCaseEqualsLiteral("allow-same-origin")) {
        out &= ~SANDBOXED_ORIGIN;
      } else if (token.LowerCaseEqualsLiteral("allow-forms")) {
        out &= ~SANDBOXED_FORMS;
      } else if (token.LowerCaseEqualsLiteral("allow-scripts")) {
        // allow-scripts removes both SANDBOXED_SCRIPTS and  
        // SANDBOXED_AUTOMATIC_FEATURES.  
        out &= ~SANDBOXED_SCRIPTS;
        out &= ~SANDBOXED_AUTOMATIC_FEATURES;
      } else if (token.LowerCaseEqualsLiteral("allow-top-navigation")) {
        out &= ~SANDBOXED_TOPLEVEL_NAVIGATION;
      } else if (token.LowerCaseEqualsLiteral("allow-pointer-lock")) {
        out &= ~SANDBOXED_POINTER_LOCK;
      }
    }
  }

  return out;
}

First all the flags are turned ON.
Then it checks if the sandbox attribute has any values, if it does it splits them and compares against the possible flags.
Once it finds a match, it does a BIT NEGATION on the flag and a BIT AND with the integer that has all the other flags.
What happens is that the flag being parsed is turned OFF.

In the end the integer with the status of all the flags is returned.

Locking the pointer

Now lets take a look at the code that checks for the allow-pointer-lock flag when an element requests pointerlock

 
 bool  
 nsDocument::ShouldLockPointer(Element* aElement)  
 {  
 // Check if pointer lock pref is enabled  
 if (!Preferences::GetBool("full-screen-api.pointer-lock.enabled")) {  
 NS_WARNING("ShouldLockPointer(): Pointer Lock pref not enabled");  
 return false;  
 }

 if (aElement != GetFullScreenElement()) {  
 NS_WARNING("ShouldLockPointer(): Element not in fullscreen");  
 return false;  
 }

 if (!aElement->IsInDoc()) {  
 NS_WARNING("ShouldLockPointer(): Element without Document");  
 return false;  
 }

 if (mSandboxFlags & SANDBOXED_POINTER_LOCK) {  
 NS_WARNING("ShouldLockPointer(): Document is sandboxed and doesn’t allow pointer-lock");  
 return false;  
 }

 // Check if the element is in a document with a docshell.  
 nsCOMPtr ownerDoc = aElement->OwnerDoc();  
 if (!ownerDoc) {  
 return false;  
 }  
 if (!nsCOMPtr(ownerDoc->GetContainer())) {  
 return false;  
 }  
 nsCOMPtr ownerWindow = ownerDoc->GetWindow();  
 if (!ownerWindow) {  
 return false;  
 }  
 nsCOMPtr ownerInnerWindow = ownerDoc->GetInnerWindow();  
 if (!ownerInnerWindow) {  
 return false;  
 }  
 if (ownerWindow->GetCurrentInnerWindow() != ownerInnerWindow) {  
 return false;  
 }

 return true;  
 }  

The ShouldLockPointer method is called every time an element requests pointerlock, the method does some sanity checks and makes sure everything is correct.
To check for the allow-pointer-lock sandbox flag, a BIT AND with the mSandBoxFlags and the SANDBOXPOINTERLOCK const is performed, we’ve looked at the SANDBOXPOINTERLOCK flag before, it has the value of 0x80
So if pointerlock is allowed, the mSandboxFlags would have the SANDBOXPOINTERLOCK flag OFF and the BIT AND would be false.

A big thanks to Ian Melven.
Ian is the one who implemented the sandbox attribute on Firefox and gave me some guidance on the PointerLock sandbox attribute bug.

Bug 735031 - Fullscreen API implementation assumes an HTML Element

Bug 735031 was to update the Firefox fullscreen implementation to allow SVG elements to receive fullscreen mode.

An overview of the relationship between DOM Elements

This is not a complete diagram, there are a bunch more elements inheriting from nsIDOMHTML/SVG/XULElement. However, It gives a nice visual representation showing that not all DOMElements are HTMLElements.

Problem

Only HTML Elements were allowed to receive fullscreen mode.
SVG Elements didn’t know about mozRequestFullScreen since the implementation was done only for HTML Elements

Requesting mozFullScreen on a SVG element would give this error:

TypeError: svgElement.mozRequestFullScreen is not a function

The IDL declarion for mozRequestFullScreen was on:

dom/interface/ html /nsIDOMHTMLElement.idl

And

MozRequestFullScreen

was implemented on:

content/html/content/src/nsGenericHTMLElement.cpp

Solution

The solution was to move the declaration of mozRequestFullScreen to:

dom/interfaces/core/nsIDOMElement.idl

And the definition:

content/base/src/nsGenericElement.cpp

Now both HTML and SVG elements can request fullscreen mode.

Bug
Diff

Notes

Since this fix had to change some IDLs, their UUID had to be updated. However, in this case, because the base IDL for all DOMElements was changed, the UUIDS for all the IDLs inheriting from nsIDOMElement had to be updated as well. The problem is that there are around 150 IDLs inheriting from nsIDOMElement, and to update each one by hand would have been CRAZY!
Luckly, somebody must have faced this problem before and created a script to update the UUID of IDLs and all its children.

update-uuids To run the script:

update-uuids . nsIDOMElement nsIDOMDocument

The output:

nsIDOMElement because it was given on command line
f561753a-1d4f-40c1-b147-ea955fc6fd94 -> a652db92-f8d4-47e0-bf8f-1ad72e6c083f
nsIDOMDocument because it was given on command line
d7cdd08e-1bfd-4bc3-9742-d66586781ee2 -> ff3125e0-b1b5-467f-84ad-1d1eeafed595
nsIDOMHTMLElement because it inherits from nsIDOMElement
3de9f8c1-5d76-4d2e-b6b9-334c6eb0c113 -> 5b703ce7-e551-41fa-b465-ff94aa3bdc66
nsIDOMXULElement because it inherits from nsIDOMElement
5e0a7c2c-fdb6-459d-a67b-549181218c31 -> 42e74ec0-75c7-422c-b564-f853e3cbbb8b
nsIDOMSVGElement because it inherits from nsIDOMElement
dbb1b49c-dce5-43fe-97ea-e249b5620aa2 -> d2900917-e0ce-4eb8-aaf9-7e021d45472a
nsIDOMXMLDocument because it inherits from nsIDOMDocument
b53a4bab-0065-468b-810a-4c4659a04f00 -> b76ca016-46e8-4ee2-be3d-5b08b29afb72
….

.. Updated ./dom/interfaces/svg/nsIDOMSVGLineElement.idl with 1 changes
Updated ./dom/interfaces/svg/nsIDOMSVGStopElement.idl with 1 changes
Updated ./dom/interfaces/svg/nsIDOMSVGGElement.idl with 1 changes
Updated ./dom/interfaces/svg/nsIDOMSVGPatternElement.idl with 1 changes
Updated ./dom/interfaces/svg/nsIDOMSVGForeignObjectElem.idl with 1 changes
….

.

Originals are in *.idlbak

PointerLock API Updates

A quick update on the Firefox PointerLock API implementation

Lets start with mochitests. While writing mochitests for pointerlock we stumbled on two problems

  1. Not being able to specify how many tests should run (different platforms were running different number of tests)
  2. Mochitest iframe not allowed to go fullscreen, making us run all the tests on a different window

David Humphrey came up with a solution for our first problem and added an “expect” functionality to the mochitest framework.
So now we can specify how many tests should occur when making asynchronous tests, for example:
SimpleTest.waitForExplicitFinish(3)
Bug 724578

For our second problem, I added the attribute mozallowfullscreen=true to the mochitest iframe that runs all tests.
I’m not sure if there was a specific reason for not allowing fullscreen on the mochitest iframe, but if it wasn’t it will simplify a lot writing tests for pointerlock
Bug 728893

Spec Updates

The spec had two major changes

  1. Switching from callbacks to events
  2. Moving functionality to the Document and Element

For example:
Everytime the pointer is locked/unlocked a mozpointerchange event will be dispatched to the document
A mozpointererror event will be dispatched if there are any errors while locking the pointer
Now It’s possible to access the element with the pointer locked via the document

  
 var div = document.createElement("div");

document.addEventListener("mozpointerlockchange", function (e) {  
 if (document.mozPointerLockElement === div) {  
 // Pointer is locked  
 }  
 }, false);

document.addEventListener("mozpointerlockerror", function (e) {

}, false);

document.addEventListener("mozfullscreenchange", function (e) {  
 if (document.mozFullScreen &&  
 document.mozFullScreenElement === div) {  
 div.mozRequestPointerLock();  
 }  
 }, false);

div.mozRequestFullScreen();

Instead of something like this:

 
 var div = document.createElement("div");

div.addEventListener("mozpointerlocklost", function (e) {  
 // Dispatched when pointer is unlocked  
 }, false);

document.addEventListener("mozfullscreenchange", function (e) {  
 if (document.mozFullScreen &&  
 document.mozFullScreenElement === div) {  
 navigator.mozPointer.lock(  
 div, // Element  
 function () {  
 // Success callback  
 },  
 function () {  
 // Failure callback  
 }  
 );  
 }  
 }, false);

div.mozRequestFullScreen();

Updating PointerLock API - Callbacks, Events and Threads

The PointerLock implementation of Firefox is going great, we are close to having the patch ready to land, maybe Firefox 13.

The work being done now is mainly some final touches, specially on the mochitests and on the API.

Recently the W3C PointerLock spec has been updated, the changes are the following:

  • When locking the mouse, dispatch pointerlockchange/pointerlockerror events instead of firing callbacks
  • Locking the pointer by requesting pointer lock on the target element
  • Adding a reference to the locked element in the Document
  • Exiting pointerlock by calling exitPointerLock on the Document

Those were significant changes, since it affected a big chunk of the code we had it implemented. However, I believe these updates to the API are beneficial, since with them developers will have an API similar to the fullscreen to work with.

The first bit I started working on was to dispatch the pointerlockchange/pointerlockerror instead of callbacks.

To Dispatch the events, the nsAsyncDOMEvent object was used:

   
 static void  
 DispatchPointerLockChange(nsINode* aTarget)  
 {  
 nsRefPtr e =  
 new nsAsyncDOMEvent(aTarget,  
 NS_LITERAL_STRING("mozpointerlockchange"),  
 true,  
 false);  
 e->PostDOMEvent();  
 }  

Same logic to dispatch the pointerlockerror and pointerlocklost

One of the good things about having to go back and rewrite some code, is the fact that opens the possibility to analyse some of the decisions made before.
Specifically in this case, the use of different threads when locking the pointer.
At first, the callbacks were being fired on a different thread so the execution wouldn’t hang, and the Lock method would be able to return as soon as possible and not make the user wait for a result.

Before, the logic for callbacks was mainly based off the nsGeoLocation implementation. However, now with the pointerlock API looking more like the fullscreen api I went and looked how they handle setting the element into fullscreen.
I had written a blog post a while back inspecting the fullscreen API, so even with the API receiving some changes it was easy to locate the code path for requesting fullscreen on an element.

Here is a simple diagram I drew

The diagram shows that once mozRequestFullScreen is called on an element, the method returns really fast and all the heavy processing happens on a separate thread.

On the other hand, this is how PointerLock does it:

On PointerLock, different from the FullScreen, the heavy processing happens on the main thread, and the new thread only handles the callback firing. Now switching to events, even less processing happens on the new thread, so that made me rethink the logic for locking the pointer.

I remember hearing that all the code that involves changing the presentation, it needs to happen on the main thread, so maybe that’s why we’re not spinning the pointerlock check/validation to another thread, since it involves changing the UI presentation by hiding the pointer if the lock is successful.

Another thing that caught my attention was the fact that on the fullscreen code, the nsCallRequestFullScreen object was dispatched to a new thread using NSDispatchToCurrentThread and on PointerLock we are using NSDispatchToMainThread


NSDispatchToCurrentThread
NSGetCurrentThread
nsThreadManager::GetCurrentThread NS_DispatchToMainThread
mMainThread