Even though the task of uploading a file to a server might look simple at first, it actually involves several complex steps.
In the text below I’ll try to give a brief overview of the steps involved and some questions that I gathered during the examination process.

Enctypes

The html form element has several different attributes, one of them is the enctype.

Enctypes as defined in the HTML4.0.1 Specification

“The enctype attribute of the FORM element specifies the content type used to encode the form data set for submission to the server”

The possible enctypes are:

  • application/x-www-form-urlencoded: The default value if the attribute is not specified.
  • multipart/form-data: Use this value if you are using an <input> element with the type attribute set to “file”.
  • text/plain (HTML5)

1. application/x-form/urlencoded

When the form’s method is set to POST, there are two options available for data encoding.
The default one is the x-www-form-urlencoded also known as Percent encoding
The encoding mechanism is pretty straight forward.
A set of characters are deemed reserved, and if they need to be used they are encoded using their hexadecimal values prefixed by a percent sign.

The list of reserved chars is:

! * ‘ ( ) ; : @ & = + $ , / ? # [ ]

The unreserved chars are:

A    B    C    D    E    F    G    H    I    J    K    L    M    N    O    P    Q    R    S    T    U    V    W    X    Y    Z
a    b    c    d    e    f    g    h    i    j    k    l    m    n    o    p    q    r    s    t    u    v    w    x    y    z
0    1    2    3    4    5    6    7    8    9    –    _    .    ~

So how the + would look like after encoding?

We know that the decimal value for + as by the ascii table is 43
The binary representation of 43 is: 0010 1011
If we get the decimal values of each nibble we end up with:
0010 => 2
1011 => 11

So we can represent the plus sign in three different ways

  • Char => +
  • Base10 => 43
  • Base16 => 2B
  • Url Encoded => %2D

dec-hex-bin

How does the browser would parse a simple form like this using UrlEncoding?

 
 <form id="simpleForm" action="/" method="POST" name="simeplForm" enctype="application/x-www-form-urlencoded">  
 <input id="simpleInput" type="text" name="simpleInput" />  
 <input id="simpleSubmit" type="submit" name="simpleSubmit" value="Submit Simple Form" />
   

As you can see on the image below it appends all the elements of the form using the name attribute as the key.
In the end request body looks something like:

simpleInput=simple+form&simpleSubmit=Submit+SimpleForm

form-url-encode

If you are curious and wants to see how Firefox can do that you can check the source code implementation here

Most likely, this is the part where the query string gets constructed:

   
 nsresult  
 nsFSURLEncoded::AddNameValuePair(const nsAString& aName,  
 const nsAString& aValue)  
 {  
 // Encode value  
 nsCString convValue;  
 nsresult rv = URLEncode(aValue, convValue);  
 NS_ENSURE_SUCCESS(rv, rv);

// Encode name  
 nsAutoCString convName;  
 rv = URLEncode(aName, convName);  
 NS_ENSURE_SUCCESS(rv, rv);

// Append data to string  
 if (mQueryString.IsEmpty()) {  
 mQueryString += convName + NS_LITERAL_CSTRING("=") + convValue;  
 } else {  
 mQueryString += NS_LITERAL_CSTRING("&") + convName  
 + NS_LITERAL_CSTRING("=") + convValue;  
 }

return NS_OK;  
 }  

Now assuming that the browser:

  • Went through all the elements of the form
  • Parsed them
  • Created the HTTP request
  • Sent the request

The server somehow needs to be able to interpret the request and parse the data.
If the form enctype is set to x-www-form-urlencoded then the server knows which format to expect the data, thus it will be able to parse and do something useful with it.

So when you’re running Node.js ,PHP, .NET, Ruby, or any other server side technology they are implementing a parser that goes through the HTTP request body and creates a key value pair data structure providing all the data contained in the form.
That’s why it is required to set a name in the form elements, they will be the keys of the data structure created in the server with the proper values for each element.

2. multipart/form-data

Now with that being said, lets think about the multipart/form-data encoding type.
If the browser is trying to send a file, whatever type it may be, does it makes sense to encode the whole file using percent encoding then append to a string containing all the other form elements like x-www-form-url-encoding does? I would say no.

So how does multipart/form-data encodes the form elements?
The definition in the RFC 2388 summarizes pretty well:

Definition of multipart/form-data

The media-type multipart/form-data follows the rules of all multipart
MIME data streams as outlined in [RFC 2046].  In forms, there are a
series of fields to be supplied by the user who fills out the form.
Each field has a name. Within a given form, the names are unique.

“multipart/form-data” contains a series of parts. Each part is
expected to contain a content-disposition header [RFC 2183] where the
disposition type is “form-data”, and where the disposition contains
an (additional) parameter of “name”, where the value of that
parameter is the original field name in the form. For example, a part
might contain a header:

Content-Disposition: form-data; name=”user”

with the value corresponding to the entry of the “user” field.

Field names originally in non-ASCII character sets may be encoded
within the value of the “name” parameter using the standard method
described in RFC 2047.

As with all multipart MIME types, each part has an optional
“Content-Type”, which defaults to text/plain.  If the contents of a
file are returned via filling out a form, then the file input is
identified as the appropriate media type, if known, or
“application/octet-stream”.  If multiple files are to be returned as
the result of a single form entry, they should be represented as a
“multipart/mixed” part embedded within the “multipart/form-data”.
Each part may be encoded and the “content-transfer-encoding” header
supplied if the value of that part does not conform to the default

Basically, each element has a different content-type, which allows a image to be sent in the same request as a bunch of text and still provide enough information for the server to parse all the data.

So if we ha a form like this:

 
 <form id="simpleForm" action="/" method="POST" name="uploadForm" enctype="multipart/form-data">  
  <input id="simpleInput" type="text" name="simpleInput" />  
  <input id="fileUpload" type="file" name="fileUpload" />  
  <input id="simpleSubmit" type="submit" name="simpleSubmit" value="Submit Simple Form" />
  </form>

The request body would look something like this:

——WebKitFormBoundary0K1fvU2Vy3qpT4ua
Content-Disposition: form-data; name="fileUpload"; filename=""
Content-Type: application/octet-stream

——WebKitFormBoundary0K1fvU2Vy3qpT4ua
Content-Disposition: form-data; name="textInput"

——WebKitFormBoundary0K1fvU2Vy3qpT4ua
Content-Disposition: form-data; name="submitBtn"

——WebKitFormBoundary0K1fvU2Vy3qpT4ua–

form-multi-data

Conclusion

From time to time I find myself in the need to create file upload plugins, and when the time comes I always find my self googling the answer for how that can be accomplished.
I never stopped and asked myself, wait a minute, how does this actually work, why are there two different encode types? or why do I need to change the encode type when sending files to the server?

I just simply take that as a given and move on.
However, even though we need to leverage what has already been created and try not to reinvent the wheel, at the same time, having a basic knowledge of how the tools we use on a daily basis function might not be a bad idea.
We don’t have to learn them to the point where we can write one from scratch, but we should learn them to the point where we can make smart decisions about when and how to use them.