node

Dependency injection with Node.js

In the last project I was working on I had the chance to apply some dependency injection patterns on a node.js application.
Before I get into the details of the implementation it is important to understand how using dependency injection could benefit your project.

Wikipedia’s definition

Dependency injection is a software design pattern that allows removing hard-coded dependencies and making it possible to change them, whether at run-time or compile-time.[1]

This can be used, for example, as a simple way to load plugins dynamically or to choose stubs or mock objects in test environments vs. real objects in production environments. This software design pattern injects the depended-on element (object or value etc) to the destination automatically by knowing the requirement of the destination. Another pattern, called dependency lookup, is a regular process and reverse process to dependency injection.

Basically, dependency injection gives you the flexibility to separate the modules functionality from it’s dependencies.
This decoupling can come in handy during testing or even when you find yourself in the need to modify some dependencies of a module later on.

Creating the module

Lets look at how you would be able to implement some dependency injection patterns with node.

I’m going to use the WebVirt project to show some examples in action.

The code blow represents a single controller that manages some express routes:

 
 var VirtController = function (di) {

};

VirtController.prototype.actions = function (req, res) {

};

VirtController.prototype.hostStats = function (req, res) {

}

VirtController.prototype.list = function (req, res) {

};

module.exports.inject = function(di) {  
 if (!_virtController) {  
 virt = di.virtModel  
 Step = di.Step;  
 _ = di._;  
 logger = di.logger;  
 _virtController = new VirtController(di.config.logger);  
 }

 return _virtController;  
 }  

The controller has three basic methods:

  • actions
  • hostStats
  • list

However, only the inject method is exported.
That’s the only entry point of the module, you can perform some validation, initialization procedures, anything that needs to be done before you instantiate the module.

In the example above we only check if an instance was already created so we don’t create two equal objects, applying the Singleton pattern.

Injecting dependencies

To use the module all we need to do is to “inject” the dependencies and receive back the initialized instance:

  
 // Load dependencies  
 var _ = di._ = require("underscore");  
 di.Step = require(‘../../external/step/lib/step.js’);  
 di.exec = require(‘child_process’).exec;  
 di.config = config = require(‘../../config/config.js’);  
 di.logger = logger = require(‘../../utils/logger.js’);

exports.virtModel = di.virtModel = require("./models/virt-model.js").inject(di);

exports.virtController = virtController = require("./controllers/virt-controller").inject(di);

One of the major benefits we gained by applying dependency injection into our project was that it gave us the flexibility to quickly identify what the module needed to operate on, and if any changes were needed we could quickly patch them.
For example;
The WebVirt project is composed of two different pieces, the WebVirt-Manager and the WebVirt-Node.
They are separate modules that share the same code base but are designed to run on different hosts. Each one of them have specific dependencies.
The WebVirt-Manager requires Redis to store the users of the system as well other bits of data.
However the WebVirt-Node does not need Redis.
That posed a huge problem since both apps were sharing the same code base and we were using a Logger module that was saving the logs to a Redis db.
And only the WebVirt-Manager host had a Redis db running.

To fix this problem we passed a “Custom Logger” to the WebVirt-Node.
Instead of requiring the Logger that was talking with the Redis db, we passed a Logger that only logged stuff to the console.

  
 // Load dependencies  
 var _ = di._ = require("underscore");  
 di.Step = require(‘../../external/step/lib/step.js’);  
 di.exec = require(‘child_process’).exec;  
 di.config = config = require(‘../../config/config.js’);  
 var logger = {  
 error: function (err, metadata) {  
 console.log("err: ", err);  
 console.log("medatata: ", metadata);  
 }  
 }  
 di.logger = logger;

exports.virtModel = di.virtModel = require("./models/virt-model.js").inject(di);

exports.virtController = virtController = require("./controllers/virt-controller").inject(di);

And by just changing a few lines of code we were able to modify the modules dependencies without altering its source nor functionality.

Enabling CORS on a node.js server, Same Origin Policy issue

Recently we faced the famous “XMLHttprequest doesn’t allow Cross-Origin Resource Sharing” error.

To overcome the problem a very simple solution was needed.

Below I’ll try to give a quick overview of what is CORS and how we managed to work around the issue.

Cross-Origin Resource Sharing – CORS

In a nutshell CORS is the mechanism that allows a domain to request resources from another domain, for example, if domain http://websiteAAA tries to request resources from http://websiteBBB the browser won’t allow it due to Same Origin Policy restrictions.

The reason for having Same Origin Policy rules applied on the browser is to prevent unauthorized websites accessing content they don’t have permissions for.

I found a great example that emphasizes the need to have Same Origin Policies enforced by the browser: Say you log in to a service, like Google for example, then while logged in you go and visit a shady website that’s running some malware on it. Without Same Origin Policy rules, the shady website would be able to query Google with the authentication cookies saved in the browser from your session, which of course is a huge vulnerability.

Since HTTP is a stateless protocol, the Same Origin Policy rules allow the browser to establish a connection using session cookies and still keep each cookie private to the domain that made the request, encapsulating the privileges of each “service” running in the browser.

With that being said, imagine a situation where you, as a developer, need to communicate with an API sitting on a different domain. In this scenario you don’t want to hit the Same Origin Policy restrictions.

Workaround 1 – Request resources from a server

The most common way to get around this problem is to make the API request from your own server, where Same Origin Policy rules are not applied, and then provide the data back to the browser. However, this can be exploited:

Last semester I created an example of how an attacker would be able to spoof whole websites and apply a phishing attack circumventing Same Origin Policy restrictions.
The attack structure was very similar of how ARP poisoning is done.

A very brief overview of the attack:

  1. The user would land on a infected page
  2. The page would load a legitimate website by making a request from the attacker’s server where Same Origin Policies are not applied.
  3. The attacker would inject some code in the response to monitor the vicitms activity
  4. After the victm’s credentials were stolen he would stop the attack and redirect the user to the original requested page.

By spoofing the victim’s DNS it would make even harder to detect the attack, but even without DNS spoofing this approach would still catch some careless users.

All the code for the example is available on github
The attack was built on top of a nodeJS server and socketIO
The presentation slides for the attack can also be found here

Workaround 2 – JSONP

Another way to circumvent the problem is by using JSONP (JSON Padding). The wikipedia articles summarizes in a clear and simple way how JSONP works.

The magic of JSONP is to use a script tag to load a json file and provide a callback to run when the file finishes loading.

An example of using JSONP with jquery:

[sourcecode language=”javascript”]
$.ajax({
url: "http://website.com/file.json
dataType: ‘jsonp’,
success: function (data) {
// Manipulate the response here
}
});
[/sourcecode]

Even though making requests from your server or using JSONP can get around the Same Origin Policy restrictions it is not the best solution, which is why CORS started being implemented by the browser vendors.

With CORS a server can set the HTTP headers of the response with the information indicating if the resources can or can’t be loaded from a different origin.

If you are curious and want to snoop around looking into the HTTP response headers of a page, one way to do that is to use the developers tools that come with WebKit.
Below is a screenshot of all the resources loaded by the stack overflow home page.
Screen Shot 2013-02-14 at 6.34.24 PM

As you can see in the screenshot, the script loaded from careers.stackoverflow.com/gethired/js had the following HTTP header options appended to its response:

  • Access-Control-Allow-Headers
  • Access-Control-Allow-Methods
  • Access-Control-Allow-Origin

That means that if you want to make an ajax call to carrers.stackoverflow.com/gethired/js from your own page, the browser will not apply Same Origin Policy restrictions since the careers.stackoverflow server has indicated that the script is allowed to be loaded from different domains.
*An important note to make is that only the http://careers.stackoverflow.com/gethired/js has the Same Origin Rules turned off, however, the careersstackoverflow.com domain still has them enabled on other pages.

This means you can enable the header options on a response level, making sure a few API calls are open to the public without putting your whole server in danger of being exploited.

This lead us to our problem.

The Problem

In the set up we currently have, one computer plays the role of the API server and we were trying to query that API asynchronously from the browser with a page being served from a different domain.

The results, as expected, were that the call was blocked by the browser.

Solution

Instead of hacking around and trying to make the requests from a different server or using JSONP techniques we simply added the proper header options to the responses of the API server.

We are building our API on a nodeJS server, and to add extra headers options to the response could not have been easier:

First we added the response headers to one of the API calls and it worked perfectly, however we wanted to add the option to all our API calls, the solution: use a middleware.

We created a middleware which sets the response header options and pass the execution to the next registered function, the code looks like this:

[sourcecode language=”javascript”]
//CORS middleware
var allowCrossDomain = function(req, res, next) {
res.header("Access-Control-Allow-Origin", "*");
res.header("Access-Control-Allow-Headers", "X-Requested-With");
next();
}

app.configure(function () {
app.set(‘port’, config.interfaceServerPort);
app.set(‘views’, dirname + ‘/views’);
app.set(‘view engine’, ‘jade’);
app.use(express.favicon());
app.use(express.logger(‘dev’));
app.use(express.bodyParser());
app.use(express.methodOverride());
app.use(allowCrossDomain);
app.use(app.router);
app.use(express.static(path.join(
dirname, ‘public’)));
});

app.configure(‘development’, function(){
app.use(express.errorHandler());
});

// API routes
app.get(‘/list/vms’, routes.listGroup);
app.get(‘/list/vms/:ip’, routes.listSingle);
app.get(‘/list/daemons’, routes.listDaemons);
[/sourcecode]

That’s it for CORS, later we’ll cover another cool header option, the X-Frame-Options

If you are interested in finding more about Same Origin Policy or CORS check out this links:
http://en.wikipedia.org/wiki/JSONP
http://geekswithblogs.net/codesailor/archive/2012/11/02/151160.aspx
https://blog.mozilla.org/services/2013/02/04/implementing-cross-origin-resource-sharing-cors-for-cornice/
https://developers.google.com/storage/docs/cross-origin
http://www.tsheffler.com/blog/?p=428
http://techblog.hybris.com/2012/05/22/cors-cross-origin-resource-sharing/
http://security.stackexchange.com/questions/8264/why-is-the-same-origin-policy-so-important
http://www.w3.org/TR/cors/
https://developer.mozilla.org/en-US/docs/HTTP/AccesscontrolCORS
https://developer.mozilla.org/en-US/docs/Server-SideAccessControl
http://www.bennadel.com/blog/2327-Cross-Origin-Resource-Sharing-CORS-AJAX-Requests-Between-jQuery-And-Node-js.htm

Playing around with Redis, pros and cons

For the use case we are dealing with we had a few options when selecting a database solution to power the application.
The options in consideration were Mysql, Sqlite, MongoDB and Redis

Looking at SQL databases:
Due the fact that we would only need a database to save a few settings options and hosts information SQlite would have been a better option since it is a serverless database, different from mysql that has a client server model. In a sense SQLite can be think of as part of the whole application, which makes deployments much easier.

Looking at the NoSql databases:
One of the main differences between Mongo and Redis is that Mongo is a document oriented database while Redis is a simple Key-Value database.
That means that with Mongo you can have complex data structures while with Redis only a few different datatypes
Another big difference is that each document in Mongo has an uuid attached to it, so it makes it much easier to query and search information in the collections. Redis, on the other hand, only provides key-based searche queries, which can be a big problem if you are dealing with inherently relational data.

With that being said we decided to go ahead with Redis.

Looking closely into our problem, we found that we would mostly be using the database in a cache-like scenario. We would be caching information about the hosts in an automated process and the users wouldn’t have an option to manually change the hosts information.

The application has its own internal mechanisms to query information about the hosts in the network. Basically, the idea is to save all the hosts of a specific network and identify which ones belong to our cluster.

Another important point is that the interval of the scans the application will be performing on the network can be defined by the user. In the case that the probes interval gets really small it would neccesarily mean accessing the host information more often.

Redis itself keeps the data loaded in RAM, similar to memcache, which makes reads much faster. It also includes the benefit of having persistent storage by writing the data to the disk at regular intervals

Kieran wrote a blog post listing the details of the architecture of the app.

With that being said the schema we came up with is the following:

Use hashes to store the information about the hosts
To keep all the hashes grouped together we prefix them:
For example:
hosts:10.0.0.1
hosts:10.0.0.2
etc..

For the keys above we have the following attributes associated:

  • ip
  • status
  • type
  • lastOn

ip being the address of the host
status indicating if it is active
type differentiating between regular hosts and hypervisors
lastOn the last time the computer was seen active on the network

Screen Shot 2013-02-01 at 4.51.44 PM
One of the benefits of that approach is that once a host is added to the hash we don’t need to worry about having duplicate entries since when trying to add a hash with the same key it will update its value instead of creating a copy. So we can group both the create/update functionality together.

An example of how the process of finding new hosts on the network and saving them on the db:

[sourcecode language=”javascript”]
NetworkScanner.prototype.searchHosts = function (cb) {
exec("sudo nmap -sP –version-light –open –privileged 10.0.0.0/24", cb);
}

// Save active hosts
NetworkScanner.prototype.saveHosts = function (hosts, cb) {
var hosts = hosts.match(this.networkRegex);
var numberHosts = hosts.length;
while (host = hosts.pop()) {
var key = "hosts:" + host;
client.multi()
.hset(key, "ip", host)
.hset(key, "status", "on")
.hset(key, "type", "default")
.hset(key, "lastOn", "timestamp")
.exec(function (err, replies) {
!–numberHosts && cb();
}
}
};
[/sourcecode]

Then to search existing hosts for active hypervisors:

[sourcecode language=”javascript”]
// Scan port of active hosts
NetworkScanner.prototype.searchComputeNodes = function (cb) {
var hosts = new Array();
client.keys("hosts:*", function (err, keys) {
var keysLength = keys.length – 1; // 0 index
keys.forEach(function (val, index) {
hosts.push(val.split(":")[1]);
if (index === keysLength) {
exec("sudo nmap –version-light –open –privileged -p 80 " + hosts.join(" ") + "", cb);
}
});
});
};

// Save compute nodes
NetworkScanner.prototype.saveComputeNodes = function (computeNodes, cb) {
computeNodes = computeNodes.match(this.networkRegex);
var computeNodesLength = computeNodes.length – 1; // 0 index
computeNodes.forEach(function (val, index) {
client.hset("hosts:" + val, "type", "compute");
})
};
[/sourcecode]

For now we don’t have any benchmarks to show the performance difference between using Redis and other database solution so depending on how well the implementation of Redis goes, we might try some comparative benchmarking in future posts.

Running node.js on port 80 with apache

This weekend I was faced with the task of putting a nodejs application into production mode.
Most of the development happened offline with each developer running local instances of node and using git to synchronize the code, so we didn’t have the problem of configuring node in a centralized server. Now that the development stage is over, we needed to set the project into production mode.

We already use Linode to host some of our projects, so we decided to host the nodejs project there as well.

All of our current projects are being served via apache.

The problem is that we can’t set apache and node to listen on the same port (80) and we didn’t have the option of deactivating apache to run just node.

We decided to implement a quick solution to get both apache and node working together: Proxy mode

So apache can still listen on port 80, and whenever somebody requests the nodejs application we forward the request to the port node is listening, in our case 11342.

Below are the steps needed to get apache and node running on the same server:

Assuming you already have apache2 installed and the nodejs application set up.

Load proxy modules

Load the proxy modules that will forward the request to node:
Open the file apache2.conf
Usually the file is located in the dir /etc/apache2/
If you not sure where the file is:

cd /
sudo find -name "apache2.conf"

After opening, append the following lines at the bottom of the file:

LoadModule proxymodule /usr/lib/apache2/modules/modproxy.so
LoadModule proxyhttpmodule /usr/lib/apache2/modules/modproxyhttp.so

Without adding those modules, if you try to start apache you will get this message:


Syntax error on line 6 of /etc/apache2/sites-enabled/mysite.com:
Invalid command ‘ProxyRequests’, perhaps misspelled or defined by a module not included in the server configuration
Action ‘configtest’ failed.
The Apache error log may have more information.
…fail!


Configure the vhost

Now that you have the required modules running, it is time to configure the vhost

To add a vhost to apache you need to create a file in /etc/apache2/sites-available

    
   
 ServerAdmin your@email.com  
 ServerName mysite.com  
 ServerAlias www.mysite.com

 ProxyRequests off

   
 Order deny,allow  
 Allow from all  
 

   
 ProxyPass http://localhost:11342/  
 ProxyPassReverse http://localhost:11342/  
   
 DocumentRoot /srv/www/mysite/public_html/  
 ErrorLog /srv/www/mysite/logs/error.log  
 CustomLog /srv/www/mysite/logs/access.log combined  
   

First you specify that all requests on port 80, to the domain mysite.com should be forward to localhost at port 11342

Enable the vhost

Now you need to enable the new vhost:

a2ensite siteName

A link will be created in the sites-enabled dir

to disable the site:

a2dissite siteName

Restart apache

Last thing you need to do is restart apache:

service apache2 reload

* You should get the message:
Reloading web server config apache2 [ OK ]


References:

http://www.ehow.com/how5458585configure-modproxy.html
http://karrigell.sourceforge.net/en/proxy.html
http://davybrion.com/blog/2012/01/hosting-a-node-js-site-through-apache/

MongooseJS Validators - Contributing to an open source project

Today I was able to put in practice all the tools I learned last semester in the DPS 909 – Topics in Open Source Development class

The Problem

In a project for a class I’m taking this semester, I was working on writing the validation portion for the mongoose schemas for one of the collections being used

   
 User = new Schema({  
 ‘username': {  
 type: String,  
 validate: [validateUsername, ‘username not valid’],  
 },  
 });  
 [/sourcecode]

It’s defening a validator for username, so when saving a User object:

[sourcecode language=”javascript”]  
 var user = new User();  
 user.save(function(err) {

}  

It will call validateUsername on username and if the validation fails the object won’t be saved and the err will have the information about the error.

 
 { message: ‘Validation failed’,  
 name: ‘ValidationError’,  
 errors:  
 { username:  
 { message: ‘Validator "username not valid" failed for path username’,  
 name: ‘ValidatorError’,  
 path: ‘username’,  
 type: ‘username not valid’  
 }  
 }  
 }  

So my problem was that I wanted to add more than one validator to a single field

 
 User = new Schema({  
 ‘username': {  
 type: String,  
 validate: [validateUsername, ‘username not valid’], [validator2, ‘second validator’],  
 },  
 });  

However that didn’t work.
I went back to the mongoose documentation but couldn’t find a way to attach two validators to a single field.

So I was faced with two options, accept the facts and move on, or try to modify the library
Mongoose is an open source library, so I started reading the source code trying to find a way to accomplish my goal.

Because the source code is very organized and easy to read it didn’t take me long to find where a shchema field was being created.

A schema field is defined in the schematype.js

 
 function SchemaType (path, options, instance) {  
 this.path = path;  
 this.instance = instance;  
 this.validators = [];  
 this.setters = [];  
 this.getters = [];  
 this.options = options;  
 this._index = null;

 for (var i in options) if (this[i] && ‘function’ == typeof this[i]) {  
 var opts = Array.isArray(options[i])  
 ? options[i]  
 : [options[i]];

 this[i].apply(this, opts);  
 }  
 };  

path is the name of the field
options is an object cointaining all the options for the field
instance is the type of the field
for example:

 
 path: username  
 instance: String  
 options: {  
 type: [Function: String],  
 validate: [ [Function: validateUsername], ‘username not valid’ ],  
 }  

The for loop iterates through all the options and calls the appropiate function depending on the property
So using the example above, ‘i’ would be equal to ‘validate’.
So calling this[i].apply(this, opts)
would be the same as calling
this.validate([Function: validateUsername], ‘username not valid’)

Here is the part where the validator gets added to the field

 
 SchemaType.prototype.validate = function (obj, error) {  
 this.validators.push([obj, error]);  
 return this;  
 };  

Pretty straight forward, it pushes the function and the error to the validators array.
But I wanted to pass more than just one function and error.

The Solution

 
 SchemaType.prototype.validate = function (obj, error) {  
 if (‘function’ == typeof obj && ‘string’ == typeof error) {  
 this.validators.push([obj, error]);  
 }  
 else {  
 for (var i in arguments) {  
 this.validators.push([arguments[i].func, arguments[i].error]);  
 }  
 }  
 return this;  
 };  

So if I defined more than one validator for a single field in the schema, the arguments var for the validate method would look something like this:

 
 arguments: { ‘0’: { func: [Function: trim], error: ‘trim error’ },  
 ‘1’: { func: [Function: validateEmail], error: ‘email error’ } }  

However I couldn’t just break the existing code, so I added a check to see if there was more than one validator before pushing to the validator array

In the end, the solution worked.
So I created a patch and opened a ticket on the mongoose repo to discuss the issue.
I’m not sure if the change will be accepted in the project, but it was nice to see that I can actually modify the library if nedded

**After going over one more time through the documentation I found a different way to add multiple validators

 
 User.path(‘username’).validate(function (v) {  
 return false;  
 }, ‘my error type’);  
 User.path(‘username’).validate(function (v) {  
 return true;  
 }, ‘another error’);  

It calls the validate function explicity on the field, allowing multiple validators to be added