For the use case we are dealing with we had a few options when selecting a database solution to power the application.
The options in consideration were Mysql, Sqlite, MongoDB and Redis

Looking at SQL databases:
Due the fact that we would only need a database to save a few settings options and hosts information SQlite would have been a better option since it is a serverless database, different from mysql that has a client server model. In a sense SQLite can be think of as part of the whole application, which makes deployments much easier.

Looking at the NoSql databases:
One of the main differences between Mongo and Redis is that Mongo is a document oriented database while Redis is a simple Key-Value database.
That means that with Mongo you can have complex data structures while with Redis only a few different datatypes
Another big difference is that each document in Mongo has an uuid attached to it, so it makes it much easier to query and search information in the collections. Redis, on the other hand, only provides key-based searche queries, which can be a big problem if you are dealing with inherently relational data.

With that being said we decided to go ahead with Redis.

Looking closely into our problem, we found that we would mostly be using the database in a cache-like scenario. We would be caching information about the hosts in an automated process and the users wouldn’t have an option to manually change the hosts information.

The application has its own internal mechanisms to query information about the hosts in the network. Basically, the idea is to save all the hosts of a specific network and identify which ones belong to our cluster.

Another important point is that the interval of the scans the application will be performing on the network can be defined by the user. In the case that the probes interval gets really small it would neccesarily mean accessing the host information more often.

Redis itself keeps the data loaded in RAM, similar to memcache, which makes reads much faster. It also includes the benefit of having persistent storage by writing the data to the disk at regular intervals

Kieran wrote a blog post listing the details of the architecture of the app.

With that being said the schema we came up with is the following:

Use hashes to store the information about the hosts
To keep all the hashes grouped together we prefix them:
For example:
hosts:10.0.0.1
hosts:10.0.0.2
etc..

For the keys above we have the following attributes associated:

  • ip
  • status
  • type
  • lastOn

ip being the address of the host
status indicating if it is active
type differentiating between regular hosts and hypervisors
lastOn the last time the computer was seen active on the network

Screen Shot 2013-02-01 at 4.51.44 PM
One of the benefits of that approach is that once a host is added to the hash we don’t need to worry about having duplicate entries since when trying to add a hash with the same key it will update its value instead of creating a copy. So we can group both the create/update functionality together.

An example of how the process of finding new hosts on the network and saving them on the db:

[sourcecode language=”javascript”]
NetworkScanner.prototype.searchHosts = function (cb) {
exec("sudo nmap -sP –version-light –open –privileged 10.0.0.0/24", cb);
}

// Save active hosts
NetworkScanner.prototype.saveHosts = function (hosts, cb) {
var hosts = hosts.match(this.networkRegex);
var numberHosts = hosts.length;
while (host = hosts.pop()) {
var key = "hosts:" + host;
client.multi()
.hset(key, "ip", host)
.hset(key, "status", "on")
.hset(key, "type", "default")
.hset(key, "lastOn", "timestamp")
.exec(function (err, replies) {
!–numberHosts && cb();
}
}
};
[/sourcecode]

Then to search existing hosts for active hypervisors:

[sourcecode language=”javascript”]
// Scan port of active hosts
NetworkScanner.prototype.searchComputeNodes = function (cb) {
var hosts = new Array();
client.keys("hosts:*", function (err, keys) {
var keysLength = keys.length – 1; // 0 index
keys.forEach(function (val, index) {
hosts.push(val.split(":")[1]);
if (index === keysLength) {
exec("sudo nmap –version-light –open –privileged -p 80 " + hosts.join(" ") + "", cb);
}
});
});
};

// Save compute nodes
NetworkScanner.prototype.saveComputeNodes = function (computeNodes, cb) {
computeNodes = computeNodes.match(this.networkRegex);
var computeNodesLength = computeNodes.length – 1; // 0 index
computeNodes.forEach(function (val, index) {
client.hset("hosts:" + val, "type", "compute");
})
};
[/sourcecode]

For now we don’t have any benchmarks to show the performance difference between using Redis and other database solution so depending on how well the implementation of Redis goes, we might try some comparative benchmarking in future posts.