problem

Installing OpenStack, Quantum problems

During the following weeks we plan to expand more on the subject of setting up an OpenStack cloud using Quantum.
For now we have been experimenting with different Quantum functionality and settings.
At first Quantum might look like a black box, not due to its complexity, but because it deals with several different plugins and protocols that if a person is not very familiar with them it becomes hard to understand why Quantum is there in the first place.

In a nutshell Quantum has the role to provide an interface to configure the network of multiple VMs in a cluster.

In the last few years the lines between a system, network and virtualization admin have become really blury.
The classical unix admin is pretty much non existent now a days since most services are offered in the cloud in virtualized environments.
And since everything seems to be migrating over to the cloud some network principles that were applied into physical networks in the past some times don’t translate very well to virtualized networks.

Later we’ll have some posts explaining what technologies and techniques underlie the network configuration of a cloud, in our case focusing specifically on OpenStack and Quantum.

With that being said, below are a few errors that came up during the configuration of Quantum:

1. ERROR [quantum.agent.dhcp_agent] Unable to sync network state.

This is error is most likely caused due a misconfiguration of the rabbitmq server.
A few ways to debug the issue is to:
Check if the file /etc/quantum/quantum.conf in the controller node(where the quantum server is installed) has the proper rabbit credentials

By default rabbitmq runs on port 5672, so run:

[sourcecode]
netstat -an | grep 5672
[/sourcecode]

and check if the rabbitmq server is up an running

On the network node(where the quantum agents are installed) also check if the /etc/quantum/quantum.conf have the proper rabbit credentials:

If you are running a multihost setup make sure the rabbit_host var points to the ip where the rabbit server is located.

Just to be safe check if you have a connection on the management networking by pinging all the hosts in the cluster and restart both the quantum and rabbitmq server as well the quantum agents.

2. ERROR [quantum.agent.l3agent] Error running l3nat daemon_loop

This error requires a very simple fix, however, it was very difficult to find information about the problem online.
Luckily, I found one thread on the mailing list of the fedora project explaining in more details the problem.

This is error is due to the fact that keystone authentication is not working.
A quick explanation – the l3 agent makes use of the quantum http client to interface with the quantum service.
This requires keystone authentication. If this fails then the l3 agent will not be able to communicate with the service.

To debug this problem check if the quantum server is up and running.
By default the server runs on port 9696

[sourcecode]
root@folsom-controller:/home/senecacd# netstat -an | grep 9696
tcp 0 0 0.0.0.0:9696 0.0.0.0:* LISTEN
tcp 0 0 192.168.0.11:9696 192.168.0.12:40887 ESTABLISHED
[/sourcecode]

If nothing shows up is because the quantum server is down, try restarting the service to see if the problems goes away:

[sourcecode]
quantum-server restart
[/sourcecode]

You can also try to ping the quantum server from the network node(in a multihost scenario):

[sourcecode]
root@folsom-network:/home/senecacd# nmap -p 9696 192.168.0.11

Starting Nmap 5.21 ( http://nmap.org ) at 2013-01-28 08:07 PST
Nmap scan report for folsom-controller (192.168.0.11)
Host is up (0.00038s latency).
PORT STATE SERVICE
9696/tcp open unknown
MAC Address: 00:0C:29:0C:F0:8C (VMware)

Nmap done: 1 IP address (1 host up) scanned in 0.04 seconds
[/sourcecode]

3.ERROR [quantum.agent.l3agent] Error running l3nat daemon_loop – rootwrap error

I didn’t come across this bug, but I found a few people running into this issue.
Kieran already wrote a good blog post explaining the problem and how to fix it

You can check the bug discussion here

4. Bad floating ip request: Cannot create floating IP and bind it to Port , since that port is owned by a different tenant.

This is just a problem of mixed credentials.
Kieran documented the solution for the issue here

There is also a post on the OpenStack wiki talking about the problem.

Conclusion

This should help fixing the problems that might arise with a Quantum installation.
If anybody knows about any other issues with Quantum or has any suggestions about the problems listed above please let us know!

Also check the official guide for other common errors and fixes

Assembling a couple of PCs

Last week we received the hardware required to set up a small cloud.
Having the hardware at hand will allow us to start testing new open source solutions for cloud High Availability software

The hardware specs:

corei7

MSI-B75MA-E33-Micro-ATX-Motherboard-

20-145-260-TS

4093215495<em>2ea276c058</em>z

Z260635

segate-barracuda

So Kieran and I had the task of assembling all the parts together and make sure all hardware was properly working.

I have always been a laptop guy, mostly due physical constraints that don’t allow me to have a dedicated case, monitor and all the gear that comes with it. I’m always on the move so having a laptop made most sense and I never had to assemble a computer myself.

Fortunately, Kieran had just finished assembling his personal computer at home, so he took the lead and started assembling the first computer.

First Day

For the first computer we started plugin in all the hardware in the MOBO before placing it inside the case to test it out first.
All the hardware was detected so we placed the MOBO inside the case with all the components plugged it in with only the hard drives missing

20121210_135526

Second Day

We got back from where we left the day before and plugged the hard-drives in the MOBO and started testing.
To our surprise the computer would not power on.
The day before everything was working fine, what could have gone wrong?.
After a few minutes staring at the MOBO Kieran noticed that the case power connectors where plugged in the wrong pins.
After fixing that little issue we powered on the computer and all the hardware was detected
20121210<em>135451
20121207</em>175443

For the second computer, after having watched Kieran assemble the first one I took the lead and he stayed around to give me some help.
To be honest, assembling a computer is not as complicated as I thought, the process is very straight forward.

I had to:

  1. Insert the CPU on the MOBO
    (I heard from some friends that for some CPUs you need to apply a thermal paste to glue the processor and the fan, but in our case the thermal paste was already applied on the fan)
  2. Connect the CPU fan
  3. Plug in the RAM
  4. Plug in the Power supply
  5. Plug in the hard drives
  6. Connect the case cables

Everything is set!

It took around 20mins to connect everything.

However, for our surprise when we tried to turn the computer on it simply didn’t.
Dejavu all over again.
This time we made sure the connectors were plugged in the right PINs so we knew that wasn’t the problem

We then started to unplugged some parts trying to isolate the problem, at first we took the RAM, then the hard-drives and in both cases no luck, the computer would not power ON.

After 15 mins with failed attempts of figuring out the problem we decided to search for possible causes online.
Nothing useful came up.

Without many choices left Kieran suggested we power the MOBO directly without the case just to make sure it was working.

We found this video showing a guy turning his PC on with a screw driver.

We did the same thing and for our relief the computer turned ON

20121210_144050

We knew the problem was in the case
We then started thinking of possible solutions.

  1. Since the computer will be used as a server and it won’t need to be powered ON/OFF very often we could use a screw driver whenever needed
  2. Buy a new case
  3. Use the reset button to power ON/OFF

We decided to go with the third option and after wiring the reset cables in the power switch PINs we tested out and it worked

So in the end, the process of assembling two computers, that I’m sure would take less than 15 mins if we had more experience on the subject, ended up taking a little bit more than that. However, on the positive side we faced some problems that forced us out of the comfort zone and I can definitely say for myself that I learned something new during the process.

Our final handy work:
20121210_144931

Resources:

Taking screenshots on CentOS, gnome-screenshot util

By default when CentOS is installed not all the gnome utils are loaded in the system.
The screenshot utils is one of them that is not loaded.
So trying to take a screenshot would fail:

ERROR:

There was an error running gnome-screenshot: Failed to execute child process “gnome-screenshot”

Only the utilities below were available:

  • gnome-about
  • gnome-about-me
  • gnome-appearance-properties
  • gnome-at-properties
  • gnome-at-visual
  • gnome-audio-profiles-properties
  • gnome-character-map
  • gnome-control-center
  • gnome-default-applications-properties
  • gnome-desktop-item-edit
  • gnome-display-properties
  • gnome-font-viewer
  • gnome-help
  • gnome-keybinding-properties
  • gnome-keyboard-properties
  • gnome-keyring
  • gnome-keyring-daemon
  • gnome-mouse-properties
  • gnome-network-properties
  • gnome-open
  • gnome-panel
  • gnome-power-bugreport.sh
  • gnome-power-manager
  • gnome-power-preferences
  • gnome-screensaver
  • gnome-screensaver-command
  • gnome-screensaver-preferences
  • gnome-session
  • gnome-session-properties
  • gnome-session-save
  • gnome-terminal
  • gnome-text-editor
  • gnome-thumbnail-font
  • gnome-typing-monitor
  • gnomevfs-cat
  • gnomevfs-copy
  • gnomevfs-df
  • gnomevfs-info
  • gnomevfs-ls
  • gnomevfs-mkdir
  • gnomevfs-monitor
  • gnomevfs-mv
  • gnomevfs-rm
  • gnome-volume-control
  • gnome-volume-control-applet
  • gnome-wacom-properties
  • gnome-window-properties
  • gnome-wm

As you can see, gnome-screenshot wasn’t there.

To install the gnome-screenshot util:

[sourcecode language=”bash”]
sudo yum install gnome-utils
[/sourcecode]

That should fix the problem and you should be able to take screenshots as you would normally expect:

The official webpage for the gnome-utils project

Getting started with CUDA on OSX 10.8 - Driver Problems

To install all the dev dependencies for CUDA enabled GPUs is not that bad, I faced a few issues but overall the documentation is pretty good.

You can find more information about how to get started here, it has all the links for the download of the driver + toolkit + SDK for windows, linux and mac

They also posted a PDF giving detail instructions about how to install everything.

Road Blocks

I’m running a MacBook Pro 2012 that comes with a GeForce GTM 650M.
On their website, they have the driver version 4.2 for download. However, I can update the CUDA driver to version 5.0.24 through the CUDA Preferences window under the System Preferences tab.

So after following the instructions they have posted on the Get Started pdf, I would get the message “Driver not supported” when running the deviceQuery test script.
I looked up online and found that this problem usually happened when the driver had a lower version than the SDK, I thought it was weird since I had downloaded all files they had instructed on the website.

I started browsing on the System Preferences when I saw the CUDA preferences tab.
On the tab it had the option to update the driver.
After the update, my driver was on version 5.0.24, and the deviceQuery test would work.

After running the deviceQuery test, they suggested to run the bandwithTest to make sure the communication with the GPU was working properly.
To my surprise, when I ran the bandwithTest the computer crashed, some weird noises came from the case and a kernel panic messaged appeared.

Interval Since Last Panic Report: 75 sec
Panics Since Last Report: 2
Anonymous UUID: CD3F065C-4392-433E-8B7B-9D466743EE14 Tue Sep 11 23:16:23 2012
panic(cpu 4 caller 0xffffff802e8b7b95): Kernel trap at 0xffffff7faef9d18e, type 14=page fault, registers:
CR0: 0x0000000080010033, CR2: 0xffffff8191902000, CR3: 0x000000006b34b06c, CR4: 0x00000000001606e0
RAX: 0xffffff815123d000, RBX: 0x00000000406c5000, RCX: 0x00000000101b1400, RDX: 0xffffff8043302374
RSP: 0xffffff815117b650, RBP: 0xffffff815117b650, RSI: 0xffffff8043302004, RDI: 0xffffff80432ff804
R8: 0x00000000003f6a01, R9: 0xffffff815117b664, R10: 0x0000000000ffffff, R11: 0xffffff8100d10004
R12: 0xffffff80432ff804, R13: 0xffffff8043302374, R14: 0x0000000000000000, R15: 0xffffff8043302004
RFL: 0x0000000000010206, RIP: 0xffffff7faef9d18e, CS: 0x0000000000000008, SS: 0x0000000000000010
Fault CR2: 0xffffff8191902000, Error code: 0x0000000000000002, Fault CPU: 0x4

I wasn’t sure if the kernel panic was connected with the driver update, so I went back and ran some other scripts that come with the CUDA SDK, I ran the particles, simpleGL, volumeRender and a few others, then to my surprise again, when I ran the mergeSort another kernel panic was generated.
By now I was starting to get worried, I went back to the scripts dir and run a few others to make sure my GPU was still functioning properly, I ran the particles, simpleGL, volumeRender and the clock script, and again, after starting the clock script another kernel panic.

Now I knew for sure something was wrong, that shouldn’t be happening.

It was almost 12pm and I was getting tired and frustrated.
I did the only logical thing left to do… googled it.

I entered the search: “mac 2012 crash with cuda driver 5″

Solution

To my relief it appeared that the kernel panics were in fact a known problem with the CUDA driver version 5 for the MacBook pro 2012.
I found this post on Adobe’s blog explaining the issue.
Apparently having the “Automatic Graphics Switching” option enable causes some CUDA applications to crash.
Turning the option off solved the problem.

Without the automatic graphics switching ON I ran the bandwithTest, mergeSort and clock apps and they worked just fine.

That Adobe’s blog post was created on August 29, so I believe that a fix for this problem should be coming out very soon.
Only Mountain Lion (Mac OSX v10.8) and Lion (Mac OSX v10.7) are affected by this bug.