Friday, September 21, 2012

Release only specific sanlock resource

On august 29 2012, libvirt 0.10.0 has been released which includes the sanlock bugfix when hot-dettaching virtual disks:

http://libvirt.org/news.html

Wednesday, September 19, 2012

Test puppet code from shell

Sometimes it comes in handy to quickly test some puppet code or conditions from the command line.  The following command greps for ID in a file, and unless found it echoes a string (or does something more useful):

# ralsh exec "/bin/echo ID not found" unless="/bin/grep -q ID /etc/default/mpt-statusd"

If the "unless" command returns "1", the "exec" command is executed.
The inverted situation is covered by "onlyif".  If the "onlyif" command returns "0", the "exec" command will be executed.

Wednesday, August 22, 2012

limits.conf or limits.d

Sometimes you're hitting the limits of the operating system but you're not sure if your changes in limits.conf are picked up, or you are not sure what parameter may be hitting a limit...

Then you find plenty of information in /proc/$pid/limits!

# cat /proc/$(pidof java)/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1024                 unlimited            processes 
Max open files            32768                32768                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       15904                15904                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

/etc/security/limits.d/90-nproc.conf limits nproc to 1024 to prevent fork bombs for all users.  You may need to override it for specific users or groups for high-capacity servers.

Tuesday, August 21, 2012

System V init template

Sometimes you just need a quick and simple custom System V init script template to be able to start/stop daemons during boot/reboot.

This init template uses the RHEL/CentOS symantics/functions and works on RHEL/CentOS 6.x

#!/bin/sh
#
# myApp Starts/stop the myApp daemon
#
# chkconfig:   345 55 25
# description: myApp description

### BEGIN INIT INFO
# Provides: myApp
# Required-Start: $local_fs
# Required-Stop: $local_fs
# Default-Start: 345
# Default-Stop: 016
# Short-Description: Starts/stop the myApp daemon
# Description:      myApp description
### END INIT INFO

# Source function library.
. /etc/rc.d/init.d/functions

exec=/path/to/myapp
prog="myapp"
OPTS=""
config=/etc/sysconfig/$prog

[ -e /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog

lockfile=/var/lock/subsys/$prog

start() {
    [ -x $exec ] || exit 5
    echo -n $"Starting $prog: "
    daemon $exec $OPTS
    retval=$?
    echo
    [ $retval -eq 0 ] && touch $lockfile
}

stop() {
    echo -n $"Stopping $prog: "
    if [ -n "`pidofproc $exec`" ] ; then
        killproc $exec
    else
        failure $"Stopping $prog"
    fi
    retval=$?
    echo
    [ $retval -eq 0 ] && rm -f $lockfile
}

restart() {
    stop
    start
}

reload() {
    restart
}

force_reload() {
    restart
}

rh_status() {
    # run checks to determine if the service is running or use generic status
    status $prog
}

rh_status_q() {
    rh_status >/dev/null 2>&1
}


case "$1" in
    start)
        rh_status_q && exit 0
        $1
        ;;
    stop)
        rh_status_q || exit 0
        $1
        ;;
    restart)
        $1
        ;;
    reload)
        rh_status_q || exit 7
        $1
        ;;
    force-reload)
        force_reload
        ;;
    status)
        rh_status
        ;;
    condrestart|try-restart)
        rh_status_q || exit 0
        restart
        ;;
    *)
        echo $"Usage: $0 {start|stop|status|restart|condrestart|try-restart|reload|force-reload}"
        exit 2
esac
exit $?

Tuesday, August 14, 2012

Sanlock patch accepted

My sanlock patch has been accepted by the libvirt maintainers today.  It will be included in libvirt 0.10.0.  It fixes the problem where all sanlock resources are released when hot-dettaching a disk from a qemu/kvm domain, leaving the other (disk) resources unlocked/unprotected.

Meanwhile, this is a situation that can be recovered from through the sanlock client by re-registering the assigned disks, or by avoiding the problem altogether by applying the patch.

Monday, August 06, 2012

Open vSwitch bandwidth throttling

Bandwidth throttling is pretty easy with Open vSwitch for outgoing (ingress) traffic.  Configure the ingress policy on the port interface for the specific virtual machine.  To limit outgoing bandwidth to 100Mbit/s:

# ovs-vsctl set Interface vnet0 ingress_policing_rate=100000
# ovs-vsctl set Interface vnet0 ingress_policing_burst=10000

The config can be tested with iperf by running the client on the VM, like:
 # iperf -d -i 10 -c <destserver> -t 60  
 [ 3] 0.0-10.0 sec 124 MBytes 104 Mbits/sec  
 [ 3] 10.0-20.0 sec 118 MBytes 99.1 Mbits/sec  
 [ 3] 20.0-30.0 sec 116 MBytes 97.1 Mbits/sec  
 [ 3] 30.0-40.0 sec 117 MBytes 98.1 Mbits/sec  
 [ 3] 40.0-50.0 sec 116 MBytes 97.7 Mbits/sec  
 [ 3] 50.0-60.0 sec 118 MBytes 99.2 Mbits/sec  
 [ 3] 0.0-60.2 sec 710 MBytes 98.9 Mbits/sec  

To reset the bandwidth to full speed:

# ovs-vsctl set Interface vnet0 ingress_policing_rate=0


Thursday, July 19, 2012

Hot-detach disks with sanlock

Qemu/kvm allows virtual disks (logical volumes, files, ...) to be attached to and detached from a running domain, and it works great (with virtio).  However, when a lock manager is in the game to protect your virtual disks from being assigned to different domains, you might get surprised when you end up loosing all your disk locks from the lock manager for that virtual machine.

What's going on?

Libvirt has a plugin for the sanlock lock manager which protects your virtual disks from being corrupted by getting accessed from multiple guests.  It works nicely, but hot-detaching has a flaw: the current libvirt code will release all sanlock resources (read: when removing 1 disk, protection for all disks get lost)!

I wrote a patch to release only the specific resource that you want to hot-detach.  It can be found in the bug report.  The patch has not been reviewed yet or approved by the libvirt devs, but for me it works as expected, and it may help others who depend on it...

Wednesday, July 18, 2012

Open vSwitch active-passive failover - unreachable guests

The current release of Open vSwitch (1.6.1) does not send learning packets when doing an active-passive bond failover. Switches connected to your network interfaces will not now about the network change when LACP is not used. Result: all your virtual machines machines become unavailable until your guests send out packages that updates the MAC learning table of the uplink switches or until the entry expires from the learning table.

The next release (1.7?) will include a patch to send learning packets when a failover happens. I tested the patch by doing a manual failover on the host and having the interfaces connected to 2 different switches:

# ovs-appctl bond/show bond0
# ovs-appctl bond/set-active-slave bond0 eth1

Hooray! Not a single interruption in guest connectivity... like it should be :-)

SUSE KVM guest 100% cpu usage - lost network - wrong date

There is something wrong with the "kvm-clock" paravirtual clocksource for KVM guests when running the 2.6.32.12-0.7-default kernel of SLES 11 SP1.

Several times now, I encountered unreachable virtual machines (lost network), 100% cpu usage of these guests as seen from the host, and when logging in to the guest console for further debugging:

  • wrong date, like Sun Feb   5 08:08:16 CET 2597
  • in dmesg: CE: lapis increasing min_delta_ns to 18150080681095805944 nsec

The fix is simple, update to the latest kernel in SLES 11 SP1, like 2.6.32.54-0.3.1 which apparently provides a stable kvm-clock module.

As as side note: I'm using ntpd on the guests.  Some resources report that you should, other tell the opposite.  My experience is that when doing live migrations, clock drifting may appear which is not corrected or too slowly.  Ntpd handles this correctly.

Tuesday, July 17, 2012

KVM Live migration of memory intensive guests

Recently, I've had some trouble to live migrate a memory intensive (jboss) application on a KVM virtual machine. It took ages (read it failed after 1,5h) for the online migration of the VM to another KVM host while a jvm with 4GB heap size (1GB young gen.) was constantly refilling the memory.

I got around this by increasing the migrate-setmaxdowntime value on the host while the domain was migrating away:

# virsh migrate-setmaxdowntime <domain name> 750

This allows the domain to be paused for 750ms while the remaining memory is sync'ed to the new host. Smaller values can be tried first, and changed on the run...

This behavior can also be simulated by using the "stress" utility on the guest:

# stress --cpu 1 --vm 8 --vm-bytes 128M --vm-hang 1 --timeout 900s

If it takes too long, increase the maxdowntime parameter (or increase your network bandwidth). It can also be worthy to check if the migration process is really taking advantage of all the available bandwidth with a utility like "iftop". If needed, increase to the available bandwidth:

# virsh migrate-setspeed <domain name> 1000

After all, sometimes it's more acceptable to have a small hickup than failing all the way and having to do an offline migration. As long as your application can live with it...