networked day to day technical issues


A Tool to Backup Files to Amazon S3

For the past year I've been working on and off on a little project to create a tool which:

  • runs on at least Linux, MacOS X and FreeBSD
  • allows to backup your files to Amazon S3 while providing optional server side encryption (AES-256)
  • is cost effective for large numbers of files (the problem with things like s3cmd or aws s3 sync is that they need to compare local files with metadata retrieved on the fly from AWS and this can get expensive)
  • is easy to install
  • provides meaningful error messages and the possibility to debug

I've ended up creating a tool called S3backuptool (yeah, not that original) which does the above and in order to run it requires Python 2.7 , PyCrypto and the Boto library.

Details are available on the project's page and it can be installed from prebuilt packages (deb or rpm) for several Linux distributions or from Python's PyPi for far more Linux distributions and OSes.

So far it's been quite the educative enterprise while also catering to my needs.

Metadata about all backed up files is stored locally in SQLite database(s) and in S3 as metadata for each uploaded file. When a backup job runs it compares the state of files with the one stored in the local SQLite database(s) and action is needed on S3 only then actual S3 api calls are performed (those cost money). In case the local SQLite databases are lost then they can be reconstructed from the S3 stored metadata.


What to monitor on a (Linux) server

It is surprisingly how many articles are out there about server monitoring, referring to how to use a specific tool, and the lack of sources of documentation regarding what you actually need to monitor from a best practices point of view.
A well monitored server allows to fix possible issues proactively or solve service interruptions a lot faster as the problem can be located faster and solved.

So here goes my list of things I always monitor, independent of actually what the specific purpose of the server is.

  • hardware status - if fans are spinning, cpu temperature, mainboard temperature, environment temperature, physical memory status, power source status, cpu's online. Most of the well know vendors (Dell, HP, IBM) provide tools to check the hardware for the above list of items
  • disk drive S.M.A.R.T. status - you can find out things like if the hdd is starting to count bad blocks or if the bad blocks are increasing fast which will give you a heads up that you need to prepare to replace the disk. Also most of the times you can monitor the HDD's temperature
  • hardware raid array status / software raid status - you really want to know when an array is degraded. Unfortunately most of the organization's don't actually monitor this
  • file system space available - I start with a warning when usage is at 80% and a critical alarm if usage is above 90%. For big filesystems ( >= 100G) of course this needs to be customized as 20% means at least 20G
  • inodes available on the file system - again I use the 80% warning, 90% critical . This is something which isn't always obvious (when you run out of inodes) and can create a whole of other problems. Of course it applies only to file systems which have a finite amount of inodes like ext2,3,4

KSM (Kernel Samepage Merging) status

KSM allows physical memory de-duplication in Linux, so basically you can get a lot more out of your memory at expense of some cpu usage (because there is a thread which scans memory for duplicate pages). Typical usage is for servers running virtual machines on top of KVM but applications aware of this capability could also use it even on OS instances which aren't VMs running on KVM.
The requirements are a kernel version of at least 2.6.32 and CONFIG_KSM=y. For more details you can check the official documentation and a tutorial on how to enable it.

Below is a small script (called ksm_stat) which I wrote in order to see how much memory is "shared" and how much memory is actually being saved by using this feature.

if [ "`cat /sys/kernel/mm/ksm/run`" -ne 1 ] ; then
       echo 'KSM is not enabled. Run echo 1 > /sys/kernel/mm/ksm/run' to enable it.
       exit 1
echo Shared memory is $((`cat /sys/kernel/mm/ksm/pages_shared`*`getconf PAGE_SIZE`/1024/1024)) MB
echo Saved memory is $((`cat /sys/kernel/mm/ksm/pages_sharing`*`getconf PAGE_SIZE`/1024/1024)) MB
if ! `type bc &>/dev/null`  ; then
        echo "bc is missing or not in path, skipping ratio calculation"
        exit 1
if [ "`cat /sys/kernel/mm/ksm/pages_sharing`" -ne 0 ] ; then
        echo -n "Shared pages usage ratio is ";echo "scale=2;`cat /sys/kernel/mm/ksm/pages_sharing`/`cat /sys/kernel/mm/ksm/pages_shared`"|bc -q
        echo -n "Unshared pages usage ratio is ";echo "scale=2;`cat /sys/kernel/mm/ksm/pages_unshared`/`cat /sys/kernel/mm/ksm/pages_sharing`"|bc -q

Example of a machine where it just has been enabled, so it takes a while until all pages are scanned

# ksm_stat
Shared memory is 67 MB
Saved memory is 328 MB
Shared pages usage ratio is 4.87
Unshared pages usage ratio is 17.04


upstart (System-V init replacement on Ubuntu) tips

Since Ubuntu Server 10.04 LTS (lucid)  Canonical's System-V init replacement, Upstart has most of the init scripts converted to Upstart jobs. Upstart is event based and it is quite different from sysV init so one needs to adjust to it's config file structure and terminology; it is present in the server release since 8.04 LTS but then it didn't have the init scripts converted to it's format so it didn't really matter on the server release that it took over Sys-V init.

Reading the documentation is mandatory, but here are some quick tips for things at least i found dificult to discover on the project's website or in the man pages:

Default runlevel is defined here: /etc/init/rc-sysinit.conf  and ofcourse it can be overridden on the kernel command line . /etc/inittab is gone and everything moved to /etc/init/ while legacy init scripts(= not converted yet to upstart format) can still be found in /etc/init.d/ together with symlinks to converted init jobs.

Managing jobs:  initctl start <job> / initctl stop <job> / initctl restart <job> / initctl reload <job>  ; Listing all jobs and their status: initctl list

Now here comes the horror story: seems that there is no tool (cli based) which lists what Upstart jobs will start in a particular runlevel, or better what Upstart and /etc/rc*.d jobs will start in a runlevel. There are two GUI based tools  (jobs-admin and Boot-Up Manager) but no cli tools so you are left to use things like sysv-rc-conf / chkconfig / update-rc.d for the /etc/rc*.d system-V init like legacy folders and for Upstart jobs you need to manually look at the files in /etc/init/ which is cumbersome as beside the runlevel entry you also need to take into account events/dependencies like net-device-up

It seems like Canonical is thinking that nowadays a Server sysadmin must also install the GUI tools in order to manage basic things like what services start with the server.


Linux: realtime traffic monitoring and path determination

There are situations when one needs to give the answer to questions like:

- a) - what application/process is listening for inbound connections
- b) - what application/process is causing network traffic
- c) - what hosts are right now doing network traffic with our server
- d) - current rate of traffic going through the network interfaces
- e) - how much traffic is causing each workstation/server directly connected to the Linux server
- f) - which path is an outgoing packet going to take when you have multiple network cards and several routes (and more than one routing tables)


Visualize sar reports with awk and gnuplot

On several systems where only sar (part of sysstat) is collecting and storing performance data i needed to troubleshoot performance issues which occurred several hours earlier . Sar is a great tool but  it is annoying that it doesn't have any option to output at the same time , on the same page, output from different reports (like cpu usage, memory usage and disk usage). If you try to request those three at the same time, it will output each report on it's on page and from there it's hard to visualize how each performance indicator evolved at a specific point in time. A solution would have been to load the data in a spreadsheet application and use vlookup function to group the data but this is time consuming and with my spreadsheet skills i don't think it can be automated.

I used awk and order to create a report from sar output, choosing the fields i considered useful in 95% of the times. Because my display resolution width is 900 i managed too squeeze in a lot of fields. In order to get a report for the date of 18th from 10 AM to 6 PM i use:


The Confusion Between Gigabyte And Gibibyte

Recently i have been confronted with several cases where on the SAN side a LUN was configured, of a certain size and on the Linux side it was detected with a larger size. The SAN vendor's support team didn't ever reply to why this is happening and after some digging i managed to find the explanation. I presumed it was some kind of issue with how each part (SAN Storage Array and Linux) measured the LUN and it turned out to be so. The Linux in question is RHEL 5.5 64bit , kernel 2.6.18

It turns out that on the SAN side the size was presented in Gibibytes and Linux presented the size as Gigabytes .

1 Gigabyte (GB) = 1 x 1000 x 1000 x 1000 = 1000000000 bytes

1 Gibibyte (GiB) = 2^30 = 1 x 1024 x 1024 x 1024 = 1073741824 bytes

So a 500 GiB LUN was reported on Linux as ~536.8 GB

Unfortunately this confusion will haunt for a long time.

More details on wikipedia and  man 7 units


Disabling swap parititions which have swapped out pages

Due some circumstances i needed to expand a swap device which was sitting on top of a logical device (LVM) . Nothing spectacular and in this case i didn't want to add another swap device as there was enough free physical memory in order to hold all pages which were going to be moved from swap.

This got me thinking regarding what would happen if when running swapoff there wasn't enough free physical memory available to hold the pages from swap. As i seen it there were two options:

  • swapoff will refuse to disable swapping from the device and return an error
  • swapoff will start moving pages to free physical memory and once that is full OOM.kill would be called to wreck havoc in the system

As i didn't want to find out the hard way i did a test on Rhel 5.5 (CentOS 5.5) running kernel 2.6.18 and it turns out that at swapoff refuses the disable swapping on that device

If anyone interested in disabling swap, then in order to be sure there is enough space available the disk cache,dentries and inodes can be flushed (though the kernel will do this for you when running swapoff by flushing part or all of those) by doing echo 3 > /proc/sys/vm/drop_cache. Take into account that flushing the whole disk cache will affect to some extent the performance of your machine.

Later edit: on one occasion on a rhel 5.5 server doing the above echo did clear the cache as expected but that echo (bash process) didn't return and it started to consume 99% cpu (on one core) in system calls , was not killable (kill -9), it was in Running state (no waiting for I/O). No errors were reported by dmesg and eventually the server was rebooted so it seems that in some circumstances it might yield unexpected results, probably a bug.


Linux – disk caching and swap usage

The Linux kernel allocates unused physical memory(RAM) to disk caching in order to improve performance, how much was allocated can be seen by using the command free (or top or others) .

The memory used for caching is available on demand , so if the kernel needs it then it will clear flush an amount of the cache (or all) depending on the needs and it will allocate that memory to processes.

The kernel may decide to swap unused pages even if memory is allocated for caching (instead of clearing some of the cache) as it makes no sense to keep in memory data which hasn't been accessed for a long while and this data can be swapped out without affecting the performance while that amount of memory will be far more useful (performance wise) being used for disk caching.

We can force a cache cleaning (echo 1 > /proc/sys/vm/drop_caches) and event adjust the swappiness kernel parameter

vm.swappiness is a tunable kernel parameter that controls how much the kernel favors swap over RAM. A high swappiness value means that the kernel will be more apt to unmap mapped pages. A low swappiness value means the opposite, the kernel will be less apt to unmap mapped pages. In other words, the higher the vm.swappiness value, the more the system will swap. The default value is 60% .

One of the kernel developers, Andrew Morton , advises:

My point is that decreasing the tendency of the kernel to swap stuff out is wrong.  You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine.  Get it out on the disk, use the memory for something useful.

If you have a portion of swap used and also a lot of memory allocated to disk caching , you can double check that the kernel is working as explained above by running vmstat and checking the si and so columns. Vmstat's man page states:

"si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).

Now if those two show low to none activity then it is clear that the pages that were swapped out are not being used by the processes owning them and there was no point in keeping them in physical memory.


Linux LVM snapshot merge

Finally, one of the features i have been waiting for a very long time is available.
Today i was going over the release notes for Red Hat Enterprise 6 and i noticed that LVM snapshot merging is available.
Now this opens the possibility to revert changes like package upgrades (failed or successful) which before this i had to do using virtual machine snapshots or other disk/file system copy mechanisms .

Snapshot merging is accomplished with lvconvert and --merge flag. The man page states:

--merge Merges a snapshot into its origin volume. If both the origin and snapshot volume are not open the merge will start immediately. Otherwise, the merge will start the first time either the origin or snapshot are activated and both are closed. Merging a snapshot into an origin that cannot be closed, for example a root filesystem, is deferred until the next time the origin volume is activated. When merging starts, the resulting logical volume will have the origin’s name, minor number and UUID. While the merge is in progress, reads or writes to the origin appear as they were directed to the snapshot being merged. When the merge finishes, the merged snapshot is removed. Multiple snapshots may be specified on the commandline or a @tag may be used to specify multiple snapshots be merged to their respective origin.