Archive for the 'Linux' Category

Profiling Memory using GNU glibc tools

One of the tools I’m using for quite a while to profile memory usage is the built-in profiling support in glibc. For some reason this isn’t known very much, so I thought documenting it here makes sense, so I can simply point the people to this post, instead of explaining everything 😉

What does glibc support?

  • Detecting mem leaks
  • Printing a memory histogram
  • Plotting a graph of memory usage over time
  • Measuring not only the heap, but also stack usage
  • Works also for embedded system which use glibc, not only on the PC

How does it work?

The functionality is implemented in a library called libmemusage.so, which
gets preloaded by the dynamic linker simply be defining the variable LD_PRELOAD=/lib64/libmemusage.so.
The path may vary depending on the system you use of course.

Example:

LD_PRELOAD=/lib64/libmemusage.so ./helloworld

You can configure where the profile output is store by exporting the variable MEMUSAGE_OUTPUT=profile.dat.

There exists also a convenience wrapper script named memusage which does all this for you. A second program called memusagestat can generate nice graphics from the profiling data. Normally this scripts don’t get installed with glibc and must be installed separately.

Gentoo: compile glibc with ‘gd’ use flag.
Debian: libc6-dbg contains /usr/lib/debug/lib/x86_64-linux-gnu/libmemusage.so, but the scripts are missing.
On other systems you may find a package called glibc-utils which contains the scripts. As a last resource you can download it from https://www.gnu.org/software/libc/

Now lets see this in action: There for I created a simple example application, which allocated memory and creates one memory leak.

#include <stdio.h>
#include <malloc.h>

int main(int argc, char *argv[])
{
    int i;
    void *data[10];

    printf("Hello World\n");

    for (i = 0; i < 10; ++i) {
        data[i] = malloc(i+10);
    }

    for (i = 0; i < 9; ++i) {
        free(data[i]);
    }

    return 0;
}

Compile it like this: gcc -o hello hello.c
And run it using memusage

$> memusage ./hello
Hello World

Memory usage summary: heap total: 1169, heap peak: 1169, stack peak: 656
         total calls   total memory   failed calls
 malloc|         11           1169              0
realloc|          0              0              0  (nomove:0, dec:0, free:0)
 calloc|          0              0              0
   free|          9            126
Histogram for block sizes:
    0-15              6  54% ==================================================
   16-31              4  36% =================================
 1024-1039            1   9% ========

As you can see there are 11 malloc calls, 10 from our code, 1 from printf,
and only 9 frees, so we have found a memory leak.
We can also see the memory distribution in the histogram and in the summary
we can see the heap and stack peak values.

Now lets do the same thing again and lets plot a memory usage chart.

$> memusage -d profile.dat ./hello
$> memusagestat -o profile.png profile.dat

This creates the following graphics:

Profile

Of course this is a little boring in this simple application, but is very useful in bigger applications. For detecting memory leaks are better solutions available like valgrind, but this chart is very useful to see how much memory an application needs at what time, even though no memory leaks exists. E.g. consistently growing memory usage would be a problem.

How to use this on embedded devices?

In most cross-compiler toolchains that use glibc the library libmemusage.so does exist also. You can use this on the embedded system by SSHing and using the LD_PRELOAD approach described above even though the convenience scripts are not available there usually. Then you copy back the results to your PC using SSH and generate the plot on the PC using memusagestat.

Latency Heatmaps

Latency heatmaps are a great way to visualize latencies which are hard to grasp from pure test data. Brendan Gregg (http://brendangregg.com) has written a great Perl script for generating such heatmaps as interactive SVG graphics. Also the flamegraphs are just awesome, but this is another story.

(Unfortunately SVGs are not allowed on wordpress, so I converted this to PNG for this blog.)

Latency Heatmap

Just recently I used the heatmaps to visualize the accuracy of our OPC UA Server SDKs. So this time I use this opportunity to blog about it.

I used a python test tool for measuring the sampling rate using the OPC UA timestamps.
This outputs a simple list as integer values [µs since UNIX epoch].

1477661743711872
1477661743761997
1477661743811750
1477661743861417
1477661743912030
...

But for generating a heatmap you need input like that:

# time              latency
1477661743761997    50125
1477661743811750    49753
1477661743861417    49667
1477661743912030    50613

Normally when measuring services like UA read and UA write I have both values, the time when measured (sending the request) and the latency (time until I get the response from the server). This time, when measuring the sampling rate for UA monitored items this is a little bit different. I only get the timestamps when the data was sampled. I don’t care when I received the data. So I compute the latency information as the difference of two sample points.

This can simply be computed using a few lines of awk script:

BEGIN { last = 0; }
{
    if (/[0-9]+/) {
        if (last == 0) {
            last = $1;
        } else {
            latency = $1 - last;
            last = $1;
            printf "%u\t%u\n", $1, latency
        }
    }
}

The result I can feed into Brandon’s trace2heatmap.pl Perl script.

The whole process of measuring and generating the SVG is put into a simple BASH script which does the following:
1.) Calling the python test UA client
2.) Calling the awk script to prepare the data for trace2heatmap.pl
3.) Calling trace2heatmap.pl to generate the SVG

This also shows the power of Linux commandline tools like BASH, awk, and Perl. I love how these tools work seamlessly together.

Excerpt of this BASH script:

...
# do measurement
echo "Starting measurement for 10s..."
if [ $PRINT_ONLY -eq 0 ]; then
    ./test.py $URL subscription >log.txt || { cat log.txt; exit 1; }
fi
echo "Done."
# compute latency based on source timestamps
echo "Computing latency data using awk..."
awk -f latency.awk log.txt >latency.txt || exit 1
# generate heatmap
echo "Generating heatmap..."
./trace2heatmap.pl --stepsec=0.1 --unitstime=us --unitslatency=us --grid --minlat=$MINLAT --maxlat=$MAXLAT --reflat=$REFLAT --title="$TITLE" latency.txt > $SVGFILE || exit 1
echo "Done. Open $SVGFILE in your browser."

I used this to measure at 50ms sampling rate, once on Window 10, and once on Linux.
The results are quiet different.

Windows 10 measurement:
Latency Heatmap

It is interesting to see that we are quiet far away from the configured 50ms sampling interval. The reason for this is that our software uses software timers for sampling that are derived from the Windows GetTickCount() API function. The resolution of this is quiet bad and is about 15-16ms. Maybe this could be improved using QueryPerformanceCounter.
See also https://randomascii.wordpress.com/2013/05/09/timegettime-versus-gettickcount/

Linux measurement: (Linux ws-gergap 4.4.6-gentoo)
Latency Heatmap

On Linux we use clock_gettime() to replicate the Windows GetTickCount() functionality. And this works much better. Also we don’t have such runwaway measurement results due to scheduling delays. Event though it’s a standard Linux kernel without real-time extension. Linux does a pretty good job.

Note that both graphics above use the same scale. When zooming in more into the Linux measurement we recognize another phenomenon:
Latency Heatmap
You can see two lines in the measurement. The distance is exactly 1ms. The reason for this is that in our platform abstraction layer we have a tickcount() function which is modelled after the Win32 API, which means it uses ms units. This in turn means our software cannot create more accurate timer events, even though Linux itself would be able to handle this.

We should think about changing this to µs to get better accuracy, and maybe QueryPerformanceCounter can fix the problem also on Windows. But for the moment we are happy with the results, as they are already much better than on Windows.

2nd note: I modified the trace2heatmap.pl a little bit to show also the configured sampling rate (red line). This way it is easier to see how far away the measured timestamps are from the configured sampling rate. The Perl script is really easy to understand, so custom modifications are a no-brainer.

If somebody is interested in these scripts, just leave me a comment and I will put it on github.

Thanks to Brendan for this script and his great book and website.

Using Let’s encrypt for non-web servers

I think I don’t need to explain Let’s encrypt anymore. But what many people are struggling with is to use Let’s encrypt certificates for other services like SMTP, IMAP, IRC, etc.

Using certbot this is quiet easy (See https://certbot.eff.org for installation instructions).

When certbot is installed you can use it in standalone mode. This means it starts a built-in webserver which is used for the authentication process and gets stopped again a few seconds later.

To make it short here is an example command to create a new certificate for your mail server:

./certbot-auto certonly --standalone -d smtp.yourdomain.com -d imap.yourdomain.com

Of course the standalone webserver must be reachable from the internet, so ensure that no firewall is blocking port 443 (https). In my case I have a firewall running, so I need to temporary enable https. Certbot also supports this by using the options pre-hook and post-hook.

./certbot-auto certonly --standalone --pre-hook /root/enable_https --post-hook /root/disable_https -d smtp.domain.com -d imap.domain.com

The example hook scripts insert a firewall rule for https and remove it again. This again are just examples that you need to adapt to your needs.

enable_https:

#!/bin/bash
IPTABLES=/sbin/iptables
$IPTABLES -I INPUT 8 -i eth0 -p tcp -m conntrack --ctstate NEW -m tcp --dport 443 -j ACCEPT

disable_https:

#!/bin/bash
IPTABLES=/sbin/iptables
$IPTABLES -D INPUT -i eth0 -p tcp -m conntrack --ctstate NEW -m tcp --dport 443 -j ACCEPT

See man(8) iptables for more information on this.

Renewing is also easy. By default the “certbot-auto renew” command will renew all certificates with the same options. Only the hooks must be give again at the command line.

./certbot-auto renew --pre-hook /root/enable_https --post-hook /root/disable_https

It is recommended to call this twice a day. Certbot will only really renew it when the certificate is about to expire. To automate this process you can create a cronjob.

I used the template which gets created when installing certbot using Debian Jessie.

/etc/cron.d/certbot:

# /etc/cron.d/certbot: crontab entries for the certbot package
#
# Upstream recommends attempting renewal twice a day
#
# Eventually, this will be an opportunity to validate certificates
# haven't been revoked, etc.  Renewal will only occur if expiration
# is within 30 days.
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# twice a day
0 */12 * * * root test -x /root/certbot-auto && perl -e 'sleep int(rand(3600))' && /root/certbot-auto -q renew --pre-hook /root/enable_https --post-hook /root/disable_https

Finally you need to update your mail server configuration to use the new certificates. Let’s encrypt stores the currently active certificate in /etc/letsencrypt/live/<your domain>/. This folder contains only symlinks to /etc/letsencrypt/archive/<your domain>/ with the real certificate files and keys, chains, etc.

In my case I simply edited /etc/postfix/main.cf and /etc/imapd.conf to use these new files.
Cyrus was no able to access the files by default, because the default file permissions prevented it to access the folders of “Let’s Encrypt”.
I fixed this by giving cyrus readonly access using the ssl-cert group.

chgrp -R ssl-cert /etc/letsencrypt/live /etc/letsencrypt/archive
chmod 750 /etc/letsencrypt/live /etc/letsencrypt/archive
usermod -a -G ssl-cert cyrus

Building MinGW Cross-Compilation Toolchain using CrossDev

This is mainly a note to myself. Maybe it’s useful for you too.

Actually building cross-compiler toolchains using crossdev is easy, but there are some pitfalls.

  • Remove any compiler environment variables before building, or the build will fail
  • For building MinGW toolchains the openmp useflag must not be set
  • If you have built already a toolchain partially with wrong settings, remove it completely before trying it again

Simple Build Instructions for building mingw32 and gdb:

# become root
su
# clear env (clearing CXX should not be necessary, but it doesn't hurt)
export CC=
export CXX=
export CFLAGS=
export CXXFLAGS=
export LDFLAGS=
# build using crossdev
crossdev --ex-gdb -t i686-pc-mingw32

For 64bit toolchain use x86_64-w64-mingw32 instead of i686-pc-mingw32.

Uninstall: crossdev -C i686-pc-mingw32

See https://wiki.gentoo.org/wiki/Mingw for more information.

Updated Vim Configuration on Github

Last weekend I played arround with tmux, powerline which is awesome by the way, but while experimenting with these tools I also found some new features for my Vim configuration.

Now I have the following improvements:

  • Replaced omnicomplete with clang_complete. This way you don’t need to maintain a ctags database. Clang_complete analyzes the code in the background using clang so the autocomplete information is alway up to date.
  • Using better header/source switcher: a.vim. Now it’s easy to switch between c/h or  cpp/h files without a need to change any config.
  • Added vim-airline, a lightweight powerline alternative for vim
  • Added some abbreviations to autocomplete “#i” -> “#include <>”, for-loops, etc.
  • Added macro for sourrounding Shell or Cmake variables. just place the cursor on a word and press @s (in normal mode). A variable “FOO” becomes “${FOO}” this way.
  • Added macro for implementing forward declarations. See example below.
Example: You have a header file containing this.
void foo();
int bla(int x);
double fft(char *data, int len);
You copy/paste this into your cpp file, place the cursor onto the first line and type “3@i” (implement 3 functions) to get this:
void foo()
{
}
int bla(int x)
{
}
double fft(char *data, int len)
{
}
To make the things easy for you a put my complete vim configuration to GitHub.

Optimizing speed in KVM image synchronization using rsync

An addition to my last post I think it might be useful to explain how to efficiently copy KVM images over network.

In https://gergap.wordpress.com/2013/08/10/rsync-and-sparse-files I explained how to efficiently handle sparse files. But what I didn’t mention is how to get the best transfer rates.

By default rsync uses SSH, which encrypts the whole traffic and gives you a good amount of security. If you have a local GigaBit-LAN the CPU becomes the bottleneck due to the encryption.

Rsync has also a solution for that problem. You can start the rsync daemon on one side and use the rsync tool on the other side with the “rsync://user@host/share” syntax to transfer the data. This way rsync uses its own efficient protocol (port 873) without encryption.
This way I’m able to achieve over 100MB/s transfer rate in my local LAN.

Rsync daemon configuration:
/etc/rsyncd.conf:
max connections = 1
log file = /var/log/rsync.log
timeout = 300

[backups]
path = /root/backups
comment = Backup images
read only = yes
list = yes
uid = root
gid = root
hosts allow = backupsserver
hosts deny = all

On my backupserver I extended my backup script to start the rsync daemon on the other side, start the rsync operation, and then kill the rsync process again.
This way the rsync port (873) is open only for a short time.
The “hosts allow” directive prevents connects from untrusted computers.
In addition it is possible to use “auth users = ” to setup password authentication. See man(1) rsync for details.

A small excerpt of my backup script:

...
echo "starting remote rsync daemon for faster backups" >> /tmp/rsync_script_tmpfile
ssh $HOSTNAME rsync --daemon >> /tmp/rsync_script_tmpfile
echo "backing up $BACKUPDIR" >> /tmp/rsync_script_tmpfile
rsync $OPTIONS rsync://$HOSTNAME/$BACKUPDIR $ARCHIVEROOT/$HOSTNAME >> /tmp/rsync_script_tmpfile
echo "stopping remote rsync daemon" >> /tmp/rsync_script_tmpfile
ssh $HOSTNAME killall rsync >> /tmp/rsync_script_tmpfile
...

Rsync and sparse files

Sparse files are a great feature of Linux filesystems. They become very handy when working with virtualization technologies like KVM. You don’t need to think long on how big you make a VM disk, just create a disk which is definitely big enough (I’m using 20GB normally for my linux based servers). If only 1GB is used the file uses only this amount of physical disk space and not the whole 20GB.

QEmu creates sparse files already by default when using raw images.
Example: qemu-img create myserver.img 20G
When adding the “s” option to the ls command you see the real used size in the first column.

ls -lhs
realsize                           virtualsize
0 -rw-r--r-- 1 gergap gergap 20.0G Aug 10 11:27 myserver.img

However these sparse files are a problem when copying them, especially when you need to move a disk image to another machine over network.

Local copies: When copying files locally with tools that are not aware of sparse files the whole 20GB will be copied. It may sound strange, but that’s the desired behavior. A sparse file with 20GB should look like a normal file to applications, so they see to complete 20GB, even though the most data is just zeros.

Luckily the “cp” command is aware of sparse files and will autodetect if a source is a sparse file. Then also the copy will become a sparse file and only the real data gets copied which is much faster. If the source is not sparse you can use “cp --sparse=always source dest“, then the destination will become a sparse file.

Now lets come to network transfer. Most admins are using rsync, which can copy a lot of files very quickly over SSH. rsync is very efficient in detecting what files have changed and only transmits the files that have been changed. So it’s easy to keep e.g. an FTP mirror in sync with its source or to implement backup strategies.

KVM images are different. You don’t have many files, but the files you have are huge sparse files. You don’t want to transmit 20GB over network if only a few MB have changed in the disk image. Even transmitting 1GB of actually used data takes quite a long time.

The solution is to use the “--inplace” option of rsync. This option only transmits the changed blocks of a file, not the whole file. The problem with “--inplace” is that is does not create sparse files.

But rsync can handle sparse files when passing the “--sparse” option. Unfortunately “--sparse” and “--inplace” cannot be used together.

Solution: When copying the file the first time, which means it does not exist on the target server use “rsync --sparse“. This will create a sparse file on the target server and copies only the used data of the sparse file.

When the file already exists on the target server and you only want to update it use “rsync --inplace“. This will only transmit the changed blocks and can also append to the existing sparse file.

I hope rsync will become more smart in the future and allows the combination of “--inplace --sparse” or can even autodetect the best strategy. But for now we have at least a working solution.

I hope this blog was helpful for understanding sparse files and rsync.