ANSI C String handling functions

String handling in portable ANSI C code seems to be still a challenging problem for many programmers.
ANSI C does not come with a huge function library like other modern programming languages. The functions that exist are pretty old and sometimes confusing. But ANSI C is still one of the most important languages because of the portability, performance and many other reasons.
(BTW: read what Linux Torvalds says about C and C++. I don’t 100% agree with Linus, but I understand his point and it’s always funny to read his posts… http://thread.gmane.org/gmane.comp.version-control.git/57643/focus=57918)
However, the main reason why errors occur is because often programmers don’t read the manual, but learn from some simple examples that don’t handle error scenarios. This simplified examples often lead to a wrong impression of how a function works (e.g. strncat).
Of course careful reading of manpages always makes sense, but a better API could solve the the problem in the first place.

I hope this little code story below can answer questions for beginners and prove the knowledge of the experts that did of course know everything about that … 😉

/* The little string handling story ... */
int main()
{
    char szBuf[10];
    int len;

    /* mega mega wrong: strcpy/strcat are evil! Never use it. */
    strcpy(szBuf, "Hello World");
    strcat(szBuf, " Foo Bar");

    /* n-functions */
    strncpy(szBuf, 10, "Hello");      /* OK, but do you really understand it? */
    strncpy(szBuf, 10, "0123456789"); /* szBuf is not null-terminated anymore! */
    strncpy(szBuf, 10, "Hello");      /* OK, one more time */
    strncat(szBuf, 10, " World");     /* Bufferoverflow!!! */
    /* The n specifies the maximum number of characters to copy, not the length
     * of the destination buffer.
     * If the limit is reached the destination string will NOT be zero-terminated.
     * Unterminated strings are a timebomb in C!!!
     */
    /* Correct usage of n-functions */
    strncpy(szBuf, sizeof(szBuf), "Hello"); /* OK, safe copy, but ... */
    szBuf[sizeof(szBuf)-1] = 0;             /* we must ensure that it is zero terminated */
    len = strlen(szBuf);                    /* now we can check how long the current string is */
    strncat(szBuf, sizeof(szBuf)-len-1, " World"); /* and call strncat correctly. Safely truncated.*/
    szBuf[sizeof(szBuf)-1] = 0;             /* again, we must ensure that it is zero terminated */
    /* This looks complicated and error prone - yes it is!"
    /* Also read the man page of strncat, it behaves different than strncpy in that it copies at most n+1 bytes
     * to set the nul-terminator. That's why we need sizeof(szBuf)-len-1, otherwise we might create another
     * of-by-one error */

    /* But there is a better way: */
    /* l-functions */
    strlcpy(szBuf, "Hello", sizeof(szBuf));     /* OK */
    strlcat(szBuf, " World", sizeof(szBuf));    /* OK, safely truncated */
    /* This was simple, wasn't it?
     * The l-functions take the length of the destination buffer as the last argument
     * and guarantees that the resulting string is always zero terminated.
     * (Except the existing string szBuf for strlcat is already unterminated,
     *  but in this case something else went wrong already before calling strlcat)
     */

    /* Of course there is a catch. The l-functions are BSD functions and not ANSI C.
     * They don't exist on Linux or Windows.
     * But it is easy to write them on your own for systems where the function does not exist.
     * Just see my example below.
     * Or on Linux you can also install libbsd, which provides these functions for Linux.
     */

    /* One more thing. If truncation is OK for you and performance doesn't matter,
     * the l-functions are a good thing.
     * If you need to handle or at least detect truncation, this is possible with the l-functions,
     * but in this case it's better to calculate the lengths of the strings and use memcpy.
     * It's really simple, memcpy does what you expect, and it's much faster than any string function
     * that must check every byte for a null-terminator. BTW, truncation can easily be avoided by
     * allocating using alloca. Here is an example: */

    char a[] = "Hello ";                                                                            
    char b[] = "World";                                                                             
    char *tmp;                                                                                      

    tmp = alloca(strlen(a)+strlen(b)+1);  /* allocate the string on the stack */
    memcpy(tmp, a, strlen(a));            /* copy the 1st part */
    memcpy(tmp+strlen(a), b, strlen(b)+1);/* copy the 2nd part including the NUL-terminator */
    /* note, that strlen() of constants can be computed at compile time.
     * For other strings you should create a helper variable len to avoid multiple strlen()
     * calls for the same string */

    /* When defining _GNU_SOURCE before including <string.h> you get another useful function:
     * mempcpy. This function behaves like memcpy, but returns a pointer to dst+n instead of dst.
     * This way the output of mempcpy can be used as input for the next mem(p)cpy call. */
    tmp = alloca(strlen(a)+strlen(b)+1);
    memcpy(mempcpy(tmp, a, strlen(a)), b, strlen(b)+1);
    /* Is this cool or what? Much shorter than any str-function, and much faster!!! */
}

/*
 * Copyright (c) 2013 Gerhard Gappmeier
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
 * documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
 * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
 * persons to whom the Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all copies or substantial portions of the
 * Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
 * WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
 * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
 * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

/** Size limited copy string function.
 * This function guaranties that \c dst will be zero terminated,
 * as long as \c dst is at least one byte long.
 * Note that strlcpy works on true 'C' strings, which means
 * both \c src and \c dst must be zero terminated.
 * @param dst destination buffer
 * @param src source buffer
 * @param len length of destination
 * @return total length of the created string in \c dst
 */
size_t strlcpy(char *dst, const char *src, size_t len)
{
    size_t pos = 0;
    if (len < 1) return 0; /* sanity check */
    len--; /* reserve space for null terminator */

    while (pos < len && *src)
    {
        dst[pos++] = *src;
        src++;
    }
    dst[pos] = 0;
    return pos;
}

/** Size limited concatenate string function.
 * This function guaranties that \c dst will be zero terminated,
 * as long as \c dst is at least one byte long.
 * Note that strlcpy works on true 'C' strings, which means
 * both \c src and \c dst must be zero terminated.
 * If \c len <= strlen(dst) strlcat cannot write any characters,
 * strlen(dst) will be returned.
 * @param dst destination buffer
 * @param src source buffer
 * @param len length of destination
 * @return total length of the created string in \c dst
 */
size_t strlcat(char *dst, const char *src, size_t len)
{
    size_t pos = strlen(dst);
    if (len <= pos) return pos; /* sanity check */
    len--; /* reserve space for null terminator */

    while (pos < len && *src)
    {
        dst[pos++] = *src;
        src++;
    }
    dst[pos] = 0;
    return pos;
}
Advertisements

0 Responses to “ANSI C String handling functions”



  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: