32. Algorithms on Null-Terminated Strings


Example function definitions

For illustrations, here are implementations of strlen, strcmp and strncat. The implementation of strncat is from the manual page for strncat. (See it by doing command man strncat in Linux.)

  //----------------------------------------------------------
  size_t strlen(const char* s)
  {
    size_t k;
    for(k = 0; s[k] != '\0'; k++) {}
    return k;
  }
  //----------------------------------------------------------
  int strcmp(const char* s, const char* t)
  {
    int k = 0;
    while(s[k] != '\0' && s[k] == t[k])
    {
      k++;
    }

    // We want to return s[k] - t[k], but be
    // sure to treat the characters as numbers
    // from 0 to 255, not as numbers from
    // -128 to 127.

    int sk = (unsigned char) s[k];
    int tk = (unsigned char) t[k];
    return sk - tk;
  }
  //----------------------------------------------------------
  char* strcat(char* dest, const char* src)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0; src[i] != '\0'; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;    
  }
  //----------------------------------------------------------
  char* strncat(char* dest, const char* src, size_t n)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0 ; i < n && src[i] != '\0' ; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;
  }
  //----------------------------------------------------------

Looping over a null-terminated string

The above definition of strcat shows a standard way to look at each character in null-terminated string s:

  for(i = 0; s[i] != '\0'; i++)
  {
    …
  }
Sometimes students do that as follows instead.
  for(i = 0; i < strlen(s); i++)
  {
    …
  }
That works, but it is very slow for long strings. Notice that it needs to recompute strlen(s) for every character in s. That requires time proportional to n2 when s has length n. If n is 1000, n2 is 1,000,000.


Storing the null character

A null character is stored for a string constant automatically, and library functions such as strcpy and strcat know to store null characters. But that is as far as it goes. Looking at the definition of strcat, you can see that the null character needs to be stored explicitly.


Copying null-terminated strings

Copying a null-terminated string s involves two steps.

  1. Allocate an array (let's call it cpy) large enough to store the copy. Strlen(s) does not count the null character, but you need to store a null character into cpy. So the size of cpy needs to be strlen(s) + 1.

  2. Copy s into cpy. Strcpy will do that job.

But you don't want to do two steps every time you need to copy a string. Here is a function that returns a copy of a null-terminated string.

  char* copystr(const char* s)
  {
    char* cpy = new char[strlen(s) + 1];
    strcpy(cpy, s);
    return cpy;
  }

A slighly more involved example

Let's define a function copyLetters(dest, src) that copies all of the letters in string src into array dest, and null-terminates dest. For example, if src is "I'm happy, not sad" then string "Imhappynotsad" is stored into dest.

An important thing to notice is that a character in dest is not necessarily at the same index as the corresponding character in src. For example, the 'm' in src is at index 2, but it is at index 1 in dest.

  void copyLetters(char* dest, const char* src)
  {
    int di = 0;  // dest index

    for(int si = 0; s[si] != '\0'; si++)
    {
      if(isalpha((unsigned)(s[si])))
      {
        dest[di] = s[si];
        di++;
      }
    }
    dest[di] = '\0';
  }

Treating strings like linked lists

An interesting feature of null-terminated strings is that you can compute not only the head of s (*s) but also the tail of s, as s + 1.

The analogy is a little off because s is empty if *s is '\0'. Here is a plan for a loop-algorithm for strlen.

  r      k
"horse"    0
"orse"    1
"rse"    2
"se"    3
"e"    4
""    5

And here is the corresponding definition of strlen.

  size_t strlen(const char* s)
  {
    const char* r = s;
    int         k = 0;

    while(*r != '\0')
    {
      k++;
      r++;
    }
    return k;
  }

Expressed as a for-loop, it looks like this.

  size_t strlen(const char* s)
  {
    for(const char* r = s; *r != '\0'; r++)
    {
      k++;
    }
    return k;
  }

Exercises

  1. What is strlen("frog" + 1)? Answer

  2. The following function is intended to return a copy of a null-terminated string.

      char* copyString(const char* s)
      {
        char* t = new char[strlen(s)];
        strcpy(t, s);
        return t;
      }
    
    There is a serious problem with it. Why doesn't it work? Answer

  3. Write a function that returns the number of occurrences of character 'a' in a given null-terminated string. Answer

  4. Suppose that numNonblanks(s) is supposed to return the number of nonblank characters in null-terminated string s. Look at the following definition of numNonblanks.

    int numNonblanks(const char* s)
    {
      int count = 0;
    
      for(int i = 0; i < strlen(s); i++)
      {
        if(s[i] != ' ')
        {
          count++;
        }
      }
      return count;
    }
    
    That definition is not a very good one. Why not? Answer

  5. Write a function removeBlanks(s) that allocates space for a null-terminated string in the heap, copies all nonblank characters in null-terminated string s into that space, null-terminates the new array and returns a pointer to that array. Do not allocate more room than is needed. Answer

  6. Does the following definition of strlen work?

      size_t strlen(const char* s)
      {
        for(const char* r = s; *r!= '\0'; r++) {}
        return r - s;
      }
    
    Answer