31. Algorithms on Null-Terminated Strings


Example function definitions

For illustrations, here are implementations of strlen, strcmp and strncat. The implementation of strncat is from the manual page for strncat. (See it by doing command man strncat in Linux.)

  //----------------------------------------------------------
  size_t strlen(const char* s)
  {
    size_t k;
    for(k = 0; s[k] != '\0'; k++) {}
    return k;
  }
  //----------------------------------------------------------
  int strcmp(const char* s, const char* t)
  {
    int k = 0;
    while(s[k] != '\0' && s[k] == t[k])
    {
      k++;
    }

    // We want to return s[k] - t[k], but be
    // sure to treat the characters as numbers
    // from 0 to 255, not as numbers from
    // -128 to 127.

    int sk = (unsigned char) s[k];
    int tk = (unsigned char) t[k];
    return sk - tk;
  }
  //----------------------------------------------------------
  char* strcat(char* dest, const char* src)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0; src[i] != '\0'; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;    
  }
  //----------------------------------------------------------
  char* strncat(char* dest, const char* src, size_t n)
  {
    size_t dest_len = strlen(dest);
    size_t i;

    for (i = 0 ; i < n && src[i] != '\0' ; i++)
    {
      dest[dest_len + i] = src[i];
    }
    dest[dest_len + i] = '\0';
    return dest;
  }
  //----------------------------------------------------------

Looping over a null-terminated string

The above definition of strcat shows a standard way to look at each character in null-terminated string s:

  for(i = 0; s[i] != '\0'; i++)
  {
    …
  }
Sometimes students do that as follows instead.
  for(i = 0; i < strlen(s); i++)
  {
    …
  }
That works, but it is very slow for long strings. Notice that it needs to recompute strlen(s) for every character in s. That requires time proportional to n2 when s has length n. If n is 1000, n2 is 1,000,000.


Storing the null character

A null character is stored for a string constant automatically, and library functions such as strcpy and strcat know to store null characters. But that is as far as it goes. Looking at the definition of strcat, you can see that the null character needs to be stored explicitly.


Copying null-terminated strings

Copying a null-terminated string s involves two steps.

  1. Allocate an array (let's call it cpy) large enough to store the copy. Strlen(s) does not count the null character, but you need to store a null character into cpy. So the size of cpy needs to be strlen(s) + 1.

  2. Copy s into cpy. Strcpy will do that job.

But you don't want to do two steps every time you need to copy a string. Here is a function that returns a copy of a null-terminated string.

  char* copystr(const char* s)
  {
    char* cpy = new char[strlen(s) + 1];
    strcpy(cpy, s);
    return cpy;
  }

A slighly more involved example

Let's define a function copyLetters(dest, src) that copies all of the letters in string src into array dest, and null-terminates dest. For example, if src is "I'm happy, not sad" then string "Imhappynotsad" is stored into dest.

An important thing to notice is that a character in dest is not necessarily at the same index as the corresponding character in src. For example, the 'm' in src is at index 2, but it is at index 1 in dest.

  void copyLetters(char* dest, const char* src)
  {
    int di = 0;  // dest index

    for(int si = 0; s[si] != '\0'; si++)
    {
      if(isalpha((unsigned)(s[si])))
      {
        dest[di] = s[si];
        di++;
      }
    }
    dest[di] = '\0';
  }

Treating strings like linked lists

An interesting feature of null-terminated strings is that you can compute not only the head of s (*s) but also the tail of s, as s + 1.

The analogy is a little off because s is empty if *s is '\0'. Here is a plan for a loop-algorithm for strlen.

  r      k
"horse"    0
"orse"    1
"rse"    2
"se"    3
"e"    4
""    5

And here is the corresponding definition of strlen.

  size_t strlen(const char* s)
  {
    const char* r = s;
    int         k = 0;

    while(*r != '\0')
    {
      k++;
      r++;
    }
    return k;
  }

Expressed as a for-loop, it looks like this.

  size_t strlen(const char* s)
  {
    for(const char* r = s; *r != '\0'; r++)
    {
      k++;
    }
    return k;
  }

Exercises

  1. What is strlen("frog" + 1)? Answer

  2. The following function is intended to return a copy of a null-terminated string.

      char* copyString(const char* s)
      {
        char* t = new char[strlen(s)];
        strcpy(t, s);
        return t;
      }
    
    There is a serious problem with it. Why doesn't it work? Answer

  3. Write a function that returns the number of occurrences of character 'a' in a given null-terminated string. Answer

  4. Suppose that numNonblanks(s) is supposed to return the number of nonblank characters in null-terminated string s. Look at the following definition of numNonblanks.

    int numNonblanks(const char* s)
    {
      int count = 0;
    
      for(int i = 0; i < strlen(s); i++)
      {
        if(s[i] != ' ')
        {
          count++;
        }
      }
      return count;
    }
    
    That definition is not a very good one. Why not? Answer

  5. Write a function removeBlanks(s) that allocates space for a null-terminated string in the heap, copies all nonblank characters in null-terminated string s into that space, null-terminates the new array and returns a pointer to that array. Do not allocate more room than is needed. Answer

  6. Does the following definition of strlen work?

      size_t strlen(const char* s)
      {
        for(const char* r = s; *r!= '\0'; r++) {}
        return r - s;
      }
    
    Answer

  7. We use positional notation to write numbers. In base 10, there is a position for 1's, a position for 10's, a position for 100's, etc., with a position for each power of 10.

    Computers also use positional notation, but they use base 2 instead of 10 (binary notation). There is a position for 1's, a position for 2's, a position for 4's, etc., with a position for each power of 2.

    Write a C++ program that reads a binary number from the standard input and writes the equivalent decimal (base 10) number on the standard output. Assume that the binary number has no more than 50 digits.

    Hints.

    1. Statement

        scanf("%s", binary);
      
      reads a string and stores the string into character array binary as a null-terminated string. (You don't add & to binary because it is already a pointer to the array where you want scanf to store the string.)

    2. To convert the binary string to an integer, loop over the null-terminated string, from beginning to end. For each bit, multiply your current number by 2 and add the next digit. That is based on the following ideas. Suppose that num(str) is the function that converts a binary string to an integer.

        num("1")   + 1 = num("11").
        num("11")  + 0 = num("110").
        num("110") + 0 = num("1100").
      

    3. Use type long for numbers so that you can handle 50-bit integers (on a 64-bit machine).

    Answer