30B. Null-Terminated Strings


Strings

The C++ standard template library provides a type called string that handles strings in a way that is similar in spirit to the String type in Java. But this is a course in how things work, not in how to use a library that takes care of it all for you.

Therefore, the standards for this course require you not to use the C++ type string. Instead, we will use null-terminated strings.


Null-terminated strings

You can store a string in an array of characters. But remember that, in general, you cannot find out how large an array is by looking at the array, since an array is just a pointer to the first thing in a chunk of memory.

To avoid that unpleasant issue, a null-terminated string is an array of characters that includes a null character ('\0') as an end marker. For example, an array s containing five characters

  s[0] = 'g'
  s[1] = 'o'
  s[2] = 'a'
  s[3] = 't'
  s[4] = '\0'
represents the string "goat". The null character is not part of the string, but is only a marker letting you know where the string ends.

You can pass a null-terminated string to a function without passing a separate size, because the function can find out how long the string is by looking for the null character.


String constants

A string constant such as "some text" is a null-terminated string. So it is an array of characters, with a null character at the end.

A string constant has type const char*. For example,

  const char* flower = "lotus";
makes flower point into the static area of memory, where null-terminated string "lotus" is stored.

If you want to include an end-of-line character in a string constant, use \n. A string constant is not allowed to have an actual line break in it. But if you write two or more string constants in a row, they are automatically combined into a single string constant. For example,

  const char* message = "This is a multiline\n"
                        "message for you\n";
creates a string constant with two lines.

Note. Combination of consecutive strings only works for string constants, not for other expressions that represent strings.


Operations on null-terminated strings

The following are available if you #include <cstring>. Type size_t is equivalent to unsigned int.

size_t strlen(const char* s)

strlen(s) returns the length of null-terminated string s. The length does not count the null character. For example, strlen("rabbit") = 6.

Note. strlen finds the length by scanning through the array looking for the null character. So it takes time that is proportional to the length of the string. Avoid computing strlen(s) over and over for the same string s in the same function.


int strcmp(const char* s, const char* t)

Do not compare strings using ==. If s and t have type char* then expression s == t is true if s and t are the same pointer. It does not look at the characters in the strings.

Function strcmp compares strings s and t for alphabetical ordering or for equality. strcmp(s,t) returns an integer r with the following properties.

r < 0 if s comes before t
r = 0 if s and t are equal
r > 0 if s comes after t

For example, strcmp("cat", "cab") > 0 since "cat" comes after "cab" in alphabetical order.

Alphabetical ordering is determined by character codes. Since 'Z' is 90 and 'a' is 97, Z comes before a in the alphabetical ordering used by strcmp.

Here is how a former student asked whether string command was equal to "-t".

  if(command[0] == '-' && command[1] == 't' && command[2] == '\0')
  {
    …
That is clumsy and difficult to read. How about this instead.
  if(strcmp(command, "-t") == 0)
  {
    …


int strcasecmp(const char* s, const char* t)

Like strcmp, but ignore the case of letters. So 'r' and 'R' are treated like the same character.

char* strcpy(char* dest, const char* src);

strcpy(dest, src) copies null-terminated string src into array dest and null-terminates dest. The caller must ensure that there is enough room in array dest for the entire string plus the null character at the end.

The return value of strcpy(dest, src) is dest.


char* strncpy(char* dest, const char* src, size_t n);

Like strcpy(dest, src), but array dest has size n, and no more than that many characters are copied. (If there is not room, no null character is stored in dest.)

char* strcat(char* dest, const char* src);

Copy string src to the end of the string in array dest, adding a null character to the end. Then return dest.

Be careful. Strcat is not a concatenation function. It does not allocate any memory. For example,

  char* c = strcat(a,b);
does not just set c to the concatenation of string a followed by b. It adds b to the end of the string in array a. So what is in array a is changed. Also, there must be enough room in a to add b to a.


char* strncat(char* dest, const char* src, size_t n);

Like strcat(dest, src), but at most n characters are copied from src.

char* strchr(const char* s, int c)

Return a pointer to the first occurrence of character c in null-terminated string s. If there is no such character, strchr(c, s) returns NULL.

The type of strchr is a poor one. Since parameter s has type const char*, strchr should not be able to return a non-const pointer into array s, since that gives you a back-door way to modify a constant string. However, the library designers really wanted to provide two different functions:

  const char* strchr(const char* s, int c);
  char* strchr(char* s, int c);
That is possible in C++, but not in C. The type of strchr is a compromise that relies on the programmer not to abuse.


char* strstr(const char* haystack, const char* needle)

Return a pointer to the first occurrence of substring needle in string haystack, or return NULL if there is none. Both haystack and needle are null-terminated strings.


Null-terminates strings are arrays!

A null-terminated string is an array, and you cannot afford to forget that.

Strings are not automatically copied

Arrays are not automatically copied, so null-terminated strings are not automatically copied. Suppose str1 is a null-terminated string. Doing statement

  char* str2 = str1;
does not make str2 a copy of str1. Only the pointer is copied, as shown in the following diagram.

Some students try to get around that by writing

  char* str2 = *str1;
imagining that *str1 is the chunk of memory itself. But you already know about arrays. Expression *A is the same as A[0]. Would
  char* str2 = str1[0];
work? Of course not, since str1[0] has type char and str2 has type char*. You can do
  char str2 = str1[0];
but calling the variable str2 does not make it a string. Now, variable str2 is a variable of type char that holds the first character of string str1.

We will see how to copy a null-terminated string on the next page.


Strings are not automatically allocated

Arrays are not automatically allocated for you. Don't expect space for null-terminated strings to show up by magic. Consider the following.

  char* str;
  strcpy(str, "horse");
What does that do? Reading the description of strcpy above, you see that strcpy(A, B) copies null-terminated string B into array A. It does not create A. That is the caller's responsibility.

But variable str is uninitialized. It is a dangling pointer. Storing "horse" at whatever address happens to be in variable str is an error that can have terrible consequences to your program.