Tag Archives: null

String literals with embedded nulls in Boost

Boost’s string library makes global string replace in C++ easy:

std::string str = "$greeting, world!";
boost::replace_all(str, "$greeting", "Hello");
std::cout << str << "\n"; // print "Hello, world!"

Suppose however that you want the search string to contain a null character. Why would you want such a thing? Consider escaping strings for safe inclusion in some contexts:

std::string str2 = get_string_from_remote_source();
boost::replace_all(str2, "\0", "(nul)");
// do something with str2 that depends on it not to contain null chars

(Remember that, unlike C strings, it’s perfectly valid for C++ standard library strings to contain NUL characters.)

Alas, the code above doesn’t work; the call to replace_all() doesn’t do anything. It turns out that when you given Boost a string literal, it uses strlen() to get the string’s length. Since strlen() works on C style null-terminated strings, it stops on the first null character it sees.

Why did I expect Boost to behave differently? In C++, string literals are array of chars. With the help of some template magic, the Boost library can know the string’s length at compile-time. It doesn’t need to rely on functions like strlen() to compute the string’s length, so it can handle arbitrary string literals, including ones with embedded nulls.

After some thinking and googling about it, it becomes clear why Boost doesn’t work this way, or at least why it isn’t the only reasonable way. The reason is that Boost cannot tell the difference between string literals and other character arrays. Consider this case:

char search[80];
strcpy(search, "foo");
boost::replace_all(str, search, "bar");

We probably wouldn’t want replace_all() to look for the whole 80 character long string which the input array happens to contain, but only for the part initialized with a null-terminated string. Actually, this seems to be some sort of gray area. When a zero character appears inside a string literal, it certainly means that the programmer intended the character to be a part of the string. But when it appears inside another character array, it may or may not mark the end of the string.

We need a way to tell Boost you want to treat a char array as an array instead of a null-terminated string literal. To do this, wrap the array in a call to boost::as_array. For example:

char nullchar[] = {'\0'};
boost::replace_all(str2, boost::as_array(nullchar), "(nul)");

In fact, you can also pass a string literal to as_array, but remember that the corresponding array contains an (additional) terminating null character. So, returning to the original problem, for a string containing solely one null character, use boost::as_array(“”). Don’t use boost::as_array(“\0”), as the latter will contain two characters.