String literals with embedded nulls in Boost

Boost’s string library makes global string replace in C++ easy:

std::string str = "$greeting, world!";
boost::replace_all(str, "$greeting", "Hello");
std::cout << str << "\n"; // print "Hello, world!"

Suppose however that you want the search string to contain a null character. Why would you want such a thing? Consider escaping strings for safe inclusion in some contexts:

std::string str2 = get_string_from_remote_source();
boost::replace_all(str2, "\0", "(nul)");
// do something with str2 that depends on it not to contain null chars

(Remember that, unlike C strings, it’s perfectly valid for C++ standard library strings to contain NUL characters.)

Alas, the code above doesn’t work; the call to replace_all() doesn’t do anything. It turns out that when you given Boost a string literal, it uses strlen() to get the string’s length. Since strlen() works on C style null-terminated strings, it stops on the first null character it sees.

Why did I expect Boost to behave differently? In C++, string literals are array of chars. With the help of some template magic, the Boost library can know the string’s length at compile-time. It doesn’t need to rely on functions like strlen() to compute the string’s length, so it can handle arbitrary string literals, including ones with embedded nulls.

After some thinking and googling about it, it becomes clear why Boost doesn’t work this way, or at least why it isn’t the only reasonable way. The reason is that Boost cannot tell the difference between string literals and other character arrays. Consider this case:

char search[80];
strcpy(search, "foo");
boost::replace_all(str, search, "bar");

We probably wouldn’t want replace_all() to look for the whole 80 character long string which the input array happens to contain, but only for the part initialized with a null-terminated string. Actually, this seems to be some sort of gray area. When a zero character appears inside a string literal, it certainly means that the programmer intended the character to be a part of the string. But when it appears inside another character array, it may or may not mark the end of the string.

We need a way to tell Boost you want to treat a char array as an array instead of a null-terminated string literal. To do this, wrap the array in a call to boost::as_array. For example:

char nullchar[] = {'\0'};
boost::replace_all(str2, boost::as_array(nullchar), "(nul)");

In fact, you can also pass a string literal to as_array, but remember that the corresponding array contains an (additional) terminating null character. So, returning to the original problem, for a string containing solely one null character, use boost::as_array(“”). Don’t use boost::as_array(“\0”), as the latter will contain two characters.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: