Myths and Missconceptions about C++

This article intends to give an overview over some of the antipatterns and missconceptions that I come across quite regulary when answering questions on reddit1.

Blank Arrays

This is probably one of the most common problems. In many languages the build-in arrays are there in order to be used by application-developers. This is not the case for C++!

C++ inherited it’s arrays more or less from C where they can be somewhat justified by the fact that almost everythin in C maps more or less directly to the generated assembler with just some syntactic sugar. The result is that arrays don’t know their own size and convert to naked pointers (see below) when looked at from a distance of ten kilometers.

Arrays in C++ are called std::vector

C++ has it’s own array-types, but they are called somewhat different: First and foremost std::vector and on somewhat rarer occasions std::array. Both can do everything that you would want to do with naked arrays but with a much nicer interface that avoids many of the problems:

#include <cstdint>
#include <iostream>
#include <vector>

int main() {
	std::vector<std::size_t> numbers = {2, 3, 5, 7, 11, 13};
	std::cout << "A vector knows it's size: " << numbers.size() << '\n'
	          << "You can access arbitrary elements of a vector: " << numbers[3] << '\n'
	          << "Unlike Arrays there is also a checked interface: " << numbers.at(4) << '\n';
	// We can also add further elements later:
	numbers.push_back(17);
	numbers.push_back(19);
}

Also, in order to remove that argument before it comes up: Accessing elements in a std::vector is exactly as fast as accessing elements in a build-in array and in the few cases where vector might appear to be slower it basically always boils down to the additional cost being absolutely negligable even in the most timecritical applications or to costs that you would have to pay anyways by implementing the mechanism at the place of usage.

Ressource-Managment

As usual with bad C++ there is also the topic of leaks: In order to be of dynamic size build-in arrays must be created with new[]; however variables that were created like this must be freed explicitly via delete[]. As if this wouldn’t be bad enough, the following code will most likely compile without warning but exhibit undefined behaviour that may even be exploitable with code-execution:

#include <string>

int main() {
	std::string* arr = new std::string[10];
	delete arr;
}

The problem is, that the code tries to free with delete instead of the correct delete[]; both commands accept pointers as the arguments so there is in general no way for the compiler to know what you want.

But even if you are perfectly sure that you won’t ever fall into this trap (unrealistic), the problems don’t stop here. Consider the following code:

void foreign_function(const std::string* array);

void my_function() {
	auto* arr = new std::string[10];
	foreign_function(arr);
	delete[] arr;
}

If foreign_function throws an exception, arr will never be freed (“leaked”) which may result in your programm consuming so much memory that the OS cannot deliver more at some point which will in one way or the other result in a very ungracefull crash.

std::vector on the other hand has none of those problems:

void foreign_function(const std::vector<std::string>& array);

void my_function() {
	std::vector<std::string> arr(10);
	foreign_function(arr);
	// livetime of arr ends HERE. the destructor will
	// clean up all the ressources (memory) owned by it,
	// even in the presence of exceptions
}

But I only use them for sizes that I know at compile-time!

This is a relatively common reply to some of the criticism mentioned above and in C++98 it was indeed not entirely wrong: Arrays of statically known size indeed avoid some of the problems:

#include <string>

int main() {
	std::string strings[3] = {"foo", "bar", "baz"};
	// no leak: since the array is stack-allocated all it's elements are destructed
}

However: In C++11 we got std::array which has many of the great methods that std::vector has and has literally zero overhead over build-in arrays of known size. Unlike them they do however have a sane syntax (the name of the variable is not between parts of it’s type) and avoids some weird behavior in cases like being passed as argument to a function.

Summary

The array of runtime-size in C++ is called std::vector which is usually the best choice for containers. If you know what you are doing std::array may also be a valid choice.

Build-in arrays on the other hand are technology for implementing container-libraries, they are not there to be used by application-programmers! Even experienced programmers should rarely ever have to touch them.

C++ is not Java

The fact that Java stole much of C++’s syntax but changed most of the semantics is the cause of one of the biggest problems that C++ faces: Many things in C++ look almost exactly as what is great style in later OOP language but they are truly awfull:

Java:

ArrayList<Integer> list = new ArrayList<Integer>();

C++:

std::vector<int>* vec = new std::vector<int>();

If you know that Javas ArrayList is more or less the equivalent to C++’s std::vector you might easily come to the conclusion that the above code is more or less equivalent. If you are aware that the Java-code is perfectly OK, you might also come to the conclusion that the C++ can’t be that bad.

You would be wrong! The above C++ is extremely ugly and shouldn’t last more then a few seconds in an even mediocre code-review. It is in fact so much against everything that C++ is about, that every experienced C++-programmer will recognize the author as someone who isn’t used to writing the language.

C++ loves Value-Types

Why is this so? In Java (and similar languages) it is common to create every instance of a class on the heap (also known as the “free store” in C++). This has the percieved advantage that it works better with inheritance in some cases and fits better together with Garbage-Collection.

C++ on the other hand doesn’t create these indirctions unless explicitly asked. Instead it places the guts of any class wherever the variable was created. This is called “value-semantics” (basically it is what Java does for integers and floats) and it works great with C++’s ressource-managment (RAII): Whenever a variable goes out of scope in C++ a certain method, the so called “destructor”, is called on it (conceptually; in reality empty constructors will of course be ommited and stuff like that). This method only exists in order to free ressources that are occupied by the class; after it the destructors of all the class-members are called. Since all of this happens implicitly you rarely ever have to do any kind of ressource-managment yourself. Since most later languages have Garbage-Collection and threw out destructors they wouldn’t profit that much from value-types, but C++ certainly does.

Another great advantage of value-types is that they are much faster: Modern computers have very slow memory-accesses for arbitrary locations, but they are quite fast for memory that is located next to memory that was just accessed2. Value-semantics are basically the perfect answer to this situation because they reduce the number of indirections to the bare essentials.

With that knowledge, let’s revisit the above code: In both languages new creates a new instance of the respective container-class on the heap. In Java there is basically no way around that and noone would criticize that.

In C++ however the creation of a std::vector on the heap, especially with pointers and new, just doesn’t make any sense: It creates the very small managment-unit (in most implementations: three pointers) that was desinged to be placed on the stack on the heap and forces the user to manually free it because C++ has no Garbage-Collection. Basically it creates several problems without providing even one advantage.

Good C++

So what would be the correct C++-code? This:

std::vector<int> vec;

Now, the problem continues, the following is valid Java:

ArrayList<Integer> list;

But it does something completely different (it creates a pointer that points to nothing at all, not even an empty list).

Summary

C++ may look similar to Java, but even in very simple examples the semantics differ to a huge degree. The above example is only of the many situations where C++ looks somewhat like Java, but the meaning is vastly different. If you are going to learn C++ from a Java-background, the best is probably to forget most of what you know about Java first.

“C++ is all about manual memory-managment”

Often people assume that since C++ doesn’t force everyone to use Garbage-Collection it must be a language where you have to manage all memory yourself and are surprised when told that this is about as wrong as it gets.

Ressources

It is indeed true that C++ usually does not use GC to clean-up dynamically allocated memory:

int * ptr = new int{23};

If you do this you would indeed be forced to manually return the memory to the OS:

delete ptr;

This will get very nasty very fast, especially if you start using exceptions. Now: GC would certainly spare us from cleaning up ourselves?

// This is Java:
Writer writer = new FileWriter("filename");
writer.write("foobar\n");
writer.close(); // didn't Java promise we wouldn't have to clean-up ourselves?

So, GC isn’t really a solution too, since it still forces us to clean-up all kinds of ressources ourselves even without Exceptions (as the above Java code does assume!).

RAII

C++ has therefore a much better solution: “Ressource Aquisition is Initialization” (RAII). While everyone agrees that the name is terrible, the semantics is actually very clear and simple:

For every class that we create we can define a method (called “destructor”) that is called when the instances live ends. In this method we release all ressources that we are currently holding. This still sounds like much work since we would have to release all members once. Again: this is not the case. After our Destructor has run the destructors of all members are run in reverse order of initialization. Since almost every container we need is already implemented in the stdlib, this means that we will rarely ever have to define a destructor ourselves since the default is almost always sufficient.

Let’s see an example:


class my_string {
public:
	template<std::size_t N>
	my_string(char[N] s) {
		data = new char[N];
		end = data + N;
		std::copy(s, s+N, data);
	}
	
	~my_string() {
		delete[] data;
	}
	
	my_string(const my_string&) = delete;
	my_string& operator(const my_string&) = delete;
private:
	char* data;
	char* end;
};

int main() {
	my_string str{"foobar"};
	// no leak
}

While the above is certainly not a great example of how to implement a string-class, it already shows the centralized definition of the cleanup in my_string::~my_string that allows all users of the class to use it without having to worry about leaks. This is true even in the presence of exceptions, since destructors are called during stack-unwinding.

Let’s see another example:

class graph {
public:
	graph(std::vector<std::vector<std::size_t>> nodes): nodes{std::move(nodes)} {}
	
	const std::vector<std::size_t>& edges_at_node(std::size_t node_id) const {
		return nodes[node_id];
	}
private:
	std::vector<std::vector<std::size_t>> nodes;
};

int main() {
	graph g{{1, 2}, {0,2}, {0,1}};
	assert(g.edges_at_node(1) == std::vector<std::size_t>{0, 2});
}

This time I didn’t even bother to define a destructor but there is still no leak since the compiler will still generate a destructor for graph that will call the destructor of nodes which prevents all leaks.

Revisiting Files

Let’s take another look at the above Java-example, where we print to a file and compare it to the following C++-version:

std::ofstream file{"filename"};
file << "foobar\n";
// no need to call file.close() in C++

Unlike the so called „managed“ languages like Java, C++ handles this extremely gracefull without any need to explicitly clean-up anything. Let’s take a look at a more sophisticated example:

std::mutex global_mutex;

void my_fun(const std::string& dir) {
	std::string line;
	std::lock_guard<std::mutex> guard{global_mutex};
	std::ofstream file {dir + "filename"}; // no need to release the temporary std::string
	while(std::getline(file, line)) {
		use(line);
	}
	// no need to close file, the destructor does this
	// no need to unlock the mutex, the destructor of guard does this
	// no need to free line, the destructor does this
}

The above code would be much longer if we would want to release everything manually in an exception-safe way without RAII. C++ is one of the few languages where it is as simple as shown above.

###Summary

C++ may not have Garbage-Collection, but it offers something much better: RAII and a great standard-library that provides the tools to even avoid having to write your own destructors.

If you write C++ in a way that forces you do manual (explicit) ressource-managment, you are simply doing it wrong. In C++ we use what I call implicit ressource-managment (in contrast to „automatic memory-managment“).

std::endl is rarely what you want

In countless beginner-tutorials, novices are taught, that they can write a newline using std::endl; sometimes it is also mentioned that they can also create a newline using '\n' in a string or character, but this is rarely what what you see in real code.

Many programmers ond most beginners are unaware of the real differences: A common missconception is that std::endl creates a plattform-independent newline while \n does not. This is wrong! Both newlines produce the correct character-sequence on the system in question (whether this is a good idea, is another topic), meaing that printing "\n" on Windows will indeed produce a carriage-return and a linefeed (to disable this behavior, open the stream in binary-mode).

The real difference is that std::endl prints a newline and flushes the stream-buffer after that, while '\n' only prints a newline. This may be no difference for a std::stringstream, but it definitely is for everything where flushing is expensive like filestreams.

If you are still unsure, whether to use it, I should also point out that Bjarne accepted my proposal for the core-guidelines to discourage it’s use.

If you really mean to flush, using std::cout << "foo\n" << std::flush; also has the advantage to clearly state your intent to do so and thereby helps to avoid confusion whether the flush is intentional.

tl;dr: Unless you really mean to flush the buffer, don’t use std::endl;

Undefined Behavior

Undefined behavior is the C- and C++-term for the kind of behavior that certain types of illegal operations result in. There are a lot of myths regarding that topic, but is is actually not that hard. This section tries to list the most important ones and answer how true they are.

“I can rely on undefined behavior”/ “I know what my CPU does”

This is a quite common myth and it is wrong and dangerous. Yes, you may know that on x86 and all other two-complement-machines INT_MAX + 1 will wrap-around and result in INT_MIN. What you will however never know is whether all of your future-compilers will make out of that. Since signed-integer-overflow is undefined, it may well start at some point to optimize based on the assumption that this overflow will never happen. The results of that won’t be to your liking.

In fact, this very specific example already happened: GCC started to optimize like that and lot’s of projects started to get weird problems. This is basically guaranteed to happen again, so never, ever rely on undefined behavior.

“Undefined behavior is just a burden when writing code.”

This is also a common opinion. While people may accept that UB allows the creation of faster code, they dislike the idea of having to write it in a different way for that.

This is not the case. The great thing about undefined behavior is that it basically always results from bad code to begin with. Let’s look at this overflow check:

int my_fun(int i1, int i2) {
	assert(i1 > 0);
	assert(i2 > 0);
	auto tmp = i1 + i2;
	if (tmp < 0) {
		throw std::overflow_error{"integer-overflow"};
	}
	return tmp;
}

Assuming for a moment that this would be defined: Is it really what you want?

Let’s think about the semantic meaning of that code: Do you really want to know whether the sum of two positive integers is negative?

It’s much more likely that you want to know whether the result of i1 + i2 would overflow INT_MAX, isn’t it? So just write that:

int my_fun(int i1, int i2) {
	assert(i1 > 0);
	assert(i2 > 0);
	if (INT_MAX - i1 < i2) {
		throw std::overflow_error{"integer-overflow"};
	}
	return i1 + i2;
}

Since an overflow is undefined we cannot check for i1+i2INTMAXi_1 + i_2 \leq \mathrm{INT_MAX}, but the mathematically sound conversion to subtract i1i_1 from both sides is trivial and obviously makes a lot of sense.

So, we had to rewrite our question a little bit to conform with the implementation, but the result is still much better code (and in fact saved us a line).

To prevent the usual argument that one might not know the exact types of the variables (for instance because this is a template): std::numeric_limits exists for a reason:

template<typename Num1, typename Num2>
auto my_template(Num1 i1, Num2 i2) {
	assert(i1 > 0);
	assert(i2 > 0);
	if(std::numeric_limits<decltype(i1 + i2)>::max() - i1 < i2) {
		throw std::overflow_error{};
	}
	return i1 + i2;
}

Signed integer-overflow is of course only one example, but most others are either very similar or dangerous even without an optimizing compiler (like out-of-bounds array-access).

The take-home-message is that you shouldn’t view undefined behavior as a burden, but as something that forces you to write clean and semantic code, which will make you more happy at a later point. It also enables you to shut-down discussions whether some code is good enough by saying “This is forbidden by the language, change it!”.

So: Don’t complain about undefined behavior and instead be happy to not having to face some bad code that is legal in other languages.


  1. Mostly on /r/cpp_questions and somewhat less often on /r/cpp

  2. This is due to Cache-effects.