Mike’s Place

Minimal C++

I keep talking about—and getting asked about—something that I refer to as “Minimal C++”. To be clear, there are a few points I should start of by stating explicitly:

If you are a C programmer, this model may help you greatly with the expressiveness of your code: you truly give up nothing when it comes to the compact code that you’re used to, and you gain stronger type safety, more clear object oriented programming, and as a result, a program which is easier to read and reason about—which translates to fewer bugs, and increased productivity.

There is something else I should point out: some programs depend heavily on the abstractions found in the C++ standard libraries. Such programs are poor candidates for Minimal C++ as I describe it here. Truly, this is meant for C programs written in the C++ language. Programs which are designed to make heavy use of “idiomatic” C++ (or are dependent on modules or libraries which themselves do) are definitely not candidates for this model.

Assumptions

I make a few assumptions of the reader here:

If these assumptions do not fit you, this programming model may also not fit you. As always, your experience may vary.

NOTE: I am considering doing something of a mini-podcast series, assuming there is enough interest, aimed at programmers who don’t have two decades of low-level C programming under their belt. This post is something of a 30,000 foot overview. If you are interested in such a podcast series, drop me a line; contact information is in the footer.

Basic Premise

The general idea is to use the C++ compiler to compile, and the C compiler to link, avoiding the use of C++ libraries which depend on anything outside of the standard C library itself.

Part 1: C++ Memory Management

In C, we call malloc(3) or one of its friends to allocate memory. When we are finished with that memory, the free(3) function is called on the pointer to that memory.

In C++, things are a bit different. C++ uses the new and delete operators, which allow objects to be created (and initialized!) as well as destroyed and the memory returned to the heap. This confers two significant benefits: first, we do not need to know how much memory to allocate (the C++ compiler will take care of that for us based on the size of the object, which is determined at compile time), and second, the memory will be initialized by a constructor function.

Here is a completely self-contained example which can be compiled to an object file using a C++ compiler, and then linked into a fully-functional program using only the C compiler frontend:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <cstdio>
#include <cstdint>
#include <cstdlib>

class Thing {
private:
	uint32_t mCounter;

public:
	Thing();
	~Thing();
};

Thing::Thing() : mCounter(0) {
	fprintf(stderr, "Object created.\n");
}

Thing::~Thing() {
	fprintf(stderr, "Object destroyed.\n");
}

void *
operator new(size_t size) {
	fprintf(stderr, "Allocating %lu byte(s) of memory\n", size);
	return calloc(1, size);
}

void
operator delete(void *ptr) {
	fprintf(stderr, "Freeing memory\n");
	free(ptr);
}

int
main(int argc, char *argv[]) {
	Thing *t = new Thing;
	delete t;

	return 0;
}

To compile this program:

g++ -std=c++11 -fno-exceptions -c -o min0.o min0.cc

And then to link this program:

gcc -o min0 min0.o

Finally, run it:

$ ./min0
Allocating 4 byte(s) of memory
Object created.
Object destroyed.
Freeing memory

You should recognize that the four bytes of allocated memory belong to the 32-bit integer; it should be pretty apparent at this point that a class is similar to a struct, but with extra “glue”.

Example Program, Explained

This is a truly minimal C++ program, in the sense that it uses the C++ language, only depends on the standard C library. The only two runtime components from C++ that are required are the operators new and delete, and they are supplied in the source code for the example.

This assertion can be verified:

$ ll 
total 52K
-rwxr-xr-x 1 user users  17K Sep 17 23:16 min0
-rw-r--r-- 1 user users  579 Sep 18 07:21 min0.cc
-rw-r--r-- 1 user users 2.8K Sep 17 23:16 min0.o

$ ldd min0
        linux-vdso.so.1 (0x00007ffd37919000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f79cb123000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f79cb389000)

We know that the C++ library is nowhere to be found in this program for three reasons:

So, then, let’s talk about the few things that are likely unfamiliar to the C programmer:

A Detailed Analysis for C Programmers

If you’re familiar with what you’ve seen so far, and understand all of the parts of the program shown above, you can skip this section.

Let’s start by showing the same program, but in pure C:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

typedef struct thing_t thing_t;
struct thing_t {
	uint32_t counter;
};

thing_t *thing_new() {
	fprintf(stderr, "Allocating %lu byte(s) of memory\n", sizeof(thing_t));
	thing_t *self = calloc(1, sizeof(*self));
	self->counter = 0;

	fprintf(stderr, "Object created.\n");
	return self;
}

void thing_delete(thing_t *self) {
	fprintf(stderr, "Object destroyed.\n");
	fprintf(stderr, "Freeing memory\n");
	free(self);
}

int
main(int argc, char *argv[]) {
	thing_t *t = thing_new();
	thing_delete(t);

	return 0;
}

The first thing you’ll notice is that—in this case—the C program is actually smaller. This is because we have only one type of object (Thing in the C++ program, and struct thing_t in the C program). But since the C functions thing_new() and thing_delete(thing_t *) perform two jobs (memory allocation and freeing, plus object initialization/finalization) instead of one, both jobs have to also be done by all such functions in C.

This brings us to the very first reason why a C programmer might want to use minimal C++: it allows us to separate the concerns of memory management from those of object management, resulting in code which is more clear than straight C (in C++, you do not typically see operator new or operator delete defined in the same source code module as any program logic; it is usually in its own module).

The next thing to notice is that in the thing_delete function, we explicitly pass a pointer to the object that must be freed. There are two problems with this:

C++ makes this easier for us: it allows us to delete a dynamically allocated object, which not only frees the memory, but calls the destructor (passing the pointer to the to-be-deleted object) so that finalization can occur. This means that if the object itself manages any resources outside of the memory that it requires, it can release those resources at that time. No extra involvement is required from the programmer.

Put a little differently:

Part 2: Object Oriented Programs

Experienced C programmers very often embrace the notion of “object oriented C,” which usually winds up looking similar to the struct thing_t example above: a data structure is defined, functions which alloc/init/finalize/free are written, an API header is created, and the C source and API headers are used in multiple programs to use the “object”.

A popular library in widespread use is GLib; it used to be part of GTK+ itself back in the v1.0 days, but then it was separated out since it was useful in non-graphical applications which are written in C. Perhaps the best way to describe GLib is that it is a general-purpose, low-level, mostly object-oriented set of libraries, complete with support for inheritance, virtual functions, interfaces, and more. However, all of the machinery that is required to support those functions is complicated and hard to understand; a “simple” example is provided in How to define and implement a new GObject, which shows that a significant amount of boilerplate code is required in order to use it in a crash-free manner.

In fact, proper usage of GLib from C is so difficult for so many people that a new programming language similar to C# but based around GLib and GObject (called Vala) was created to make it a simpler and easier thing to do, eliminiating the need for (programmer-provided) boilerplate code. The code that Vala generates is, however, significantly worse than what can be generated by your local C++ compiler. What’s more: the same idea can be expressed more concisely (and more legibly!) in C++ than in C+GObject, and likely with less memory and processor usage. Someday, perhaps, I’ll do some actual code size/performance tests between the two and write about it. Vala is legible enough, but it comes with its own issues and problems; when I initially evaluated it, it would happily generate code with implementation-defined or even undefined behavior.

In C++, a class declaration is fairly small and requires no preprocessor assistance. If you wanted to write a ring buffer in minimal C++, the header for it might look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#pragma once

#include <cstdint>
#include <cstdlib>

class RingBuffer {
private:
        struct Private;
        Private *mPriv;

public:
        RingBuffer(size_t length);
        ~RingBuffer();

        size_t alloc();                 // Buffer's total size?
        size_t size();                  // How much data contained?

        void append(uint8_t byte);
        void append(const char *cstr);
        void append(size_t len, const uint8_t *buffer);
        void read(size_t buflen, uint8_t *buf);
};

C programmers might find this example a little confusing. Here’s what’s new about it:

Some things that we do not notice about it right away, but would become clear when compiled and the symbols looked at:

Note that the implementation is left as an exercise for the reader, if desired.

What About Limitations?

There are a few limitations:

None of these limitations should be “painful” for a C programmer just trying to make the logic of a program more legible.

Conclusion

Honestly, the point is that C++ doesn’t have to be hungry for memory or processor cycles. A C++ program can be efficient, in terms of number of (compiled) instructions required in order to accomplish a task or function. It is possible for a C++ program to operate with deterministic memory usage, deterministic runtime, and without any “bloat” (that is, functionality that is not required for the program’s operation).

The bloat that we see in everyday software written in C++ is more of a function of programmers choosing to use abstractions which themselvees are built on a series of layers of abstractions, and not “first-degree” abstractions.

Some C++ functionality, such as RTTI and exceptions, are unavailable unless you go to extra effort to provide support for them. C programs do not use these sorts of features, anyway, and honestly, most programs get by wonderfully well without them.

Minimal C++ programs can be very tiny when using a lightweight standard C library, such as the MUSL libc or newlib. They will still be somewhat large if linked with glibc, particularly if the standard I/O portion of the library is used.

Thanks for reading.

If you appreciated this article (or anything else I’ve written), please consider donating to help me out with my expenses—and thanks!