I keep talking about—and getting asked about—something that I refer to as “Minimal C++”. To be clear, there are a few points I should start of by stating explicitly:
If you are a C programmer, this model may help you greatly with the expressiveness of your code: you truly give up nothing when it comes to the compact code that you’re used to, and you gain stronger type safety, more clear object oriented programming, and as a result, a program which is easier to read and reason about—which translates to fewer bugs, and increased productivity.
There is something else I should point out: some programs depend heavily on the abstractions found in the C++ standard libraries. Such programs are poor candidates for Minimal C++ as I describe it here. Truly, this is meant for C programs written in the C++ language. Programs which are designed to make heavy use of “idiomatic” C++ (or are dependent on modules or libraries which themselves do) are definitely not candidates for this model.
I make a few assumptions of the reader here:
If these assumptions do not fit you, this programming model may also not fit you. As always, your experience may vary.
NOTE: I am considering doing something of a mini-podcast series, assuming there is enough interest, aimed at programmers who don’t have two decades of low-level C programming under their belt. This post is something of a 30,000 foot overview. If you are interested in such a podcast series, drop me a line; contact information is in the footer.
The general idea is to use the C++ compiler to compile, and the C compiler to link, avoiding the use of C++ libraries which depend on anything outside of the standard C library itself.
In C, we call malloc(3)
or one of its friends to allocate memory.
When we are finished with that memory, the free(3)
function is called
on the pointer to that memory.
In C++, things are a bit different. C++ uses the new
and delete
operators,
which allow objects to be created (and initialized!) as well as destroyed
and the memory returned to the heap. This confers two significant benefits:
first, we do not need to know how much memory to allocate (the C++ compiler
will take care of that for us based on the size of the object, which is
determined at compile time), and second, the memory will be initialized
by a constructor function.
Here is a completely self-contained example which can be compiled to an object file using a C++ compiler, and then linked into a fully-functional program using only the C compiler frontend:
|
|
To compile this program:
g++ -std=c++11 -fno-exceptions -c -o min0.o min0.cc
And then to link this program:
gcc -o min0 min0.o
Finally, run it:
$ ./min0
Allocating 4 byte(s) of memory
Object created.
Object destroyed.
Freeing memory
You should recognize that the four bytes of allocated memory belong to the 32-bit integer; it should be pretty apparent at this point that a class is similar to a struct, but with extra “glue”.
This is a truly minimal C++ program, in the sense that it uses the C++ language,
only depends on the standard C library. The only two runtime components from
C++ that are required are the operators new
and delete
, and they are
supplied in the source code for the example.
This assertion can be verified:
$ ll
total 52K
-rwxr-xr-x 1 user users 17K Sep 17 23:16 min0
-rw-r--r-- 1 user users 579 Sep 18 07:21 min0.cc
-rw-r--r-- 1 user users 2.8K Sep 17 23:16 min0.o
$ ldd min0
linux-vdso.so.1 (0x00007ffd37919000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f79cb123000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f79cb389000)
We know that the C++ library is nowhere to be found in this program for three reasons:
gcc
compiler driver, and not g++
, to do the
linking (we could have done the linking by hand, but this is 2018, not 1996;
and we are not creating an operating system kernel or a statically linked
binary with a specified memory layout).libc.so.6
) and the
dynamic linker (“ELF interpreter”, ld-linux-x86-64.so.2
). The
linux-vdso.so.1
entry is not requested by the executable itself, but is
provided by the Linux kernel that I am running on. (You can verify this
by following along and using readelf -Wa min0 | less
on the executable).So, then, let’s talk about the few things that are likely unfamiliar to the C programmer:
this
in the code itself.new
and delete
, which are required by
the compiler when creating instances of the class on the heap. (These
operators are not used when creating instances of the class on the stack.)Thing
, and on line 37 we see how to free it as well. This
is where the operators new
and delete
are actually used. As we saw
when we ran the compiled executable, when we call upon the new
operator,
the memory for the object is allocated, and then the method Thing::Thing()
(known as the constructor) is run, and we see “Object created” on the
program’s standard error stream. When we call upon the delete
operator,
the method Thing::~Thing()
is run, and we see “Object destroyed” on the
program’s standard error stream, after which, the memory for the object is
freed.If you’re familiar with what you’ve seen so far, and understand all of the parts of the program shown above, you can skip this section.
Let’s start by showing the same program, but in pure C:
|
|
The first thing you’ll notice is that—in this case—the C program is actually
smaller. This is because we have only one type of object (Thing
in the C++
program, and struct thing_t
in the C program). But since the C functions
thing_new()
and thing_delete(thing_t *)
perform two jobs (memory allocation
and freeing, plus object initialization/finalization) instead of one, both jobs
have to also be done by all such functions in C.
This brings us to the very first reason why a C programmer might want to use
minimal C++: it allows us to separate the concerns of memory management from
those of object management, resulting in code which is more clear than
straight C (in C++, you do not typically see operator new
or
operator delete
defined in the same source code module as any program logic;
it is usually in its own module).
The next thing to notice is that in the thing_delete
function, we explicitly
pass a pointer to the object that must be freed. There are two problems with
this:
operator delete
to do it for us.thing_delete()
function, which might then
crash. Of course, since it is just a couple of fprintf
calls and a call
passing the pointer to free
, we know that in this particular case, we
can get away with passing a pointer to any memory which we got from any of
the allocator functions (malloc
and friends). But that won’t be the case
for anything but the most trivial of examples.C++ makes this easier for us: it allows us to delete
a dynamically allocated
object, which not only frees the memory, but calls the destructor (passing
the pointer to the to-be-deleted object) so that finalization can occur. This
means that if the object itself manages any resources outside of the memory
that it requires, it can release those resources at that time. No extra
involvement is required from the programmer.
Put a little differently:
Experienced C programmers very often embrace the notion of “object oriented C,”
which usually winds up looking similar to the struct thing_t
example above:
a data structure is defined, functions which alloc/init/finalize/free are
written, an API header is created, and the C source and API headers are used
in multiple programs to use the “object”.
A popular library in widespread use is GLib; it used to be part of GTK+ itself back in the v1.0 days, but then it was separated out since it was useful in non-graphical applications which are written in C. Perhaps the best way to describe GLib is that it is a general-purpose, low-level, mostly object-oriented set of libraries, complete with support for inheritance, virtual functions, interfaces, and more. However, all of the machinery that is required to support those functions is complicated and hard to understand; a “simple” example is provided in How to define and implement a new GObject, which shows that a significant amount of boilerplate code is required in order to use it in a crash-free manner.
In fact, proper usage of GLib from C is so difficult for so many people that a new programming language similar to C# but based around GLib and GObject (called Vala) was created to make it a simpler and easier thing to do, eliminiating the need for (programmer-provided) boilerplate code. The code that Vala generates is, however, significantly worse than what can be generated by your local C++ compiler. What’s more: the same idea can be expressed more concisely (and more legibly!) in C++ than in C+GObject, and likely with less memory and processor usage. Someday, perhaps, I’ll do some actual code size/performance tests between the two and write about it. Vala is legible enough, but it comes with its own issues and problems; when I initially evaluated it, it would happily generate code with implementation-defined or even undefined behavior.
In C++, a class declaration is fairly small and requires no preprocessor assistance. If you wanted to write a ring buffer in minimal C++, the header for it might look something like this:
|
|
C programmers might find this example a little confusing. Here’s what’s new about it:
struct ringbuffer_private_t
. The full name of the
struct
’s type in this example is struct RingBuffer::Private
.operator new
and operator delete
.c
to their
name and dropping the .h
.Some things that we do not notice about it right away, but would become clear when compiled and the symbols looked at:
alloc
function will be RingBuffer::alloc
in the C++
source code, but in the binary it will be something like
_ZN10RingBuffer5allocEv
(when compiled with any compiler conforming to the
Itanium ABI).virtual
, but this is a more advanced
topic. (Some research should clear the issue for programmers not familiar
with the whole virtual/non-virtual thing).Note that the implementation is left as an exercise for the reader, if desired.
There are a few limitations:
dynamic_cast
operator
provided by C++ is unavailable to such programs.None of these limitations should be “painful” for a C programmer just trying to make the logic of a program more legible.
Honestly, the point is that C++ doesn’t have to be hungry for memory or processor cycles. A C++ program can be efficient, in terms of number of (compiled) instructions required in order to accomplish a task or function. It is possible for a C++ program to operate with deterministic memory usage, deterministic runtime, and without any “bloat” (that is, functionality that is not required for the program’s operation).
The bloat that we see in everyday software written in C++ is more of a function of programmers choosing to use abstractions which themselvees are built on a series of layers of abstractions, and not “first-degree” abstractions.
Some C++ functionality, such as RTTI and exceptions, are unavailable unless you go to extra effort to provide support for them. C programs do not use these sorts of features, anyway, and honestly, most programs get by wonderfully well without them.
Minimal C++ programs can be very tiny when using a lightweight standard C library, such as the MUSL libc or newlib. They will still be somewhat large if linked with glibc, particularly if the standard I/O portion of the library is used.
Thanks for reading.
If you appreciated this article (or anything else I’ve written), please consider donating to help me out with my expenses—and thanks!