Sometimes, learning happens strangely
So, I was dinking around in Google Reader yesterday and reading my Daily WTF, when I came across this wonderful post. Of course, upon seeing the code listed in that post, I nearly excremented bricks, the code listed there being obviously stupid (I would so fire anyone that used 14 levels of nested if statements when they are obviously not necessary, assuming that I were a manager of programmers). Anyway, I followed the link to the problem the code solved, and started poking around that Web site, and found myself wildly fascinated with it.
So, I signed up for it and decided that I would start with the first problem and work my way through them, using the problems to also try to learn or confirm various things that I know/think are true. I’ve been using the C# programming language for some time now, so I thought, “What the hell, I’ll go ahead and do these and see what I learn.” Having worked the first problem, I found some rather interesting things out about the Mono and .NET runtimes.
Firstly, I learned that the .NET runtime’s implementation of DateTime and/or TimeSpan sucks (my guess is DateTime is what sucks). It is not capable of measuring very small amounts of time, say smaller than one hundredth or one thousandth of a second, and treats those small amounts of time as if they were zero time at all. Which means that under .NET 2.0, I cannot measure things like I can under Mono. Oh, well, I suppose.
I wrote a small program to compute the answer to the first problem, and wrote a small library routine that computes the time spent computing the solution. The program runs each method twice (because the “JITter” compiles methods to native code upon the first invocation of a method), and measures the execution time each time.
So, I implemented 5 different solutions to the problem.
public static void BruteForceMethod() {
int total = 0;
for(int i = 1; i < 1000; i++) {
if((i % 3 == 0) || (i % 5 == 0)) {
total += i;
}
}
Console.WriteLine("The answer is {0} (brute force method).", total);
}
This is what I call the “brute force” method. It is simple: It checks every single number to determine if it is evenly divisible by either 3 or 5. If so, it adds it to the total. Pretty clear.
public static void SetMethod() {
int total = 0;
for(int i = 3; i < 1000; i += 3) {
total += i;
}
for (int i = 5; i < 1000; i += 5) {
if(i % 3 == 0) continue;
total += i;
}
Console.WriteLine("The answer is {0} (set method).", total);
}
This method, I call the "set" method, because it breaks down the space a bit; instead of stepping through 1 to 999 using an increment of 1, it steps through 3 to 999 with a step of 3 and then 5 to 999 with a step of 5. In the second pass, it checks to see if the number is _also_ evenly divided by 3, and if so, doesn't add it a second time.
public static int HelperMod(int target, int modBy) {
int tmp = target / modBy;
return(modBy * tmp * (tmp + 1) / 2);
}
public static void HelperMethod() {
int target = 999;
int total = HelperMod(target, 3) + HelperMod(target, 5) -
HelperMod(target, 15);
Console.WriteLine("The answer is {0} (helper method).", total);
}
These two methods together are the “helper method”. I didn’t come up with this one, one of the other problem solvers did, in Ruby (though their implementation was a slight bit different, the method of operation is the same). This one uses more in-depth math to perform the computation, instead of a larger number of simpler math operations. Specifically, the function:
… where x is the interval and y is the inclusive upper limit on the natural number, provides the result. The middle value is the calculation I extracted from the Ruby code, and the simplified form on the right is how it’d be done if it were being done by hand (and is far less confusing for humans).
public static void TableMethod() {
int[] numbers = { 10, 100, 102, 105, 108, 110, 111, 114, 115, 117,
12, 120, 123, 125, 126, 129, 130, 132, 135, 138,
140, 141, 144, 145, 147, 15, 150, 153, 155, 156,
159, 160, 162, 165, 168, 170, 171, 174, 175, 177,
18, 180, 183, 185, 186, 189, 190, 192, 195, 198,
20, 200, 201, 204, 205, 207, 21, 210, 213, 215,
216, 219, 220, 222, 225, 228, 230, 231, 234, 235,
237, 24, 240, 243, 245, 246, 249, 25, 250, 252,
255, 258, 260, 261, 264, 265, 267, 27, 270, 273,
275, 276, 279, 280, 282, 285, 288, 290, 291, 294,
295, 297, 3, 30, 300, 303, 305, 306, 309, 310,
312, 315, 318, 320, 321, 324, 325, 327, 33, 330,
333, 335, 336, 339, 340, 342, 345, 348, 35, 350,
351, 354, 355, 357, 36, 360, 363, 365, 366, 369,
370, 372, 375, 378, 380, 381, 384, 385, 387, 39,
390, 393, 395, 396, 399, 40, 400, 402, 405, 408,
410, 411, 414, 415, 417, 42, 420, 423, 425, 426,
429, 430, 432, 435, 438, 440, 441, 444, 445, 447,
45, 450, 453, 455, 456, 459, 460, 462, 465, 468,
470, 471, 474, 475, 477, 48, 480, 483, 485, 486,
489, 490, 492, 495, 498, 5, 50, 500, 501, 504,
505, 507, 51, 510, 513, 515, 516, 519, 520, 522,
525, 528, 530, 531, 534, 535, 537, 54, 540, 543,
545, 546, 549, 55, 550, 552, 555, 558, 560, 561,
564, 565, 567, 57, 570, 573, 575, 576, 579, 580,
582, 585, 588, 590, 591, 594, 595, 597, 6, 60,
600, 603, 605, 606, 609, 610, 612, 615, 618, 620,
621, 624, 625, 627, 63, 630, 633, 635, 636, 639,
640, 642, 645, 648, 65, 650, 651, 654, 655, 657,
66, 660, 663, 665, 666, 669, 670, 672, 675, 678,
680, 681, 684, 685, 687, 69, 690, 693, 695, 696,
699, 70, 700, 702, 705, 708, 710, 711, 714, 715,
717, 72, 720, 723, 725, 726, 729, 730, 732, 735,
738, 740, 741, 744, 745, 747, 75, 750, 753, 755,
756, 759, 760, 762, 765, 768, 770, 771, 774, 775,
777, 78, 780, 783, 785, 786, 789, 790, 792, 795,
798, 80, 800, 801, 804, 805, 807, 81, 810, 813,
815, 816, 819, 820, 822, 825, 828, 830, 831, 834,
835, 837, 84, 840, 843, 845, 846, 849, 85, 850,
852, 855, 858, 860, 861, 864, 865, 867, 87, 870,
873, 875, 876, 879, 880, 882, 885, 888, 890, 891,
894, 895, 897, 9, 90, 900, 903, 905, 906, 909,
910, 912, 915, 918, 920, 921, 924, 925, 927, 93,
930, 933, 935, 936, 939, 940, 942, 945, 948, 95,
950, 951, 954, 955, 957, 96, 960, 963, 965, 966,
969, 970, 972, 975, 978, 980, 981, 984, 985, 987,
99, 990, 993, 995, 996, 999 };
int total = 0;
foreach(int num in numbers) total += num;
Console.WriteLine("The answer is {0} (table method).", total);
}
In this method, I precomputed the numbers that needed to be added together (using (for i in 3 5; do for j in $(seq 0 $i 999); do echo $j; done; done) | sort | uniq | tr '\n' ',' | sed 's/,/, /g;s/, $/\n/g;', goodness Unix is great) and put them into an array, adding each of the values of the array together to come up with the total.
Finally, there’s this method:
public static void PrecomputedMethod() {
int total = 233168;
Console.WriteLine("The answer is {0} (precomputed).", total);
}
This method simply contains the answer and prints it.
So, what can be learned from this? Let’s take a look at a single run:
Project Euler: Problem 1 Find the sum of all the multiples 3 OR 5 that are below 1,000. The answer is 233168 (brute force method). Execution time: 00:00:00.0184820 The answer is 233168 (brute force method). Execution time: 00:00:00.0000600 The answer is 233168 (set method). Execution time: 00:00:00.0001280 The answer is 233168 (set method). Execution time: 00:00:00.0000100 The answer is 233168 (helper method). Execution time: 00:00:00.0001300 The answer is 233168 (helper method). Execution time: 00:00:00.0000070 The answer is 233168 (table method). Execution time: 00:00:00.0001870 The answer is 233168 (table method). Execution time: 00:00:00.0000110 The answer is 233168 (precomputed). Execution time: 00:00:00.0000650 The answer is 233168 (precomputed). Execution time: 00:00:00.0000070
So, we run through each of these twice, and display the time computed to be the runtime for each method. (I’ll show just how I did that here in a little bit, it was pretty simple.) The obvious thing is that for each method, the first execution was a good deal slower than the second method, usually by more than an order of magnitude. That shows the delay created by JIT compilation of the method. It also shows that the very first brute-force method JIT takes longer than any of the others. My hypothesis there was that the methods called after the first one benefited from some sort of caching (probably at the hardware level) in working with the opcodes for the for loop in the Mono runtime. It wouldn’t make sense for it to be a delay in JIT itself, since by the time we start measuring things, the JIT has already compiled at least 5 different methods, and probably more depending on the internal implementation of things such as System.Console.Write and System.Console.WriteLine. By changing the order of execution and moving the brute force method to be the second one tested, that would seem to be the case:
Project Euler: Problem 1 Find the sum of all the multiples 3 OR 5 that are below 1,000. The answer is 233168 (set method). Execution time: 00:00:00.0179120 The answer is 233168 (set method). Execution time: 00:00:00.0000600 The answer is 233168 (brute force method). Execution time: 00:00:00.0001400 The answer is 233168 (brute force method). Execution time: 00:00:00.0000410 The answer is 233168 (helper method). Execution time: 00:00:00.0001310 The answer is 233168 (helper method). Execution time: 00:00:00.0000080 The answer is 233168 (table method). Execution time: 00:00:00.0001760 The answer is 233168 (table method). Execution time: 00:00:00.0000100 The answer is 233168 (precomputed). Execution time: 00:00:00.0000670 The answer is 233168 (precomputed). Execution time: 00:00:00.0000080
Now, the set method contains the longest delay. That is an interesting find; it would seem to confirm that the JIT benefits somewhat from some form of caching. Just what caching, I don’t exactly know, and I don’t know how to find that out just yet.
In any event, we know that the first time we call each of these methods, the JIT converts the method from IL bytecode into native object code, and then executes the native object code. Pretty nice, that. On subsequent runs of the method, no more JIT needs to be done, and so the method’s object code executes directly, since that is retained in memory by the Mono runtime.
Now, let’s take a look at the runtimes for each method type, specifically, the second run of each, from the first run above:
| Method | Execution Time |
|---|---|
| Brute Force | 0.0000600 sec |
| Table | 0.0000110 sec |
| Set | 0.0000100 sec |
| Helper | 0.0000070 sec |
| Precomputed | 0.0000070 sec |
These are interesting in that they show what’s more efficient, relatively speaking. The “brute force” method is clearly the loser. Though if you look at the second output above, you’ll notice that the “set” method lost when it was first in line. Still, though, we can see from the two sets of numbers that the brute force method takes longer than the set method even when run in the same place. That’s not surprising, because it is a stupid for loop compared to two more intelligently designed for loops. The fact that the table method performs fairly similarly to the set method was expected, and shows that they are roughly the same in terms of efficiency.
Perhaps what I found the most interesting was the execution time of the helper method. It’s nearly the same (well, it is the same, in this run) as the static answer, even though there are 3 additional function calls. The JIT compiler can inline methods when it sees fit, and I am going to wager a guess that is exactly what it did, so the helper method winds up being nothing more than a calculation which then gets printed. The interesting part is that both methods compile to what one would expect in IL bytecode (that is, the method calls and such are preserved), so any inlining that is happening would be happening at runtime, when the JIT compiler evaluates HelperMethod() and HelperMod().
Alright, so this made me more curious, and I wanted to see what would happen if there were an AOT compiled version of the code on my system. So I compiled both the executable and the small library that I made to time execution using Mono’s --aot flag. I saw no difference; I am going to guess for the time being that the version of Mono in Intrepid doesn’t do AOT quite right. It’s lacking some of the functionality of Mono 2.0 that enables you to force the AOT binaries to be the only thing executed, and so I am not quite sure what is happening there. It’s not majorly important for my purposes, though, so I’m just not going to pursue that any further now.
The only lingering question left is “how efficient is the JITted code compared to native binary code?” To answer that, of course, the program has to be done in C. Not a big issue, here’s the output from a compiled C version, compiled using gcc -o p1 p1.c, along side with the first run from earlier, with the program info header removed for space reasons.:
| C Version | C# Version |
|---|---|
The answer is 233168 (brute force method) Execution time: 00:00:00.0000460 sec The answer is 233168 (brute force method) Execution time: 00:00:00.0000300 sec The answer is 233168 (set method) Execution time: 00:00:00.0000110 sec The answer is 233168 (set method) Execution time: 00:00:00.0000110 sec The answer is 233168 (helper method) Execution time: 00:00:00.0000050 sec The answer is 233168 (helper method) Execution time: 00:00:00.0000070 sec The answer is 233168 (table method). Execution time: 00:00:00.0000160 sec The answer is 233168 (table method). Execution time: 00:00:00.0000090 sec The answer is 233168 (precomputed). Execution time: 00:00:00.0000030 sec The answer is 233168 (precomputed). Execution time: 00:00:00.0000030 sec |
The answer is 233168 (brute force method). Execution time: 00:00:00.0184820 The answer is 233168 (brute force method). Execution time: 00:00:00.0000600 The answer is 233168 (set method). Execution time: 00:00:00.0001280 The answer is 233168 (set method). Execution time: 00:00:00.0000100 The answer is 233168 (helper method). Execution time: 00:00:00.0001300 The answer is 233168 (helper method). Execution time: 00:00:00.0000070 The answer is 233168 (table method). Execution time: 00:00:00.0001870 The answer is 233168 (table method). Execution time: 00:00:00.0000110 The answer is 233168 (precomputed). Execution time: 00:00:00.0000650 The answer is 233168 (precomputed). Execution time: 00:00:00.0000070 |
Clearly, that is faster. Of course, we already knew that would happen because 100% of the compilation for a C program is done by the C compiler, the assembler, and then the binary is put together by the linker for a specific operating system and CPU architecture combination. But… look at those numbers again, and there is one thing that is unexpected—at least it was to me. The helper method has the same execution time when JITted under Mono as when compiled ahead of time by GNU GCC for one of the invocations. The set method was pretty close, too. And comparing the JITted code execution time with the code output by GCC shows that Mono’s JIT compiler isn’t doing a horrible job with compiling things. It could do better, but it’s not doing bad.
If you want to check it all out yourself, the both the C code and the C# code are available here: Source code for Euler Problem 1 (C and C#). But, having learned all of this, and compared the two programs together, I have to say that I am pretty pleased to see how well Mono performs something like this. It’s surely a very, very small sample, and it’s not an authoritative, tell-all benchmark—just some informal little tests, running things a few more times, and seeing what happens. But it’s telling enough to help figure things out when trying to determine, say, how much the delay on the first invocation to a method will be, or something like that. It’s fascinating to me, anyway. And I think that it shows the potential that the Mono runtime has, too. There’s no (technical) reason to avoid using Mono, clearly.