On FatELF, or “Because 140 characters isn’t enough for a discussion”
So, I have someone on Identi.ca (@flameeyees@identi.ca) discussing with me me about my views on FatELF. No biggie, but trying to continue the argument (pointless as it is) there is just too much work: the character limit does not permit real discussion on such a complex issue. So, permit me to address each of the issues raised as I understand them and rebut. Then conversation can continue, if at all desired (though seriously, I don’t know that *I* desire to do so).
First point: FatELF would be useless because “you can do that already, write a cc frontend that compiles the same file multiple times, it’s _not_ hard, I’ve done it before“. Okay, so the proposed solution here is to write a compiler driver that will interpret arguments and, from a single Makefile, build for multiple platforms. There might even be something out there for that, but simply put, if GCC supported this feature intrinsically, then everyone would have it and it would be done in a standard way. Free software works better when everyone can agree on a single standard way of doing things, and not just a single standard template for how it might be done. Using addons to perform this function still yields multiple binaries that have to be shipped anyway, which is decidedly not the aim.
Second point: “how is shipping one (fat) binary ‘better’ than shipping one auto-extracting auto-deciding archive?” Making the assumption that the toolchain and kernel all support the feature as a standard thing here, the difference is simple: the kernel ELF loader would be able to decide which sections of the ELF file should actually be loaded in memory, read only those sections, and go on about its business normally—the rest of the process would not need to change in any way. No temporary copies need to be made, no images need to be extracted, nothing like that has to be done. However, the inverse is quite a different story. Let’s make the assumption that you’re using a POSIX shell script, with the archive of all of the possible binaries appended to the POSIX shell script. First, the script has to be prepended to EVERY such archive (meaning that different versions of the script could exist, and as any programmer knows, DRY), and the script is not going to be trivial: it would have to have code to detect and support every single individual platform. Furthermore, it would require that the user have permission to extract the payload, make it executable, and run it. This is the same deficiency that makes gzexe impractical for everyday use; I know that at least on all the servers that I manage, /tmp is mounted read-write but with execution of scripts and binaries disabled. Finally, it would fail to properly work in the event that something needed to be setuid—that information would have to be in the payload itself, which is absolutely not portable from one system to another. It just cannot be made to work in a generic enough fashion to be reliable on all different types of platforms with different administrative decisions made in the management of those platforms, and in many cases would require an increased attack surface just to be made workable.
However, if FatELF (or, honestly, anything that is truly equivalent) were used, an administrator could copy the binary from one system (say, an x86) to another system (say, a PowerPC) that has all of the other dependencies filled for it, drop it on the filesystem, chown/chmod it once, and it would Just Work. setuid, if needed, would be honored by the kernel, and no extraction has to take place. No additional temporary disk space would be required, nor would it be necessary to incorporate any logic into the ad-hoc “loader” (if it could even be called that) to try to find a filesystem that is read-write with execution permitted for the current user, and therefore no special privileges from the user would be necessary.
In fact, the only way to solve the problem reliably at present would be to have something like /var/cache/adhoc-fat-binaries, and have all ad-hoc “fat binaries” be setuid 0 (or setuid to some user that has all necessary privileges to make something setuid 0 if necessary, probably only UID 0 has that privilege on most systems) so that it could (a) write to /var/cache/adhoc-fat-binaries and (b) set the setuid or setgid bits if necessary for the program to fulfill its function. And it deserves to be restated: we all know that having a single specific standard and adhering to it—even when the standard is less than ideal (and in some cases, like X11, falls quite short of ideal)—is far better than having 100 different and incompatible ways to do the same thing. It’s one of the things that we people in free software know pretty damn well.
See, I don’t see something like FatELF being used for distribution binaries, or anything that would be distributed in an operating system distribution package, except perhaps in special situations where something like biarch is natively supported on the hardware and it would be feasible to permit that sort of flexibility. Instead, I see something like my current situation: I administer several machines for small businesses, and not all of them are the same hardware platform. They are all the same operating system and many of them have the same libraries installed. Some of them are 64-bit and some are 32-bit. Some are x86, some x86-64, and some are neither. But I would very much like to write a single program, say “make” and copy the file to every machine so that it just works. For the moment, if I want something like that, I have to just use something like Java, C#, or a script. Or, if I need something setuid, I do it in C and compile it for every system, shipping the source code file to the systems instead. But it would be more efficient to not have to do that. That is why I would see FatELF being a “good thing”.
I know that I am in the minority.
That brings me to point three: “because in 99% of all usage, the kernel won’t _need_ it. And its cost in effort and overhead would be higher.” For this next part of my post here, I am going to be looking at the Linux kernel, version 2.6.34, which I have just downloaded from kernel.org, which is 64 MB compressed (using bzip2!) and takes up 442 MB when uncompressed, before touching any file in the tree. Now, I am looking at this for x86-64 because that is the system I am running on and typed “make menuconfig”.
Who needs any of the following options? I am willing to bet that the following options are not needed in 99% of all (desktop, server, and embedded, combined) usage:
- Processor type and features/Support for extended (non-PC) x86 platforms
- Processor type and features/Maximum number of CPUs
- Processor type and features/Memory model
- Processor type and features/Build a relocatable kernel
- Executable file formats / Emulations/Kernel support for ELF binaries
- Executable file formats / Emulations/Kernel support for MISC binaries
- Executable file formats / Emulations/IA32 Emulation
- Executable file formats / Emulations/IA32 Emulation/IA32 a.out support
- Networking support/Plan 9 Resource Sharing Support (9P2000) (Experimental)
- File systems/Second extended fs support
- File systems/Reiserfs support
- File systems/JFS filesystem support
- File systems/XFS filesystem support
- File systems/GFS2 file system support
- File systems/OCFS2 file system support
- File systems/Dnotify support
- File systems/Kernel automounter support
- File systems/Kernel automounter version 4 support (also supports v3)
- File systems/FUSE (Filesystem in Userspace) support
- File systems/FUSE (Filesystem in Userspace) support/Character device in Userpace support
I can’t even go on. Twenty is enough; I think I have made my point. In 99%+ of all situations, these options are either always on or always off. They are rarely modified. And the kernel still supports a.out from IA32′s really old days‽ Seriously?
What does this tell me? It tells me that FatELF—or anything else that came along and did something like what FatELF would do—has room in the kernel. And if it were for whatever reason incompatible with current ELF (as it would very likely be) then the kernel could still support “old” ELF, without any of the extra fields or sections.
And actually, there is a great deal of possibility around something entirely different altogether. FatELF isn’t the most technically elegant thing I can think of to solve the problems that it solves, but I have yet to see something else seriously proposed. I can think of something even better, actually. We are all taught that operating systems are here to abstract us from hardware, so that we can write applications and not have to worry about communicating with the hardware directly because the OS handles those details for us. Well, if that is the case, then why don’t operating systems also abstract the system’s processor? Why don’t we have operating system kernels that provide a virtual instruction set? Yes, I am talking about essentially moving the application VM into an operating system kernel, though ideally with some supporting utilities in userspace to do things like hold persistent JIT caches and so forth. However, that’s for another post, another time.
lu_zero 23rd June 2010
I think the whole idea (stitching binaries together for “easy” fruition) is completely wrong on many levels:
Right now _the_ way to install something on a distribution is calling
$pkgmanager $installcommand name
and be done with it (or using a graphical front-end with more or less the same immediate fruition). That’s the simplest way for an user to get his software.
setting up a repo for custom packages is quite easy at least on the distributions I know, same getting a package for a target system using cross-development tools.
I’m not even starting to consider why technically the FatELF idea is wrong as implementation or how the whole proposal had been mismanaged, in my opinion it is completely pointless as usecase.
Michael Trausch 23rd June 2010
The biggest usecase that I know of for something like this—like many of the features in the Linux kernel—is in a business environment. Mostly a very small to normally small business (with 1 external IT consultant or 1 part-time IT employee), but a business environment nonetheless. It’s not for end-users, it’s not for “the desktop”, it’s not for “the embedded device”, nor is it something for “the user experience” or what-have-you. It is for the overworked person who wants to just setup his development environment once and say “for i in $(cat ~/machines); do rsync ~/path/to/fat-binary $i:/usr/local/bin/fat-binary; done” to deploy the thing, without worrying about what platforms he is managing or running on (so long as it is properly-built software, it will run because all of the libraries will exist, or it will be a statically linked binary).
This is likely for small businesses where the people writing the software are also the people
buyingusing (I really don’t know how I made that typo) the software, where every machine that is purchased is purchased individually because the business does not buy in bulk because it need not do so (and it would be a waste of money), and where the next system that is purchased for something is whatever the least expensive thing is, because the operating system itself is supported on just about any easily obtainable platform. It would seem to be to be extremely unlikely to have a use outside of that environment, much like signed binaries would not (but would require modifications in all the same places of the toolchain as FatELF to be truly useful).lu_zero 24th June 2010
I hope you are aware you are again throwing a large quantity of words for a situation that is again an use case quite broken. IF you maintain a system properly usually you end up with an uniform distribution and a local repository for your local programs.
if you have that adding something would be just doing a
dsh -g office $deploy stuff
at least that is what’s the best and proper solution. since rsync will not track files, won’t let you update/cleanup properly.
All distributions let you maintain systems in a quite clean way once you spend _once_ the time to setup properly a local repository.
Even with Java and python the model you are proposing doesn’t work that well.
Michael Trausch 16th July 2010
@lu_zero: I am not going to spend the time to reiterate my point, since your own opinion weighs so heavy that it clouds your ability to see other options. You can certainly take a Java or Python program and share it on a heterogeneous network using a shared network filesystem, and you should be able to do the same with natively compiled software, period. But as long as people block such a reasonable enhancement, it will not happen—so what’s the point in wasting the effort, right?