Archive for the ‘Uncategorized’ Category

Real Modules for C++

May 27th, 2008 by olsner

C++ sucks. C++ needs a proper module system, where you can actually separate modules from each other rather than tangle them together in circular header include trees. This is it! Or, rather, when thinking about it has terminated and culminated into the start of an implementation, this will be the start of it that started it. Think of it as a collection of a few fluffy ideas that will transform C++ hell into the cozy wonderland it should be ;-)

I’m not entirely sure about this, but maybe I’ll call it M++ as in “C++ with Modules”. In any case, this will become a C++ dialect as there is no hope of not breaking any code. There will also probably never be a very easy transition path for existing large-ish C++ codebases. (That’s not even a goal until this thing is sufficiently awesome to motivate someone to convert a large body of C++ code to it…)

What is M++?

Basically, it is C++ with “modules”, where modules are somewhat like ordinary C++ translation units, but with special magic for how names are imported between modules.

Importing a module imports a well-defined set of names into the local scope. Contrast with C/C++ where module importing is implemented by inclusion of header files. (You know this already, but I’ll repeat it for the sake of rhetoric) Header files may #define just about anything and wreak whatever havoc on the environment of following files. This means that any kind of higher-level reasoning on the effects of changes in header files on the actual code is pretty much moot. Any compiler must recompile every included header file from every including source file every time any part of this pool of mud changes. A wonky define in module A may produce error messages in system headers included from module B, while compiling module C. Modules A and B may not even be your code, or code you can’t change. World of woe!

With a proper module system, declarations from different modules will never clash, except maybe in the module that is importing conflicting names from more than one module. But that would be caused by code in your module, and you would have the tools to solve the problem!

Guiding principles

  • Keep as much as possible of the syntax and semantics of C++
  • Remove the need for preprocessor inclusion
  • Keep the preprocessor around for e.g. importing external interfaces
  • Replace the current text-level importing with a symbol-table-level import
  • Provide good means of separating unrelated components by:
    • Limiting the set of exported symbols from components
    • Providing easy means to cherry-pick subcomponents (using namespace::name;)
    • Providing robust non-conflicting importing of whole components (using namespace;)

Basic proof-of-something example

For context, this module would reside in a file Main.mpp (maybe even just call these files cpp?), and exports the class Main::Foo, the method Main::Foo::Bar and the function Main::main(…).

// This is where the *really* cool part is. These modules (namespaces) are
// automatically imported by searching the system module path. Either they
// are defined by local source tree cpp (mpp?) files named e.g. Win32.cpp
// and Net/HTTP.cpp (Net_HTTP.cpp would also work), or they could be
// binary self-describing modules. Nothing said here on how that binary
// self-description would look. That's the magic left as an exercise for the reader.
using namespace Win32;
using namespace Net::HTTP;
 
// These are some provisional ideas on how to wrap the C/C++ standard library in
// M++ form. More on importing legacy C/C++ functions and classes later on.
using stdlib::atoi;
using stdio::printf;
 
 
// This is a private function. It would not be linkable from other translation
// units (default linkage in the top-level is static), but is usable from all
// definitions in the Main namespace below.
void Foo_Bar()
{}
 
 
namespace Main
{
    // Normal classes here
    class Foo
    {
    public:
        void Bar();
    }
 
    // Maybe the main function could move into the Main namespace from the
    // global namespace like this:
    int main(int argc, char *argv[])
    {
        Foo foo;
        foo.Bar();
        Foo_Bar();
        printf("Baz\n");
    }
}

Unresolved/other issues

  • How do you expose things in the global namespace from M++ modules (these exposed names should follow the relevant C++ ABI and mix-and-match with C++ code.)

This one is slightly harder than the latter question. Many solutions here are bad. So I think I’ll just let this one noodle for a while ;-)

  • How do you import external C or C++ classes/functions into M++?

I’m thinking you’d #include the external headers in one module, then use using ::asdf; to import the top-level declarations into the namespace exported from that module. This means that a M++ compiler must be able to understand the full wonderful ambiguity of C++. But hopefully, the modularization through use of namespaces would mean that only one of a large number of modules need to go through the hassle of actually parsing all that crud in order to build a small symbol table.

One remaining issue is how to distribute the kind of macros that are required/useful (i.e. file/line-tracking allocation functions, asserts, debug/release-dependent code). Would you be including some small number of headers into each module to do that, would the module system somehow get involved in preprocessing and let macros be imported as part of a namespace?

Getting rid of macros in the first place is a pretty good idea anyway, but exactly how far can you practically take it? Some things like stdint.h define a large quantity of useful macros. In an ideal world, these would be constant varibables const int INT_MAX; etc defined in a suitable namespace (perhaps something like stdint, since they come from stdint.h).

Ant sucks.

March 29th, 2008 by olsner

I’ve recently had the opportunity to use Ant quite heavily in a project at work, and have thus come to realize it sucks. Quite hard.

Contents

  • Contents
  • The structure of an Ant “project”
  • What’s a build script anyway?
  • How Ant does it
  • Modelling a build process
  • Refactoring: Worsening the Design of Existing Code
  • Bonus chapter
  • Conclusion

The structure of an Ant “project”

The top-level element of an Ant project is the Project, represented by an XML element called project. This entity does not actually do anything useful.

A project consists of a list of definitions of properties, macros and targets, mixed with a number of miscellaneous directives (like import). That’s the actual stuff – the need for the top-level node seems to be purely a side-effect of XML requiring, for no good reason, a single specific top-level node.

Contrast the Makefile – realizing that there is no real need for a “project” thingy, the Makefile puts properties (variables) and targets directly on the top-level.

I shouldn’t bring up syntax this early in the flame, but I really must.

Compare this, the simplest Ant file I can come up with:

<project name="useless" default="all">
    <property name="foo" value="bar" />
 
    <target name="all">
        ... <!-- see below -->
    </target>
</project>

And the corresponding Makefile:

foo=bar
 
all:

See the difference? How much of the Ant file conveys actual contents? How much is just crud to explicitly write out the entire Ant AST in the rawest, most explicit form possible? At this point, I’m thinking that the writers of Ant have entirely missed … everything that the field of CS has come up with when it comes to programming (and other) languages? But they certainly noticed the invention of XML.

What’s a build script anyway?

The thing in common between most build systems is what I’ll call the dependencies->target structure. That is, when the build tool is run, you give it a set of targets (or use the default as defined by the ant/make file), and the tool recursively builds all targets’ dependencies before building the target, building each target according to some description of what it’s supposed to do to build it and when it’s supposed to be built.

Ant is no different there; and make is the canonical example of such a system.

The difference lies in how the systems determine what to build. Make has an intrinsic notion of targets and dependencies being files (rather, it has a notion about file targets and “phony” targets, defined by the Makefile) – if any of the dependencies are newer than the target, the target is rebuilt. If any of the dependencies were out-of-date and rebuilt, their dependents are rebuilt automatically by make.

This is probably old news to you, but I want to emphasize the effect of this property of make: provided with a correct set of dependencies, make does the right thing. And with make’s ability to automatically check files, it is very easy to provide that correct complete set of dependencies.

(Granted, it does get harder in make with things like automatically discovering C/C++ header file dependencies, so many makefile writers google for recipes for automatically maintaining and updating dependency files and copy-paste this into their Makefile. But once this code is in the Makefile, make can take care of keeping the dependency files updated just as any other targets are kept updated.)

How Ant does it

As opposed to make, ant doesn’t encourage this kind of specification of recursive dependencies. The easiest way to write an Ant file is to take the shell script you wrote (or memorized!) for building everything and translate this line-by-line into the corresponding Ant syntax for running things in a series. Let’s say that the series of shell commands includes the generation of a generated whizbang file, like this:

javac Generate.java
java Generate Whizbang.whizbang Whizbang.java
javac Whizbang.java Foo.java Bar.java

It is relatively straight-forward to translate this into Ant syntax. But do take note of how irksome it is to do even simple stuff in Ant!

<project name="useless" default="all">
    <property name="foo" value="bar" />
 
    <target name="all">
        <javac>
            <!-- god forbit you write a path without putting it in an attribute of an XML node -->
            <src path="Generate.java" />
        </javac>
        <java>
            <arg text="Generate" />
            <!-- yes. please. put. every. string. in. a. separate. XML. node. thank. you. -->
        </java>
        <javac>
            <!-- repeat spanking -->
        </javac>
    </target>
</project>

See, Ant is bash-in-XML, with all the pain and none of the benefit. Every word becomes an attribute of an XML node inside the XML node specifying the “verb” of the action. Being able to put the word javac or java directly in the node name is just an effect of ant being so geared towards Java. For any other program, you’ll end up with something like <exec><command name="java" /></exec>. Ugly!

Actually, previous versions of Ant supported putting commands as text inside XML nodes, like <exec>gcc -o target</exec>. This has since been deprecated. The producers of angle bracket keys probably bribed the Ant maintainer at the time.

Modelling a build process

This brings me to possibly the most important part of this whole comparison slash rant. The model of a build process enforced by Ant.

Ant: A project consists of a list of “targets” to run, each containing a list of commands and a list of other targets to run before those commands

Make: A Makefile consists of a list of targets (files or phony), each listing its dependencies and a list of commands to run to update the target from those dependencies

Notice the difference between updating a target from its updated constituent parts and building a target by executing dependency targets and commands. In Ant, all you’re really doing is calling functions or post-order traversing a tree of targets.

Refactoring: Worsening the Design of Existing Code

The way an ant file is refactored into independent parts is to extract a bunch of commands doing one thing (for example mkdir cpp-objects followed by one build command for each object), into its own target (for example <target name=cpp-objects>) and then either inline calling that target with <ant ...> (this is just like a function call), or by adding that target as a dependency. The end result is a large set of phony targets that aren’t linked to their products or sources. What have been gained? The resulting Ant file does exactly the same thing, just as bad as the original file did it.

Contrast this with how you’d do it with make. You take the set of commands in the rule you want to split and identify the products and sources produced by that set. Let’s say it’s every .o file generated from cpp-sources/*.c.

The canonical way to do that would be something like this:

CPP_OBJECTS=interesting-.o-files
cpp-objects/%.o: cpp-sources/%.c
    $(CC) -c -o $@ $<
 
other_target: $(CPP_OBJECTS)

Notice how features of make conspire to reduce duplication in typing and modelling, and how easy it is to produce something that properly models the building of the required set of .o files from the appropriate .c files and how the actual commands are derived from a generic pattern rule. This is what a DSL (Domain Specific Language) for building looks like. Not like this:

<exec><command name="build-many-cfiles.sh ${cpp_objects}"/></exec>

I suppose there could well be some kind of “for-each” XML thing in Ant, but it’s still just so far from what we’re modelling. Building is not running a series of commands, it is updating an end product from a set of sources. There is just no way for ant to actually do this with a model like that!

(Bonus: Conditional execution in Ant)

Then, we have the Ant way to build things conditionally (hang on, this’ll get hairy! to spare the sensitive reader, this is only pseudo-ant code).

Let’s say I want to do the Thing only when a certain file is newer than the target or the target file doesn’t exist.

<doThing>
    <condition>
        <or>
            <and>
                <exists path="target" />
                <exists path="source" />
                <newer left="source" right="target" />
            </and>
            <not><exists path="target" /></not>
        </or>
    </condition>
</doThing>

Wow! That’s lean.

Well, the end result is that some Ant hackers get so fed up with making ant do the right thing that they produce atrocities like

<target name="build" dependencies="clean"> ... </target>

or

<target name="build"><delete><dir path="output"></delete> ... </target>

Just to get ant to rebuild stuff that’s out of date, accidentally rebuilding everything else in the process. But workstations are fast, right?

Yes. As I might write about some other day, waiting for the compile is not an unimportant factor of the subjective irksomeness of any build system. Going from “Yes, there’s the typo!” to waiting for a complete rebuild of your project is enough to make any sane coder go bat-shit insane. If the delay is due to stupid build systems, even more so!

Conclusion

Ant is a poor way to model builds. Ant has syntax so brain-dead, it goes way beyond any bastard child of Java and Cobol in needless verbosity. In some places every, fucking, word, needs, its, own, XML element. And see above for Ant’s take on the ubiquitous if statement. In short: Ant sucks.

If Ant was developed internally by any code shop, it would surely already have appeared on The Daily WTF. And we would all have laughed at the round-about ways to write shell scripts, and how many tokens you need in order even to do nothing (not to mention what you need to actually do anything at all).

The Real WTF, someone would say, is that these people produced a make replacement that does half as much as make and does it worse!

Then someone would counter with a joke post about “hey, if this language had conditionals, it’d probably look like this” (see above), at which point in time the original reporter realizes with a shiver that that was the exact way they did implement it.

Oh, and just for a final rubbing-in, did I mention that Ant sucks?