Archive for July, 2008

M++ Design, Part 2 of N

Jul 02, 2008 by olsner

A short post about Syntax and Compiler usage. SPECS (new syntax for C++, see below) I have been dabbling with previously, and always found quite interesting. Regarding compiler usage, I guess my inspiration lies in the way Haskell compilation with ghc works (at least in the ghc --make variant). It should never be any harder than that.

Compiler usage

I'm thinking it would be left up to the compiler to keep track of dependencies and determine which modules need recompilation. Rather than taking a list of source files to compile into objects, run the compiler for each source to make a .o file and then link your hundreds of .o files into a shared library or executable, you'd tell the compiler to either build module Main (which can have another name) into an executable, or to build a library exposing one or more given namespaces. The names declared in these namespaces would be exported as if in the global namespace (following the target C++ ABI, most likely - I would like to keep binary compatability as far as possible).

Example 1: Compile the program contained in module Main and all dependent modules into a.out.

m++ --main Main -o a.out

Example 2: Compile modules A, B, C into a shared library.

m++ --export A --export B --export C -o libstuff.so

As mentioned in the previous posts, modules referenced in the source (e.g. Net::HTTP) would be automatically looked up as ./Net/HTTP.mpp in the current include path. It is up to the compiler to apply necessary magic to that file in order to extract the information it needs to compile the current module. I'm thinking there'd probably be a local file containing a parsed representation of the module source which is automatically updated if the source file is outdated.

Due to the way this works, I think mutually recursive modules are basically impossible to write in this new language. C++, using headers with declarations separate from implementation files, allows to get around this problem in some small way by e.g. using forward declarations in the header. It may be possible for the compiler to apply a similar workaround automatically by parsing declarations and implementations separately, but I actually think it is a good thing not to be able to build mutually recursive modules.

Code generation would be driven entirely by the need to output the exported names (or just main() in the case of --main), so only names used recursively by those functions would be code-generated at all. In ordinary C++, it is very hard to control the set of functions you're exposing to the world. In M++, it should be very easy and very explicit.

Syntax

I originally thought using the syntax of C++ would be a good thing. After all, what I secretly want this language to do is replace C++ entirely, which I thought more likely to happen using a syntax familiar to the old fashioned C/C++ tradition. Worked wonders for e.g. Java, C# and JavaScript, didn't it? Too bad all of them punted on the opportunity to actually replace C++, rather than take some small niche where you never really needed C/C++.

However, since this is a C++ dialect anyway, why not go the extra mile and just throw out all the syntax, and all that ugly legacy that would come along with the syntax? For example, there's this Modest Proposal for a new syntax for C++. Quite interesting read, and using a syntax like this would probably eliminate a lot of the quirky issues you run into when trying to parse C++. The proposal also suggests fixing a few quirks of the C++ language, such as making this a reference rather than a pointer. One of the most wonderful parts of SPECS (Significantly Prettier and Easier C++ Syntax) is the complete re-working of type syntax - even a complicated type signature like that of a pointer to an array of pointers to functions returning a pointer to a function is easily readable in SPECS:

type ComplicatedType : ^ [7] ^(int -> ^(int -> int));

To read the SPECS typedef, just start at the left and read:

pointer to array of 7 pointers to a function taking int and returning a pointer to a function from int to int

Contrast the equivalent C++:

typedef int (*(*(*ComplicatedType)[7])(int))(int);

There is a somewhat reliable technique for reading this kind of nested type definition (start in the middle and go outwards? something like that...), but in my opinion: don't bother. Give the original author a proper spanking and make them rewrite it as a set of typedefs instead :D

SPECS would be so much nicer than C++ syntax, but then I'm much more talking about an entirely new language than a C++ dialect (even though some semantics would be similar).

So: keep C++ and the world of pain it represents or write something using a new (and therefore scary and controversial) syntax?