CppParse

A C++ parser and lexical analyzer written in Haskell. Get the source from the darcs repository.

The lexer is in a workable state though it doesn’t readily parse all floating point formats, wide strings, or integers with type suffixes (L, UL, LL, ULL).

The parser can thus far handle very simple declarations (arrays, pointers, references, const&volatile, basic types, functions and function bodies), almost no statements (basically only return in order to at least get something going), and no expressions at all except literals. This should get a whole lot easier though, now that I have the lexer – no more hacking together lists of Tokens for testing.

My plan is to eventually get some easy code generation going by outputting LLVM bytecode/assembly (or possibly by using some LLVM haskell binding – I’ve been told there are some). But this relies on at least getting expression statements and function call statements going.

TODO/Goals (easier things first, harder things later)

  • Lexer and parser test harness(es) for continuous testing/regression testing (this’d be a good first action item although less rewarding than doing stuff that generates actual output as in executable LLVM byte code)
  • Able to parse a function definition that calls an external function and output LLVM code that when compiled generates the proper function. At first, I’d probably make it a extern void f(void); – but obviously, getting this to take and return values is a must-have. TODO: Find out how LLVM byte-assembly calling conventions work.
  • Able to parse expressions on integers.
  • Able to take a set of declarations and generate a dependency map of references, such that you can extract one declaration with its dependencies, write that out into a new file and have it separately compiled with the rest of the source into a well-formed program. (Probably would be able to do that before all the tricky parts, like templates, are done – since this is basically just extracted from the symbol table or from accesses to the symbol table)
  • Function overloading – make it possible for one name in one scope to have several overloads. Code for extracting the overload set from a source location (traversing all visible scopes, generating the set of all visible overloads).
  • Template inference

Oh, and dear Gawd how big this project is. How many years is this going to take?