I'm kind of working on something similar. One thing that struck me about C++ is ...

I'm kind of working on something similar.

One thing that struck me about C++ is that although template parsing, etc, is pretty hard, you can get a long way by segmenting the source into delimited regions. This might also speed up your semantic analysis, by allowing you to write code that's operating on a known kind of region.

For instance after pre-processing, it should be easy to find nested "{", "[", "(", "\"", "'", and "<". Segmenting the source in this way, first, might make further processing easier.

It seems that the standard parsing methods we learn in CS are optimized for single-pass speed, but not for maintainability or development efficiency. Computers have gotten fast. Using multiple passes of simpler operators seems like a good promising approach.

I see the process of parsing as some kind of folding, where you start with a linear sequence of chars and gradually fold it over and over into a "lumpy" tree.