I'd be interested in learning more how you implemented this.
I reccomended something similar to this for use as the parser for the VB.NET IDE, back around the start of the VS 2008 development cycle. It ended up being kind of expensive , because it required pushing changes all the way down to the text representation. All the IDE schedule time for Orcas went into supporting the LINQ features, so nobody was really interested in hearing about expensive IDE changes (by the way, I'm not disputing the choice, from a product perspective, it was the right one).
I was trying to make the parser faster, and to minimize rebinding of declarations, rather than on improving intermediate states, but I can see how it might be possible to make your stuff work.
My idea was based on several things:
1) Incremental tokenization, using a specialized editor text representation
2) Knowledge of the amount of lexical and syntantic lookahead needed by the VB grammar
3) An index from tokens to parse tree nodes
When an edit happend, the incremental scanner would then move backwards a constant number of tokens (based on some grammar properties) and then scan forward patching up the token stream until the scanner produced a duplicate token in the appropriate offset position.
The parse tree index would then be used to figure out where to restart parsing, again based on some properties of the grammar.
My focus was on producing the same trees as the existing parser, and on minimizing changes to declarations (because of all the rebinding that needs to be done), and so my incremental termination condition was based on entering a method (in the top-down hand written parser) that would generate a node of the same type as the "current position"(offset based on the edit), with the same parent, in the existing parse tree with the same starting token (based on object identity). This all worked because the scanner was incremental, which required the editor to structure it's data in a certain way
I would imagine that you probably used something similar to this, but with different termination conditions.
I would really want to see how similar (or not) what you did was to the scheme I came up with.
One thing that struck me about C++ is that although template parsing, etc, is pretty hard, you can get a long way by segmenting the source into delimited regions. This might also speed up your semantic analysis, by allowing you to write code that's operating on a known kind of region.
For instance after pre-processing, it should be easy to find nested "{", "[", "(", "\"", "'", and "<". Segmenting the source in this way, first, might make further processing easier.
It seems that the standard parsing methods we learn in CS are optimized for single-pass speed, but not for maintainability or development efficiency. Computers have gotten fast. Using multiple passes of simpler operators seems like a good promising approach.
I see the process of parsing as some kind of folding, where you start with a linear sequence of chars and gradually fold it over and over into a "lumpy" tree.
My actual scheme is similar in some ways to what you describe. More complex, because it involves many different types of structures.
Also, my approach is more explicit, I don't "restart a parse" in a general way, as I contemplate each possible change explicitly. This is prohibitive to do by hand for a full grammar, but I'm not doing full C/C++/C# analysis, so it's doable. It also makes the dynamic parser fully incremental.
I reccomended something similar to this for use as the parser for the VB.NET IDE, back around the start of the VS 2008 development cycle. It ended up being kind of expensive , because it required pushing changes all the way down to the text representation. All the IDE schedule time for Orcas went into supporting the LINQ features, so nobody was really interested in hearing about expensive IDE changes (by the way, I'm not disputing the choice, from a product perspective, it was the right one).
I was trying to make the parser faster, and to minimize rebinding of declarations, rather than on improving intermediate states, but I can see how it might be possible to make your stuff work.
My idea was based on several things:
1) Incremental tokenization, using a specialized editor text representation
2) Knowledge of the amount of lexical and syntantic lookahead needed by the VB grammar
3) An index from tokens to parse tree nodes
When an edit happend, the incremental scanner would then move backwards a constant number of tokens (based on some grammar properties) and then scan forward patching up the token stream until the scanner produced a duplicate token in the appropriate offset position.
The parse tree index would then be used to figure out where to restart parsing, again based on some properties of the grammar.
My focus was on producing the same trees as the existing parser, and on minimizing changes to declarations (because of all the rebinding that needs to be done), and so my incremental termination condition was based on entering a method (in the top-down hand written parser) that would generate a node of the same type as the "current position"(offset based on the edit), with the same parent, in the existing parse tree with the same starting token (based on object identity). This all worked because the scanner was incremental, which required the editor to structure it's data in a certain way
I would imagine that you probably used something similar to this, but with different termination conditions.
I would really want to see how similar (or not) what you did was to the scheme I came up with.