Here's a rendered version of that document at some point in time: https://1drv.m...

Here's a rendered version of that document at some point in time: https://1drv.ms/b/s!AvaVLzoc2_8MiNdqkVZ1ScoRjYeOZQ?e=ApQSFw

I notice that my discussion about the churning on the GitHub repository has been repeated a number of times. I can provide some historical clarification on why there was so much churning there.

The actual compiler proper as it stands right now is found here: https://github.com/Co-dfns/Co-dfns/blob/master/cmp/e.cd

The documentation for those 17 lines of code are in a hopefully soon to be published thesis of about 70k words that includes performance information and the like.

However, during the development of this project, I didn't start writing it in APL on Dyalog. I explored a significant number of other architectural designs and programming methodologies. Some more popular I tried were habits like Extreme Programming, Agile methods, Java, Scheme, Nanopass, SML, Isabelle, and even a custom extension of Hoare Logic on top of dfns. I believe that I also explored implementing the compiler in C++/Boost and prototyped some stuff (I don't know if it ended up in this Git repo) and C.

In other words, the compiler has not been a single code base, but has been a series of complete rewrites using different methods, approaches, languages, techniques, and architectures. I have used heavyweight machine support (mostly around C++ with Visual Studio's infrastructure) as well as some very hardcore UNIX style low-level handiwork. Multiple different IDEs, text editors, and operating systems were all explored, as were multiple different backends, targets, and the like. At one time I had targeted LLVM, and another C, another OpenACC, and another ArrayFire.

The whole project has been a somewhat wide ranging exploration of the design space, to say the least.

What you are seeing of the XML stuff was from a particular design effort that was an attempt to apply strict Cleanroom Software Engineering as a methodology to the compiler design, to see what would happen. In the end, I abandoned the attempt, for what I hope will be obvious reasons, but during this time, I predominately worked on RHEL with the ed(1) text editor editing XML files for the DocBook publishing suite. Parts of the churning are the incorporation and removal of various dependencies that had to be brought in and out of the repository depending on what infrastructure I was relying on. In the case of DocBook, some of those files are large.

However, a significant amount of the work of Cleanroom Software Engineering is "coding via documented process." This includes the certification steps as well as the function specification, increment development, sequent analysis, and so forth.

Thus, for a very real portion of the Co-dfns work, I was literally programming in XML using ed(1) to model relatively complex state machines and function specifications that provided very fine-grained behaviors of the compiler. For example, a significant amount of work went into the following file:

https://github.com/Co-dfns/Co-dfns/blob/92e07bd84b5c8be08e2f...

This file is about 45k lines of XML, and was written and edited entirely by hand using ed(1). I had a video demonstration of this a while back which demonstrated how I did this, and particularly how I did a lot of this with ed(1), but I lost the script file recording.

Over time, as I continued to explore patterns and development approaches, I continued to discover that the code was faster, better, and easier to work with as I removed more and more "stuff" from the various processes and possibilities.

It wasn't until relatively late in the game that I actually realized that not only could the compiler be written in dfns well, but also that the compiler could be written in dfns in a way that was fully data parallel, which is the core insight of my Thesis. This had significant ramifications on the source code, because it meant that the compiler could not be tackled not only as a self-hosting project (at least in theory) but also in a fundamentally idiomatic way.

The result is that the compiler has generally continued to be more featureful, less buggy, and more dense at each major stage, with the latest leading to 17 lines of code. This is accomplishing essentially the same result as the 750 lines of code in a previous HN discussion, but does so partly by recognizing some passes as irrelevant and unnecessary to the current needs.

I do expect that after the publication of the thesis, the compiler will grow a little bit to add some new things that need to go in. However, at this point, I have a fairly efficient methodology.

So, the GitHub repository is not just a record of the code, but a record of a lot of different approaches to how to do what I was trying to do. Much of that XML you see was very much "coding" in the sense that I was providing for the core behavior of the system and was the primary specification of its behaviors in a formal, rigorous manner.