Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's because the author doesn't want a whole new language, but rather a better way to build abstractions over C.

What he wants is C's conceptual model with sexpr syntax and (therefore) the ability to build custom abstractions on top of it. That's different from working in an entirely other programming language with its own conceptual model that happens to compile to C.

You would see this difference vividly in the C that the two systems generate: the first would read like a more verbose version of the same application in which the skeleton of the program is recognizable; the second would look like machine gobbledygook. The former would be debuggable in a way that let you step through the logic of the surface program, while the other probably wouldn't. And so on.

Instead of thinking of this as a different language, think of it as 'magic C with DSLs', or 'C, with a macro system that lets you do what you want'. That distinction isn't absolute, because if you push your abstractions far enough you will end up with a different language, but the style of programming the author is talking about gives you incentives not to do that.



Even if all you want is C semantics with a different syntax, though, I think it's better to emit LLVM IR. LLVM IR is close to C semantics (though aliasing information has to be supplied explicitly), but you have direct control over debug info, which is important to avoid a huge regression in the debugging experience. As an added bonus, you eliminate the necessity of serializing and deserializing your IR during your compilation pipeline for no reason (which is effectively what compiling and reparsing C is doing).


You may be right, but learning curve is an issue. If you're already familiar with C, writing a sexpr C generator is super easy. I can see why someone would balk at learning a new conceptual model and toolset, and just take the path of least resistance, especially if they already know the kind of C program they'd like to write. This would have been even more true in 2010 of course.

So what's the best way to tackle the learning curve of what you're suggesting? If I know zero about LLVM and I want to make something like the OP, what should I do?


Another good option is to use Joe Armstrong's approach: look at the LLVM IR generated by clang for example and then emit those.

My first attempt at using LLVM was using the C++ API. It was...a struggle. Using this approach (IR snippets), I made more progress in a day than I had in months using the API.


Also worth mentioning an invaluable learning tool: 'cpp' backend in LLVM. It emits an idiomatic C++ code that generates any given IR module using the LLVM API.


Yup, that's what I used in my first attempt. Didn't work for me. (Actually: a later part of my first attempt, I think I initially tried with the straight API. Good luck with that).


What exactly did not work? Have you filed a bug?


That's clever.


LLVM has a great tutorial (Kaleidoscope): http://llvm.org/docs/tutorial/

It walks you through basic expression generation, control flow, memory, etc. for a simple language. The learning curve isn't zero, to be sure, but I think the time saved by being able to work with IR as a tree instead of as a flat series of bytes makes it easily worth it.


"you eliminate the necessity of serializing and deserializing your IR during your compilation pipeline for no reason (which is effectively what compiling and reparsing C is doing)."

I did debugging before it got to C and just ensured C generation would do exactly what I wanted. I could read and debug the C itself as a check against problems in that. Yet, serializing and deserializing my IR just didn't happen: it was just LISP or BASIC expressions depending on which version we're talking about. Just tree's.

You're debugging regression claim is correct as I addressed in my main comment. Fortunately, my development style and choice of libraries compensated for that nicely. It would've been quite painful if I had to deal with arbitrary FOSS or proprietary stuff out of necessity. I'd be working at both abstraction levels for sure or coding miracles into my tooling haha.


I don't doubt your position has merit, and there are many options for generating code other than llvm, but, is anything really quicker to implement than fprintf? I have implemented compile (note I prefer to phrase it as 'source to source translation' rather 'compile') to X myself, and for a certain class of project (personal or rarely used by others, compilation speed isn't an issue) fprintf (or whatever) get's you a lot of bang for the development-hour buck.


In order to actually generate valid C, you have to do a lot of work to figure out what you're supposed to fprintf; you have to get the operator precedence right, you have to do scoping right, debugging the serialization code is annoying, etc. A high-level API like LLVM IR, by contrast, lets you interact with the IR as a tree instead of an output stream, which is usually easier because your AST is already a tree.


There is a plain text form of an LLVM IR. It is a bit easier to pritnf it than a fully featured C.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: