Blog : May 2008
posted on May 22, 2008 - 12:24 AM PDT by Erick Tryzelaar
Now that I'm getting a little more familiar with the guts of the compiler, I can see some things that would make things much simpler for me, and may be useful for other people. I'd like to re-factor the compiler even more than I've done right now.
First, one problem I'm having right now is that I don't have a firm grip on the AST of the backend. There are many many options, and I'm not sure how they all mix together. While there are scripts like flxd and flxb to print out the current state of the AST, it's converted into almost-compilable felix that doesn't really help. What I really want is to really just see the AST in some s-expressions. So why don't we add two ways of displaying the AST? And if the AST s-expressions can completely represent the AST, we could read it back in.
I'm thinking about using sexplib to get AST -> s-expressions for free. It's a small library, and since I'm basing my work on the huge llvm, it doesn't seem like it's that much. The only real problem I can think of is the near name collision with the sex module. Maybe we could rename sex to flx_sex. We could also have a binary done via marshal.
Second, the compiler is very monolithic, but it doesn't have to be. There are a couple high level concepts that we could break out into their own separate libraries:
- reading and typechecking the flx files
- reading (and not typechecking) the AST s-expressions
- writing out the AST in felix
- writing out the AST in s-expressions
- converting the AST to c++
- converting the AST to llvm
- shared data structures
Then, using these libraries, we can implement some new drivers:
- flx-as (as in felix assembler) that does .flx -> AST
- flx-dis (as in felix disassembler) that does AST -> .flx
- rename flxg to flxc-cpp (as in felix compiler) that does {.flx,AST} -> c++
- rename flxg to flxc-llvm (as in felix compiler) that does {.flx,AST} -> llvm
- create flxc that can do {.flx,AST} -> {c++,llvm}
The nice thing about library-itizing felix is that it becomes much easier to write tools to process the AST. For instance, making a doxygen-like document viewer would be much simpler when we could just work directly with the AST, rather than re-parse it like the old viewer used to do. Another long term thing could be gui integration. I think there could be a lot that we could get out of this.
Third, I'd like to at least push converting matches into gotos into the backend. Llvm needs to but at the end of every basic block, but the felix frontend assumes that gotos can fall through. I'm ignoring support for arbitrary gotos at the moment. Will bad things happen if I do this? Are there other things that should be pushed to the backend?
Fourth, and this one's tentative, I might try to use my fbuild in my llvm branch. When I stopped working on it in January, it was configuring and compiling ocaml. Since that's all I'm going to be doing with llvm for a while, it might be nice to actually use it.
Fifth, where'd you all go? I hope we can get some more involvement to help me with all this :)
posted on May 20, 2008 - 12:50 AM PDT by Erick Tryzelaar
In case anyone uses RSS feeds, you can now receive updates to this website by signing up here.
posted on May 17, 2008 - 04:05 PM PDT by Erick Tryzelaar
I've started a fork of felix to modify our compiler's backend to target llvm instead of c++. You can track it in the git repository here.
For those unfamiliar with git, you can clone the repository by doing:
> git clone http://git.felix-lang.org/r/felix/llvm-felix.git
Anyway, the support is very basic at the moment. But, it can turn this:
#syntax <nugram.flxh>
open syntax felix;
type int = "i32";
fun add: int * int -> int = "add i32 $1, $2";
fun mul: int * int -> int = "mul i32 $1, $2";
body declare_exit = 'declare i32 \@exit(i32)';
proc exit: int = "call i32 \@exit(i32 $1)" requires declare_exit;
noinline fun foo (a:int, b:int) => a*a + 2*a*b + b*b;
var x = 1;
val y = 2;
val z = (foo (x, y)) + 3;
exit z;
Into this when run with flxg test:
declare i32 @exit(i32)
; ------------------------------
; C FUNC <8>: foo
define i32 @foo( i32 %a, i32 %b) {
entry:
%_tmp30 = mul i32 %a, %a
%_tmp31 = mul i32 2, %a
%_tmp32 = mul i32 %_tmp31, %b
%_tmp33 = add i32 %_tmp30, %_tmp32
%_tmp34 = mul i32 %b, %b
%_tmp35 = add i32 %_tmp33, %_tmp34
ret i32 %_tmp35
}
; ------------------------------
; C PROC <15>: main
define i32 @main(i32 %argc, i8** %argv) {
entry:
%x = bitcast i32 1 to i32
%y = bitcast i32 2 to i32
%_tmp36 = call i32 @foo(i32 %x, i32 %y)
%z = add i32 %_tmp36, 3
call i32 @exit(i32 %z)
ret i32 0
}
Which, when run with cat test.ll | llvm-as - | llc - | gcc -x assembler -, and you run a.out, it will exit with the status code 12.
posted on May 12, 2008 - 09:15 AM PDT by Erick Tryzelaar
Now that I've organized the compiler a bit, it's starting to get a little easier to see how we could do an LLVM backend for felix. One of the many challenges we'll have to deal with for that is what do we do for a foreign function interface. For those who don't know, one of the neat things felix does is compile into C++ code. This lets us easily define adds like this:
fun add: int * int -> int = "$1 + $2";
fun sin: float -> float = "std::sin($1)";
Where we will just pass the string through the program, translating the $-prefixed values into the proper variable names in the output source. We can even do complex functions like this:
fun foo: int * int * int -> int = "$1 + $2 - $3";
And still have it work.
For LLVM, though, we'd be directly compiling against their API, since that would be more efficient. So, what can we do? One possibility would be to use Ocs Scheme and our S-Expression Library, sex:
fun add: int * int -> int = "`(add ,_1 ,_2)";
fun sin: float -> float = "`(ccall std::sin ,_1)";
fun foo: int * int * int -> int = "`(add ,_1 `(sub ,_2 ,_3))";
Then just implement in the backends how to translate these simple calls into the target format. The nice thing about this is that it stays reasonably generic. I was worried that we'd have to sprinkle #IFDEFs throughout our source, but this seems to avoid that in most situations.