Using Harlan in C++ Programs
So far, Harlan programs have primarily existed in a vacuum. You'd compile them, run them, and they might produce some output. Certainly none of them received any input from the outside world. Most of the test cases use small sets of data, and the larger benchmarks generated incredibly synthetic data, like a vector of 16 million ones. My focus has been on building the compiler itself, so this has been a tolerable situation up to this point. However, Harlan is at the point where it needs more realistic applications and it's clear the foreign function interface (FFI) story just won't cut it anymore.
I'm happy to report that it's now possible to pass non-trivial amounts
of data from C++ to Harlan. Two new features made this
possible. First, there are library functions like unsafe-deref-float
and unsafe-set!-float
which allow reading and writing from raw
pointers. Second, there's a new primitive form called unsafe-vec-ptr
which gives a raw pointer to the contents of a vector. These are very
low level, but they give us the tools we need to build a reasonably
usable FFI. Let's see how to use these to implement a dot product in
Harlan and use it from a C++ program.
First, we need to write the dot product function. This is pretty short in Harlan.
(module
(import ffi)
(define (harlan_dot N pa pb)
(let ((a (import-float-vec pa N))
(b (import-float-vec pb N)))
(reduce + (kernel ((a a) (b b)) (* a b))))))
For the most part, this is a straightforward dot product written in
Harlan. The main new thing is the call to import-float-vec
, which
copies a C-style array into a Harlan vector. If you're curious, it's
implementation is in ffi.kfc.
Unlike most Harlan programs, this does not define a main
function. Instead, we compile it to a shared library by running the
following from your Harlan directory.
./harlanc --shared dotprod.kfc
When this is done, you'll have a dotprod.so
which you can link to
your C++ programs. The harlan_dot
function is exposed under the
signature float harlan_dot(int N, float *pa, float *pb)
.
Now, let's plug this function into the dot product benchmark I wrote
about previously. Basically, we add a prototype for harlan_dot
,
then add a call to TIME(harlan_dot)
along with the rest of the
benchmarks. You can see the full set of changes
here. I
commented out the CUBLAS version because I ran into runtime errors
that I didn't feel like debugging. Below is a graph of how Harlan
compares with the other implementations.
Yikes!
Clearly I've got some performance issues to deal with. On the bright side, Harlan runs faster on the GPU than it does on the CPU. I'll be investigating these performance problems soon.
As far as the FFI goes, there are some usability issues that remain too. For example, the Harlan compiler and the code it produces have some relative paths hard coded, which means they must run from the Harlan source directory. These shouldn't be hard to fix. In the meantime, it's now possible to integrate Harlan code into projects written in other languages.