2 posts tagged “fp”
Greg confessed on #london.pm to having written a average function in Perl in FP style... recursively.
I asked him about this, as a perfectly good average function (using List::Util would be:
sub avg { sum(@_) / @_ }
which is perfectly fine FP style too. As far as I could understand it, he was reducing a tuple of sum and count. Though he said he wrote it just-for-fun, it's not entirely unreasonable if you're using a list-based language, where you'd otherwise have to traverse the list twice (in Perl, arrays know how long they are).
Out of curiosity, I tried to check how Haskell defined avg, but it seems that it doesn't.
LoganCapaldo suggested the following definition:
> import Data.List
> avg xs = sum xs / genericLength xs
We use genericLength because the normal length function insists on returning an integer, while genericLength returns a generic Number, which can be divided appropriately. (We'd otherwise have to write sum xs / (fromIntegral $ length xs))
The recursive function would work by summing a tuple, so that
avgs [1,3,5] => (9,3)
Once we have that we could divide the tuple using uncurry.
> avg' = uncurry (/) . avgs
As we're going to fold the number values into a tuple, we need a function like:
> sum_count s (t,c) = (t+s, c+1)
after which our definition is easily written as a fold
> avgs = foldr sum_count (0,0)
Which works just fine. But seeing the tuples, I thought of the comments from Joao and doserj, and thought of (&&&).
> import Control.Arrow
> sum_count s = (+s).fst &&& succ.snd
And byorgey commented that the .fst and .snd pair can be abstracted away using (***). So I tried
> sum_count s = (+s) *** succ
and was really rather surprised that it worked! Admittedly, the definition is now something that scares me and whose intent is far less clear than the first naive definition above:
> avg2 = uncurry (/) . foldr (\s -> (+s) *** succ) (0,0)
For extra bonus obfuscation points, making the lambda expression points-free is left as an exercise to the reader ;-)
Update: mauke pointed out that my Perl avg sub didn't work... added parens to sum (@_). Looks like naughty me didn't bother testing that code...
For some time I've been wanting to implement the Enfilade, a really cool data structure discovered by the Xanadu people among others. It's like a balanced tree, except that it's indexed relatively rather than absolutely - making it ideal for things like text editor buffers, as you can insert and delete with impunity, knowing that the changes to the address of pieces to your right will percolate up the tree.
As there isn't already an implementation on Perl's CPAN I've occasionally planned to do it for fun.
Problems: 1) I'm lazy, and 2) I'm weak in Computer Science, so I don't even know how to do the balanced binary tree that I need as a basis. I looked on CPAN some time back, which has some implementations of trees, but I couldn't figure out how to get them relatively indexed, and besides which, it's always instructive to implement something for yourself.
I bought Okasaki's “Purely Functional Data Structures” some time back, in a fit of enthusiasm, read the first chapter (a very readable introduction) and been utterly terrified by the rest of it. But recently I picked it up off the shelf, read and mostly understood the chapter on Red Black trees, and had a go implementing it with Perl.
Why a purely functional version? As Okasaki says, the algorithm is simpler than the destructive one (and potentially very fast). Also I think that having a persistent, functional tree will make it trivial to implement things like “undo”.
A quick look at the algorithm suggested that I might want lazy building of the tree structure, and that reminded me of the overview in Moose's cookbook about a lazy binary tree.
Oddly, the Moose example has “parent” fields, which rather defeat the point of sharing, but then again Moose is an OO base class, not an FP language so fair enough. But I've been meaning to learn Moose for a while so this seemed like a perfect opportunity.
Five hours later (I'm a bit slow, but I suppose that's not bad as I'm grokking a new library and an algorithm at the same time) I have a working implementation.
Some notes:
- Moose's “lazy” fields are cute! Actually, in the end, I didn't use them as the algorithm itself was strict (I just created a base case using an Empty node class instead.
- “coerce” is lovely. I let the left and right branches be coercable from
- a node (oddly, you have to specify this case explicitly!)
- undef (for an empty node)
- hashref (to create a node initialised with that data)
- other (set the data field to that value)
- Perl doesn't have algebraic datatypes, so I simulated them with object classes:
Node
|-> Node::E - empty node
`-> Node::T - full node - Pattern matching allows the MLian languages to do the
Red-Black balancing in 4 lines. But they are 4 dense, ugly lines, that
check 2x2 cases manually, and from which it's very hard to understand
the intent. Okasaki's diagram was much more helpful. I contemplated
- implementing pattern matching for Nodes (or a generic pattern matching for Perl :-)
- doing a hand coded mess of if {} else {} blocks just to get it working
- Debugging an algorithm that you don't understand is hard. I scratched my head for an hour with an unpleasant bug because I'd forgotten to clone a node before balancing it. (This is part of the mismatch between FP and Perl - you can do Purely Functional in Perl, but the language doesn't enforce it in any way, so you end up maintaining a situation whereby the public API is functional, but within a subroutine you can mutate the thing you're working on. End result of this is that you have to keep track of whether you've mutated something and therefore need to clone it first. And that means you'll sometimes get it wrong...)
- I don't really understand how the Red-Black invariants are enforced by the rotations, but working through this on paper (and later with a pretty printer) you can see that it really works... wheee!
- That is to say, I can see the constraint about 2 reds in a row, as it means that you can't ever have a mismatch of depths without triggering a rebalance. But the “constant number of blacks to the leaf” is mysterious.
- I don't understand the optimizations Okasaki suggests in Exercises 2.2 and 3.10 re not doing too many comparisons, and not rebalancing uselessly. Will come back to this.
-
Okasaki suggests that the algorithm is very fast, even without those
optimizations... but my Mooseperl version was not... Of course I'm
doing “Object Functional” but I had the feeling that Moose was giving a
fair amount of overhead.
This isn't a criticism of Moose of course - it's designed for OO systems, and there, the amount of work that Moose does makes your code powerful and readable, and some Very Clever People in the Perl world are developing it, using it, and making it fast.
But an FP algorithm uses many short lived structures, which may make this overhead untenable. Certainly my test script was noticeably slow, and profiling may have pointed to Moose as the culprit.
- I say may but Devel::DProf is a buggy piece of shit and suggests that my script completed in negative time, which is not confidence-inspiring.
Actually the other profiling modules (like DProfLB - where LB stands for “Less Bad” seem to do the same thing) Pah.
-
Moose is beautiful, powerful, and well documented. It seems to be
robust, and gives a lot of the goodness of Perl6 and Ruby, while being
respectively not vapourware, and still Perl.
It makes certain aspects of implementing algorithms in an FP style very neat to prototype, but seems to be impractical due to the overhead on short-lived objects. I wouldn't have any reservations to using it for an OO class though!
- I rewrote the class in non-moosey Object/Functional Perl, and as expected, this runs in a more reasonable time. Still need to optimize etc., but it's faster, while some of the code is clunkier than the Moose equivalent.
You can see the Moosey and non-Moosey code in my svn repo.
