2 posts tagged “countdown”
The Countdown code I showed you isn't really taking advantage of Haskell's laziness. We should only have to check entries up until the point that we have enough matches (in the current code 'take 4 $ getAnagrams') and that's good. However, we have to generate the whole powerset first, so that we can sort it in reverse order of length. Ideally, we'd generate the powerset breadth-first, in order of length.
OK, so generating the powerset doesn't take all that much time and this isn't a good optimization as such, but I did think it might be fun trying to do the breadth first powerset, as all the examples I'd seen had been depth first.
But first: an interlude to marvel at one of the scariest things I've seen this year. A monadic definition of powerset.
import Control.Monad (filterM)
powerset = filterM (const [True, False])
I'm not even going to attempt to twist my head around that now, but it's very beautiful, though it's impossible to tell just by reading it what the intent of the code is.
I asked on #london.pm if anyone knew good breadth-first algorithms for powerset. Joel helpfully pasted the following
I think it surprised Joel (and actually, it surprised me a little) that I was more or less unable to read this at all. Yes, I know that the syntax of Lisp is incredibly simple, and you can learn all the syntax of Scheme in 20 minutes or whatever. The noise of the parenthesis, the cdrs etc. is just noise, you can filter it out if you look at it calmly. But I still don't understand where the pairs are in pair-fold-right, and cut apparently does something similar to currying, but what does that mean in context?(define (combinations set n)
(if (zero? n)
(list '())
(let ((n2 (- n 1)))
(pair-fold-right
(lambda (pr acc)
(let ((first (car pr)))
(append (map (cut cons first <>)
(combinations (cdr pr) n2))
acc)))
'()
set))))
(define (power-set set)
(let ((size (length set)))
(let loop ((i 0))
(if (> i size)
'()
(append (combinations set i)
(loop (+ i 1)))))))
To cut a long story short, I was reading this as a foreign language rather than a piece of generic pseudocode. When I was very little, I read with my mother a picture book about frogs, in German. With the help of the pictures and a little imagination, it was easy to tell which word meant "frog", which meant "tree", and what the sense of the story was. After we finished, very puffed up with just how damn clever I was, I started trying to read it again, and got utterly confused about all these strange words like 'in' and 'auf' and 'dem' that I just hadn't worried about the first time around.
So... trying to see the frog for the trees, we can see that combinations gives every set of combinations of a particular length, and that power-set merely loops through 0..length, appending the combinations found for that length.
We can write combinations as a variant on the original powerset function, but which refuses to carry on once it's got enough letters:
combinations xs n = comb' xs n (length xs)
comb' xss@(x:xs) n l | n == 0 = [[]]
| l == n = [xss]
| otherwise = (map (x:) $ comb' xs (n-1) (l-1)) ++ (comb' xs n (l-1))
comb' [] _ _ = [[]]
And powerset is easy as:
We can now remove out the lines with sortBy and filter longEnough, as the new definition already presents the items in the right order.powerset xs = powerset' xs (length xs)
powerset' xs l = if l < minLength
then []
else (combinations xs l) ++ (powerset' xs (l-1))minLength = 3
Does this make it any faster? Apparently not: as I guessed, powerset is not the hotspot. I guess that the problem is the repeated lookups in the Data.Map — any suggestions on how to profile the Haskell code, and better algorithms to deal with it?
Will on #geekup has been working on a Countdown letters and numbers game solver written in Python. I thought it'd be fun to try to do it in Haskell, and started with the letters game (anagram) solver.
Starting with a string of jumbled letters, the goal is to make the longest possible anagram. I remember the first time I tried to solve anagrams I jumped into the problem without thinking and got mixed up in all kinds of complicated combinatorial mess. The actual answer is very simple: let's take two words which are anagrams of each other:
- monad
- nomad
Both of them contain the same letters, so they are identical in some form of "canonical representation", for example
- {a:1, d:1, m:1, n:1, o:1} -- dictionary mapping letter to number of times used
- "admno" -- a string with the letters sorted
So for example:
This function is called a powerset. I'm lazy so I googled a definition. We want the longest words first. The definition of powerset I found does a depth first search so it's not in order of length. What we want to do is to work on a list like thispmqnrdzoa
...
m n d oa
...
where canonicalize is just sort . map toLower . filter isLetter.list s =
sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s
comparing is a nice litle utility sub that makes the above effectively the same as (\a b -> length a `compare` length b). We then flip it to reverse the ordering (and this is actually a good use for flip ;-).
Ordering by length is potentially inefficient — it checks the length of each element twice, and unlike Perl (where a string knows its own length), a string is just a list, so it has to descend the list to find it out. This is easy to optimize by precalculating the lengths, using a technique that in Perl we call the "Schwartzian transform", and I'll probably come back to this.
OK, so we have a list of subsets to compare, now we need to find a dictionary of canonical representations of words. Luckily most unixy distributions ship with one, often /usr/dict/words, but Ubuntu sticks is elsewhere.
I asked on #haskell, and was told I should use a Data.Map, Haskell's basic equivalent of a hash or associative array, but implemented using a Functional Programming friendly tree representation. In actual fact, quicksilver, mrs, and mmorrow told me the answer straight away, but let's pretend for the purpose of this post that we're working it out now :-)
Assuming I load that module, as is common, as M, I'd essentially want to call M.insertWith (++) on each element. The (++) is the concatenation operator, and it's the right thing to use because the dictionary is mapping String -> [String], for example
insertWith returns a new copy of the Map each time. It's like an accumulator which gradually takes on the entries from the list of words. And whenever we think about accumulators, we can think about folds.fromList [("admno", ["nomad","monad","Damon"]),...]
mempty is shorthand here for "an empty Data.Map". But we can go one better as apparently fold/insertWith is so common that there is a shorthand, fromListWith!foldl' (\m x -> M.insertWith (++) (canonicalize x) [x] m) mempty listOfWords
Woah! That's quite compact, and I just introduced some new syntax too: The &&& is basically saying "let's make a tuple with the result of calling these 2 functions on my input!" so it's the same asfromListWith (++) . map (canonicalize &&& return)
And return just means "wrap this value in the appropriate Monad". So it's a scary way of saying [a], because we're "in" the List monad. (In the same way that mempty above was an empty Data.Map, because it was "in" the Map Monad.)fromListWith (++) . map (\a -> (canonicalize a, return a))
Whenever I play with Map, I get angry errors about the monomorphism restriction. The way around that is to add an explicit type signature. If, like me, you're not quite sure what to put there, you can add a compiler directive to quell the error, then work out what the signature would be by calling :t my_function from the GHCI command line. (You'll often find afterwards that you can remove the signatures if you wanted to, because later on the compiler has more information to work out the types of things. It's only really during incremental development that you get the problem.
{-# LANGUAGE NoMonomorphismRestriction #-}
-- (that's the compiler directive, you can comment this out later)
makeAnag s = do
d <- dict
return $ take 4 $ getAnagrams s d
dict = do file <- readFile "/etc/dictionaries-common/words"
return $ mkdict $ lines file
mkdict :: [String] -> M.Map String [String]
mkdict = M.fromListWith (++) . map (canonicalize &&& return) . filter longEnough
longEnough = (>=3) . length
As you can see, for all the perceived difficulty of doing IO in a pure language
like Haskell, it doesn't seem all that hard in this simple case. readFile
reads the file, and lines splits it into an array of lines.
The final thing is to check each powerset against the dictionary. To extract the value, we use M.lookup. This function fails if it can't find a value. So we could do
- For each powerset in the list
- Check if it's present
- And add it to the list if so
With an empty list for each failure. We can use concatMap to join these together. So it's:[ ["anagram"], [], [], ["anagram 1", "anagram 2"], [] ]
Though that actually returns:concatMap (\v -> M.lookup v dict) listOfPowersets
which I hadn't expected. (M.lookup returned a list like ["anagram 1", "anagram 2"]. Quite literally it returned it, which in List context means it actually passed [["anagram 1", "anagram 2"]], which is why the list isn't completely flattened by concatMap. I get around this by using join. This is another of those monadic functions: in List context it does exactly what we want here, flattening this list.[ ["anagram"], ["anagram 1", "anagram 2"], ]
You can look at the final Haskell Countdown code. I'll look at optimizing the sort and the powersets soon, any comments on other improvements (including better algorithms) very welcome. (Sorry, comments require Vox signup...)getAnagrams s d = join
. concatMap (flip M.lookup $ d)
$ filter longEnough -- 3 or more letters
. sortBy (flip $ comparing length) -- longest first
. nub -- unique entries only
. powerset -- all combinations of
. canonicalize -- canonical (sorted) string
$ s
