Haskell 'words' and Perl 'split'

Comments

[this is good]
I suspect that part of the reason that there aren't more complicated splitting functions in prelude is that there are some excellent parsers built out of parsec that can handle such work. That bears at least some mention in a discussion like this.
the library source that comes with ghc is very readable. if you don't immediately find what you're looking for, you can often start out from something similar. in this case, why not just take the words function? i.e. (slightly edited):

import Data.Char (isSpace)

myWords :: String -
vox ate my comment. i meant to say:

parsec is cool. on the other hand, the ghc library source is very readable. if you want something that's not available in quite the form you want, you can usually start there, i.e. take the words function,

import Data.Char (isSpace)

myWords :: String -> [String]
myWords s = case dropWhile isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') = break isSpace s'

and go for a custom predicate, to get your split function:

split :: (a -> Bool) -> [a] -> [[a]]
split pred l = case dropWhile pred l of
[] -> []
s' -> w : split pred s''
where (w, s'') = break pred s'

(note how it's now polymorphic)
[this is good]
I took a minute and hammered out a version using unfoldr which might be helpful to you:

splitBy token string = List.unfoldr gen string
where
gen "" = Nothing
gen s = Just (takeWhile (/= token) s, dropWhile (== token) $ dropWhile (/= token) s)

You could of course change it to use a user-provided test and just negate it for the dropWhile (== token) portion, and then build this out of that part. Of course you're limited to single character splitting with this algorithm.

If you wanted to be able to break by strings or regular expressions, your gen function would wind up getting more complicated, but it could still be done with unfoldr.
Text.Regex.splitRegex

Post a comment

Already a Vox member? Sign in