Sunday, 29 September 2013

Is there an optimization similar to loop unroll for functional programming?

Is there an optimization similar to loop unroll for functional programming?

Disclaimer: I know little about ghc compiling pipeline, but I hope to
learn some more about it with this post, for example, if comparing
imperative vs functional is relevant to code compilation.
As you know, loop unrolling reduces the number of iterations over a loop
by duplicating the code inside it. This improves performance since it
reduces the number of jumps (and penalities associated with it) and AFAIR,
creates bigger blocks of code, leaving room for better Register renaming
optimization.
I was wondering, could there be an equivalent to Loop Unrolling for
functional programming? Could we 'unroll' a function, open/expand it's
definition, to first reduce the number of calls to said function and/or
creating bigger chunks of code -- then leaving room for more code rewrite
optimizations (like register renaming, or some FP equivalent)?
Something that would 'unroll' or 'expand' a function definition, using for
example function evaluation (maybe mixed with some tactic) in order to
have a trade-off between space vs time.
An example of what I had in mind:
map1 _ [] = []
map1 f (x:xs) = (f x): map f xs
Would unroll to
map2 _ [] = []
map2 f (x:x1:xs) = (f x):(f x1):map2 f xs
map2 f (x:xs) = (f x): map2 f xs
Once more:
map4 _ [] = []
map4 f (x:x1:x2:x3:xs) = (f x):(f x1):(f x2):(f x3):map4 f xs
map4 f (x:x1:x2:xs) = (f x):(f x1):(f x2):map4 f xs
map4 f (x:x1:xs) = (f x):(f x1):map4 f xs
map4 f (x:xs) = (f x): map4 f xs
Two things are at play: multiple cases of map4 (and consequent tests on
list) could degrade performance, or the reduced number of calls of map4
would improve performance. Maybe this could reduce some constant overhead
created by lazy evaluation?
Well that doesn't seems to hard to code a test for, so after putting
criterion to roll this out, this is what I've got:
ImgUr album
Problem size 5*10^6
map 105.4 ms
map2 93.34 ms
map4 89.79 ms
Problem size 1*10^7
map 216.3 ms
map2 186.8 ms
map4 180.1 ms
Problem size 5*10^7
map 1050 ms
map2 913.7 ms
map4 899.8 ms
Well, it seems that unrolling had some effect^1! map4 appears to be 16%
faster.
Time for the questions then:
Have this been discussed before? Is something like that already implemented?
Is it really the reduced number of evaluations of map4 that improves speed?
Can this be automated?
Could I evaluate by chunks? ie.: if (f x) is fully evaluated, fully
evalute everything up to (f x4).
Any other form this sort of unrolling come at play?
How inflation on the function size could this lead to?
Any short-commings on why this is not a good idea?
1: I`ve also unrolled fib, since this sort of optimization would also
happen in that form, but the performance gain is cheating a (very) bad
algorithm.
2: I'm a haskell intermediate

No comments:

Post a Comment