coder, gamer, parent
Learning haskell has its challenges. Most notably the terseness of the function composition, giving IMO a longer learning curve to master the syntax. This terseness is due to its roots in mathematics and academia, and so the functions read unsurprisingly like algebraic formulas.
I’ve been working my way through Project Euler to learn haskell in a more in-depth way than I would following a tutorial.
One of the biggest challenges I faced was to generate prime numbers in a reasonable response time, as can be found in Problem 3. Researching I found that prime numbers can be generated using the sieve of eratosthenes.
My first naieve approach to generating prime numbers.
1 2 3 | primes :: [Integer] primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0] |
The above code worked fine with the example provided by problem 3, however it took too long to evaluate. My second naive approach was to reduce the amount of checking by eliminating even numbers (no even number can be a prime).
What else can be done to speed up this function? In mathematics there’s a rule which states if a prime factor exists that is GREATER than the square root of the number, then there must be one LESS than the square root. So how does that work?
By testing all primes less than the square root of N, and found none is a factor of N. We can stop, because we know there can be no prime factor of N that is greater than the square root of N. Because, if there were, then there would also be a prime factor less than the square root of N, and we have already found out that there is none.
1 2 | primes = 2: sieve [3,5..] where sieve (p:xs) = p : sieve [x | x <- xs, (x < p*p) || (x `mod` p > 0)] |
Once again this took too long to evaluate, even with the square root short circuiting this only gave up to a 20% increase in speed.
So what can we do further?
I had to research this one, and found an answer on Haskell Wiki. We can add in a deferred filter. Note that I can’t take credit for the below code.
1 2 3 4 5 | primes :: [Integer] primes = 2: 3: sieve (tail primes) [5,7..] sieve (p:ps) xs = h ++ sieve ps [x | x <- t, rem x p /=0] where (h,~(_:t)) = span(< p*p) xs |
span – given a predicate and a list, splits the list into two lists (returned as a tuple) such that elements in the first list are taken from the head of the list while the predicate is satisfied, and elements in the second list are the remaining elements from the list once the predicate is not satisfied.
tilde ~ – the tilde forces the compiler to not evaluate the value of the 2nd argument until it is actually used.
What is occurring above is using the span function to evaluate the square roots of h, however the tilde ~ forces the compiler to perform lazy evaluation on t until its value is actually required.
Netsuperbrain.com has a very good article on lazy pattern matching.
Effectively by combining a short circuit, and lazy pattern matching. We can now generate large quantities of prime numbers very very quickly.
I thought this would be worth writing up for a couple of reasons. Trying to research what the tilde operator actually does was quite frustrating, I knew that I had to stop haskell from evaluating the 2nd argument in the case of a match to the first. In addition the sieve of eratosthenes and researching the nature of prime number factors, also combined into learning more about the problem domain and so a more efficient program by an order of magnitude of about 20x.
Let me know if I’ve gotten anything wrong?
Justin is a Senior Software Engineer living in Brisbane. A Polyglot Developer proficient in multiple programming languages including [C#, C/C++, Java, Android, Ruby..]. He's currently taking an active interest in Teaching Kids to Code, Functional Programming, Robotics, 3D Printers, RC Quad-Copters and Augmented Reality.
Software Engineering is an art form, a tricky art form that takes as much raw talent as it does technical know how. I'll be posting articles on professional tips and tricks, dos and donts, and tutorials.
3 Responses to Haskell – A better prime number list using turners sieve and lazy pattern matching
Tonja
August 30th, 2012 at 7:50 pm
Nice work dude.
GordonBGood
December 15th, 2016 at 5:39 pm
This is the “Turner Sieve” which **IS NOT** the Sieve of Eratosthenes (SoE) neither by algorithm nor by performance as explained so very well in the following article: http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf. Essentially, when an algorithm needs to use the `mod` function which implies division it is not the SoE, which only uses a data structure and repeated additions to mark composite number representations in that data structure. The Turner sieve is only a very poor version of a Trial Division sieve, although very concise.
Justin
August 22nd, 2017 at 8:56 pm
Hi GordonBGood, Thanks for taking the time in replying. You’re absolutely correct, it is turners sieve. For anyone who would like to know more, The Haskell wiki has a write up on the different implementations and variations of the Sieve of Eratosthenes including the one I’ve used here (Turners Sieve).
https://wiki.haskell.org/Prime_numbers#Turner.27s_sieve_-_Trial_division