coder, gamer, parent
MapReduce is one of “those” buzz words that is going around at the moment. Mostly in part due to Google using it so successfully for their distributed indexing algorithms.
So what is MapReduce?
According to Wikipedia [http://en.wikipedia.org/wiki/MapReduce]
So MapReduce is an amalgamation of two higher order functions taken from functional programming.
Map and Reduce :. MapReduce.
Google took the concepts of Map and Reduce and designed a distributed computing framework around those two concepts. As it’s almost infinitely horizontally scalable, it lends itself to distributed computing quite easily.
So lets break up MapReduce into its 2 main components.
Map
Map is a higher order function which takes a function and a list, and applies that function to each element of that list and returns the resultant list.
Given the function x = x + 2, and a list of integers 1 -> 3. Map will return the following list
: : / \ / \ 1 : 3 : / \ / \ 2 : 4 : / \ / \ 3 [] 5 []
Returns: 3, 4, 5
Map in C#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | static void Main(string[] args) { var testList = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; var mapList = Map<int, int>(x => x + 2, testList); mapList.ToList<int>().ForEach(i => Console.Write(i + " ")); Console.WriteLine(); Console.ReadKey(); } static IEnumerable<TResult> Map<T, TResult>(Func<T,TResult> func, IEnumerable<T> list) { foreach (var i in list) yield return func(i); } |
Returns: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Reduce
Reduce takes a list of elements and returns a function that is applied to all of the elements of that list.
Given the function (x, y) => x + y, and a list of elements 1 to 3, Reduce will return an aggregate of the following
: / \ 1 : f(0,1) = 1 / \ \ 2 : f(1,2) = 3 / \ \ 3 [] f(3,3) = 6
Returns: 6
Reduce in C#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | static void Main(string[] args) { var testList = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; Console.WriteLine(Reduce<int, int>((x, y) => x + y, testList, 0)); Console.ReadKey(); } static T Reduce<T, U>(Func<U, T, T> func, IEnumerable<U> list, T acc) { foreach (var i in list) acc = func(i, acc); return acc; } |
Returns: 55
So how does google apply Map and Reduce to a distributed framework?
Google uses Map and Reduce as higher order abstractions which allow them to hide the parallelization to lower lying functions.
For instance by breaking up a larger list into smaller lists, google can distribute those smaller lists to individual servers to run the map function. Each server can then forward their resultant list to a 2nd set servers running the Reduce function (which may sum up the results of the intermediate list). A final server could then create a list from each of the results from the intermediate servers and then reduce that again giving a final result).
Tier 1: Map(L1), Map(L2), Map(L3), Map(L4), Map(L5) Tier 2: Reduce(R1), Reduce(R2), Reduce(R3), Reduce(R4), Reduce(R5) Tier 3: Reduce(new list {R1, R2, R3, R4, R5}) = N L1 -> L5 are lists of words R1 -> R5 are the intermediate count of instances of a word Final Reduce takes a list of all of the intermediate results and then returns the sum
Linq equivalents of Map and Reduce
If you’re lucky enough to have linq then you don’t need to write your own map and reduce functions. C# 3.5 and Linq already has it albeit under different names.
Map = Select | Enumerable.Range(1, 10).Select(x => x + 2);
Reduce = Aggregate | Enumerable.Range(1, 10).Aggregate(0, (acc, x) => acc + x);
Filter = Where | Enumerable.Range(1, 10).Where(x => x % 2 == 0);
Further Reading
http://ayende.com/blog/4435/map-reduce-a-visual-explanation
http://labs.google.com/papers/mapreduce-osdi04.pdf
Justin is a Senior Software Engineer living in Brisbane. A Polyglot Developer proficient in multiple programming languages including [C#, C/C++, Java, Android, Ruby..]. He's currently taking an active interest in Teaching Kids to Code, Functional Programming, Robotics, 3D Printers, RC Quad-Copters and Augmented Reality.
Software Engineering is an art form, a tricky art form that takes as much raw talent as it does technical know how. I'll be posting articles on professional tips and tricks, dos and donts, and tutorials.
1 Response to MapReduce in C#
amok
May 29th, 2012 at 1:19 am
Wonderfully explained!