MapReduce in C#

In: C#|Functional Programming

25 Jun 2011

MapReduce is one of “those” buzz words that is going around at the moment. Mostly in part due to Google using it so successfully for their distributed indexing algorithms.

So what is MapReduce?

According to Wikipedia [http://en.wikipedia.org/wiki/MapReduce]

So MapReduce is an amalgamation of two higher order functions taken from functional programming.

Map and Reduce :. MapReduce.

Google took the concepts of Map and Reduce and designed a distributed computing framework around those two concepts. As it’s almost infinitely horizontally scalable, it lends itself to distributed computing quite easily.

So lets break up MapReduce into its 2 main components.

Map

Map is a higher order function which takes a function and a list, and applies that function to each element of that list and returns the resultant list.

Given the function x = x + 2, and a list of integers 1 -> 3. Map will return the following list

 
      :              :
    /   \          /  \
  1      :       3     :
        /  \          /  \
      2     :       4     :
           /  \          /  \
         3     []      5    []

Returns: 3, 4, 5

Map in C#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static void Main(string[] args)
{
    var testList = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    var mapList = Map<int, int>(x => x + 2, testList);
 
    mapList.ToList<int>().ForEach(i => Console.Write(i + " "));
    Console.WriteLine();
    Console.ReadKey();
}
 
static IEnumerable<TResult> Map<T, TResult>(Func<T,TResult> func, IEnumerable<T> list)
{     
    foreach (var i in list)
        yield return func(i);
}

Returns: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

Reduce

Reduce takes a list of elements and returns a function that is applied to all of the elements of that list.

Given the function (x, y) => x + y, and a list of elements 1 to 3, Reduce will return an aggregate of the following

 
      :           
    /   \         
  1      :      f(0,1) = 1   
        /  \          \      
      2     :      f(1,2) = 3
           /  \          \
         3     []      f(3,3) = 6

Returns: 6

Reduce in C#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
static void Main(string[] args)
{
    var testList = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; 
    Console.WriteLine(Reduce<int, int>((x, y) => x + y, testList, 0));
    Console.ReadKey();
}
 
static T Reduce<T, U>(Func<U, T, T> func, IEnumerable<U> list, T acc)
{
    foreach (var i in list)
        acc = func(i, acc);
 
    return acc;
}

Returns: 55

So how does google apply Map and Reduce to a distributed framework?

Google uses Map and Reduce as higher order abstractions which allow them to hide the parallelization to lower lying functions.

For instance by breaking up a larger list into smaller lists, google can distribute those smaller lists to individual servers to run the map function. Each server can then forward their resultant list to a 2nd set servers running the Reduce function (which may sum up the results of the intermediate list). A final server could then create a list from each of the results from the intermediate servers and then reduce that again giving a final result).

    
Tier 1:  Map(L1), Map(L2), Map(L3), Map(L4), Map(L5)
Tier 2:  Reduce(R1), Reduce(R2), Reduce(R3), Reduce(R4), Reduce(R5)
Tier 3:  Reduce(new list {R1, R2, R3, R4, R5})  = N

L1 -> L5 are lists of words
R1 -> R5 are the intermediate count of instances of a word
Final Reduce takes a list of all of the intermediate results and then returns the sum

Linq equivalents of Map and Reduce

If you’re lucky enough to have linq then you don’t need to write your own map and reduce functions. C# 3.5 and Linq already has it albeit under different names.

Map = Select | Enumerable.Range(1, 10).Select(x => x + 2);
Reduce = Aggregate | Enumerable.Range(1, 10).Aggregate(0, (acc, x) => acc + x);
Filter = Where | Enumerable.Range(1, 10).Where(x => x % 2 == 0);

Further Reading

http://ayende.com/blog/4435/map-reduce-a-visual-explanation
http://labs.google.com/papers/mapreduce-osdi04.pdf

1 Response to MapReduce in C#

Avatar

amok

May 29th, 2012 at 1:19 am

Wonderfully explained!

Comment Form

About Justin

justin

Justin is a Senior Software Engineer for 3csSoftware Brisbane. A Polyglot Developer proficient in multiple programming languages including [C#, C/C++, Java, Android, Ruby..]. He's currently taking an active interest in Teaching Kids to Code, Functional Programming, Robotics, 3D Printers, RC Quad-Copters and Augmented Reality.

About This Blog

Software Engineering is an art form, a tricky art form that takes as much raw talent as it does technical know how. I'll be posting articles on professional tips and tricks, dos and donts, and tutorials.

profile for Justin Shield on Stack Exchange, a network of free, community-driven Q&A sites

Photostream

  • Justin: Hi Kyle, You absolutely can, you configure the permissions on the root node for the domain, and onl [...]
  • Kyle k: Justin - Can 2 sites be run off of 1 installation and be kept completely separate? i.e. so the peopl [...]
  • Justin: Hi Ramkumar, You might want to check out this website, it'll go through the process of editing yo [...]
  • ramkumar: Hi Justin, i need explanation to create domain in my local machine. I am new to this Umbraco and [...]
  • Muzikayise (zulucoda): Great post, thank you for sharing. I think another way you could look at it is: 1) You can use P [...]