As my first .net/C# program, I am going to follow the tutorial written as part of HDInsight server installation. The diagram below nicely depicts our approach. I really liked the diagram “Visual Objective”. It nicely depicts the model.
Step 1:
Save log file locally. Push/load log file (to be analyzed) to HDFS. Here are the commands.
Step 2:
Create a new VS 2012 project (Class Library) and add “Microsoft.net Map Reduce API for Hadoop” NuGet package. Don’t forget to add “using Microsoft.Hadoop.MapReduce;” in your code file.
Step 3:
Create Map and Reduce class that extends MapperBase and ReducerCombinerBase classes respectively. Here is how skeleton look like.
We would also need a job definition that could be submitted as Hadoop Job.
Step 4:
Build project and output Lesson1.dll (name of my project). Copy this dll to subfolder MRLib under Debug\Bin directory (not sure whether I should do in future too or give a path of the dll). In the command prompt, go to MRLib directory and run the command as shown in the image below.
Hurray!!! Made me happy to see the output :-) My first hadoop Map Reduce program ran successfully. In next days, I will also be converting some existing java examples into .net before I take a real life problem. Stay tuned!!