Wiki2
MapReduce

Map Reduce Study in Haskell

I was preparing for a course in Big Data where the first session was related to Map Reduce methods.

After reading the famous Google paper (below). I had to make an example in Haskell.

It is not parallel in any way, but helps me understand the principles. The mapReduce function could be modified to allow parallel execution.


module Main (main) where

import Data.List

type Name = String      -- named types
type Document = String
type AWord = String
type Count = Int

-- Return word frequency in a single document
myMap :: (Name, Document) -> [(AWord, Count)]
myMap (n, d) = map (\g -> (head g, length g)) $ group $ sort $ words d

-- Summarize word count of a word from a set of documents
myReduce :: (AWord, [Count]) -> (AWord, Count)
myReduce (w, cs) = (w, sum cs)

-- Map reduce function with implicit reference to map/reduce functions
mapReduce :: [(Name, Document)] -> [(AWord, Count)]
mapReduce ds = map myReduce itermediate where
  itermediate = map (\g -> (fst $ head g, map snd g))
              $ groupBy equalKey $ sortBy compareKey $ concat $ map myMap ds

-- Helper: Determine if a key/value tuple has equal keys
equalKey :: (Eq k) => (k, v) -> (k, v) -> Bool
equalKey a b = (fst a) == (fst b)

-- Helper: Determine if a key/value tuple has equal keys
compareKey :: (Ord k) => (k, v) -> (k, v) -> Ordering
compareKey a b = (fst a) `compare` (fst b)

docs = "Provide the best development platform possible. Provide full source access to developers and users, including the ability to look at CVS tree changes directly. Users can even look at our source tree and changes directly on the web!"
     : "Integrate good code from any source with acceptable licenses. ISC or Berkeley style licences are preferred, the GPL is not acceptable when adding new code, NDAs are never acceptable. We want to make available source code that anyone can use for ANY PURPOSE, with no restrictions. We strive to make our software robust and secure, and encourage companies to use whichever pieces they want to. There are commercial spin-offs of OpenBSD."
     : "Pay attention to security problems and fix them before anyone else does. (Try to be the #1 most secure operating system.)"
     : "Greater integration of cryptographic software. OpenBSD is developed and released from Canada and due to Canadian law it is legal to export crypto to the world (as researched by a Canadian individual and as documented in the Export Control list of Canada)."
     : "Track and implement standards (ANSI, POSIX, parts of X/Open, etc.)"
     : "Work towards a very machine independent source tree. Support as many different systems and hardware as feasible."
     : []

files = map (("doc" ++) . show) [1..]

main :: IO ()
main = putStrLn $ concat $ map (\(w,c) -> w ++ " : " ++ show c ++ "\n")
     $ mapReduce (zip files docs)