12-03-2012, 02:44 PM
Distributed Computing Seminar
[attachment=18215]
Motivating Concepts
Performing computation on a graph data structure requires processing at each node
Each node contains node-specific data as well as links (edges) to other nodes
Computation must traverse the graph and perform the computation step
How do we traverse a graph in MapReduce? How do we represent the graph for this?
Breadth-First Search
Breadth-First Search is an iterated algorithm over graphs
Frontier advances from origin by one level with each pass
Breadth-First Search & MapReduce
Problem: This doesn't “fit” into MapReduce
Solution: Iterated passes through MapReduce – map some nodes, result includes additional nodes which are fed into successive MapReduce passes
Graph Representations
The most straightforward representation of graphs uses references from each node to its neighbors
Finding the Shortest Path: Intuition
We can define the solution to this problem inductively:
DistanceTo(startNode) = 0
For all nodes n directly reachable from startNode, DistanceTo(n) = 1
For all nodes n reachable from some other set of nodes S,
DistanceTo(n) = 1 + min(DistanceTo(m), m S)