Distributed-System-Mini-Google

Project number two of Distributed System course at Pitt for graduate students.

Amanda Crawford: acrawfor

Raphael

Project Overview
1. Map Reduce Design Considerations
Mini Google Overview
Hadoop
1. Hadoop Phases
2. Hadoop Cluster
Hadoop Communication Paradigmns
1. Job Tracker - Task Tracker
2. Map to Combine
3. Combine to Partition
4. Partion to Reducer
HDFS Design Considerations
1. HDFS Operations
2. Components

Project Overview

Map Reduce Design Considerations

How do we assign work units to workers?
What if we have more work units than workers?
What if workers need to share partial results?
How do we aggregate partial results?  How do we know all the workers have finished?
What if workers fail?

Mini Google Overview

1. Hadoop Implementation

HDFS
MapReduce Engine
HBase / Lucene ( will not be used in project but influences designs)

2. Alternate Implementation

Distributed File System - Data Storage and Batch Processing
MapReduce - Computing
Inverted Index - Searching and Query Handling

Hadoop

Hadoop Phases

Map
Sort/ Shuffle / Aggregate
Reduce

Hadoop Cluster

Master Nodes
- Yarn Resource Manager
- HDFS Name Node
- Job Trackers
- Work Queue
Worker Nodes
- Yarn NodeManager
- HDFS DataNode
- Task Tracker and tasks

Hadoop Communication Paradigmns

Job Tracker - Task Tracker

_Task Trackers
- create and remove tasks received from the job trackers
- communicates task status to job tracker by sending heartbeats
_Job Tracker
- Manages task trackers
- Schedule and tracks jobs progress
- Receives jobs from clients

Map to Combine

Combine to Partition

Partion to Reducer

HDFS Design Considerations

Source: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

HDFS Operations

Read - Load data

Components

Name Node
Data Node
Task Tracker

Sources: Map Reduce & Hadoop

Hadoop YARN

Hadoop HDFS

[Hadoop HDFS] (https://www.tutorialspoint.com/hadoop/hadoop_hdfs_overview.htm)

Searching HDFS

HBase

[MapReduce] (https://www.guru99.com/introduction-to-mapreduce.html)

[MapReduce] (https://hci.stanford.edu/courses/cs448g/a2/files/map_reduce_tutorial.pdf)

[MapReduce] (https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm) MAP Reduce Thorough

Apache Map Reduce Partition Tools

RPYC

Multiprocessing

Inverted Index

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
code_snippets		code_snippets
docs		docs
inputs		inputs
part_1		part_1
reducers		reducers
.gitignore		.gitignore
Index System.md		Index System.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed-System-Mini-Google

Project Overview

Map Reduce Design Considerations

Mini Google Overview

Hadoop

Hadoop Phases

Hadoop Cluster

Hadoop Communication Paradigmns

Job Tracker - Task Tracker

Map to Combine

Combine to Partition

Partion to Reducer

HDFS Design Considerations

HDFS Operations

Components

Apache Map Reduce Partition Tools

About

Releases

Packages

Contributors 2

Languages

raphaelcfernandes/Distributed-System-Mini-Google

Folders and files

Latest commit

History

Repository files navigation

Distributed-System-Mini-Google

Project Overview

Map Reduce Design Considerations

Mini Google Overview

Hadoop

Hadoop Phases

Hadoop Cluster

Hadoop Communication Paradigmns

Job Tracker - Task Tracker

Map to Combine

Combine to Partition

Partion to Reducer

HDFS Design Considerations

HDFS Operations

Components

Apache Map Reduce Partition Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages