forked from KnowledgeCaptureAndDiscovery/INF549
-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
11,428 additions
and
0 deletions.
There are no files selected for viewing
181 changes: 181 additions & 0 deletions
181
...essing/.ipynb_checkpoints/10_Parallel Processing of Data Using MapReduce-checkpoint.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Parallel Processing of Data Using MapReduce\n", | ||
"This notebook will enable you to understand how to analyze data in parallel using the map and reduce functions of MapReduce.\n", | ||
"\n", | ||
"Please note that the map function used in this notebook is not a real map. A real MapReduce framework like Hadoop or Spark requires some additional configuration and normally will not be applied to data that is so small. Therefore, you might find the runtime between different parallel processing notebooks does not vary too much." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import time\n", | ||
"from functools import reduce\n", | ||
"import sys\n", | ||
"import math\n", | ||
"\n", | ||
"def breakDoc(text,nToBreakInto):\n", | ||
" textList=[]\n", | ||
" fLength = len(text)\n", | ||
" nLinesInEach = int(math.ceil(float(fLength)/nToBreakInto))\n", | ||
" for i in range(nToBreakInto):\n", | ||
" startIndex=i*nLinesInEach\n", | ||
" endIndex=(i+1)*nLinesInEach\n", | ||
" if endIndex<=fLength-1:\n", | ||
" textList.append(text[startIndex:endIndex])\n", | ||
" else:\n", | ||
" textList.append(text[startIndex:])\n", | ||
" return textList\n", | ||
"\n", | ||
"def loadText():\n", | ||
" textList=[]\n", | ||
" condition=True\n", | ||
" while condition:\n", | ||
" text=input('Please Enter the Text You Want to Encipher: ')\n", | ||
" if text=='stop':\n", | ||
" condition=False\n", | ||
" else:\n", | ||
" textList.append(text)\n", | ||
" return textList\n", | ||
"\n", | ||
"def cipher(text,key):\n", | ||
" import string\n", | ||
" stri=\"\"\n", | ||
" for ch in text:\n", | ||
" if ch not in string.ascii_letters:\n", | ||
" stri+=ch\n", | ||
" else:\n", | ||
" output = chr(ord(ch) + key)\n", | ||
" outputNum = ord(output)\n", | ||
" if 64 < outputNum < 91 or 96 <outputNum < 123:\n", | ||
" stri+=output\n", | ||
" else:\n", | ||
" x=chr(outputNum-26)\n", | ||
" stri+=x\n", | ||
" return stri\n", | ||
"\n", | ||
"def CCMapReduce(text,key,nToBreakInto):\n", | ||
" #starttime = datetime.datetime.now()\n", | ||
" start = time.process_time()\n", | ||
" textList=breakDoc(text,nToBreakInto)\n", | ||
" encodedList=list(map(cipher,textList,[key]*len(textList)))\n", | ||
" encodedText=reduce(lambda x,y:x+y,encodedList)\n", | ||
" #endtime = datetime.datetime.now()\n", | ||
" #print \"Runtime: \",(endtime - starttime).seconds,\"seconds\"\n", | ||
" stop=time.process_time()\n", | ||
" print(\"Runtime: \",(stop-start),\"seconds\")\n", | ||
" return encodedText\n", | ||
"\n", | ||
"def loadDocument():\n", | ||
" filename=input('Please Enter the Text You Want to Encipher: ')\n", | ||
" with open(filename) as f:\n", | ||
" text=f.read()\n", | ||
" return text" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Encrpyt one document with MapReduce\n", | ||
"The cell below breaks a document into several chunks, encrypt each of the chunks separately and joins the results into one document. It uses the divide-and-conquer strategy, that is, splitting the data, processing the data, and joining the results. Once the cell below is run, it will output the runtime of the function.\n", | ||
"\n", | ||
"Please use the text file called \"merge.txt\". It includes three novels, _Pride and Prejudice_, _Jane Eyre_ and _Crime and Punishment_." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"text=loadDocument()\n", | ||
"nToBreakInto=int(input(\"Please Enter the Number of Chunks: \"))\n", | ||
"key=int(input(\"Please Enter Shift Key: \"))\n", | ||
"encodedText=CCMapReduce(text,key,nToBreakInto)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"** Print the encrypted document**" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"print(encodedText)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Copy and paste the two cells above and vary the value for the shift key and the number of pieces in which to divide the dataset.\n", | ||
"\n", | ||
"**Question**: How does the run time vary with different values of the shift key? You need to keep the number of pieces constant to answer this question. \n", | ||
"\n", | ||
"**Question**: How does the run time vary with different values for the number of pieces? You need to keep the value for the shift key to answer this question.\n", | ||
"\n", | ||
"**Question**: What is the speedup time for a shift key of 5 and the use of 3 pieces? Show the equation you are using to calculate the speedup time.\n", | ||
"\n", | ||
"**Question**: For similar values for the number of chunks and shift keys, how does the run time using MapReduce compare to the run time from the Parallel Processing Notebook? \n", | ||
"\n", | ||
"**Note** You may reuse the copied and paste cells to rerun the experiment (only copy and paste once).\n", | ||
"\n", | ||
"**Question**: Discuss why or why not encrypting files is an embarrassingly parallel problem." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Parallelism and Critical Paths\n", | ||
"\n", | ||
"a.\tDescribe a problem where a MapReduce approach would make processing more efficient.\n", | ||
"\n", | ||
"b. Describe a problem where parallel processing would only help in some steps" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"anaconda-cloud": {}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
148 changes: 148 additions & 0 deletions
148
...rallelProcessing/.ipynb_checkpoints/10_Processing Datasets Independently-checkpoint.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Processing Datasets Independently\n", | ||
"This notebook will enable you to understand how to analyze data in separate files independently." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import time\n", | ||
"\n", | ||
"def loadText():\n", | ||
" textList=[]\n", | ||
" condition=True\n", | ||
" while condition:\n", | ||
" text=input('Please Enter the Text You Want to Encipher: ')\n", | ||
" if text=='stop':\n", | ||
" condition=False\n", | ||
" else:\n", | ||
" textList.append(text)\n", | ||
" return textList\n", | ||
"\n", | ||
"def cipher(text,key):\n", | ||
" import string\n", | ||
" stri=\"\"\n", | ||
" for ch in text:\n", | ||
" if ch not in string.ascii_letters:\n", | ||
" stri+=ch\n", | ||
" else:\n", | ||
" output = chr(ord(ch) + key)\n", | ||
" outputNum = ord(output)\n", | ||
" if 64 < outputNum < 91 or 96 <outputNum < 123:\n", | ||
" stri+=output\n", | ||
" else:\n", | ||
" x=chr(outputNum-26)\n", | ||
" stri+=x\n", | ||
" return stri\n", | ||
"\n", | ||
"def CCIndependent(files,key):\n", | ||
" start = time.process_time()\n", | ||
" encodedList=[]\n", | ||
" for text in files:\n", | ||
" encodedList.append(cipher(text,key))\n", | ||
" stop=time.process_time()\n", | ||
" print(\"Runtime: \",(stop-start),\"seconds\")\n", | ||
" return encodedList\n", | ||
" \n", | ||
"def loadDocuments():\n", | ||
" textList=[]\n", | ||
" condition= True\n", | ||
" while condition:\n", | ||
" filename=input('Please Enter the Text You Want to Encipher: ')\n", | ||
" if filename=='stop':\n", | ||
" condition=False\n", | ||
" else:\n", | ||
" with open(filename) as f:\n", | ||
" text=f.read()\n", | ||
" textList.append(text)\n", | ||
" return textList" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Encrpyt multiple documents\n", | ||
"The cell below encrpyts more than one documents and it will encrypt them one by one. Input \"stop\" when you have choosen all the documents that you want to encrypt. Once the cell below is run, it will output the runtime of the function.\n", | ||
"\n", | ||
"Please use all of the three text files provided. They are _Pride and Prejudice_, _Jane Eyre_ and _Crime and Punishment_.\n", | ||
"\n", | ||
"Repeat three times (copy and paste the two cells below three times) with a shift key of 1, 4, and 10." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"textList=loadDocuments()\n", | ||
"key=int(input(\"Please Enter Shift Key: \"))\n", | ||
"encodedList=CCIndependent(textList,key)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"** Print the encrypted document**" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for i in encodedList:\n", | ||
" print(i)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"source": [ | ||
"**Question**: How does the run time change with different values of the shift key?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"anaconda-cloud": {}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.