Assignment 6

khider · Apr 1, 2020 · 0b11fd6 · 0b11fd6
1 parent 76089d6
commit 0b11fd6
Show file tree

Hide file tree

Showing 13 changed files with 11,428 additions and 0 deletions.
diff --git a/...essing/.ipynb_checkpoints/10_Parallel Processing of Data Using MapReduce-checkpoint.ipynb b/...essing/.ipynb_checkpoints/10_Parallel Processing of Data Using MapReduce-checkpoint.ipynb
@@ -0,0 +1,181 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Parallel Processing of Data Using MapReduce\n",
+    "This notebook will enable you to understand how to analyze data in parallel using the map and reduce functions of MapReduce.\n",
+    "\n",
+    "Please note that the map function used in this notebook is not a real map. A real MapReduce framework like Hadoop or Spark requires some additional configuration and normally will not be applied to data that is so small. Therefore, you might find the runtime between different parallel processing notebooks does not vary too much."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "from functools import reduce\n",
+    "import sys\n",
+    "import math\n",
+    "\n",
+    "def breakDoc(text,nToBreakInto):\n",
+    "    textList=[]\n",
+    "    fLength = len(text)\n",
+    "    nLinesInEach = int(math.ceil(float(fLength)/nToBreakInto))\n",
+    "    for i in range(nToBreakInto):\n",
+    "        startIndex=i*nLinesInEach\n",
+    "        endIndex=(i+1)*nLinesInEach\n",
+    "        if endIndex<=fLength-1:\n",
+    "            textList.append(text[startIndex:endIndex])\n",
+    "        else:\n",
+    "            textList.append(text[startIndex:])\n",
+    "    return textList\n",
+    "\n",
+    "def loadText():\n",
+    "    textList=[]\n",
+    "    condition=True\n",
+    "    while condition:\n",
+    "        text=input('Please Enter the Text You Want to Encipher: ')\n",
+    "        if text=='stop':\n",
+    "            condition=False\n",
+    "        else:\n",
+    "            textList.append(text)\n",
+    "    return textList\n",
+    "\n",
+    "def cipher(text,key):\n",
+    "    import string\n",
+    "    stri=\"\"\n",
+    "    for ch in text:\n",
+    "        if ch not in string.ascii_letters:\n",
+    "            stri+=ch\n",
+    "        else:\n",
+    "            output = chr(ord(ch) + key)\n",
+    "            outputNum = ord(output)\n",
+    "            if 64 < outputNum < 91 or 96 <outputNum < 123:\n",
+    "                        stri+=output\n",
+    "            else:\n",
+    "                x=chr(outputNum-26)\n",
+    "                stri+=x\n",
+    "    return stri\n",
+    "\n",
+    "def CCMapReduce(text,key,nToBreakInto):\n",
+    "    #starttime = datetime.datetime.now()\n",
+    "    start = time.process_time()\n",
+    "    textList=breakDoc(text,nToBreakInto)\n",
+    "    encodedList=list(map(cipher,textList,[key]*len(textList)))\n",
+    "    encodedText=reduce(lambda x,y:x+y,encodedList)\n",
+    "    #endtime = datetime.datetime.now()\n",
+    "    #print \"Runtime: \",(endtime - starttime).seconds,\"seconds\"\n",
+    "    stop=time.process_time()\n",
+    "    print(\"Runtime: \",(stop-start),\"seconds\")\n",
+    "    return encodedText\n",
+    "\n",
+    "def loadDocument():\n",
+    "    filename=input('Please Enter the Text You Want to Encipher: ')\n",
+    "    with open(filename) as f:\n",
+    "        text=f.read()\n",
+    "    return text"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Encrpyt one document with MapReduce\n",
+    "The cell below breaks a document into several chunks, encrypt each of the chunks separately and joins the results into one document. It uses the divide-and-conquer strategy, that is, splitting the data, processing the data, and joining the results. Once the cell below is run, it will output the runtime of the function.\n",
+    "\n",
+    "Please use the text file called \"merge.txt\". It includes three novels, _Pride and Prejudice_, _Jane Eyre_ and _Crime and Punishment_."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "text=loadDocument()\n",
+    "nToBreakInto=int(input(\"Please Enter the Number of Chunks: \"))\n",
+    "key=int(input(\"Please Enter Shift Key: \"))\n",
+    "encodedText=CCMapReduce(text,key,nToBreakInto)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "** Print the encrypted document**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(encodedText)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Copy and paste the two cells above and vary the value for the shift key and the number of pieces in which to divide the dataset.\n",
+    "\n",
+    "**Question**: How does the run time vary with different values of the shift key? You need to keep the number of pieces constant to answer this question.  \n",
+    "\n",
+    "**Question**: How does the run time vary with different values for the number of pieces? You need to keep the value for the shift key to answer this question.\n",
+    "\n",
+    "**Question**: What is the speedup time for a shift key of 5 and the use of 3 pieces? Show the equation you are using to calculate the speedup time.\n",
+    "\n",
+    "**Question**: For similar values for the number of chunks and shift keys, how does the run time using MapReduce compare to the run time from the Parallel Processing Notebook? \n",
+    "\n",
+    "**Note** You may reuse the copied and paste cells to rerun the experiment (only copy and paste once).\n",
+    "\n",
+    "**Question**: Discuss why or why not encrypting files is an embarrassingly parallel problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Parallelism and Critical Paths\n",
+    "\n",
+    "a.\tDescribe a problem where a MapReduce approach would make processing more efficient.\n",
+    "\n",
+    "b.  Describe a problem where parallel processing would only help in some steps"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/...rallelProcessing/.ipynb_checkpoints/10_Processing Datasets Independently-checkpoint.ipynb b/...rallelProcessing/.ipynb_checkpoints/10_Processing Datasets Independently-checkpoint.ipynb
@@ -0,0 +1,148 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Processing Datasets Independently\n",
+    "This notebook will enable you to understand how to analyze data in separate files independently."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "def loadText():\n",
+    "    textList=[]\n",
+    "    condition=True\n",
+    "    while condition:\n",
+    "        text=input('Please Enter the Text You Want to Encipher: ')\n",
+    "        if text=='stop':\n",
+    "            condition=False\n",
+    "        else:\n",
+    "            textList.append(text)\n",
+    "    return textList\n",
+    "\n",
+    "def cipher(text,key):\n",
+    "    import string\n",
+    "    stri=\"\"\n",
+    "    for ch in text:\n",
+    "        if ch not in string.ascii_letters:\n",
+    "            stri+=ch\n",
+    "        else:\n",
+    "            output = chr(ord(ch) + key)\n",
+    "            outputNum = ord(output)\n",
+    "            if 64 < outputNum < 91 or 96 <outputNum < 123:\n",
+    "                        stri+=output\n",
+    "            else:\n",
+    "                x=chr(outputNum-26)\n",
+    "                stri+=x\n",
+    "    return stri\n",
+    "\n",
+    "def CCIndependent(files,key):\n",
+    "    start = time.process_time()\n",
+    "    encodedList=[]\n",
+    "    for text in files:\n",
+    "        encodedList.append(cipher(text,key))\n",
+    "    stop=time.process_time()\n",
+    "    print(\"Runtime: \",(stop-start),\"seconds\")\n",
+    "    return encodedList\n",
+    "    \n",
+    "def loadDocuments():\n",
+    "    textList=[]\n",
+    "    condition= True\n",
+    "    while condition:\n",
+    "        filename=input('Please Enter the Text You Want to Encipher: ')\n",
+    "        if filename=='stop':\n",
+    "            condition=False\n",
+    "        else:\n",
+    "            with open(filename) as f:\n",
+    "                text=f.read()\n",
+    "                textList.append(text)\n",
+    "    return textList"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Encrpyt multiple documents\n",
+    "The cell below encrpyts more than one documents and it will encrypt them one by one. Input \"stop\" when you have choosen all the documents that you want to encrypt. Once the cell below is run, it will output the runtime of the function.\n",
+    "\n",
+    "Please use all of the three text files provided. They are _Pride and Prejudice_, _Jane Eyre_ and _Crime and Punishment_.\n",
+    "\n",
+    "Repeat three times (copy and paste the two cells below three times) with a shift key of 1, 4, and 10."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "textList=loadDocuments()\n",
+    "key=int(input(\"Please Enter Shift Key: \"))\n",
+    "encodedList=CCIndependent(textList,key)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "** Print the encrypted document**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for i in encodedList:\n",
+    "    print(i)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "**Question**: How does the run time change with different values of the shift key?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}