Skip to content

Built a simple TF-IDF (Term Frequency & Inverse Document Frequency) based search algorithm for searching a small subset of Wikipedia Data on Spark Cluster of 3 Nodes on top of HDFS, hosted on AWS, having web UI with Django

License

Notifications You must be signed in to change notification settings

therajmaurya/Wikipedia-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia-Search-Engine

Project Links:

Pre-Requisites:

STEPS:

(Activate your virtual environment and clone this repository to present working directory)

STEP-1: Start Django in-built server:

python manage.py runserver 127.0.0.1:8000

STEP-2: Now open browser and goto 127.0.0.1:8000/bigdatajob to interact with the search engine. (Use credentials (testuser,test1234) to login!!)

Thank you!!

About

Built a simple TF-IDF (Term Frequency & Inverse Document Frequency) based search algorithm for searching a small subset of Wikipedia Data on Spark Cluster of 3 Nodes on top of HDFS, hosted on AWS, having web UI with Django

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published