-
Notifications
You must be signed in to change notification settings - Fork 1
/
RUNNING.txt
16 lines (10 loc) · 926 Bytes
/
RUNNING.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
RUNNING.txt
-----------
1] First run the data_acquisition.py script to fetch latest news headlines and metadata and store it to Cassandra table.
Arguments to the program are keyspace and table. Currently database stored on vshukla keyspace.
Use command: python3 data_acquisition.py vshukla newsdata
2] Run newsTopicModels.py which contains the main code for text pre-processing and topic modelling with LDA. It exports two tables in CSV format.
Use command: spark-submit --packages anguenot:pyspark-cassandra:0.6.0 newsTopicModels.py
3] Run the post-processing script tablesToFeedToTableau.py to make the LDA exported csv files into Tableau data source friendly tables.
4] For Tableau visualisations refer the Tableau Public link https://public.tableau.com/views/TextAnalysisBidDataProject/START?:embed=y&:display_count=yes&publish=yes
GitLab code repo: https://csil-git1.cs.surrey.sfu.ca/ksanduja/BD-NewsAnalysts.git