The plant health automated e-commerce data extractor (PHAEDE) is developed for automatically collecting information of online traded products and performing risk identification for the collected data.
To collect information of online traded products, first navigate to the crawler project and then execute the crawl
command.
cd PHWebcrawler
scrapy crawl AlibabaCrawler
main.R
: The entry point of this project, where the necessary packages are loaded and the relevant scripts are sourced.utils.R
: A few general utility functions.import_data.R
: The code for importing the collected data to the environment.remove_duplicates.R
: The code for removing duplicate instances from the originally collected data.preprocessing.R
: A few data pre-processing steps.data_profiling.R
andvisualization.R
: Some simple data profiling and visualization based on the categorical and non-textual data.text_mining.R
: Some basic analysis based on the textual data.fasttext.R
andfasttext2.R
: The fastText model that learns the word and document vectors from the textual data (using two different API functions).cart_category.R
: A CART decision tree applied on the product categories.c50.R
,cart.R
,kmeans.R
,knn.R
,naive_bayes.R
,random_forest.R
,svm.R
, andxgboost.R
: The supervised and unsupervised models applied on the learned document vectors.roc.R
: The ROC curves for the trained models.phaede/server.R
andphaede/ui.R
: The server side functions and the user interface definition of the shiny app.