Plant Health Automated E-commerce Data Extractor

The plant health automated e-commerce data extractor (PHAEDE) is developed for automatically collecting information of online traded products and performing risk identification for the collected data.

Data Collection

To collect information of online traded products, first navigate to the crawler project and then execute the crawl command.

cd PHWebcrawler
scrapy crawl AlibabaCrawler

List of R Code

main.R: The entry point of this project, where the necessary packages are loaded and the relevant scripts are sourced.
utils.R: A few general utility functions.
import_data.R: The code for importing the collected data to the environment.
remove_duplicates.R: The code for removing duplicate instances from the originally collected data.
preprocessing.R: A few data pre-processing steps.
data_profiling.R and visualization.R: Some simple data profiling and visualization based on the categorical and non-textual data.
text_mining.R: Some basic analysis based on the textual data.
fasttext.R and fasttext2.R: The fastText model that learns the word and document vectors from the textual data (using two different API functions).
cart_category.R: A CART decision tree applied on the product categories.
c50.R, cart.R, kmeans.R, knn.R, naive_bayes.R, random_forest.R, svm.R, and xgboost.R: The supervised and unsupervised models applied on the learned document vectors.
roc.R: The ROC curves for the trained models.
phaede/server.R and phaede/ui.R: The server side functions and the user interface definition of the shiny app.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plant Health Automated E-commerce Data Extractor

Data Collection

List of R Code

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
PHWebcrawler		PHWebcrawler
phaede		phaede
.gitignore		.gitignore
LICENSE		LICENSE
PHAEDE.Rproj		PHAEDE.Rproj
README.md		README.md
c50.R		c50.R
cart.R		cart.R
cart_category.R		cart_category.R
data_profiling.R		data_profiling.R
fasttext.R		fasttext.R
fasttext2.R		fasttext2.R
import_data.R		import_data.R
kmeans.R		kmeans.R
knn.R		knn.R
main.R		main.R
naive_bayes.R		naive_bayes.R
preprocessing.R		preprocessing.R
random_forest.R		random_forest.R
remove_duplicates.R		remove_duplicates.R
roc.R		roc.R
svm.R		svm.R
text_mining.R		text_mining.R
utils.R		utils.R
visualization.R		visualization.R
xgboost.R		xgboost.R

License

ai-cfia/PHAEDE

Folders and files

Latest commit

History

Repository files navigation

Plant Health Automated E-commerce Data Extractor

Data Collection

List of R Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages