Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Commit

Permalink
Merge pull request #41 from qiyangqd/qiyangnew
Browse files Browse the repository at this point in the history
Addressing feedbacks; edited codes; redo plots and final repots
  • Loading branch information
xiaoyuanf authored Mar 24, 2020
2 parents 6416b82 + 4147df3 commit 401215d
Show file tree
Hide file tree
Showing 23 changed files with 293 additions and 206 deletions.
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@ data/cleaned_data.csv : data/raw_data.csv scripts/data_wrangle.R

# EDA
images/corr.png images/facted_hist.png images/heatmap.png images/season_PM2.5.png images/year_PM2.5.png : data/cleaned_data.csv data/raw_data.csv scripts/eda.R
Rscript scripts/eda.R --raw_path="data/raw_data.csv" --clean_path="data/cleaned_data.csv" --image_path="images"
Rscript scripts/eda.R --raw_path="data/raw_data.csv" --clean_path="data/cleaned_data.csv" --image_folder_path="images"

# Generate model
docs/model.rds : data/cleaned_data.csv scripts/model.R
Rscript scripts/model.R --clean_path="data/cleaned_data.csv" --model_path="docs/model.rds"
data/model.rds : data/cleaned_data.csv scripts/model.R
Rscript scripts/model.R --clean_path="data/cleaned_data.csv" --model_path="data/model.rds"

# Knit report
docs/finalreport.html docs/finalreport.pdf docs/finalreport.md : images/corr.png images/facted_hist.png images/heatmap.png images/season_PM2.5.png images/year_PM2.5.png docs/finalreport.Rmd data/cleaned_data.csv data/raw_data.csv scripts/knit.R
docs/finalreport.html docs/finalreport.pdf docs/finalreport.md : images/corr.png images/facted_hist.png images/heatmap.png images/season_PM2.5.png images/year_PM2.5.png docs/finalreport.Rmd data/cleaned_data.csv data/raw_data.csv scripts/knit.R data/model.rds
Rscript scripts/knit.R --rmd_path="docs/finalreport.Rmd"

# Clean targets
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,19 +45,19 @@ __tests__: Tests for functions.
3. Run the following scripts (in order) with specified arguments.

- [load_data.R](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/scripts/load_data.R): Save the raw data as a .csv in the `data` folder from an external URL.
`Rscript scripts/load_data.R --data_url=https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv`
`Rscript scripts/load_data.R --data_url="https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv"`

- [data_wrangle.R](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/scripts/data_wrangle.R): Save a new cleaned dataset as a .csv in the `data` folder.
`Rscript scripts/data_wrangle.R --raw_path=<raw_data_path> --clean_path=<clean_data_path>`
`Rscript scripts/data_wrangle.R --raw_path="data/raw_data.csv" --clean_path="data/cleaned_data.csv"`

- [eda.R](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/scripts/eda.R): Run exploratory data analysis and save the plots in a user defined location.
`Rscript scripts/eda.R --raw_path=<raw_data_path> --clean_path=<clean_data_path> --image_path=<image_path>`
`Rscript scripts/eda.R --raw_path="data/raw_data.csv" --clean_path="data/cleaned_data.csv" --image_folder_path="images"`

- [model.R](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/scripts/model.R): Run a linear regression and save the model in a user defined location.
`Rscript scripts/model.R --clean_path=<clean_data_path> --model_path=<model_path>`
`Rscript scripts/model.R --clean_path="data/cleaned_data.csv" --model_path="data/model.rds"`

- [knit.R](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/scripts/knit.R): Knit the final report.
`Rscript scripts/knit.R --rmd_path=<rmd_path>`
`Rscript scripts/knit.R --rmd_path="docs/finalreport.Rmd"`

### If you are using GNU MAKE:

Expand Down
Empty file added corr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added data/model.rds
Binary file not shown.
6 changes: 3 additions & 3 deletions docs/finalreport.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ Below are the variables in the dataset:
| day | Quantitative |Day of data in this row|
| hour | Quantitative |Hour of data in this row|
| `PM2.5` | Quantitative |`PM2.5` concentration (ug/m^3)|
| DEWP | Quantitative |Dew Point (℃)|
| TEMP | Quantitative |Temperature (℃)|
| DEWP | Quantitative |Dew Point (°C)|
| TEMP | Quantitative |Temperature (°C)|
| PRES | Quantitative |Pressure (hPa)|
| cbwd | Categorical |Combined wind direction|
| lws | Quantitative |Cumulated wind speed (m/s)|
Expand Down Expand Up @@ -111,7 +111,7 @@ The purpose of the line chart was to show the change of `PM2.5` concentration ac
## Analysis methods

```{r}
lm <- readRDS(file=here::here("docs", "model.rds"))
lm <- readRDS(file=here::here("data", "model.rds"))
tidy(lm)
```
```{r}
Expand Down
40 changes: 20 additions & 20 deletions docs/finalreport.html

Large diffs are not rendered by default.

24 changes: 12 additions & 12 deletions docs/finalreport.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ Below are the variables in the dataset:
| day | Quantitative |Day of data in this row|
| hour | Quantitative |Hour of data in this row|
| `PM2.5` | Quantitative |`PM2.5` concentration (ug/m^3)|
| DEWP | Quantitative |Dew Point (℃)|
| TEMP | Quantitative |Temperature (℃)|
| DEWP | Quantitative |Dew Point (°C)|
| TEMP | Quantitative |Temperature (°C)|
| PRES | Quantitative |Pressure (hPa)|
| cbwd | Categorical |Combined wind direction|
| lws | Quantitative |Cumulated wind speed (m/s)|
Expand Down Expand Up @@ -92,21 +92,21 @@ The purpose of the line chart was to show the change of `PM2.5` concentration ac


```r
lm <- readRDS(file=here::here("docs", "model.rds"))
lm <- readRDS(file=here::here("data", "model.rds"))
tidy(lm)
```

```
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1836. 72.9 25.2 5.25e-139
## 2 DEWP 4.10 0.0518 79.2 0.
## 3 TEMP -6.29 0.0675 -93.1 0.
## 4 PRES -1.62 0.0712 -22.8 2.52e-114
## 5 cbwdNE -28.9 1.45 -20.0 2.78e- 88
## 6 cbwdNW -39.9 1.13 -35.1 9.70e-267
## 7 cbwdSE 0.435 1.10 0.397 6.92e- 1
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1836. 72.9 25.2 5.25e-139
## 2 DEWP 4.10 0.0518 79.2 0.
## 3 TEMP -6.29 0.0675 -93.1 0.
## 4 PRES -1.62 0.0712 -22.8 2.52e-114
## 5 cbwdNortheast -28.9 1.45 -20.0 2.78e- 88
## 6 cbwdNorthwest -39.9 1.13 -35.1 9.70e-267
## 7 cbwdSoutheast 0.435 1.10 0.397 6.92e- 1
```

```r
Expand Down
Binary file modified docs/finalreport.pdf
Binary file not shown.
163 changes: 96 additions & 67 deletions docs/finalreport.tex
Original file line number Diff line number Diff line change
@@ -1,47 +1,34 @@
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
%
\documentclass[
]{article}
\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% Use upquote if available, for straight quotes in verbatim environments
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
pdftitle={Final report},
pdfauthor={Margot Chen, Qi Yang},
hidelinks,
pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Final report},
pdfauthor={Margot Chen, Qi Yang},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
Expand Down Expand Up @@ -83,14 +70,6 @@
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\usepackage{longtable,booktabs}
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
Expand All @@ -100,18 +79,52 @@
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\setlength{\emergencystretch}{3em} % prevent overfull lines
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{-\maxdimen} % remove section numbering
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi

%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}

%%% Change title format to be more compact
\usepackage{titling}

% Create subtitle command for use in maketitle
\providecommand{\subtitle}[1]{
\posttitle{
\begin{center}\large#1\end{center}
}
}

\setlength{\droptitle}{-2em}

\title{Final report}
\pretitle{\vspace{\droptitle}\centering\huge}
\posttitle{\par}
\author{Margot Chen, Qi Yang}
\preauthor{\centering\large\emph}
\postauthor{\par}
\predate{\centering\large\emph}
\postdate{\par}
\date{2020/3/14}

\title{Final report}
\author{Margot Chen, Qi Yang}
\date{2020/3/14}

\begin{document}
\maketitle
Expand Down Expand Up @@ -178,8 +191,8 @@ \subsection{Data Description}\label{data-description}}
hour & Quantitative & Hour of data in this row\tabularnewline
\texttt{PM2.5} & Quantitative & \texttt{PM2.5} concentration
(ug/m\^{}3)\tabularnewline
DEWP & Quantitative & Dew Point (℃)\tabularnewline
TEMP & Quantitative & Temperature (℃)\tabularnewline
DEWP & Quantitative & Dew Point (°C)\tabularnewline
TEMP & Quantitative & Temperature (°C)\tabularnewline
PRES & Quantitative & Pressure (hPa)\tabularnewline
cbwd & Categorical & Combined wind direction\tabularnewline
lws & Quantitative & Cumulated wind speed (m/s)\tabularnewline
Expand Down Expand Up @@ -284,30 +297,45 @@ \subsection{Analysis methods}\label{analysis-methods}}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lm <-}\StringTok{ }\KeywordTok{readRDS}\NormalTok{(}\DataTypeTok{file=}\NormalTok{here}\OperatorTok{::}\KeywordTok{here}\NormalTok{(}\StringTok{"docs"}\NormalTok{, }\StringTok{"model.rds"}\NormalTok{))}
\NormalTok{lm <-}\StringTok{ }\KeywordTok{readRDS}\NormalTok{(}\DataTypeTok{file=}\NormalTok{here}\OperatorTok{::}\KeywordTok{here}\NormalTok{(}\StringTok{"data"}\NormalTok{, }\StringTok{"model.rds"}\NormalTok{))}
\KeywordTok{tidy}\NormalTok{(lm)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1836. 72.9 25.2 5.25e-139
## 2 DEWP 4.10 0.0518 79.2 0.
## 3 TEMP -6.29 0.0675 -93.1 0.
## 4 PRES -1.62 0.0712 -22.8 2.52e-114
## 5 cbwdNE -28.9 1.45 -20.0 2.78e- 88
## 6 cbwdNW -39.9 1.13 -35.1 9.70e-267
## 7 cbwdSE 0.435 1.10 0.397 6.92e- 1
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1836. 72.9 25.2 5.25e-139
## 2 DEWP 4.10 0.0518 79.2 0.
## 3 TEMP -6.29 0.0675 -93.1 0.
## 4 PRES -1.62 0.0712 -22.8 2.52e-114
## 5 cbwdNortheast -28.9 1.45 -20.0 2.78e- 88
## 6 cbwdNorthwest -39.9 1.13 -35.1 9.70e-267
## 7 cbwdSoutheast 0.435 1.10 0.397 6.92e- 1
\end{verbatim}

\hypertarget{results}{%
\subsection{Results}\label{results}}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lm)}
\end{Highlighting}
\end{Shaded}

\includegraphics{finalreport_files/figure-latex/unnamed-chunk-4-1.pdf}
\includegraphics{finalreport_files/figure-latex/unnamed-chunk-4-2.pdf}
\includegraphics{finalreport_files/figure-latex/unnamed-chunk-4-3.pdf}
\includegraphics{finalreport_files/figure-latex/unnamed-chunk-4-4.pdf}

Judging by the plots, we can tell that this model has huge residuals and
cannot predict pm2.5 well.

\hypertarget{results-placeholder-will-be-adjusted-later}{%
\subsection{Results (placeholder, will be adjusted
later)}\label{results-placeholder-will-be-adjusted-later}}

In a nutshell, we find that \texttt{PM2.5} concentration is more likely
to change with time instead of meteorological conditons, which is rather
surprising as previous studies have shown correlation between
to change with time instead of meteorological conditions, which is
rather surprising as previous studies have shown correlation between
\texttt{PM2.5} and weather variables. Our first finding is that the
correlations between dew point (DEWP), temperature (TEMP) or pressure
(PRES) and \texttt{PM2.5}concentration are rather weak (Figure 1), so
Expand Down Expand Up @@ -392,4 +420,5 @@ \subsubsection{References}\label{references}}
weather impact, APEC and winter heating. Proceedings of the Royal
Society A, 471, 20150257.


\end{document}
Binary file modified docs/finalreport_files/figure-html/unnamed-chunk-4-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/finalreport_files/figure-html/unnamed-chunk-4-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/finalreport_files/figure-html/unnamed-chunk-4-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/finalreport_files/figure-html/unnamed-chunk-4-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/model.rds
Binary file not shown.
Binary file modified images/corr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/facted_hist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/heatmap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/season_PM2.5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/year_PM2.5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 14 additions & 10 deletions scripts/data_wrangle.R
Original file line number Diff line number Diff line change
@@ -1,19 +1,22 @@
# author: Margot Chen
# date: 2020-03-08

"This script cleans the raw dataset by deleting the first column and NAs of the `raw_data.csv`.
Please save the cleaned dataset in the 'data' folder when using command line.
"This script reads in the raw data file (in the data folder), cleans the raw dataset by deleting the first column and NAs of the `raw_data.csv`.
It is recommended to save the cleaned data file in the data folder as well.
Usage: data_wrangle.R --raw_path=<raw_data_path> --clean_path=<clean_data_path>
Usage: data_wrangle.R --raw_path=<path_to_raw_data_file> --clean_path=<path_to_clean_data_file>
" -> doc

library(tidyverse)
library(docopt)
library(here)
library(glue)
# Load packages
# Create package list
c <- c("tidyverse", "here", "docopt", "glue")
# Suppress output package list
invisible(lapply(c, require, character.only = TRUE))

# Read in command line arguments
opt <- docopt(doc)

# Main function
main <- function(raw_path, clean_path) {

# Read in the raw data
Expand All @@ -29,7 +32,8 @@ main <- function(raw_path, clean_path) {
# Save the cleaned data
save_data(df, clean_path)

print(glue("The cleaned dataset is saved as ", clean_path, "."))
# Message to the user
print(glue("The cleaned dataset is saved to ", clean_path, "."))

}

Expand All @@ -38,7 +42,7 @@ main <- function(raw_path, clean_path) {
#' @example read_data('data/raw_data.csv')
#'
#' @param clean_path indicates where the wrangled data should be saved.
#' @example save_data("data/data_cleaned.csv")
#' @example save_data("data/cleaned_data.csv")

# Read in the raw dataset
read_data <- function(raw_path) {
Expand All @@ -50,6 +54,6 @@ save_data <- function(df, clean_path) {
write.csv(df, here(clean_path), row.names=FALSE)
}

# Tests
# Call the main function
main(opt$raw_path, opt$clean_path)

Loading

0 comments on commit 401215d

Please sign in to comment.