Merge pull request #8 from qiyangqd/three_folders_required

Edit the .rmd file and knitted
STAT547-UBC-2019-20 · Mar 1, 2020 · 178d6b0 · 178d6b0
2 parents 1daab49 + e198de4
commit 178d6b0
Show file tree

Hide file tree

Showing 4 changed files with 9 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -5,7 +5,7 @@ This is the project repository for Group 12 in STAT547M in University of British
 Margot Chen, Qi Yang
 
 ## Links to milestones
-The links below will update through the course. A release will be created when a milestone is completed. 
+The links below will update through the course. A release will be tagged when a milestone is completed. 
 
 __Milestone 1:__ The HTML version of the project proposal can be found [here](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/docs/milestone1.html)    
 __Milestone 2:__   

diff --git a/docs/milestone1.Rmd b/docs/milestone1.Rmd
@@ -24,7 +24,7 @@ Beijing, the capital city of China, is fighting against `PM2.5` pollution in rec
 Previous studies showed that __meteorological conditions__, such as wind and humidity, could contribute to the formation of `PM2.5`. Therefore, we speculate that there could be correlations between Beijing’s `PM2.5` concentration and the meteorological conditions in a sufficient period of time. If so, knowing the meteorological conditions can support the assessment and even prediction of air quality in Beijing. 
 
 ### Data Description  
-The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. It is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).     
+The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. This is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).     
 
 Below are the variables in the dataset:    
 
@@ -50,7 +50,7 @@ df<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PR
 ```
 
 ```{r}
-sum(is.na(df$PM2.5))/length(df$PM2.5)
+sum(is.na(df$pm2.5))/length(df$pm2.5)
 ```
 
 So there are 4.73% missing values in the `PM2.5` variable, which shows the data quality is reasonably good. We generated a new dataset for some plots by omitting the missing values.

diff --git a/docs/milestone1.html b/docs/milestone1.html
@@ -391,7 +391,7 @@ <h3>Introduction</h3>
 </div>
 <div id="data-description" class="section level3">
 <h3>Data Description</h3>
-<p>The <a href="https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#">dataset</a> used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. It is an hourly dataset containing 1) the <code>PM2.5</code> of US Embassy in Beijing and 2) <strong>meteorological statistics</strong> from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded <a href="https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv">here</a>.</p>
+<p>The <a href="https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#">dataset</a> used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. This is an hourly dataset containing 1) the <code>PM2.5</code> of US Embassy in Beijing and 2) <strong>meteorological statistics</strong> from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded <a href="https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv">here</a>.</p>
 <p>Below are the variables in the dataset:</p>
 <table>
 <thead>
@@ -468,8 +468,8 @@ <h3>Data Description</h3>
 <div id="dataset-loading" class="section level3">
 <h3>Dataset loading</h3>
 <pre class="r"><code>df&lt;-read.csv(&quot;https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv&quot;)</code></pre>
-<pre class="r"><code>sum(is.na(df$PM2.5))/length(df$PM2.5)</code></pre>
-<pre><code>## [1] NaN</code></pre>
+<pre class="r"><code>sum(is.na(df$pm2.5))/length(df$pm2.5)</code></pre>
+<pre><code>## [1] 0.04716594</code></pre>
 <p>So there are 4.73% missing values in the <code>PM2.5</code> variable, which shows the data quality is reasonably good. We generated a new dataset for some plots by omitting the missing values.</p>
 <pre class="r"><code>df_clean&lt;- na.omit(df)</code></pre>
 </div>

diff --git a/docs/milestone1.md b/docs/milestone1.md
@@ -17,7 +17,7 @@ Beijing, the capital city of China, is fighting against `PM2.5` pollution in rec
 Previous studies showed that __meteorological conditions__, such as wind and humidity, could contribute to the formation of `PM2.5`. Therefore, we speculate that there could be correlations between Beijing’s `PM2.5` concentration and the meteorological conditions in a sufficient period of time. If so, knowing the meteorological conditions can support the assessment and even prediction of air quality in Beijing. 
 
 ### Data Description  
-The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. It is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).     
+The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. This is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).     
 
 Below are the variables in the dataset:    
 
@@ -45,11 +45,11 @@ df<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PR
 
 
 ```r
-sum(is.na(df$PM2.5))/length(df$PM2.5)
+sum(is.na(df$pm2.5))/length(df$pm2.5)
 ```
 
 ```
-## [1] NaN
+## [1] 0.04716594
 ```
 
 So there are 4.73% missing values in the `PM2.5` variable, which shows the data quality is reasonably good. We generated a new dataset for some plots by omitting the missing values.