Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Commit

Permalink
Merge pull request #8 from qiyangqd/three_folders_required
Browse files Browse the repository at this point in the history
Edit the .rmd file and knitted
  • Loading branch information
xiaoyuanf authored Mar 1, 2020
2 parents 1daab49 + e198de4 commit 178d6b0
Show file tree
Hide file tree
Showing 4 changed files with 9 additions and 9 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This is the project repository for Group 12 in STAT547M in University of British
Margot Chen, Qi Yang

## Links to milestones
The links below will update through the course. A release will be created when a milestone is completed.
The links below will update through the course. A release will be tagged when a milestone is completed.

__Milestone 1:__ The HTML version of the project proposal can be found [here](https://stat547-ubc-2019-20.github.io/group_12_qiyangqd_xiaoyuanf/docs/milestone1.html)
__Milestone 2:__
Expand Down
4 changes: 2 additions & 2 deletions docs/milestone1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Beijing, the capital city of China, is fighting against `PM2.5` pollution in rec
Previous studies showed that __meteorological conditions__, such as wind and humidity, could contribute to the formation of `PM2.5`. Therefore, we speculate that there could be correlations between Beijing’s `PM2.5` concentration and the meteorological conditions in a sufficient period of time. If so, knowing the meteorological conditions can support the assessment and even prediction of air quality in Beijing.

### Data Description
The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. It is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).
The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. This is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).

Below are the variables in the dataset:

Expand All @@ -50,7 +50,7 @@ df<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PR
```

```{r}
sum(is.na(df$PM2.5))/length(df$PM2.5)
sum(is.na(df$pm2.5))/length(df$pm2.5)
```

So there are 4.73% missing values in the `PM2.5` variable, which shows the data quality is reasonably good. We generated a new dataset for some plots by omitting the missing values.
Expand Down
6 changes: 3 additions & 3 deletions docs/milestone1.html
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,7 @@ <h3>Introduction</h3>
</div>
<div id="data-description" class="section level3">
<h3>Data Description</h3>
<p>The <a href="https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#">dataset</a> used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. It is an hourly dataset containing 1) the <code>PM2.5</code> of US Embassy in Beijing and 2) <strong>meteorological statistics</strong> from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded <a href="https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv">here</a>.</p>
<p>The <a href="https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#">dataset</a> used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. This is an hourly dataset containing 1) the <code>PM2.5</code> of US Embassy in Beijing and 2) <strong>meteorological statistics</strong> from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded <a href="https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv">here</a>.</p>
<p>Below are the variables in the dataset:</p>
<table>
<thead>
Expand Down Expand Up @@ -468,8 +468,8 @@ <h3>Data Description</h3>
<div id="dataset-loading" class="section level3">
<h3>Dataset loading</h3>
<pre class="r"><code>df&lt;-read.csv(&quot;https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv&quot;)</code></pre>
<pre class="r"><code>sum(is.na(df$PM2.5))/length(df$PM2.5)</code></pre>
<pre><code>## [1] NaN</code></pre>
<pre class="r"><code>sum(is.na(df$pm2.5))/length(df$pm2.5)</code></pre>
<pre><code>## [1] 0.04716594</code></pre>
<p>So there are 4.73% missing values in the <code>PM2.5</code> variable, which shows the data quality is reasonably good. We generated a new dataset for some plots by omitting the missing values.</p>
<pre class="r"><code>df_clean&lt;- na.omit(df)</code></pre>
</div>
Expand Down
6 changes: 3 additions & 3 deletions docs/milestone1.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Beijing, the capital city of China, is fighting against `PM2.5` pollution in rec
Previous studies showed that __meteorological conditions__, such as wind and humidity, could contribute to the formation of `PM2.5`. Therefore, we speculate that there could be correlations between Beijing’s `PM2.5` concentration and the meteorological conditions in a sufficient period of time. If so, knowing the meteorological conditions can support the assessment and even prediction of air quality in Beijing.

### Data Description
The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. It is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).
The [dataset](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#) used in our project was obtained from University of California Irvine Machine learning Repository. It was originally uploaded by Songxi Chen in Peking University, China. This is an hourly dataset containing 1) the `PM2.5` of US Embassy in Beijing and 2) __meteorological statistics__ from Beijing Capital International Airport. The data was collected from Jan 1st, 2010 to Dec 31st, 2014. The original purpose of the dataset was to assess the effect of Chinese government’s pollution reduction plan which started from 2012. The dataset can be downloaded [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv).

Below are the variables in the dataset:

Expand Down Expand Up @@ -45,11 +45,11 @@ df<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PR


```r
sum(is.na(df$PM2.5))/length(df$PM2.5)
sum(is.na(df$pm2.5))/length(df$pm2.5)
```

```
## [1] NaN
## [1] 0.04716594
```

So there are 4.73% missing values in the `PM2.5` variable, which shows the data quality is reasonably good. We generated a new dataset for some plots by omitting the missing values.
Expand Down

0 comments on commit 178d6b0

Please sign in to comment.