Skip to content

Commit

Permalink
Merge pull request #16 from fhdsl/cansavvy/add-info
Browse files Browse the repository at this point in the history
GitHub Example plots and some standardization
  • Loading branch information
cansavvy authored Mar 29, 2024
2 parents a14b521 + 8bb2cce commit 8436007
Show file tree
Hide file tree
Showing 10 changed files with 291 additions and 71 deletions.
7 changes: 6 additions & 1 deletion _config_automation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,12 @@ cran_googlesheet:

###### GitHub ######
refresh-github: yes
github_repos: [ fhdsl/metricminer, fhdsl/metricminer.org ]
github_repos: [
fhdsl/metricminer,
fhdsl/metricminer-dashboard,
jhudsl/OTTR_Template,
jhudsl/OTTR_Template_Website
]
github_googlesheet:

###### Google Analytics ######
Expand Down
16 changes: 15 additions & 1 deletion calendly.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
title: "Calendly"
output: html_document
output:
html_document:
toc: true
toc_float: true
toc_collapsed: true
date: "`r format(Sys.time(), '%d %B, %Y')`"
---

Expand Down Expand Up @@ -40,3 +44,13 @@ After you've set up authorization you'll need to check the following items in th
refresh-calendly: yes
calendly_googlesheet:
```

## Customizing the data

In order to customize the data you are downloading from calendly you can modify the
`refresh-scripts/refresh-calendly.R` script in your repository.

You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible.

If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose).

44 changes: 30 additions & 14 deletions citations.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
---
title: "Citations"
output: html_document
output:
html_document:
toc: true
toc_float: true
toc_collapsed: true
date: "`r format(Sys.time(), '%d %B, %Y')`"
---

Expand All @@ -19,33 +23,35 @@ citations <- readr::read_tsv(file.path("metricminer_data", "citations", "citatio
# citations <- googlesheets4::read_sheet(yaml$citations_googlesheet)
```

Here we show how to get the total counts per original paper:

```{r,message=FALSE}
library(dplyr)
# here we remove duplicates if there are any of the same titles of citing papers for each original paper and then get a count of the number of rows for each original paper (aka how many times it is cited)
citations %>%
distinct(original_paper, cite_titles, .keep_all = TRUE) %>%
count(original_paper)
```

## Data information

Column information:

- `original_paper` shows papers that we have captured citation information about
- `cite_titles` shows papers that cite the original paper
- `links`column shows the link for the paper that cites the original paper (the `cite_titles` papers).
- `links`column shows the link for the paper that cites the original paper (the `cite_titles` papers).

```{r, message = FALSE}
knitr::kable(citations)
```

Here we show how to get the total counts per original paper:

```{r,message=FALSE}
library(dplyr)
# here we remove duplicates if there are any of the same titles of citing papers for each original paper and then get a count of the number of rows for each original paper (aka how many times it is cited)
citations %>%
distinct(original_paper, cite_titles, .keep_all = TRUE) %>%
count(original_paper)
```

## Setting up Citations

1. Go to: https://scholar.google.com/scholar
2. Search for the paper you are looking for the citation count.
3. Then click the `Cited by ___` button below the title of the paper
4. Copy and paste this in the `_config_automation.yml` file in the `citation_papers` section.
4. Copy and paste this in the `_config_automation.yml` file in the `citation_papers` section.

```
###### Citations ######
refresh-citations: yes
Expand All @@ -56,3 +62,13 @@ citation_googlesheet:
```
- [ ] In the `config_automation.yml` file, make sure that `refresh-citations` is set to "yes".
- [ ] Optionally, if you are saving data to google, specify a googlesheet ID in `citation_googlesheet` if you'd like the citation data to be saved to. This will only be relevant if you've set `data_dest` to `google`.


## Customizing Citation Data

In order to customize the data you are downloading from Google Scholar you can modify the
`refresh-scripts/refresh-citations.R` script in your repository.

You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible.

If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose).
43 changes: 27 additions & 16 deletions cran.Rmd
Original file line number Diff line number Diff line change
@@ -1,43 +1,45 @@
---
title: "CRAN"
output: html_document
output:
html_document:
toc: true
toc_float: true
toc_collapsed: true
date: "`r format(Sys.time(), '%d %B, %Y')`"
---

## Preview

```{r}
```{r, echo = FALSE, hide = TRUE, message = FALSE}
library(tidyverse)
```
```{r, echo = FALSE, hide = TRUE}
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))
yaml <- yaml::read_yaml(file.path(root_dir, "_config_automation.yml"))
## For github
cran <- readr::read_tsv(file.path("metricminer_data", "cran", "cran.tsv"))
## For google
## For google
# cran <- googlesheets4::read_sheet(yaml$cran_googlesheet)
```
```{r}
cran %>% dplyr::summarize(download_total = sum(count))
```

Total CRAN downloads for all packages:

```{r}
cran %>% dplyr::group_by(package) %>%
dplyr::summarize(download_total = sum(count))
cran %>% dplyr::summarize(download_total = sum(count))
```
CRAN package downloads over time, summarized by month.

cran_stats <- cran %>%
separate(date, into=c("year", "month name", "day"), sep = "-") %>%
```{r, message = FALSE}
cran_stats <- cran %>%
separate(date, into=c("year", "month name", "day"), sep = "-") %>%
unite("Month", c("year", "month name"), sep='-', remove=TRUE) %>%
group_by(Month, package) %>%
group_by(Month, package) %>%
summarise(monthly_downloads = sum(count)) %>% #summarize monthly downloads by package
filter(monthly_downloads > 0) #drop the 0's
filter(monthly_downloads > 0) #drop the 0's
ggplot(cran_stats, aes(Month, monthly_downloads, group=package, color = package)) +
geom_line() +
ggplot(cran_stats, aes(Month, monthly_downloads, group=package, color = package)) +
geom_line() +
geom_point() +
theme(panel.background = element_blank(), panel.grid = element_blank()) +
theme(axis.text.x = element_text(angle = 90)) +
Expand All @@ -60,3 +62,12 @@ cran_googlesheet:
- [ ] In the `config_automation.yml` file, make sure that `refresh-cran` is set to "yes".
- [ ] In the `cran_packages` of your `config_automation.yml`, type the names of the packages that you'd like to collect data from on CRAN. Type them exactly as they are spelled, case sensitive, separated by commas. Delete the example package names we've put there.
- [ ] Optionally, if you are saving data to google, specify a googlesheet ID in `cran_googlesheet` you'd like the CRAN data to be saved to. This will only be relevant if you've set `data_dest` to `google`.

## Customizing CRAN Data

In order to customize the data you are downloading from CRAN you can modify the
`refresh-scripts/refresh-cran.R` script in your repository.

You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible.

If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose).
94 changes: 74 additions & 20 deletions docs/github.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

<title>GitHub</title>

<script src="site_libs/header-attrs-2.25/header-attrs.js"></script>
<script src="site_libs/header-attrs-2.26/header-attrs.js"></script>
<script src="site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="site_libs/bootstrap-3.3.5/css/cosmo.min.css" rel="stylesheet" />
Expand Down Expand Up @@ -315,25 +315,79 @@ <h4 class="date">15 March, 2024</h4>

<div id="preview" class="section level2">
<h2>Preview</h2>
<pre><code>## Rows: 2 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: &quot;\t&quot;
## chr (1): repo_name
## dbl (4): num_contributors, total_contributions, num_stars, health_percentage
## lgl (1): num_forks
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 12 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: &quot;\t&quot;
## chr (1): repo
## dbl (4): count_clones, uniques_clones, count_views, uniques_views
## date (1): timestamp
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
<pre class="r"><code>knitr::kable(github)</code></pre>
<p>Contributions to a repository example:</p>
<p><img src="github_files/figure-html/unnamed-chunk-2-1.png" width="672" /></p>
<p>Views of a GitHub Repository over time</p>
<pre><code>## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).</code></pre>
<p><img src="github_files/figure-html/unnamed-chunk-3-1.png" width="672" /></p>
</div>
<div id="data-information" class="section level2">
<h2>Data information</h2>
<p>Data information for <code>github overall data</code>:</p>
<ul>
<li><code>num_forks</code> shows the number of times this repo has been
forked. NA means it has never been forked.</li>
<li><code>num_contributors</code> how many people have contributed to
this repo</li>
<li><code>num_stars</code> how many people have starred this repo?</li>
<li><code>health_percentage</code> what percentage of <a
href="https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/about-community-profiles-for-public-repositories">“good
software health” items as described by GitHub</a> does this repo
have?</li>
</ul>
<pre class="r"><code>knitr::kable(github_overall)</code></pre>
<table style="width:100%;">
<colgroup>
<col width="22%" />
<col width="10%" />
<col width="17%" />
<col width="20%" />
<col width="10%" />
<col width="18%" />
</colgroup>
<thead>
<tr class="header">
<th align="left">repo_name</th>
<th align="left">num_forks</th>
<th align="right">num_contributors</th>
<th align="right">total_contributions</th>
<th align="right">num_stars</th>
<th align="right">health_percentage</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">fhdsl/metricminer</td>
<td align="left">NA</td>
<td align="right">4</td>
<td align="right">432</td>
<td align="right">1</td>
<td align="right">37</td>
</tr>
<tr class="even">
<td align="left">fhdsl/metricminer.org</td>
<td align="left">NA</td>
<td align="right">2</td>
<td align="right">28</td>
<td align="right">0</td>
<td align="right">37</td>
</tr>
</tbody>
</table>
<p>Data information for <code>github timecourse</code>:</p>
<ul>
<li><code>timestampe</code> shows the date the counts correspond to</li>
<li><code>count_clones</code> tells the number of clones made on this
day</li>
<li><code>unique_clones</code> tells how many people did these clones
come from?</li>
<li><code>count_views</code> how many views did the repo get on this
day?</li>
<li><code>unique_views</code> how many people were those views
from?</li>
</ul>
<pre class="r"><code>knitr::kable(github_timecourse)</code></pre>
<table>
<colgroup>
<col width="25%" />
Expand Down
77 changes: 68 additions & 9 deletions github.Rmd
Original file line number Diff line number Diff line change
@@ -1,27 +1,77 @@
---
title: "GitHub"
output: html_document
output:
html_document:
toc: true
toc_float: true
toc_collapsed: true
date: "`r format(Sys.time(), '%d %B, %Y')`"

---

## Preview

```{r, echo = FALSE, hide = TRUE}
```{r, echo = FALSE, hide = TRUE, message=FALSE, warning = FALSE}
library(ggplot2)
library(magrittr)
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))
yaml <- yaml::read_yaml(file.path(root_dir, "_config_automation.yml"))
## For github
github <- readr::read_tsv(file.path("metricminer_data", "github", "github.tsv"))
github <- readr::read_tsv(file.path("metricminer_data", "github", "github_timecourse.tsv"))
github_overall <- readr::read_tsv(file.path("metricminer_data", "github", "github.tsv"))
github_timecourse <- readr::read_tsv(file.path("metricminer_data", "github", "github_timecourse.tsv"))
## For google
# github_overall <- googlesheets4::read_sheet(yaml$github_googlesheet, sheet = "overall_stats")
# github_timecourse <- googlesheets4::read_sheet(yaml$github_googlesheet, sheet = "timecourse")
```

Contributions to a repository example:

```{r echo = FALSE, hide = TRUE, message=FALSE, warning = FALSE}
github_overall %>%
ggplot(aes(x = repo_name, y = total_contributions)) +
geom_bar(stat = "identity", fill = "lavender") +
theme_classic()
```

Views of a GitHub Repository over time

```{r echo = FALSE, hide = TRUE, message=FALSE}
github_timecourse %>%
ggplot(aes(x = timestamp, y = count_views, fill = repo, color = repo)) +
geom_line(stat = "identity") +
theme_classic() +
ylab("date")
```

## For google
# github <- googlesheets4::read_sheet(yaml$github_googlesheet)
## Data information

Data information for `github overall data`:

- `num_forks` shows the number of times this repo has been forked. NA means it has never been forked.
- `num_contributors` how many people have contributed to this repo
- `num_stars` how many people have starred this repo?
- `health_percentage` what percentage of ["good software health" items as described by GitHub](https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/about-community-profiles-for-public-repositories) does this repo have?

```{r, message=FALSE}
knitr::kable(github_overall)
```
```{r}
knitr::kable(github)

Data information for `github timecourse`:

- `timestamp` shows the date the counts correspond to
- `count_clones` tells the number of clones made on this day. NAs generally indicate no one cloned the repository that day.
- `unique_clones` tells how many people did these clones come from? NAs generally indicate no one cloned the repository that day.
- `count_views` how many views did the repo get on this day? NAs generally indicate no one viewed the repository that date
- `unique_views` how many people were those views from? NAs generally indicate no one viewed the repository that date

```{r, message=FALSE}
knitr::kable(github_timecourse)
```

### Setting up GitHub
## Setting up GitHub

At this point you should already have your GitHub authorization set up for your metricminer dashboard by having [followed the instructions above.](#setting-up-your-dashboard-repository).

Expand All @@ -36,3 +86,12 @@ github_googlesheet:
- [ ] In the `_config_automation.yml` file, make sure that `refresh-github` is set to "yes".
- [ ] In `github_repos` of your `_config_automation.yml`, specify the names of the repositories you'd like to collect data from in `github_repos`. Make sure it includes the `owner/repository` e.g. `fhdsl/metricminer` not just `metricminer`. Commas need to separate the repositories. Delete the example repositories we put there.
- [ ] Optionally, if you are saving data to Google, specify a googlesheet ID in `github_googlesheet` you'd like the GitHub data to be saved to. This will only be relevant if you've set `data_dest` to `google`.

## Customizing GitHub Data

In order to customize the data you are downloading from GitHub you can modify the
`refresh-scripts/refresh-github.R` script in your repository.

You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible.

If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose).
Loading

0 comments on commit 8436007

Please sign in to comment.