Merge pull request #16 from fhdsl/cansavvy/add-info

GitHub Example plots and some standardization
fhdsl · Mar 29, 2024 · 8436007 · 8436007
2 parents a14b521 + 8bb2cce
commit 8436007
Show file tree

Hide file tree

Showing 10 changed files with 291 additions and 71 deletions.
diff --git a/_config_automation.yml b/_config_automation.yml
@@ -31,7 +31,12 @@ cran_googlesheet:
 
 ###### GitHub ######
 refresh-github: yes
-github_repos: [ fhdsl/metricminer, fhdsl/metricminer.org ]
+github_repos: [
+  fhdsl/metricminer,
+  fhdsl/metricminer-dashboard,
+  jhudsl/OTTR_Template,
+  jhudsl/OTTR_Template_Website
+  ]
 github_googlesheet:
 
 ###### Google Analytics ######

diff --git a/calendly.Rmd b/calendly.Rmd
@@ -1,6 +1,10 @@
 ---
 title: "Calendly"
-output: html_document
+output: 
+  html_document:
+    toc: true
+    toc_float: true
+    toc_collapsed: true
 date: "`r format(Sys.time(), '%d %B, %Y')`"
 ---
 
@@ -40,3 +44,13 @@ After you've set up authorization you'll need to check the following items in th
 refresh-calendly: yes
 calendly_googlesheet:
 ```
+
+## Customizing the data 
+
+In order to customize the data you are downloading from calendly you can modify the 
+`refresh-scripts/refresh-calendly.R` script in your repository. 
+
+You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible. 
+
+If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose). 
+
diff --git a/citations.Rmd b/citations.Rmd
@@ -1,6 +1,10 @@
 ---
 title: "Citations"
-output: html_document
+output:
+  html_document:
+    toc: true
+    toc_float: true
+    toc_collapsed: true
 date: "`r format(Sys.time(), '%d %B, %Y')`"
 ---
 
@@ -19,33 +23,35 @@ citations <- readr::read_tsv(file.path("metricminer_data", "citations", "citatio
 # citations <- googlesheets4::read_sheet(yaml$citations_googlesheet)
 ```
 
+Here we show how to get the total counts per original paper:
+
+```{r,message=FALSE}
+library(dplyr)
+# here we remove duplicates if there are any of the same titles of citing papers for each original paper and then get a count of the number of rows for each original paper (aka how many times it is cited)
+citations %>%
+  distinct(original_paper, cite_titles, .keep_all = TRUE) %>%
+  count(original_paper)
+```
+
+## Data information
+
 Column information:
 
 - `original_paper` shows papers that we have captured citation information about
 - `cite_titles` shows papers that cite the original paper
-- `links`column shows the link for the paper that cites the original paper (the `cite_titles` papers). 
+- `links`column shows the link for the paper that cites the original paper (the `cite_titles` papers).
 
 ```{r, message = FALSE}
 knitr::kable(citations)
 ```
 
-Here we show how to get the total counts per original paper:
-
-```{r,message=FALSE}
-library(dplyr)
-# here we remove duplicates if there are any of the same titles of citing papers for each original paper and then get a count of the number of rows for each original paper (aka how many times it is cited)
-citations %>% 
-  distinct(original_paper, cite_titles, .keep_all = TRUE) %>% 
-  count(original_paper)
-```
-
 ## Setting up Citations
 
 1. Go to: https://scholar.google.com/scholar
 2. Search for the paper you are looking for the citation count.
 3. Then click the `Cited by ___` button below the title of the paper
-4. Copy and paste this in the `_config_automation.yml` file in the `citation_papers` section. 
-         
+4. Copy and paste this in the `_config_automation.yml` file in the `citation_papers` section.
+
 ```
 ###### Citations ######
 refresh-citations: yes
@@ -56,3 +62,13 @@ citation_googlesheet:
 ```
 - [ ] In the `config_automation.yml` file, make sure that `refresh-citations` is set to "yes".
 - [ ] Optionally, if you are saving data to google, specify a googlesheet ID in `citation_googlesheet` if you'd like the citation data to be saved to. This will only be relevant if you've set `data_dest` to `google`.
+
+
+## Customizing Citation Data 
+
+In order to customize the data you are downloading from Google Scholar you can modify the 
+`refresh-scripts/refresh-citations.R` script in your repository. 
+
+You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible. 
+
+If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose). 
diff --git a/cran.Rmd b/cran.Rmd
@@ -1,43 +1,45 @@
 ---
 title: "CRAN"
-output: html_document
+output: 
+  html_document:
+    toc: true
+    toc_float: true
+    toc_collapsed: true
 date: "`r format(Sys.time(), '%d %B, %Y')`"
 ---
 
 ## Preview
 
-```{r}
+```{r, echo = FALSE, hide = TRUE, message = FALSE}
 library(tidyverse)
-```
 
-```{r, echo = FALSE, hide = TRUE}
 root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))
 yaml <- yaml::read_yaml(file.path(root_dir, "_config_automation.yml"))
 
 ## For github
 cran <- readr::read_tsv(file.path("metricminer_data", "cran", "cran.tsv"))
 
-## For google 
+## For google
 # cran <- googlesheets4::read_sheet(yaml$cran_googlesheet)
 ```
-```{r}
-cran %>% dplyr::summarize(download_total = sum(count))
-```
 
+Total CRAN downloads for all packages: 
 
 ```{r}
-cran %>% dplyr::group_by(package) %>%
-   dplyr::summarize(download_total = sum(count))
+cran %>% dplyr::summarize(download_total = sum(count))
+```
+CRAN package downloads over time, summarized by month. 
 
-cran_stats <- cran %>% 
-  separate(date, into=c("year", "month name", "day"), sep = "-") %>% 
+```{r, message = FALSE}
+cran_stats <- cran %>%
+  separate(date, into=c("year", "month name", "day"), sep = "-") %>%
   unite("Month", c("year", "month name"), sep='-', remove=TRUE) %>%  
-  group_by(Month, package) %>% 
+  group_by(Month, package) %>%
   summarise(monthly_downloads = sum(count)) %>% #summarize monthly downloads by package
-  filter(monthly_downloads > 0) #drop the 0's 
+  filter(monthly_downloads > 0) #drop the 0's
 
-ggplot(cran_stats, aes(Month, monthly_downloads, group=package, color = package)) + 
-  geom_line() + 
+ggplot(cran_stats, aes(Month, monthly_downloads, group=package, color = package)) +
+  geom_line() +
   geom_point() +
   theme(panel.background = element_blank(), panel.grid = element_blank()) +
   theme(axis.text.x = element_text(angle = 90)) +
@@ -60,3 +62,12 @@ cran_googlesheet:
 - [ ] In the `config_automation.yml` file, make sure that `refresh-cran` is set to "yes".
 - [ ] In the `cran_packages` of your `config_automation.yml`, type the names of the packages that you'd like to collect data from on CRAN. Type them exactly as they are spelled, case sensitive, separated by commas. Delete the example package names we've put there.
 - [ ] Optionally, if you are saving data to google, specify a googlesheet ID in `cran_googlesheet` you'd like the CRAN data to be saved to. This will only be relevant if you've set `data_dest` to `google`.
+
+## Customizing CRAN Data 
+
+In order to customize the data you are downloading from CRAN you can modify the 
+`refresh-scripts/refresh-cran.R` script in your repository. 
+
+You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible. 
+
+If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose). 
diff --git a/docs/github.html b/docs/github.html
@@ -13,7 +13,7 @@
 
 <title>GitHub</title>
 
-<script src="site_libs/header-attrs-2.25/header-attrs.js"></script>
+<script src="site_libs/header-attrs-2.26/header-attrs.js"></script>
 <script src="site_libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <meta name="viewport" content="width=device-width, initial-scale=1" />
 <link href="site_libs/bootstrap-3.3.5/css/cosmo.min.css" rel="stylesheet" />
@@ -315,25 +315,79 @@ <h4 class="date">15 March, 2024</h4>
 
 <div id="preview" class="section level2">
 <h2>Preview</h2>
-<pre><code>## Rows: 2 Columns: 6
-## ── Column specification ────────────────────────────────────────────────────────
-## Delimiter: &quot;\t&quot;
-## chr (1): repo_name
-## dbl (4): num_contributors, total_contributions, num_stars, health_percentage
-## lgl (1): num_forks
-## 
-## ℹ Use `spec()` to retrieve the full column specification for this data.
-## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
-## Rows: 12 Columns: 6
-## ── Column specification ────────────────────────────────────────────────────────
-## Delimiter: &quot;\t&quot;
-## chr  (1): repo
-## dbl  (4): count_clones, uniques_clones, count_views, uniques_views
-## date (1): timestamp
-## 
-## ℹ Use `spec()` to retrieve the full column specification for this data.
-## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
-<pre class="r"><code>knitr::kable(github)</code></pre>
+<p>Contributions to a repository example:</p>
+<p><img src="github_files/figure-html/unnamed-chunk-2-1.png" width="672" /></p>
+<p>Views of a GitHub Repository over time</p>
+<pre><code>## Warning: Removed 2 rows containing missing values or values outside the scale range
+## (`geom_line()`).</code></pre>
+<p><img src="github_files/figure-html/unnamed-chunk-3-1.png" width="672" /></p>
+</div>
+<div id="data-information" class="section level2">
+<h2>Data information</h2>
+<p>Data information for <code>github overall data</code>:</p>
+<ul>
+<li><code>num_forks</code> shows the number of times this repo has been
+forked. NA means it has never been forked.</li>
+<li><code>num_contributors</code> how many people have contributed to
+this repo</li>
+<li><code>num_stars</code> how many people have starred this repo?</li>
+<li><code>health_percentage</code> what percentage of <a
+href="https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/about-community-profiles-for-public-repositories">“good
+software health” items as described by GitHub</a> does this repo
+have?</li>
+</ul>
+<pre class="r"><code>knitr::kable(github_overall)</code></pre>
+<table style="width:100%;">
+<colgroup>
+<col width="22%" />
+<col width="10%" />
+<col width="17%" />
+<col width="20%" />
+<col width="10%" />
+<col width="18%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th align="left">repo_name</th>
+<th align="left">num_forks</th>
+<th align="right">num_contributors</th>
+<th align="right">total_contributions</th>
+<th align="right">num_stars</th>
+<th align="right">health_percentage</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td align="left">fhdsl/metricminer</td>
+<td align="left">NA</td>
+<td align="right">4</td>
+<td align="right">432</td>
+<td align="right">1</td>
+<td align="right">37</td>
+</tr>
+<tr class="even">
+<td align="left">fhdsl/metricminer.org</td>
+<td align="left">NA</td>
+<td align="right">2</td>
+<td align="right">28</td>
+<td align="right">0</td>
+<td align="right">37</td>
+</tr>
+</tbody>
+</table>
+<p>Data information for <code>github timecourse</code>:</p>
+<ul>
+<li><code>timestampe</code> shows the date the counts correspond to</li>
+<li><code>count_clones</code> tells the number of clones made on this
+day</li>
+<li><code>unique_clones</code> tells how many people did these clones
+come from?</li>
+<li><code>count_views</code> how many views did the repo get on this
+day?</li>
+<li><code>unique_views</code> how many people were those views
+from?</li>
+</ul>
+<pre class="r"><code>knitr::kable(github_timecourse)</code></pre>
 <table>
 <colgroup>
 <col width="25%" />

diff --git a/github.Rmd b/github.Rmd
@@ -1,27 +1,77 @@
 ---
 title: "GitHub"
-output: html_document
+output: 
+  html_document:
+    toc: true
+    toc_float: true
+    toc_collapsed: true
 date: "`r format(Sys.time(), '%d %B, %Y')`"
+
 ---
 
 ## Preview
 
-```{r, echo = FALSE, hide = TRUE}
+```{r, echo = FALSE, hide = TRUE, message=FALSE, warning = FALSE}
+library(ggplot2)
+library(magrittr)
+
 root_dir <- rprojroot::find_root(rprojroot::has_dir(".git"))
 yaml <- yaml::read_yaml(file.path(root_dir, "_config_automation.yml"))
 
 ## For github
-github <- readr::read_tsv(file.path("metricminer_data", "github", "github.tsv"))
-github <- readr::read_tsv(file.path("metricminer_data", "github", "github_timecourse.tsv"))
+github_overall <- readr::read_tsv(file.path("metricminer_data", "github", "github.tsv"))
+github_timecourse <- readr::read_tsv(file.path("metricminer_data", "github", "github_timecourse.tsv"))
+
+## For google
+# github_overall <- googlesheets4::read_sheet(yaml$github_googlesheet, sheet = "overall_stats")
+# github_timecourse <- googlesheets4::read_sheet(yaml$github_googlesheet, sheet = "timecourse")
+```
+
+Contributions to a repository example:
+
+```{r echo = FALSE, hide = TRUE, message=FALSE, warning = FALSE}
+github_overall %>% 
+  ggplot(aes(x = repo_name, y = total_contributions)) + 
+  geom_bar(stat = "identity", fill = "lavender") + 
+  theme_classic()
+```
+
+Views of a GitHub Repository over time 
+
+```{r echo = FALSE, hide = TRUE, message=FALSE}
+github_timecourse %>% 
+  ggplot(aes(x = timestamp, y = count_views, fill = repo, color = repo)) + 
+  geom_line(stat = "identity") + 
+  theme_classic() + 
+  ylab("date")
+```
 
-## For google 
-# github <- googlesheets4::read_sheet(yaml$github_googlesheet)
+## Data information 
+
+Data information for `github overall data`:
+
+- `num_forks` shows the number of times this repo has been forked. NA means it has never been forked. 
+- `num_contributors` how many people have contributed to this repo
+- `num_stars` how many people have starred this repo?
+- `health_percentage` what percentage of ["good software health" items as described by GitHub](https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/about-community-profiles-for-public-repositories) does this repo have? 
+
+```{r, message=FALSE}
+knitr::kable(github_overall)
 ```
-```{r}
-knitr::kable(github)
+
+Data information for `github timecourse`:
+
+- `timestamp` shows the date the counts correspond to
+- `count_clones` tells the number of clones made on this day. NAs generally indicate no one cloned the repository that day. 
+- `unique_clones` tells how many people did these clones come from? NAs generally indicate no one cloned the repository that day.
+- `count_views` how many views did the repo get on this day? NAs generally indicate no one viewed the repository that date
+- `unique_views` how many people were those views from? NAs generally indicate no one viewed the repository that date
+
+```{r, message=FALSE}
+knitr::kable(github_timecourse)
 ```
 
-### Setting up GitHub
+## Setting up GitHub
 
 At this point you should already have your GitHub authorization set up for your metricminer dashboard by having [followed the instructions above.](#setting-up-your-dashboard-repository).
 
@@ -36,3 +86,12 @@ github_googlesheet:
 - [ ] In the `_config_automation.yml` file, make sure that `refresh-github` is set to "yes".
 - [ ] In `github_repos` of your `_config_automation.yml`, specify the names of the repositories you'd like to collect data from in `github_repos`. Make sure it includes the `owner/repository` e.g. `fhdsl/metricminer` not just `metricminer`. Commas need to separate the repositories. Delete the example repositories we put there.
 - [ ] Optionally, if you are saving data to Google, specify a googlesheet ID in `github_googlesheet` you'd like the GitHub data to be saved to. This will only be relevant if you've set `data_dest` to `google`.
+
+## Customizing GitHub Data 
+
+In order to customize the data you are downloading from GitHub you can modify the 
+`refresh-scripts/refresh-github.R` script in your repository. 
+
+You can take a look at the [`metricminer` R package documentation](https://hutchdatascience.org/metricminer/articles/getting-started.html) for more details about the functions and what is possible. 
+
+If you have a metric need that is not currently fulfilled by `metricminer` or `metricminer-dashboard` we encourage you to [file a GitHub issue with us and let us know about your new feature idea (or bug report)](https://github.com/fhdsl/metricminer/issues/new/choose).