Merge pull request #6 from Machine-Learning-Pipelines/llm

Llm
Machine-Learning-Pipelines · Aug 2, 2023 · d6057ca · d6057ca
2 parents fca0121 + 610f3bc
commit d6057ca
Show file tree

Hide file tree

Showing 47 changed files with 6,155 additions and 3,199 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,9 +6,11 @@ site/
 sphinxdocs/
 
 case-studies/individual/
-
+case-studies/arxiv-corpus/gold_standard_old/
+case-studies/arxiv-corpus/gold_standard_test/
 client_secrets.json
 
+outputs/
 doc/
 
 # Byte-compiled / optimized / DLL files

diff --git a/README.md b/README.md
@@ -22,5 +22,10 @@ Distributed under the MIT License. See `LICENSE.txt` for more information.
 ### Funding
 
 We thank the The Center for Research and Education in AI and Learning (REAL@USC) for their funding and support towards this project.
-
 Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the The Center for Research and Education in AI and Learning (REAL@USC).
+
+---
+
+This material is based upon work supported by the National Science Foundation under Grant No. OAC 2138773
+
+---
diff --git a/case-studies/arxiv-corpus/gold_standard.dvc b/case-studies/arxiv-corpus/gold_standard.dvc
@@ -1,5 +1,5 @@
 outs:
-- md5: 95f383f75d5092838472870faaf4528d.dir
-  size: 1356118693
-  nfiles: 4276
+- md5: fa4149515a6f4ab47eb3b373cfc8f815.dir
+  size: 1353977566
+  nfiles: 4273
   path: gold_standard
diff --git a/case-studies/arxiv-corpus/manual_eval.csv b/case-studies/arxiv-corpus/manual_eval.csv
diff --git a/case-studies/plots/heatmap.png b/case-studies/plots/heatmap.png
diff --git a/case-studies/plots/heatmap_manual_eval.png b/case-studies/plots/heatmap_manual_eval.png
diff --git a/case-studies/plots/heatmap_repo.png b/case-studies/plots/heatmap_repo.png
diff --git a/case-studies/plots/heatmap_repo_palettes.png b/case-studies/plots/heatmap_repo_palettes.png
diff --git a/case-studies/plots/workflow.png b/case-studies/plots/workflow.png
diff --git a/docs/architecture.md b/docs/architecture.md
diff --git a/docs/docstrings.md b/docs/docstrings.md
@@ -1,10 +1,35 @@
-# Docstrings
+# Architecture
 
-## Documentation of `tex_eval.py`
+## `tex_eval` module
 
-::: src.reproscreener.tex_eval
+The `tex_eval` module is used to evaluate `.tex` files exttracted from the arXiv source tarball of the paper.
 
-## Documentation of `repo_eval.py`
+::: reproscreener.tex_eval
+    options:
+      show_source: false
+      heading_level: 3
 
-::: src.reproscreener.repo_eval
+## `repo_eval` module
 
+::: reproscreener.repo_eval
+    options:
+      show_source: false
+      heading_level: 3
+
+## `scrape_arxiv` module
+
+The `scrape_arxiv` module is used to obtain the gold standard dataset from the arXiv. It includes the PDFs, source tarballs, and abstract for each paper.
+
+::: reproscreener.scrape_arxiv
+    options:
+      show_source: false
+      heading_level: 3
+
+## `gold_standard` module
+
+The `gold_standard` module is used to evaluate and compare the performance of `reproscreener` on the gold standard dataset. It uses the data from the `scrape_arxiv` module.
+
+::: reproscreener.gold_standard
+    options:
+      show_source: false
+      heading_level: 3
diff --git a/docs/evaluation_results.ipynb b/docs/evaluation_results.ipynb
diff --git a/docs/funding.md b/docs/funding.md
@@ -1,5 +1,11 @@
 # Funding
 
+
+
 We thank the The Center for Research and Education in AI and Learning (REAL@USC) for their funding and support towards this project.
 
-Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the The Center for Research and Education in AI and Learning (REAL@USC).
+Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the The Center for Research and Education in AI and Learning (REAL@USC).
+
+---
+
+This material is based upon work supported by the National Science Foundation under Grant No. OAC 2138773
diff --git a/docs/manual_eval.csv b/docs/manual_eval.csv