differences for PR #95

epiverse-trace · Aug 1, 2024 · fef4f2f · fef4f2f
1 parent 041a3cb
commit fef4f2f
Show file tree

Hide file tree

Showing 2 changed files with 116 additions and 6 deletions.
diff --git a/clean-data.md b/clean-data.md
@@ -316,7 +316,7 @@ This approach simplifies the data cleaning process, ensuring that categorical da
 
 In epidemiological data analysis it is also useful to track and analyze time-dependent events, such as the progression of a disease outbreak or the duration between sample collection and analysis.
 The `{cleanepi}` package  offers a convenient function for calculating the time elapsed between two dated events at different time scales. For example, the below code snippet utilizes the `span()` function to compute the time elapsed since the date of sample for the case identified
- until the date this document was generated (2024-07-09).
+ until the date this document was generated (2024-08-01).
 
 
 ``` r
@@ -343,9 +343,9 @@ utils::head(sim_ebola_data)
 1                        9                3
 2                       10                6
 3                        9                4
-4                        9                6
-5                        7                8
-6                        8                5
+4                        9                7
+5                        7                9
+6                        8                6
 ```
 
 After executing the `span()` function, two new columns named `time_since_sampling_date` and `remainder_months` are added to the **sim_ebola_data** dataset, containing the calculated time elapsed since the date of sampling for each case, measured in years, and the remaining time measured in months.
@@ -399,7 +399,7 @@ individual cleansing steps within the broader data cleansing process.
 
 You can view the report using `cleanepi::print_report()` function. 
 
-![Example of data cleaning report generated by `{cleanepi}`](fig/report_demo.png)
+![Example of data cleaning report generated by `{cleanepi}`.](fig/report_demo.png)
 
 ## Validating and tagging case data
 In outbreak analysis, once you have completed the initial steps of reading and cleaning the case data,
@@ -439,6 +439,116 @@ utils::head(data, 7)
 
 // tags: id:case_id, date_onset:date_onset, date_reporting:date_sample, gender:gender, age:age 
 ```
+The resulting `linelist` object resembles a data frame but offers richer features 
+and functionalities. Packages that are linelist-aware can leverage these 
+features. For example, you can extract a dataframe of only the tagged columns 
+using the `linelist::tags_df()` function, as shown below:
+
+``` r
+head(linelist::tags_df(data), 5)
+```
+
+``` output
+     id date_onset date_reporting gender age
+1 14905 2015-03-15     2015-04-06   male  90
+2 13043       <NA>     2014-01-03 female  25
+3 14364 2014-02-09     2015-03-03 female  54
+4 14675 2014-10-19     2014-12-31   <NA>  90
+5 12648 2014-06-08     2016-10-10 female  74
+```
+
+Safeguarding is implicitly built into the linelist objects. If you try to delete any of the tagged 
+columns, you will receive an error or warning message, as shown in the example below.
+
+
+``` r
+new_df <- data |> 
+  dplyr::select(linelist::has_tag(c("id", "age")))
+```
+
+``` warning
+Warning: The following tags have lost their variable:
+ date_onset:date_onset, date_reporting:date_sample, gender:gender
+```
+
+The default options for lost  tags in a linelist object is warning. However, it can be change to error message using `lost_tags_action()`. 
+
+::::::::::::::::::::::::::::::::::::: challenge 
+
+- Set the action for lost tags in a linelist to error as follows:
+
+
+   ``` r
+   linelist::lost_tags_action(action = "error")
+   ```
+and re-run the above code segment. 
+- What do you learn for resulting complementary message?  
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+The `{linelist}` package supplies tags for the common epidemiological variables 
+and specify them the appropriate data types. You can view this by running the 
+following command:
+
+``` r
+linelist::tags_types()
+```
+
+``` output
+$id
+[1] "numeric"   "integer"   "character"
+
+$date_onset
+[1] "integer" "numeric" "Date"    "POSIXct" "POSIXlt"
+
+$date_reporting
+[1] "integer" "numeric" "Date"    "POSIXct" "POSIXlt"
+
+$date_admission
+[1] "integer" "numeric" "Date"    "POSIXct" "POSIXlt"
+
+$date_discharge
+[1] "integer" "numeric" "Date"    "POSIXct" "POSIXlt"
+
+$date_outcome
+[1] "integer" "numeric" "Date"    "POSIXct" "POSIXlt"
+
+$date_death
+[1] "integer" "numeric" "Date"    "POSIXct" "POSIXlt"
+
+$gender
+[1] "character" "factor"   
+
+$age
+[1] "numeric" "integer"
+
+$location
+[1] "character" "factor"   
+
+$occupation
+[1] "character" "factor"   
+
+$hcw
+[1] "logical"   "integer"   "character" "factor"   
+
+$outcome
+[1] "character" "factor"   
+```
+To ensure that all tagged variables are standardized and have the correct data 
+types, use the `linelist::validate_tags()` and `linelist::validate_types()` functions, respectively, as 
+shown in the example below:
+
+```r
+linelist::validate_tags(data,
+  allow_extra = FALSE
+)
+linelist::validate_types(data,
+  ref_types = tags_types()
+)
+```
+If your dataset contains a `non-default` tag, set the argument 
+`allow_extra = TRUE` when creating the linelist object.
+
 
 ::::::::::::::::::::::::::::::::::::: keypoints 
 

diff --git a/md5sum.txt b/md5sum.txt
@@ -5,7 +5,7 @@
 "index.md" "32bc80d6f4816435cc0e01540cb2a513" "site/built/index.md" "2024-07-02"
 "links.md" "fe82d0a436c46f4b07b82684ed2cceaf" "site/built/links.md" "2024-07-02"
 "episodes/read-cases.Rmd" "b7aef81b60501065599814c0db15f512" "site/built/read-cases.md" "2024-07-02"
-"episodes/clean-data.Rmd" "f945fe9d7dd34d01c0d02805a358a872" "site/built/clean-data.md" "2024-07-09"
+"episodes/clean-data.Rmd" "2ef69b0a12062590eff29949b7102041" "site/built/clean-data.md" "2024-08-01"
 "episodes/describe-cases.Rmd" "cd9cb1c9d43eb3618e7a8a51b3748e55" "site/built/describe-cases.md" "2024-07-02"
 "instructors/instructor-notes.md" "ca3834a1b0f9e70c4702aa7a367a6bb5" "site/built/instructor-notes.md" "2024-07-02"
 "learners/reference.md" "106717912e909a7c8d9e3e8fea48e17d" "site/built/reference.md" "2024-07-02"