-
Notifications
You must be signed in to change notification settings - Fork 2
/
Harmony-Walkthrough-Check-for-Matches-Between-Items-from-Different-Scales.Rmd
612 lines (482 loc) · 25 KB
/
Harmony-Walkthrough-Check-for-Matches-Between-Items-from-Different-Scales.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
---
title: "Using Harmony to Check for Matches Between Items from Different Scales"
author: "Deanna Varley"
date: "2024-11-20"
output:
html_document:
df_print: paged
---
This walkthrough/tutorial shows you how to use the harmonydata package in R to:
(1) identify all of the different versions of a scale item used across multiple
surveys/datasets.This is useful in instances where multiple surveys/datasets
to be harmonised contained the same scale, but the wording of each item in each
scale may have varied slightly across surveys - a common occurrence when
working with data from more than one study/survey.
(2) Identify the most commonly used wording of each item for each scale
used in the data we are planning to harmonise.
(3) Save the most commonly used wording of each item of each scale in a list
(one list per scale).
(4) Using the lists created, run functions from the harmonydata package to check
for matches between items from different scales (where 'matches' are items with
a high level of semantic similarity/correspondence).
(5) Save the indices of similarity between item pairs and identified matches
between items in lists, matrices or data frames, and then export the results to
csv files.
In this example, we'll be working with items from the following scales: the
GAD-7, the CES-D and the K10 (as well as variants of the K10 - the K6 and K5).
First, let's install and load the R packages we'll need, set the working
directory, then load the data file we'll be using:
```{r, warning=FALSE, message=FALSE}
#----------------------Preparation and Loading Packages-------------------------
#If you don't already have these packages installed in your R library,
#un-comment and run the below code:
#install.packages("devtools") #This package is required to install the harmonydata package
#devtools::install_github("harmonydata/harmony_r")
#install.packages("dplyr") #Will be used for data manipulation.
#install.packages(flextable) #Will be used for displaying results in formatted tables.
#install.packages("readxl") #Note: A different R package would be required to
#import datafiles not in xlsx format
#Load the required R libraries/packages:
library(devtools)
library(harmonydata)
library(dplyr)
library(flextable)
library(readxl)
harmonydata::set_url()
#--------------------Set Working Directory, Load Data---------------------------
#Set the working directory/file location R will use to find data files. E.g.:
setwd("~/Data Harmonisation Project/Content Analysis")
#Load the data and save to a dataframe object in the R environment:
dat1 <- read_xlsx("Survey Items.xlsx") #The xlsx file is a list of survey items.
#Add an ID column to the dataframe:
dat1$id<-as.character(seq(1,nrow(dat1)))
```
Note: The data loaded contains the following columns/variables:
Name (the study/survey name),
Year (the study/survey year),
Scale (the name of the scale),
Item Number (a numeric variable indicating what item number a given item is within a given scale),
Items (a text/string variable containing the text of each item from each scale),
id (a unique identifier for each row).
Now that we've loaded the R packages we'll need and our data, the first
thing we'll do is inspect our data to make sure there are no errors in the data.
Specifically, we want to check that the 'Item Number' variable in the data is
coded correctly throughout the dataset. Doing this will help us to detect any
errors in the data where items were recorded with the wrong 'Item Number' that
could interfere with the accuracy of the analyses we'll do later.
You can skip this step if you're confident that your data is recorded correctly.
In this example, we were working with a large datafile so we wanted to take this
extra precaution.
To check the data for coding errors, we'll run the below code and then
inspect the lists of items created to make sure everything looks right. In this
example, we'll be checking that all of the different versions of items from
the GAD-7 scale, the CES-D scale and the K10 scale in our data have their
'Item Number' variable coded correctly.
Let's do this for the GAD-7 items first:
``` {r}
#-----------------Inspect Data to Check for Coding Errors-----------------------
freqgad71i <- dat1 %>% #Start with dataframe 'dat1'
filter(Scale=="GAD-7" & `Item Number`==1) %>% #Filter rows where Scale is "GAD-7" and Item Number is 1
group_by(Items) %>% #Group by the 'Items' column
summarise(total=n()) #Count the number of rows in each group and store the results in a column called 'total' in an object called freqgad71i
freqgad72i<- dat1 %>%
filter(Scale=="GAD-7" & `Item Number`==2) %>%
group_by(Items) %>%
summarise(total=n())
freqgad73i<- dat1 %>%
filter(Scale=="GAD-7" & `Item Number`==3) %>%
group_by(Items) %>%
summarise(total=n())
freqgad74i<- dat1 %>%
filter(Scale=="GAD-7" & `Item Number`==4) %>%
group_by(Items) %>%
summarise(total=n())
freqgad75i<- dat1 %>%
filter(Scale=="GAD-7" & `Item Number`==5) %>%
group_by(Items) %>%
summarise(total=n())
freqgad76i<- dat1 %>%
filter(Scale=="GAD-7" & `Item Number`==6) %>%
group_by(Items) %>%
summarise(total=n())
freqgad77i<- dat1 %>%
filter(Scale=="GAD-7" & `Item Number`==7) %>%
group_by(Items) %>%
summarise(total=n())
```
Once you've created these lists, you can inspect the lists to make sure that
all of the items in the list are differently worded versions of the same item,
and that there aren't any items that are incorrectly coded.
Now let's do the same thing for all the versions of each item from the CES-D
scale.
```{r}
#-----------------Inspect Data to Check for Coding Errors-----------------------
#The below will identify the most commonly used wording of each CES-D item and
#store them in individual objects for inspection to make sure the survey items
#are all coded correctly:
freqcesd1i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==1) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd2i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==2) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd3i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==3) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd4i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==4) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd5i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==5) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd6i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==6) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd7i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==7) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd8i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==8) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd9i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==9) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd10i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==10) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd11i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==11) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd12i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==12) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd13i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==13) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd14i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==14) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd15i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==15) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd16i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==16) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd17i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==17) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd18i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==18) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd19i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==19) %>%
group_by(Items) %>%
summarise(total=n())
freqcesd20i<- dat1 %>%
filter(Scale=="CES-D" & `Item Number`==20) %>%
group_by(Items) %>%
summarise(total=n())
```
Now let's do the same thing for all the versions of each item from the K10
scale.
Note: Because items from the K6 and K5 versions of the K10 are identical or
nearly identical to the corresponding items in the K10, the K6 and K5 versions
of each K10 item will be included when creating lists of all the differently
worded versions of each item.
```{r}
#-----------------Inspect Data to Check for Coding Errors-----------------------
freqK101i <- dat1 %>%
filter(Scale == "K10" & `Item Number` == 1) %>%
group_by(Items) %>% #
summarise(total = n())
#The below code is annotated to show you how we're incorporating the items from
#K6 and K5 into this code:
freqK102i<- dat1 %>% #Start with dataframe 'dat1'
filter((Scale=="K10" & `Item Number`==2) | (Scale=="K6" & `Item Number`==1) | (Scale=="K5" & `Item Number`==1)) %>% #Filter rows where Scale is "K10" and Item Number is 2 OR where Scale is "K6" and Item Number is 1 OR where SCale is K5 and Item Number is 1.
group_by(Items) %>% #Group by the 'Items' column
summarise(total=n()) #Count the number of rows in each group and store the results in a column called 'total' in an object called freqK102i
freqK103i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==3)) %>%
group_by(Items) %>%
summarise(total=n())
freqK104i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==4) | (Scale=="K6" & `Item Number`==2) | (Scale=="K5" & `Item Number`==2)) %>%
group_by(Items) %>%
summarise(total=n())
freqK105i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==5) | (Scale=="K6" & `Item Number`==3) | (Scale=="K5" & `Item Number`==3)) %>%
group_by(Items) %>%
summarise(total=n())
freqK106i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==6)) %>%
group_by(Items) %>%
summarise(total=n())
freqK107i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==7)) %>%
group_by(Items) %>%
summarise(total=n())
freqK108i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==8) | (Scale=="K6" & `Item Number`==4) | (Scale=="K5" & `Item Number`==4)) %>%
group_by(Items) %>%
summarise(total=n())
freqK109i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==9) | (Scale=="K6" & `Item Number`==5) | (Scale=="K5" & `Item Number`==5)) %>%
group_by(Items) %>%
summarise(total=n())
freqK1010i<- dat1 %>%
filter((Scale=="K10" & `Item Number`==10) | (Scale=="K6" & `Item Number`==6)) %>%
group_by(Items) %>%
summarise(total=n())
```
If you identify any coding errors, correct them in your data before moving on.
You could make any needed corrections using functions from the dplyr R package.
Now, the next thing we'll do is to identify the most commonly used wording of
an item in instances where multiple surveys/datasets to be harmonised contained
the same scale, but the wording of each item in each scale may have varied
slightly across surveys.
(Note: For information about how to use Harmony to confirm that differently
worded versions of the same scale item can be considered to be 'matches' to
each other, see the other R tutorial available titled: "Harmony Walkthrough -
Using Harmony to Check for Correspondence Between Items Within the Same Scale").
To do this, let's create a function that will help us identify the most
commonly used wordings of each item in a scale. For this example, we'll be
identifying the most commonly used versions of items from the GAD-7, CES-D and
the K10 scales, including matching items from variants of the K10: the K6 and K5.
The function will identify the most commonly used wording for each item and then
we'll store the most common versions of each item in a list.
First, let's do this for the GAD-7:
```{r}
#--------------Identify Most Commonly Used Wording, Save in List----------------
#The below code will create a function to identify the most
#commonly used wordings of items in the GAD-7:
#First we define a function called `get_most_common_item_text` that takes two
#arguments: `scale` and `item_number`:
get_most_common_item_text <- function(scale, item_number) {
# Start with the dataframe `dat1`
dat1 %>%
# Filter the rows where the `Scale` matches the scale specified using the
#`scale` argument and the `Item Number` matches the provided `item_number`
filter(Scale == scale & `Item Number` == item_number) %>%
# Group the remaining rows by the `Items` column
group_by(Items) %>%
# Summarize the data by counting the number of occurrences for each unique value in the `Items` column
summarise(total = n(), .groups = 'drop') %>%
# Arrange the results in descending order of the `total` column (most frequent items first)
arrange(desc(total)) %>%
# Take the first row of the sorted data (i.e., the most common item)
slice(1) %>%
# Extract (pull) the value from the `Items` column of the selected row
pull(Items)
}
#Second let's create and populate a list that will store the most commonly used
#version of each item from the GAD-7:
GAD7 <- list(
instrument_name = "GAD-7", #We save the name of the instrument being processed in the list
questions = lapply(1:7, function(i) { #We use lapply to iterate over the 7 items of the GAD-7 scale
list( #For each item, we create a list with its question number and text
question_no = paste0("GAD7_", i), #We save a label for each question number in the list, e.g., "GAD7_1", "GAD7_2", etc.
question_text = get_most_common_item_text( #We call the function to get the most common wording of item 'i'
scale = "GAD-7", #Use the scale argument in the get_most_common_item_text function to specify the scale being processed ("GAD-7") - note that this must match the name of the instrument as it's recorded in your datafile.
item_number = i #Use the item number argument to specify the current item number (iterating through items 1 to 7, in this case). Here, we use 'i' as we previously specified 'i' as the placeholder indicating what item number is currently be iterated over using the lapply function.
)
)
})
)
```
Next, we'll do the same thing for the CES-D using the function we just created.
```{r}
#--------------Identify Most Commonly Used Wording, Save in List----------------
#The code below uses the get_most_common_item_text function and creates and
#populates a list that will store the most commonly used version of each item
#from the CES-D:
CESD <- list(
instrument_name = "CES-D", #We save the name of the instrument being processed in the list
questions = lapply(1:20, function(i) { #We use lapply to iterate over the 20 items of the CES-D scale
list( #For each item, we create a list with its question number and text
question_no = paste0("CESD_", i), #We save a label for each question number in the list, e.g., "CESD_1", "CESD_2", etc.
question_text = get_most_common_item_text( #We call the function to get the most common wording of item 'i'
scale = "CES-D", #Use the scale argument in the get_most_common_item_text function to specify the scale being processed ("CES-D") - note that this must match the name of the instrument as it's recorded in your datafile.
item_number = i #Use the item number argument to specify the current item number (iterating through items 1 to 20, in this case)
)
)
})
)
```
Next, we'll do the same thing for the K10 (and its variants: the K6 and K5).
However, we'll now create a new and slightly different function to identify the
most commonly used wording of each item from the K10, K6 and K5 as this function
needs to work a bit differently from the one we used for the GAD-7 and the CES-D.
This is because we now want the function to be able to identify the most
commonly used wording of each item from the K10, as well as the corresponding
items from the K6 and the K5.
Here's how we can update the function to account for this:
``` {r}
#--------------Identify Most Commonly Used Wording, Save in List----------------
#First let's create the function. This time the code defines a custom function,
#called get_most_common_item_text_K10, which is designed to retrieve the most
#common text for a specific item number within the K10, K6, or K5 scales.
#It finds the most common text for the specified item across the relevant scales,
#accounting for the mappings between K10, K6, and K5 items..
get_most_common_item_text_K10 <- function(item_number) {
dat1 %>%
filter(
(Scale == "K10" & `Item Number` == item_number) |
(Scale == "K6" & `Item Number` == ifelse(item_number == 2, 1, # K10_2 -> K6_1
ifelse(item_number == 4, 2, # K10_4 -> K6_2
ifelse(item_number == 5, 3, # K10_5 -> K6_3
ifelse(item_number == 8, 4, # K10_8 -> K6_4
ifelse(item_number == 9, 5, # K10_9 -> K6_5
ifelse(item_number == 10, 6, #K10_10 -> K6_6
NA))))))) |
(Scale == "K5" & `Item Number` == ifelse(item_number == 2, 1, # K10_2 -> K5_1
ifelse(item_number == 4, 2, # K10_4 -> K5_2
ifelse(item_number == 5, 3, # K10_5 -> K5_3
ifelse(item_number == 8, 4, # K10_8 -> K5_4
ifelse(item_number == 9, 5, # K10_9 -> K5_5
NA))))))) %>%
group_by(Items) %>%
summarise(total = n(), .groups = 'drop') %>%
arrange(desc(total)) %>%
slice(1) %>%
pull(Items)
}
#Now let's create and populate a list that will store the most commonly used
#version of each item from the K10 (and the K6 and K5).
K10 <- list(
instrument_name = "K10",
questions = lapply(1:10, function(i) {
list(
question_no = paste0("K10_", i),
question_text = get_most_common_item_text_K10(i)
)
})
)
```
Now that we've identified the most commonly used wording of each item for each
scale of interest to us and stored those items in lists, we can move on to using
functions from the harmonydata package to check for matches between items in
each scale.
To do this, we can use the code below:
```{r}
#-----------------Harmony Tests for Matches Between Scales----------------------
#In this example, we're only checking for matches between the GAD-7, CES-D and
#the K10 (and its variants), but you could adapt code shown below to
#quickly check for matches between more than three scales as well.
#First, we define a list of the scales we want to use to check for matches
#with another scale we'll specify later.
#Here, be sure to make sure that you type out the name of the scale exactly
#the same way as you did when making the lists of the most commonly worded
#versions of each item in each scale.
scales_list <- c("CESD", "K10")
#Note in the above code that we left out one of the three scales - this is the
#scale that will be compared to the other two scales.
#Next, we tell R what our output directory is (where we want it to save exported
#files to). Remember to update this to wherever you would like to save your files.
output_directory <- "~/Data Harmonisation Project/Content Analysis"
#Now, we'll create a loop that will check for matches between the scale we specify
#and all of the other scales we specified in our list of scales (`scales_list`).
#Because we want the results produced by the code in the loop to be saved for
#each scale we process through the loop, we first initialize an empty named
#list that will be used to store the variable pairs for each scale:
all_variable_pairs <- list()
#Now we use the below code to create and run the loop:
for (scale in scales_list) {
instruments_list <- list(GAD7, get(scale)) #Here, we use get() to retrieve the actual scale object
match = match_instruments(instruments_list) #Here, we use the match_instruments function from harmonydata to check for matches
df <- data.frame(match$matches[[1]]) #Here, we create a data frame from the matches
for (x in 1:length(match$matches)) {
df[x, ] = match$matches[[x]]
}
#Then we set the column and row names in the data frame:
colnames(df) <- lapply(match$questions, function(x) paste(x$question_no, x$question_text , sep=" "))
rownames(df) <- lapply(match$questions, function(x) paste(x$question_no, x$question_text , sep=" "))
#Then convert the dataframe to a matrix
matrix_df <- as.matrix(df)
#Then we create a logical matrix to identify matches over the threshold we've decided to use to identify what is/isn't a 'match'.
logical_matrix <- (matrix_df > 0.70) | (matrix_df < -0.70)
#Then we create a subset matrix with NA for non-matches
subset_matrix <- matrix_df
subset_matrix[!logical_matrix] <- NA
subset_matrix <- subset_matrix[rowSums(!is.na(subset_matrix)) > 0, colSums(!is.na(subset_matrix)) > 0]
#Now we get the scale name for file naming
scale_name <- as.character(scale) #We convert the scale name to a character variable for file naming
#Now we write the matches to a CSV file. Remember to update the file name to something appropriate:
write.csv(subset_matrix, file.path(output_directory, paste0("matches_gad7_", scale_name, ".csv")), row.names = TRUE)
#We find variable pairs that have matching indices that indicate a match:
matching_indices <- which(logical_matrix, arr.ind = TRUE)
#We create a data frame for the variable pairs:
variable_pairs <- data.frame(
Var1 = rownames(matrix_df)[matching_indices[, 1]],
Var2 = colnames(matrix_df)[matching_indices[, 2]],
Value = matrix_df[matching_indices]
)
#And then we remove self-pairs (where a match has been identified between a given item and itself)
variable_pairs <- variable_pairs[variable_pairs$Var1 != variable_pairs$Var2, ]
#Then we save the variable pairs to the all_variable_pairs list using the scale name as the key
all_variable_pairs[[paste0("variable_pairs_", scale_name)]] <- variable_pairs
#Then we write the variable pairs to a CSV file. Remember to update the file name to something appropriate:
write.csv(variable_pairs, file.path(output_directory, paste0("variable_pairs_gad7_", scale_name, ".csv")), row.names = FALSE)
}
#After running the loop, `all_variable_pairs` will contain a set of data frames,
#with one data frame for each scale that was used to check for matches with the
#GAD-7.
```
You can also use R to display your results in a formatted table.
```{r, echo=TRUE}
#-------------------Display Results in Formatted Tables-------------------------
#The below code will display the results in formatted tables.
#First we set the default settings for tables created with the flextable package:
set_flextable_defaults(
font.family = "Times New Roman", font.size = 12,
border.color = "black", big.mark = "", border.width = .5)
#To create tables for each set of results we stored in the list called
# all_variable_pairs, we'll use another loop that will iterate over each set of
#results and create a formatted table for each set of results.
#To do this, we first initialize an empty named list that will be used to store
#the tables:
tables_list <- list()
for (name in names(all_variable_pairs)) {
#Get the current variable pairs dataframe
variable_pairs <- all_variable_pairs[[name]]
#Rename columns in the dataframes as desired to what you want the column headers to say in the formatted table:
variable_pairs <- rename(variable_pairs,
`Item 1` = Var1,
`Item 2` = Var2,
`Similarity Score` = Value)
#Create the table:
table <- flextable(variable_pairs) %>%
bold(part = "header") %>%
autofit() %>%
set_table_properties(width = 1.0, layout = "autofit") %>%
set_caption(caption = "Variable Pairs and Their Similarity Scores")
#Store the table in the list
tables_list[[name]] <- table
}
#You can then use the below code to export the tables to a word doc format:
for (name in names(tables_list)) {
save_as_docx(tables_list[[name]], path = file.path(output_directory, paste0(name, ".docx")))
}
#You can also display each table in R (in the viewer or within an RMarkdown/RNotebook file)
tables_list$variable_pairs_K10 #For variable pairs between the GAD-7 and K10
tables_list$variable_pairs_CESD #For variable pairs between the GAD-7 and CES-D
```
```{r}
```