forked from kayleealexander/RMA-Tool
-
Notifications
You must be signed in to change notification settings - Fork 2
/
project notes
49 lines (31 loc) · 2.45 KB
/
project notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Here are our project notes from Friday, April 19 2024
Project notes from working meeting:
We continued to look at issues with harvesting OAI and convering xml to CSV
We decided to focus only on downloaded CSV and TSV files
Kaylee explored writing something that would help Rachel convert TSV files, or this is done in Open Refine? Does this need to be revisited?
We had discussions about supplementing the lexicon.
Kaylee wrote code and Rachel tested on her PC
One issue raised was the idea of searching for phrases, for example "biological male"
We also thought of supplementing the lexicon with lists of LCSH terms that were outdated, in addition to terms that were problematic.
We tried to find ways of getting automated lists of changed LCSH subject headings and did not come up with a good solution.
Rachel and Anna are coping and pasting lists of changes into Excel and adjusting in order to come up with these terms from classweb:
https://classweb.org/approved-subjects/ (Rachel to do from 2011-2016, Anna to work on 2017-2024)
At the close of day, there is a working script, with some issues - tokenizing? on parts of words causes the riot in Marriott to appear.
Need to change section of find matches for what script is looking for to resolve this
Kaylee cleared out the riots as a closing activity today! WOO!
May 1 2024
We reviewed current progress. Rachel tested GUI, it did not work
Anna prepared Aileen H Clyde sample metadata, it is in project Box folder
Kaylee notes that we can add more information about description of the project in advance of our next meeting
June 6
We are going to do CSV only
We are still considering the progress bar
Anna will review documentation, see if there is anything additonal to add about the requirement for csv files
Rachel will finalize lexicons
Kaylee will add additional column output for original context
We discussed the issues with matching against the problem LCSH lexicon
Kaylee thinks that the processing step where the metadata gets stripped of punctuation is removing the subdivisions between LCSH headings and subheadings
This might be making the matching process against the LCSH lexicon not function well.
Anna asked if we could add intermediate steps where we print out some of the behind the scenes processing steps would be useful
maybe we need a separate tool that could process digital library LCSH against a LCSH lexicon
Also maybe take out headings only from LCSH changes, not subheadings in order to pick up broad matches