Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DF Functions to Reflect AUT PR #379 #23

Closed
11 tasks
ianmilligan1 opened this issue Nov 17, 2019 · 4 comments
Closed
11 tasks

Update DF Functions to Reflect AUT PR #379 #23

ianmilligan1 opened this issue Nov 17, 2019 · 4 comments
Assignees

Comments

@ianmilligan1
Copy link
Member

ianmilligan1 commented Nov 17, 2019

In AUT #349, @ruebot updated the DF functions to be consistent with Python DF functions. The docs should be updated to reflect this.

I think this is the list:

  • extractValidPages -> pages
  • extractHyperlinks -> webgraph
  • extractImages -> images
  • extractImageLinks -> imageLinks
  • extractPDFs -> pdfs
  • extractAudio -> audio
  • extractVideo -> videos
  • extractSpreadsheets -> spreadsheets
  • extractPresentationProgram -> presentationProgramFiles
  • extractWordProcessor -> wordProcessorFiles
  • extractTextFiles -> textFiles
@ruebot
Copy link
Member

ruebot commented Nov 17, 2019

last one should be textFiles, and video should be videos

@ianmilligan1
Copy link
Member Author

Fixed, thx @ruebot

@ruebot
Copy link
Member

ruebot commented Nov 18, 2019

Here is what we need to update in current based off of #21 (@lintool we need you to do your review on that so we can move forward here please.)

extractValidPages

$ ag -R extractValidPages current 
current/collection-analysis.md
35:RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc).extractValidPagesDF()
78:RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc).extractValidPagesDF()

current/index.md
22:  .extractValidPagesDF()
34:  .extractValidPagesDF()

current/text-analysis.md
69:  .extractValidPagesDF()

current/link-analysis.md
333:          .extractValidPagesDF()

current/df-results.md

extractHyperlinks

$ ag -R extractHyperlinks current
current/index.md
61:  .extractHyperlinksDF()
73:  .extractHyperlinksDF()

extractImage

$ ag -R extractImage current
current/index.md
118:val df = RecordLoader.loadArchives("example.arc.gz", sc).extractImageDetailsDF();
143:val df = RecordLoader.loadArchives("example.arc.gz", sc).extractImageDetailsDF();
187:- Images: `extractImageDetailsDF()`

current/image-analysis.md
25:  .extractImageDetailsDF();
85:  .extractImageDetailsDF();
186:  .extractImageLinksDF();

extractPDF

$ ag -R extractPDF current  
current/index.md
188:- PDFs: `extractPDFDetailsDF()`
224:val df = RecordLoader.loadArchives("example.arc.gz", sc).extractPDFDetailsDF();

current/binary-analysis.md
157:  .extractPDFDetailsDF();
214:  .extractPDFDetailsDF();

extractAudio

$ ag -R extractAudio current
current/binary-analysis.md
30:  .extractAudioDetailsDF();
82:  .extractAudioDetailsDF();

current/index.md
186:- Audio: `extractAudioDetailsDF()`

extractVideo

$ ag -R extractVideo current
current/index.md
192:- Videos: `extractVideoDetailsDF()`
201:val df = RecordLoader.loadArchives("example.media.warc.gz", sc).extractVideoDetailsDF();

current/binary-analysis.md
685:  .extractVideoDetailsDF();
742:  .extractVideoDetailsDF();

`extractSpreadsheet

$ ag -R extractSpreadsheet current 
current/binary-analysis.md
421:  .extractSpreadsheetDetailsDF();
478:  .extractSpreadsheetDetailsDF();

current/index.md
190:- Spreadsheets: `extractSpreadsheetDetailsDF()`

extractPresentationProgram

$ ag -R extractPresentationProgram current
current/index.md
189:- Presentation program files: `extractPresentationProgramDetailsDF()`

current/binary-analysis.md
289:  .extractPresentationProgramDetailsDF();
346:  .extractPresentationProgramDetailsDF();

extractWordProcessor

$ ag -R extractWordProcessor current      
current/index.md
193:- Word processor files: `extractWordProcessorDetailsDF()`

current/binary-analysis.md
817:  .extractWordProcessorDetailsDF();
874:  .extractWordProcessorDetailsDF();

extractTextFile

$ ag -R extractTextFile current     
current/index.md
191:- Text files: `extractTextFilesDetailsDF()`

current/binary-analysis.md
553:  .extractTextFilesDetailsDF();
610:  .extractTextFilesDetailsDF();

@ianmilligan1
Copy link
Member Author

ianmilligan1 commented Feb 3, 2020

Closed with #40, commit 536f277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants