Skip to content

Commit

Permalink
Add support for handling speaker markers in Apple generated VTT files
Browse files Browse the repository at this point in the history
  • Loading branch information
stevencrader committed Jan 27, 2024
1 parent b92ae1e commit 87cebc9
Show file tree
Hide file tree
Showing 8 changed files with 521 additions and 20 deletions.
3 changes: 1 addition & 2 deletions .idea/runConfigurations/All_Tests.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,8 @@ Transcripts used for testing are excerpts from the following shows.
- [How to Start a Podcast](https://feeds.buzzsprout.com/1/2562823/)
- how_to_start_a_podcast.json
- how_to_start_a_podcast.html
- [Podnews Daily (2024-01-25)](https://podnews.net/update/nz-podcast-summit-2024)
- podnews_daily_2024-01-25.vtt
- [Podnews Weekly Review (2023-03-17)](https://feeds.buzzsprout.com/1538779/12458004/)
- podnews_weekly_review_2023-03-17.html
- [Podnews Weekly Review (2023-05-05)](https://feeds.buzzsprout.com/1538779/12782529/)
Expand Down
7 changes: 6 additions & 1 deletion src/speaker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
* Regular expression for detecting speaker in string
*/
const PATTERN_SPEAKER = /^(?<speaker>[a-z].+?): (?<body>.*)/i
const PATTERN_SPEAKER_2 = /^<v (?<speaker>\w+?)>(?<body>.*)/i

/**
* Attempt to extract the speaker's name from the data.
Expand All @@ -14,10 +15,14 @@ const PATTERN_SPEAKER = /^(?<speaker>[a-z].+?): (?<body>.*)/i
export const parseSpeaker = (data: string): { speaker: string; message: string } => {
let speaker = ""
let message = data.trimStart()
const speakerMatch = PATTERN_SPEAKER.exec(data)
let speakerMatch = PATTERN_SPEAKER.exec(data)
if (speakerMatch === null) {
speakerMatch = PATTERN_SPEAKER_2.exec(data)
}
if (speakerMatch !== null) {
speaker = speakerMatch.groups.speaker
message = speakerMatch.groups.body
}

return { speaker, message }
}
5 changes: 2 additions & 3 deletions test/json.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import { describe, expect, test } from "@jest/globals"
import { IOptions, Options, Segment } from "../src"
import { parseJSON } from "../src/formats/json"

import { readFile, saveSegmentsToFile, TestFiles } from "./test_utils"
import { readFile, TestFiles } from "./test_utils"

describe("JSON formats test", () => {
test.each<{
Expand Down Expand Up @@ -198,12 +198,11 @@ describe("Parse JSON file data", () => {
},
id: "Podnews Weekly Review 2023-06-02, extra space",
},
])("Parse JSON File ($id)", ({ filePath, expectedFilePath, options, id }) => {
])("Parse JSON File ($id)", ({ filePath, expectedFilePath, options }) => {
const data = readFile(filePath)
const expectedJSONData = JSON.parse(readFile(expectedFilePath))
Options.setOptions(options)
const segments = parseJSON(data)
saveSegmentsToFile(segments, `out_json_${id}.json`)
expect(segments).toEqual(expectedJSONData.segments)
})
})
133 changes: 133 additions & 0 deletions test/test_files/podnews_daily_2024-01-25.vtt
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
WEBVTT
0.860 --> 7.380
<v SPEAKER_1>Podcasting is coming back to New Zealand again, the latest from podnews.net with Hindenburg Pro's auto levels.

10.100 --> 13.480
<v SPEAKER_1>The New Zealand Podcasting Summit 2024 has been announced.

13.500 --> 15.620
<v SPEAKER_1>It's on May the 11th in Auckland.

15.840 --> 25.560
<v SPEAKER_1>The speaker list includes a number of podcasters from across Outer Roa, including MediaWorks, NZMe, RNZ and Stuff Audio, and a number of independent producers.

26.080 --> 28.520
<v SPEAKER_1>I spoke there last year, and I'd recommend it.

28.580 --> 33.680
<v SPEAKER_1>Tickets are available in a link you'll find in our show notes and our newsletter at podnews.net.

34.480 --> 38.520
<v SPEAKER_1>Good luck to those going to Podfest, which kicks off today in Orlando and Florida.

38.540 --> 39.400
<v SPEAKER_1>It's a great event.

39.720 --> 41.840
<v SPEAKER_1>Want to check out a new podcast app?

41.860 --> 44.720
<v SPEAKER_1>Fountain has today launched on Product Hunt.

44.740 --> 46.080
<v SPEAKER_1>You should go and take a peek.

46.280 --> 49.520
<v SPEAKER_1>There's a link to that in our show notes and our newsletter at podnews.net.

50.240 --> 55.980
<v SPEAKER_1>There's a new book out called Podcasting in a Platform Age, written by John L Sullivan, published today.

56.440 --> 1:01.080
<v SPEAKER_1>What happens when money pours into a medium which has historically been more grassroots?

1:01.460 --> 1:11.360
<v SPEAKER_1>The book details the efforts of industry players to transform podcasting into a profitable medium, which are beginning to challenge the very definition of podcasting itself.

1:12.260 --> 1:20.120
<v SPEAKER_1>Acast podcast ads in Ireland have delivered 7 million euro to podcasters since the end of 2019, according to the company today.

1:20.540 --> 1:25.460
<v SPEAKER_1>Mia Lobel, formerly at Pushkin Industries, is now running a podcast consultancy.

1:25.480 --> 1:32.400
<v SPEAKER_1>She writes, Willie D won't sell his podcast despite multi-million dollar offers almost weekly.

1:32.680 --> 1:36.060
<v SPEAKER_1>According to Hip Hop DX, he's the host of Willie D Live.

1:36.360 --> 1:42.700
<v SPEAKER_1>The show is predominantly on YouTube and most of his shows have got fewer than 10,000 views.

1:43.220 --> 1:45.580
<v SPEAKER_1>And in Stockholm, it's all going on.

1:45.660 --> 1:51.360
<v SPEAKER_1>We've linked to a story from Expressen, a news website in the newsletter.

1:51.640 --> 1:56.040
<v SPEAKER_1>We've just left the Swedish words, but here we'll translate them.

1:56.440 --> 2:00.660
<v SPEAKER_1>Man in suit throws poo at Spotify's front door.

2:01.460 --> 2:03.500
<v SPEAKER_1>Perhaps he couldn't find the unsubscribe button.

2:04.320 --> 2:09.680
<v SPEAKER_1>And in podcast news, anyone who tells you women don't need financial advice specifically for them is wrong.

2:09.760 --> 2:22.520
<v SPEAKER_1>According to Her Money with Jean Chatsky, she takes every audience of women through the steps they need to take today to live comfortably and worry free tomorrow with the latest research, expert tips and personal advice.

2:23.200 --> 2:25.700
<v SPEAKER_1>Song Exploder is celebrating its 10th anniversary.

2:25.720 --> 2:34.860
<v SPEAKER_1>PRX says the show will have a number of live appearances, including an Evolutions by Podcast movement in March and with the Toronto Symphony Orchestra in April.

2:34.980 --> 2:38.020
<v SPEAKER_1>We can also expect a new version of the first ever show.

2:38.940 --> 2:48.640
<v SPEAKER_1>For the love of rugby is new from Crowdsports today, fronted by England's most capped men's player Ben Youngs and current England international Dan Cole.

2:49.040 --> 2:59.940
<v SPEAKER_1>The Olympics this year will mean a lot of new shows like Tom Dean Medal Machine, new from Global today, following the career of swimmer Tom Dean, who is attempting a record breaking five medal haul.

3:00.300 --> 3:08.040
<v SPEAKER_1>And Mind the Business, small business success stories is back for a new season from Intuit QuickBooks and iHeart Media.

3:08.780 --> 3:11.780
<v SPEAKER_1>Traders for Selected Shows are in our new podcast, Traders Podcast.

3:11.800 --> 3:13.520
<v SPEAKER_1>You'll find that wherever you get your podcasts.

3:13.880 --> 3:16.740
<v SPEAKER_1>And this podcast is sponsored by Hindenburg Pro.

3:17.000 --> 3:19.320
<v SPEAKER_1>Still setting your audio levels clip by clip.

3:19.900 --> 3:21.220
<v SPEAKER_1>Oh, the humanity.

3:21.240 --> 3:23.520
<v SPEAKER_1>Hindenburg Pro, see what we did there.

3:23.600 --> 3:26.500
<v SPEAKER_1>Hindenburg Pro takes care of all of that for you.

3:26.520 --> 3:33.020
<v SPEAKER_1>So you can get on with telling your story, setting your audio levels is an excellent thing that Hindenburg Pro does.

3:33.380 --> 3:40.300
<v SPEAKER_1>And today you can get a special 90 day trial and 30% off by following the links at podnews.net/latest.

3:40.700 --> 3:44.080
<v SPEAKER_1>And that's the latest from our newsletter to read all the stories and subscribe.

3:44.100 --> 3:45.620
<v SPEAKER_1>We're at podnews.net.
Loading

0 comments on commit 87cebc9

Please sign in to comment.