Tutorials

Quick and dirty tutorials and templates

Here are some examples (go into directories for more):

Text and NLP processing in `text/`

Example (there are more): sentiment_analysis_emotion.ipynb

Machine learing in `machine_learning/`

Example (there are more): classifier.ipynb

Template files

template.ipynb
template.py

# Plotting and input/output in `plotting_and_io/`

Save static Plotly images in Colab

save_plotly_colab.ipynb

dump / load json files

load_write_json.ipynb

Rename files

brew install rename
rename 's/cleaned/16khz/' *.wav

text speech emails_from_python

Send emails in `emails/`

First create a script called email_config.py with these variables using your info:

port = 465  #For SSL. eg, 465 
smtp_server = "outgoing.mit.edu" #eg, "outgoing.mit.edu" with the "outgoing" part
from_email_actual = "username"  # actual email sent by. just the username of the above email. eg., "[email protected]"
from_email_appears = "[email protected]"  # It will appeared as sent by this email "[email protected]" or it could be the same one as from_email_actual. You need authorization to send from other emails which you can configure in your email settings. 
also_send_to = ['[email protected]', '[email protected]']  # you can Add email to receive copy or leave empty list, but note some .edu accounts cannot send to themselves. These emails will not be seen by recipient. 
cc = None # This will appear, but note some .edu accounts cannot send to themselves
testing = True #send the emails to the email specified in testing_to_email to test everything is running well and the html formatting looks right. 
testing_to_email = '[email protected]'
testing_append_subject_line = '[Test] '

email_content.py Here you define the email subject line and body of the different types of emails. Use HTML (e.g., <br> for line breaks, etc.).
Define a CSV file with emails and email_type. This will allow email_send.py to call extract certain body and subject lines from email_content.py. You will call this csv file in the argument, see below. In my example, include columns to_email, name, email_type, and prizes, but you can include whatever you need and change in email_content.py accordingly.
Run script

This will prompt you for the password of from_email_actual

python3 email_send.py --path_to_dataframe=path/to/dataframe.csv

Annotations

Annotations - Colab option

audio_annotation.ipynb is a Colab approach.

Annotations - CLI option

audio_annotation.py takes data from ./data/input/vfp_audios_16khz/ (just the first third of the speech task) dir and outputs a DF in ./data/outputs/annotations/

Obtain vfp_audios_16khz/ cannot be shared. Ask for it.

After verifying the configuration (paths, instructions, how many seconds to play) in the first few lines, run:

Example:

pip3 install playsound PyObjC pandas pyaudio
python3 annotation.py --input_dir=data/input/vfp_audios_16khz/ --output_dir=data/outputs/annotations/

Speech processing in `speech/`

Convert mp3 to other format

pip3 install pydub python3 convert_mp3.py --input_dir='data/input/ --output_dir= --output_format=wav --output_bitrate=32k

Downsample sampling rate

Most speech occurs below 8kHz. Therefore downsampling to 16kHz is enough to capture most speech-related frequencies and information (see Nyquist rate). Many algorithms require samples to be at 16kHz (for faster processing or normalization across samples) while many recordings are done at 22 or 44kHz.

sh downsample_16khz.sh

Speech Activity Detection

speech_activity_detection_pyannote.ipynb detects and plots speech and silences using Pyannote package. You can use recordings in data/input/audio_samples to test

Extract OpenSmile eGeMAPS features

extract_opensmile_features.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
cluster_bash		cluster_bash
data		data
machine_learning		machine_learning
plotting_and_io		plotting_and_io
speech		speech
text		text
.Rhistory		.Rhistory
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anaconda_miniconda.html		anaconda_miniconda.html
audio_annotation.ipynb		audio_annotation.ipynb
bash_script.ipynb		bash_script.ipynb
conda_corrupted_files		conda_corrupted_files
create_package		create_package
daylight.py		daylight.py
pandas.md		pandas.md
remove_correlated_variables.ipynb		remove_correlated_variables.ipynb
run_python.md		run_python.md
shortcuts.md		shortcuts.md
template.ipynb		template.ipynb
template.py		template.py
virtual_environment.md		virtual_environment.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tutorials

Text and NLP processing in `text/`

Machine learing in `machine_learning/`

Template files

Save static Plotly images in Colab

dump / load json files

Rename files

Send emails in `emails/`

Annotations

Annotations - Colab option

Annotations - CLI option

Speech processing in `speech/`

Convert mp3 to other format

Downsample sampling rate

Speech Activity Detection

Extract OpenSmile eGeMAPS features

About

Releases

Packages

Languages

License

danielmlow/tutorials

Folders and files

Latest commit

History

Repository files navigation

Tutorials

Text and NLP processing in text/

Machine learing in machine_learning/

Template files

Save static Plotly images in Colab

dump / load json files

Rename files

Send emails in emails/

Annotations

Annotations - Colab option

Annotations - CLI option

Speech processing in speech/

Convert mp3 to other format

Downsample sampling rate

Speech Activity Detection

Extract OpenSmile eGeMAPS features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Text and NLP processing in `text/`

Machine learing in `machine_learning/`

Send emails in `emails/`

Speech processing in `speech/`

Packages