Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
okanbulut committed Feb 4, 2024
1 parent 62a3a3f commit f58e85e
Show file tree
Hide file tree
Showing 8 changed files with 81 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .Rproj.user/49462073/pcs/source-pane.pper
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"activeTab": 0
"activeTab": 1
}
9 changes: 5 additions & 4 deletions .Rproj.user/49462073/rmd-outputs
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
~/GitHub/blog/_posts/2022-05-21-effective-feature-selection-using-mlr3/effective-feature-selection-using-mlr3.html
~/GitHub/blog/_posts/2024-01-02-psychometric-network-analysis-using-r/psychometric-network-analysis-using-r.html
~/GitHub/blog/_posts/2024-01-04-introduction-to-psychometric-network-analysis/introduction-to-psychometric-network-analysis.html
C:/Users/Okan/Documents/GitHub/blog/docs/index.html
~/GitHub/blog/_posts/2024-01-21-lexicon-based-sentiment-analysis-using-r/lexicon-based-sentiment-analysis-using-r.html
~/GitHub/blog/_posts/2024-01-21-lexicon-based-sentiment-analysis-using-r/lexicon-based-sentiment-analysis-using-r.html
~/GitHub/blog/_posts/2024-01-21-lexicon-based-sentiment-analysis-using-r/lexicon-based-sentiment-analysis-using-r.html
~/GitHub/blog/_posts/2024-01-21-lexicon-based-sentiment-analysis-using-r/lexicon-based-sentiment-analysis-using-r.html
~/GitHub/blog/_posts/2024-01-21-lexicon-based-sentiment-analysis-using-r/lexicon-based-sentiment-analysis-using-r.html



Expand Down
1 change: 1 addition & 0 deletions .Rproj.user/49462073/sources/prop/INDEX
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ C%3A%2FUsers%2FOkan%2FDesktop%2Ftext-vectorization-using-python-tf-idf.Rmd="9AB1
C%3A%2FUsers%2FOkan%2FDesktop%2Fvisualizing-machine-learning-models2.Rmd="E6BD37EB"
D%3A%2FDropbox%2FConferences%2FCSSE%202019%2FWorkshop%2FVisualization%20example%2Fmissing%20with%20naniar.Rmd="9F1A80D6"
D%3A%2FDropbox%2FConferences%2FCSSE%202019%2FWorkshop%2FVisualization%20example%2Fmosaic_plot.R="E607CAE7"
D%3A%2FDropbox%2FResearch%20Partnerships%2FResearch%20with%20Cheryl%2FData%20Analysis%2FSentiment%20analysis_wave1.R="CE4FA1ED"
D%3A%2FDropbox%2FTeaching%2FEDPY%20607%20Measurement%20Theory%20II%2FWinter%202020%2FLecture%20Slides%2FWeek%2011%2FR%20Script%20-%20Week%2011.R="7F0256B2"
D%3A%2FDropbox%2FTeaching%2FEDPY%20607%20Measurement%20Theory%20II%2FWinter%202020%2FLecture%20Slides%2FWeek%2012%2FR%20Script%20-%20Week%2012.R="C6F15635"
D%3A%2FDropbox%2FTeaching%2FEDPY%20607%20Measurement%20Theory%20II%2FWinter%202020%2FLecture%20Slides%2FWeek%205%2FR%20Script%20-%20Week%205.R="26A72944"
Expand Down
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,20 @@ suppressWarnings({

## Introduction

XXXX
During the COVID-19 pandemic, I was looking for a new statistical technique to learn so that I could keep myself distracted a bit. Among the several techniques I reviewed, those related to natural language processing (NLP) were the most interesting ones. So, I decided to choose one technique from this field and learn more about it. This was sentiment analysis, also referred to as opinion mining in the literature). Sentiment analysis allows researchers to extract and interpret emotions expressed towards a particular subject in a written text. Using sentiment analysis, one can determine the direction (i.e., positive or negative), type, and strength of sentiments expressed in any form of text (e.g., documents, customer reviews, and social media posts).

During the COVID-19 pandemic, my colleagues and I published several papers utilizing lexicon-based sentiment analysis [@sentimentbulut; @sentimentpoth].
During the COVID-19 pandemic, several researchers utilized sentiment analysis to analyze public reaction to news and updates on COVID-19 (e.g., user posts on social media platforms such as Twitter, YouTube, and Instagram). My colleagues and I decided to extend this work to daily announcements made by public health officials. In Alberta, Dr. Deena Hinshaw, Alberta's chief medical officer of health, provided [daily updates on the province's response to the ongoing COVID-19 pandemic](https://www.youtube.com/watch?v=fvw_USRfXgY). By analyzing these public health announcements, we wanted to examine whether Alberta was able to follow effective communication strategies during a complex public health emergency [@sentimentbulut; @sentimentpoth].

In this post, my goal is to demonstrate how to conduct sentiment analysis using R. Here I will focus on a specific type of sentiment analysis known as "lexicon-based sentiment analysis" (see the next section for more details). I will demonstrate some examples of lexicon-based sentiment analysis that we used in our publications. In future posts, I plan to dive into more advanced forms of sentiment analysis using [state-of-the-art pretrained models on Hugging Face](https://huggingface.co/docs/transformers/en/index).


## Lexicon-Based Sentiment Analysis

When I first started digging into the world of sentiment analysis, I noticed that the most common way of extracting sentiments was lexicon-based sentiment analysis. This technique relies on a particular lexicon (i.e., the vocabulary of a language or subject) to identify the direction and magnitude of sentiments expressed in a text. Some lexicons, such as the Bing lexicon [@hu2004mining], can help categorize the words as either positive or negative. There are also lexicons that provide more descriptive labels for the sentiments. For example, the NRC Emotion Lexicon [@mohammad2013crowdsourcing] can help categorize each word based on sentiments (positive and negative) and Plutchik’s [@plutchik1980general] psych evolutionary theory of basic emotions (i.e., anger, fear, anticipation, trust, surprise, sadness, joy, and disgust).

In essence, lexicon-based sentiment analysis is all about matching the words in a piece of text with the words included in general-purpose lexicons such as Bing and NRC. Once each word is matched to a particular sentiment (e.g., positive vs. negative), we can calculate the total sentiment score by summing up the individual sentiment scores for each word in the text. For example, assume that a text has 50 positive and 30 negative words as identified by the Bing lexicon. The total sentiment score of this text would be $(50 - 30) = 20$. Since this is a positive value, we can conclude that mostly positive sentiments were expressed in this particular text (a negative value would tell us the opposite).

Performing sentiment analysis using R is quite fun (but also very tricky sometimes...). When I was analyzing the public health announcements in terms of sentiments, I benefited a lot from Julia Silge and David Robinson's great book, [Text Mining with R](https://www.tidytextmining.com/). The book has [a chapter on sentiment analysis](https://www.tidytextmining.com/sentiment) where the authors demonstrate how to conduct sentiment analysis using general-purpose lexicons such as Bing and NRC. As Julia and David also emphasize in their book, a major limitation of lexicon-based sentiment analysis is that the unit of analysis is unigrams (i.e., single words). That is, these methods do not consider qualifiers before a word. For example, "not" when saying "not true" is a negation word; however, sentiment analysis would process this phrase as two separate words, "not" and "true", and ignore the negation.

## Example

Expand Down
Loading

0 comments on commit f58e85e

Please sign in to comment.