Skip to content

Latest commit

 

History

History
51 lines (44 loc) · 2.04 KB

features.md

File metadata and controls

51 lines (44 loc) · 2.04 KB

Morphological features

  • The existence of elongated tokens (e.g., “baaad”)
  • The number of elongated tokens
  • The existence of date references
  • The existence of time references
  • The number of tokens that contain only upper case letters
  • The number of tokens that contain both upper and lower case letters
  • The number of tokens that start with an upper case letter
  • The number of exclamation marks
  • The number of question marks
  • The sum of exclamation and question marks
  • The number of tokens containing only exclamation marks
  • The number of tokens containing only question marks
  • The number of tokens containing only exclamation or question marks
  • The number of tokens containing only ellipsis(...)
  • The existence of a subjective (i.e., positive or negative) emoticon at the message’s end
  • The existence of an ellipsis and a link at the message’s end
  • The existence of an exclamation mark at the message’s end
  • The existence of a question mark at the message’s end
  • The existence of a question or an exclamation mark at the message’s end
  • The existence of slang

POS based features

  • The number of adjectives
  • The number of adverbs
  • The number of interjections
  • The number of verbs
  • The number of nouns
  • The number of proper nouns
  • The number of urls
  • The number of subjective emoticons
  • The average, maximum and minimum F1 scores of the message’s POS unigrams for the positive , negative and the neutral class
  • The average, maximum and minimum F1 scores of the message’s POS bigram for the positive , negative and the neutral class
  • The average, maximum and minimum F1 scores of the message’s POS trigram for the positive , negative and the neutral class

Sentiment lexicon based features

  • Sum of scores
  • Maximum of scores
  • Minimum of scores
  • Average of scores
  • The count of words with scores
  • The score of the last word of the message that appears in the lexicon
  • The score of the last word of the message

Miscellaneous features

  • Negation
  • Carnegie Mellon University’s Twitter clusters(938 features)