R lunchs | June 4, 2019
Intro sentiment analysis/text classification
Focus on concepts rather than code
Intro sentiment analysis/text classification
Focus on concepts rather than code
Explore 3 common pitfalls, 2 methods for text classification
Based on large dictionary of words and their sentiment, derived by experts
mean stdunfair -2.1 0.83066 [-1, -3, -3, -2, -3, -1, -2, -3, -1, -2]great 3.1 0.7 [ 2, 4, 4, 4, 3, 3, 3, 3, 2, 3]hate -2.7 1.00499 [-4, -3, -4, -4, -2, -2, -2, -2, -1, -3]:'( -2.2 0.74833 [-2, -1, -2, -2, -2, -2, -4, -3, -2, -2]
Based on large dictionary of words and their sentiment, derived by experts
mean stdunfair -2.1 0.83066 [-1, -3, -3, -2, -3, -1, -2, -3, -1, -2]great 3.1 0.7 [ 2, 4, 4, 4, 3, 3, 3, 3, 2, 3]hate -2.7 1.00499 [-4, -3, -4, -4, -2, -2, -2, -2, -1, -3]:'( -2.2 0.74833 [-2, -1, -2, -2, -2, -2, -4, -3, -2, -2]
if (!require("pacman")) install.packages("pacman")pacman::p_load(sentimentr, dplyr, magrittr)mytext <- c('do you like it? But I hate really bad dogs','I am the best friend.','Do you really like it? I\'m not a fan')mytext <- get_sentences(mytext)sentiment_by(mytext)## element_id word_count sd ave_sentiment## 1: 1 10 1.497465 -0.8088680## 2: 2 5 NA 0.5813777## 3: 3 9 0.284605 0.2196345
SentimentR
package\((-1)^{n}\)
, with \(n\)
as the number of negatorsSentimentR
package\((-1)^{n}\)
, with \(n\)
as the number of negatorsBack, Mitja D., et al., SAGE Psychological Science, 2010
Of the 16,624 instances of anger words, 5,974 (35.9%) was the word critical
Of the 16,624 instances of anger words, 5,974 (35.9%) was the word critical
Reboot NT machine [name] in cabinet [name] at [location]:CRITICAL:[date and time].
Of the 16,624 instances of anger words, 5,974 (35.9%) was the word critical
Reboot NT machine [name] in cabinet [name] at [location]:CRITICAL:[date and time].
Pury, SAGE Psychological Science, 2011
USA today's Twitter Election meter
From the SemEval 2016 stance dataset
Example | Target | Stance | Sentiment |
---|---|---|---|
If abortion is not wrong, then nothing is wrong. Powerful words from Blessed Mother Teresa. |
Abortion | Against | 0.49 |
From the SemEval 2016 stance dataset
Example | Target | Stance | Sentiment |
---|---|---|---|
If abortion is not wrong, then nothing is wrong. Powerful words from Blessed Mother Teresa. |
Abortion | Against | 0.49 |
8 years ago today my son was taken from me. If there's a god, fuck you, fuck you very much. |
Atheism | Favor | -0.48 |
From the SemEval 2016 stance dataset
Example | Target | Stance | Sentiment |
---|---|---|---|
If abortion is not wrong, then nothing is wrong. Powerful words from Blessed Mother Teresa. |
Abortion | Against | 0.49 |
8 years ago today my son was taken from me. If there's a god, fuck you, fuck you very much. |
Atheism | Favor | -0.48 |
Hillary Clinton has some strengths and some weaknesses. |
Hillary Clinton | Neutral | -0.35 |
From the SemEval 2016 stance dataset
Example | Target | Stance | Sentiment |
---|---|---|---|
If abortion is not wrong, then nothing is wrong. Powerful words from Blessed Mother Teresa. |
Abortion | Against | 0.49 |
8 years ago today my son was taken from me. If there's a god, fuck you, fuck you very much. |
Atheism | Favor | -0.48 |
Hillary Clinton has some strengths and some weaknesses. |
Hillary Clinton | Neutral | -0.35 |
Benghazi must be answered for #Jeb16 |
Hillary Clinton | Against | 0 |
Given that we have some annotated data, we can design our own algorithm
Given that we have some annotated data, we can design our own algorithm
Problem: We are lazy :)
A. Joulin et. al, Bag of Tricks for Efficient Text Classification (2017)
FastText.zip
)Each word is split into character n-grams. Example for n=3:
where = ['<wh', 'whe', 'her', 'ere', 're>', '<where>']
Large improvements for languages like German, Turkish and Arabic
Oldest tree in the world (9,500 years old, Sweden)
Source: Wikimedia
Source: Google Compute Vision API
Now Can you write a sentence in English the way it would look now, one hundred years ago, and five hundred years ago?
400 years ago Canst thou write a sentence in English, in the wise that it looketh now, and look’d an hundred years past, and yet five hundred years past?
500 years ago Canst thou writ a sentence yn Englissh, yn such wise as yt seemeth nou, and seemed an hundred yeeres gonne, and fiue hundred yeeres gonne?
600 years ago Canst thou wryt ane sentence yn Englische yn this wise: that as yt semeth nouwe, and hath semed an hundred yeres ygonne, and fiue hundred yeres ygonne?
1000 years ago Meaht þu writan anne cwide in Ænglisc þus, swa he nu biþ, ond swa he hundred geara ær wæs, ond swa fif hundred geara ær?
... and worst of all:
... and worst of all:
One book I specifically recommend on the topic:
Bit by bit: Social Research in the Digital Age
(free to read online)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |