Download - Mid-Term Report of my research project
Mid-term report of the projectDIPF Frankfurt, July 18
Supervisor: Dr. Ivan HabernalStudent: Anil Narassiguin
Identification of Argumentative Texts in User-Generated Content
on Educational Controversies
Persuasion The act of persuading or seeking to persuade... Persuade 1- To prevail on (a person) to do something, as by advising or urging. 2- To induce to believe by appealing to reason or understanding.
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | Anil Narassiguin 2
Persuasive or not ?
“Purposely raising your children to be outcasts is child abuse, I don't care what kind of cult you belong to. „
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | Anil Narassiguin 3
Debate about homeschooling
Persuasive
Persuasive or not ?
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 4
I have no intention of waiting for scientists to reach agreement on this issue before I decide how I will educate my children. Leave us alone.
Debate about single sex education
Non Persuasive
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 5
Persuasive or not ?
You are a bad parent if you do not get your child the best education that is feasible for him/her under your circumstances. The notion that your child(ren)
should be used to indulge some social agenda is pathetic.
Debate about public and private schools
Not obvious...
Summary
Presentation of the task
Our approach
Problems encountered and future work
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 6
PRESENTATION OF THE TASK
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 7
Argumentation mining
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 8
ArgumentationMining
NLP Information Retrieval
MachineLearning
Philosophy
Logic
Psychology
Our project
Classification of user-generated text documents as persuasive (P1) and non-persuasive (P2).
Supervised Learning Three annotators classified manually a corpus of 990 texts as P1 or P2.
Cohen's kappa: 0.50 – 0.60
6 domains or topics such as: redshirting, prayer in schools, homeschooling, single sex education, mainstreaming and public private schools.
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 9
Our project
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 10
Cross Validation (CV) - Full CV : Cross Validation performed on all the data - In Domain CV : Cross Validation performed on texts of only one domain - Cross Domain Validation: One domain is tested on the 5 other ones.
Our data
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 11
Classredshirting prayer
in schools
homeschooling
single sex education
mainstreaming public private schools
P1 38 77 86 26 10 287
P2 30 66 138 24 19 189
Total 68 143 224 50 29 476
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 12
OUR APPROACH
Feature Engineering
Baseline features Based on the tokens, we extract the 10,000 most common 1, 2 and 3-grams.
Part of speech (POS) features Total number and Ratio of all the POS defined in DKPro.
Syntactic features - Statistics (mean, max) about the number of clauses per sentence.- Average and maximal depth of the dependency trees- Presence of a dependency rule (still need to be done)(Ref: Identifying Argumentative Discourse Structures in Persuasive Essays, Christian Stab)
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 13
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 14
Lexical features- Statistics about sentences and tokens- Length of a post, number of token per sentence and number of tokens with more than 6 letters (Ref: "Stance Classification of Ideological Debates: Data, Models, Features, and Constraints" Kazi Saidul Hasan and Vincent Ng)
- Ratio of punctuation marks and presence of multiple punctuation
Sentiment featuresIntegration of GPL Stanford Deep Learning for Sentiment Analysis in DKPro
Feature Engineering
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 15
Feature Engineering
Sentiment featuresSentiment Analysis tool output: for each sentence, 5 scores are computed. We are creating 20 features for those scores.
Classifier used
We're currently using the common classifier SVM
At the moment, we don't consider to compare the results for several classifiers
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 16
Results
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 17
Full CVBaseline 69.0 / 68.9
All the features
67.9 / 67.8
Baseline+Sentiment Analysis
68.0 / 68.0
Class redshirting prayer in schools
homeschooling single sex education
mainstreamingpublic private schools
Baseline 51.5 / 51.4
74.1 / 73.9
74.1 / 70.5
74.1 / 70.5
65.5 / 39.6
70.0 / 68.5
All the features
54.4 / 54.1
73.4 / 73.3
75.0 / 71.5
68.0 / 67.5
65.5 / 39.6
72.1 / 70.6
Baseline+Sentiment Analysis
54.4 / 54.1
73.4 / 73.3
75.0 / 71.5
68.0 / 67.5
65.5 / 39.6
72.1 / 70.6
In Domain CV
– Blue: Accuracy– Orange: F-measure
Results
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 18
Cross Domain
Class redshirting prayer in schools
homeschooling single sex education
mainstreaming public private schools
Baseline 54.4 / 54.1
53.9 / 52.6
65.6 / 61.0 64.0 / 62.5
65.5 / 61.8
66.1 / 65.9
All the features
55.9 / 55.5
53.9 / 53.0
65.1 / 61.2 62.0 / 60.1
65.5 / 61.8
65.3 / 64.9
Baseline+Sentiment Analysis
54.4 / 54.1
73.4 / 73.3
74.5 / 70.9 66.0 / 65.3
65.5 / 39.6
71.0 / 69.6
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 19
PROBLEMS ENCOUNTERED AND FUTURE WORK
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal |
Problems encountered
20
Knowledge in NLP
Getting hands on DKPro
Computation times (might solve it soon...)
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal |
Future work
Error Analysis
Feature selection
Hyper parameter optimization for SVM
Bootstrap the model with new data from debate portals
21
17/07/14 | DIPF Frankfurt | UKP Lab - Prof. Dr. Iryna Gurevych | Dr. Ivan Habernal | 22
Questions ?