grammar%profile%for% spokenlearnerdatagrammar%profiles% extracngcharacteriscs: a2vsb1 rank...

36
Grammar Profile for Spoken Learner Data By Brendan Flanagan 1 , Emiko Kaneko 2 , Emi Izumi 3 , Sachio Hirokawa 4 1 Kyushu University, JSPS Research Fellow 2 Aizu University 3 Doshisha University 4 Kyushu University

Upload: others

Post on 30-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Grammar  Profile  for  Spoken  Learner  Data

    By  Brendan  Flanagan1,  Emiko  Kaneko2,  Emi  Izumi3,    Sachio  Hirokawa4  

    1  Kyushu  University,  JSPS  Research  Fellow  2  Aizu  University  

    3  Doshisha  University  4  Kyushu  University  

     

  • Overview

    •  IntroducGon  •  Equivalent  Proficiency  Levels  • Grammar  PaLern  Item  Dataset  •  SVM  &  OpGmal  Feature  SelecGon  • CharacterisGc  Grammar  Profiles  •  A1  vs  A2  •  A2  vs  B1  •  B1  vs  B2  

    • Conclusion  

  • Introduc

  • Equivalent  Proficiency  Levels  The  NICT-‐JLE  Corpus  and  CEFR-‐J

    The  NICT-‐JLE  Corpus  is  made  up  of  1280  transcripts  of  the  ACTFL-‐ALC  SST  (Standard  Speaking  Test)  English  oral  proficiency  interview  test.  

     There  are  9  proficiency  levels  based  on  the  SST  scoring  

    method.  

  • Equivalent  Proficiency  Levels  The  NICT-‐JLE  Corpus  and  CEFR-‐J

    SST  Level  4  is  categorized  at  CEFR-‐J  Level  A2    

    (in  this  presentaGon)  

    Target  Proficiency  Levels:  

    CEFR-‐J:  A1,  A2,  B1,  B2  

    CEFR-‐J  Level  

    #  Samples  SST  4  as    CEFR-‐J  A1  

    #  Samples  SST  4  as    CEFR-‐J  A2  

    A1   236     257    

    A2   738     717    

    B1   263     263    

    B2   40     40    

  • Grammar  PaIern  Item  Dataset•  The  NICT  JLE  corpus  exam  and  data  structure:  

    •  Each  secGon  was  preprocessed  to  count  the  occurrence  of  493  grammar  paLerns,  eg:  

    Stage   Task   Follow-‐up  

    1  

    2   ●   ●  

    3   ●   ●  

    4   ●   ●  

    5  

    Grammar  paGern   #  00015   #  00253   #  00287  

    1:人称代名詞主格(I)+be: I am 2 2 4

    1-1: 人称代名詞主格(I)+be: I am not 0 0 0

    1-2:人称代名詞主格(I)+be: Am I ...? 0 0 0

  • Grammar  PaIern  Item  Dataset•  The  NICT  JLE  corpus  exam  and  data  structure:  

    •  Each  secGon  was  preprocessed  to  count  the  occurrence  of  493  grammar  paLerns,  eg:  

    Stage   Task   Follow-‐up  

    1  

    2   ●   ●  

    3   ●   ●  

    4   ●   ●  

    5  

    Excluded  ”Follow-‐up”  secGon  from  analysis  as  it  contains  free  dialog.  

    Target  secGons  for  analysis.  

    Grammar  paGern   #  00015   #  00253   #  00287  

    1:人称代名詞主格(I)+be: I am 2 2 4

    1-1: 人称代名詞主格(I)+be: I am not 0 0 0

    1-2:人称代名詞主格(I)+be: Am I ...? 0 0 0

  • SVM  &  Grammar  Item  Dataset•  The  preprocessed  dataset  was  then  vectorized  to  create  a  

    special  purpose  search  engine  using  GETA[1].  •  The  dataset  was  divided  into  randomly  

    selected  parts  to  evaluate  the  classificaGon  performance  of  SVM  models  by  10-‐fold  cross  validaGon.  

    •  SVMlight[2]  linear  kernel  was  used  to  train/test  models.  •  To  rank  the  importance  of  grammar  items  for  feature  

    selecGon,  iniGally  an  SVM  model  was  trained  using  all  features.  

    •  The  SVM  model  score  for  each  individual  grammar  item  wi  was  analyzed  to  determine  the  weight(wi)  ranking.  

    [1]  hLp://geta.cs.nii.ac.jp  [2]  hLp://svmlight.joachims.org    

  • SVM  &  Op

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Analysis  By  SVM

  • Grammar  Profiles  Classifica

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Grammar  Profiles  Extrac

  • Visualiza

  • Visualiza

  • Visualiza

  • Visualiza

  • Grammar  Profiles  Visualizing  Characteris

  • Grammar  Profiles    Visualizing  Characteris

  • Grammar  Profiles    Visualizing  Characteris

  • Grammar  Profiles    Visualizing  Characteris

  • Grammar  Profiles    Visualizing  Characteris

  • Grammar  Profiles    Visualizing  Characteris

  • Conclusion

    • Classified  the  English  proficiency  levels  of  data  in  a  spoken  learner  corpus  by  SVM.  • CharacterisGc  grammar  items  for  each  CEFR-‐J  Level  were  extracted.  •  To  aid  interpretaGon  of  the  results,  we  visualized  grammar  item  features  by  Decision  tree.  •  In  future  work,  we  will  extract  the  error  features  of  spoken  learner  data.