Automatic Generation of Domain Models for Call Centers from
Noisy Transcriptions
David Przybilla
Knowledge Representation Seminar
WS 2012/2013
Outline
1. The Problem
2. Proposed Solution • Using Speech Recognition • Feature Engineering ( NLP Component • Taxonomy Builder • Model Builder
3. Application
4. Results
5. Conclusions
1. The Problem
Different Domains
• Mobile Phones • Apparel • Services...
Domain Model
emails
Speech Audio
Taxonomy Evaluate Agents
Identify Key Problems
Useful for
Aid Agents
Efficency
Unsupervised
2. Solution: Automatically Building a
Domain Model
2.1 Automatic Speech Recognition
● Trained an ASR system using “more than 2000
Calls”
– 125 of these has topic annotations
Automatic Speech Recognition
● Issues
– Different Accents
– Error rate for phone calls around 40%
● Deletion of words
● Wrong words are inserted
● Wrong speaker is assigned
– Noise:
● No punctuation marks, silence periods
● No sentence boundaries.
● False starts
● Filllings words. (“umm ”, “uhh”)
2.2 Feature Engineering Component
( NLP Component)
2.2 Feature Engineering Component
( NLP Component)
Stemmer Extract ngrams
Stop Words Removal
Conversation Transcriptions
Feature Vectors
Stop Words Removal
• Remove functional words i.e: ‘the’, ‘a’, ‘for’, at….
• Remove filling words. i.e: “mm”, “uhh”
• More discriminative Dimensions
Get the root of each word. i.e : Worked work bunnies bunny …. w
Stemmer
Feature Engineering Component
( NLP Component)
Worked Works working
Feature Vector D2 D1 .. Dn
Extract N-grams
• N-gram : Sequence of n-items. In this Experiment, items are words.
• Discarding N-grams
Clusterer
N-grams examples: “lotus notes” “expense reiumbursement” …
Feature Vectors
Clusterer
● Clustering: Repeated Bisection
– Cosine similarity
– Top Down Approach
Feature Vectors
Set Of Clusters
Clustering: Repeated Bisection
...
….
…..
…..
…..
Do this iteratively until completing K clusters
Step 0
Step 1 Step 2
…..
Repeating Bisection with Different K
Values
Repeater
Bisection : K=5
Repeater
Bisection : K=10
…..
….. ….
….. ….. ….
….. ….. …. Repeater
Bisection : K=100 …..
Mo
re granu
larity of to
pics
Extract N-grams
• N-gram : Sequence of n-items. In this Experiment, items are words.
• Discarding N-grams
Taxonomy Builder
N-grams examples: “lotus notes” “expense reiumbursement” …
Feature Vectors
Taxonomy Builder
– Set of Clusters
● Taxonomy
Taxonomy Builder
…..
….. ….
…..
● Discard Clusters with less
than T elements
Creating the Taxonomy
● Each Node in the taxonomy
is a cluster.
A B
● There is at least one
common document between
A & B.
● B was created during a finer
granularity call to RB
Taxonomy Builder
Model Builder
Add/Organize Information in the
Node Default Properti Node
Model Builder
● Extend each node with additional information:
● Typical actions ● Typical Q&A ● Call statistics
● Style of the agents (for opening and Closing)
Tiled: merge ‘repeated questions’ Ordered: Showing them in the order they appear
Typical Actions
● Actions are around topic features
● Apparently they input topic features ● 10-word window around topic-
vocabulary
● Discard n-grams below a threshold
i.e: Click the font color button
How to Extract Q&A?
● Look for patterns such as:
– How, what, can I , were there…etc
● Answers are sentences following the question.
Call Statistics
● Average Call Duration
● Average Transcription length
● Average number of speaker turns
● Number of calls
● How Agents usually start/end a call
● Allowed them to compare call durations among
different topics.
Asessing the Results (?)
● ‘Almost all issued from the labeled calls’ have
been captured in the Q&A and taxonomy.
● The phrases captured for the Q&A, and
actions are well form In dispite of ASR issues
● Tiling : merged questions, actions. However
semantically similar phrases were not merged
● “The list of topic specific phrases matched and
at times was more similar than hand generated
sets”
Application
How to access the knowledge in the
taxonomy?
Topic Identification
Topic Identification
Identify the topic of call by listening to the
initial part of the call
Discriminative Features
Topic Identification
Variation: check how good is prediction with
certain clusters..
Conclusions
• Automating part of building Knowledge representation is possible
• It is also possible to bring better performance probably by extracting relations, topic vocabulary from manuals, and external knowledge
• Semantic level processing tools can be used to improve the given method
• The application side apparently showed that the created taxonomy is good enough for actually solving problems in the call center
Critical review – How to asses the goodness/correctness of a Taxonomy
– How to compare human generated vs machine
generated taxonomies
– Given the pipeline and the good results, does “ASR”
issues really matter?
– Possibility of adding extra knowledge: from topic
articles, manuals..etc
– The ‘performance’ depends on text clustering ->
goodness of each node.
Thank you for your time