Download - Automatic generation of domain models for call centers

Automatic Generation of Domain Models for Call Centers from

Noisy Transcriptions

David Przybilla

[email protected]

Knowledge Representation Seminar

WS 2012/2013

Outline

1. The Problem

2. Proposed Solution • Using Speech Recognition • Feature Engineering ( NLP Component • Taxonomy Builder • Model Builder

3. Application

4. Results

5. Conclusions

1. The Problem

Different Domains

• Mobile Phones • Apparel • Services...

Domain Model

emails

Speech Audio

Taxonomy Evaluate Agents

Identify Key Problems

Useful for

Aid Agents

Efficency

Unsupervised

2. Solution: Automatically Building a

Domain Model

2.1 Automatic Speech Recognition

● Trained an ASR system using “more than 2000

Calls”

– 125 of these has topic annotations

Automatic Speech Recognition

● Issues

– Different Accents

– Error rate for phone calls around 40%

● Deletion of words

● Wrong words are inserted

● Wrong speaker is assigned

– Noise:

● No punctuation marks, silence periods

● No sentence boundaries.

● False starts

● Filllings words. (“umm ”, “uhh”)

2.2 Feature Engineering Component

( NLP Component)

2.2 Feature Engineering Component

( NLP Component)

Stemmer Extract ngrams

Stop Words Removal

Conversation Transcriptions

Feature Vectors

Stop Words Removal

• Remove functional words i.e: ‘the’, ‘a’, ‘for’, at….

• Remove filling words. i.e: “mm”, “uhh”

• More discriminative Dimensions

Get the root of each word. i.e : Worked work bunnies bunny …. w

Stemmer

Feature Engineering Component

( NLP Component)

Worked Works working

Feature Vector D2 D1 .. Dn

Extract N-grams

• N-gram : Sequence of n-items. In this Experiment, items are words.

• Discarding N-grams

Clusterer

N-grams examples: “lotus notes” “expense reiumbursement” …

Feature Vectors

Clusterer

● Clustering: Repeated Bisection

– Cosine similarity

– Top Down Approach

Feature Vectors

Set Of Clusters

Clustering: Repeated Bisection

...

….

…..

…..

…..

Do this iteratively until completing K clusters

Step 0

Step 1 Step 2

…..

Repeating Bisection with Different K

Values

Repeater

Bisection : K=5

Repeater

Bisection : K=10

…..

….. ….

….. ….. ….

….. ….. …. Repeater

Bisection : K=100 …..

Mo

re granu

larity of to

pics

Extract N-grams

• N-gram : Sequence of n-items. In this Experiment, items are words.

• Discarding N-grams

Taxonomy Builder

N-grams examples: “lotus notes” “expense reiumbursement” …

Feature Vectors

Taxonomy Builder

– Set of Clusters

● Taxonomy

Taxonomy Builder

…..

….. ….

…..

● Discard Clusters with less

than T elements

Creating the Taxonomy

● Each Node in the taxonomy

is a cluster.

A B

● There is at least one

common document between

A & B.

● B was created during a finer

granularity call to RB

Taxonomy Builder

Model Builder

Add/Organize Information in the

Node Default Properti Node

Model Builder

● Extend each node with additional information:

● Typical actions ● Typical Q&A ● Call statistics

● Style of the agents (for opening and Closing)

Tiled: merge ‘repeated questions’ Ordered: Showing them in the order they appear

Typical Actions

● Actions are around topic features

● Apparently they input topic features ● 10-word window around topic-

vocabulary

● Discard n-grams below a threshold

i.e: Click the font color button

How to Extract Q&A?

● Look for patterns such as:

– How, what, can I , were there…etc

● Answers are sentences following the question.

Call Statistics

● Average Call Duration

● Average Transcription length

● Average number of speaker turns

● Number of calls

● How Agents usually start/end a call

● Allowed them to compare call durations among

different topics.

Asessing the Results (?)

● ‘Almost all issued from the labeled calls’ have

been captured in the Q&A and taxonomy.

● The phrases captured for the Q&A, and

actions are well form In dispite of ASR issues

● Tiling : merged questions, actions. However

semantically similar phrases were not merged

● “The list of topic specific phrases matched and

at times was more similar than hand generated

sets”

Application

How to access the knowledge in the

taxonomy?

Topic Identification


Identify the topic of call by listening to the

initial part of the call

Discriminative Features


Variation: check how good is prediction with

certain clusters..

Conclusions

• Automating part of building Knowledge representation is possible

• It is also possible to bring better performance probably by extracting relations, topic vocabulary from manuals, and external knowledge

• Semantic level processing tools can be used to improve the given method

• The application side apparently showed that the created taxonomy is good enough for actually solving problems in the call center

Critical review – How to asses the goodness/correctness of a Taxonomy

– How to compare human generated vs machine

generated taxonomies

– Given the pipeline and the good results, does “ASR”

issues really matter?

– Possibility of adding extra knowledge: from topic

articles, manuals..etc

– The ‘performance’ depends on text clustering ->

goodness of each node.

Thank you for your time

Download - Automatic generation of domain models for call centers

Top Related