automatic generation of domain models for call centers

30
Automatic Generation of Domain Models for Call Centers from Noisy Transcriptions David Przybilla [email protected] Knowledge Representation Seminar WS 2012/2013

Upload: david-przybilla

Post on 04-Jul-2015

632 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Automatic generation of domain models for call centers

Automatic Generation of Domain Models for Call Centers from

Noisy Transcriptions

David Przybilla

[email protected]

Knowledge Representation Seminar

WS 2012/2013

Page 2: Automatic generation of domain models for call centers

Outline

1. The Problem

2. Proposed Solution • Using Speech Recognition • Feature Engineering ( NLP Component • Taxonomy Builder • Model Builder

3. Application

4. Results

5. Conclusions

Page 3: Automatic generation of domain models for call centers

1. The Problem

Different Domains

• Mobile Phones • Apparel • Services...

Domain Model

emails

Speech Audio

Taxonomy Evaluate Agents

Identify Key Problems

Useful for

Aid Agents

Efficency

Unsupervised

Page 4: Automatic generation of domain models for call centers

2. Solution: Automatically Building a

Domain Model

Page 5: Automatic generation of domain models for call centers

2.1 Automatic Speech Recognition

● Trained an ASR system using “more than 2000

Calls”

– 125 of these has topic annotations

Page 6: Automatic generation of domain models for call centers

Automatic Speech Recognition

● Issues

– Different Accents

– Error rate for phone calls around 40%

● Deletion of words

● Wrong words are inserted

● Wrong speaker is assigned

– Noise:

● No punctuation marks, silence periods

● No sentence boundaries.

● False starts

● Filllings words. (“umm ”, “uhh”)

Page 7: Automatic generation of domain models for call centers

2.2 Feature Engineering Component

( NLP Component)

Page 8: Automatic generation of domain models for call centers

2.2 Feature Engineering Component

( NLP Component)

Stemmer Extract ngrams

Stop Words Removal

Conversation Transcriptions

Feature Vectors

Page 9: Automatic generation of domain models for call centers

Stop Words Removal

• Remove functional words i.e: ‘the’, ‘a’, ‘for’, at….

• Remove filling words. i.e: “mm”, “uhh”

• More discriminative Dimensions

Get the root of each word. i.e : Worked work bunnies bunny …. w

Stemmer

Feature Engineering Component

( NLP Component)

Worked Works working

Feature Vector D2 D1 .. Dn

Page 10: Automatic generation of domain models for call centers

Extract N-grams

• N-gram : Sequence of n-items. In this Experiment, items are words.

• Discarding N-grams

Clusterer

N-grams examples: “lotus notes” “expense reiumbursement” …

Feature Vectors

Page 11: Automatic generation of domain models for call centers

Clusterer

● Clustering: Repeated Bisection

– Cosine similarity

– Top Down Approach

Feature Vectors

Set Of Clusters

Page 12: Automatic generation of domain models for call centers

Clustering: Repeated Bisection

...

….

…..

…..

…..

Do this iteratively until completing K clusters

Step 0

Step 1 Step 2

…..

Page 13: Automatic generation of domain models for call centers

Repeating Bisection with Different K

Values

Repeater

Bisection : K=5

Repeater

Bisection : K=10

…..

….. ….

….. ….. ….

….. ….. …. Repeater

Bisection : K=100 …..

Mo

re granu

larity of to

pics

Page 14: Automatic generation of domain models for call centers

Extract N-grams

• N-gram : Sequence of n-items. In this Experiment, items are words.

• Discarding N-grams

Taxonomy Builder

N-grams examples: “lotus notes” “expense reiumbursement” …

Feature Vectors

Page 15: Automatic generation of domain models for call centers

Taxonomy Builder

– Set of Clusters

● Taxonomy

Page 16: Automatic generation of domain models for call centers

Taxonomy Builder

…..

….. ….

…..

● Discard Clusters with less

than T elements

Creating the Taxonomy

● Each Node in the taxonomy

is a cluster.

A B

● There is at least one

common document between

A & B.

● B was created during a finer

granularity call to RB

Page 17: Automatic generation of domain models for call centers

Taxonomy Builder

Page 18: Automatic generation of domain models for call centers

Model Builder

Page 19: Automatic generation of domain models for call centers

Add/Organize Information in the

Node Default Properti Node

Page 20: Automatic generation of domain models for call centers

Model Builder

● Extend each node with additional information:

● Typical actions ● Typical Q&A ● Call statistics

● Style of the agents (for opening and Closing)

Tiled: merge ‘repeated questions’ Ordered: Showing them in the order they appear

Page 21: Automatic generation of domain models for call centers

Typical Actions

● Actions are around topic features

● Apparently they input topic features ● 10-word window around topic-

vocabulary

● Discard n-grams below a threshold

i.e: Click the font color button

Page 22: Automatic generation of domain models for call centers

How to Extract Q&A?

● Look for patterns such as:

– How, what, can I , were there…etc

● Answers are sentences following the question.

Page 23: Automatic generation of domain models for call centers

Call Statistics

● Average Call Duration

● Average Transcription length

● Average number of speaker turns

● Number of calls

● How Agents usually start/end a call

● Allowed them to compare call durations among

different topics.

Page 24: Automatic generation of domain models for call centers

Asessing the Results (?)

● ‘Almost all issued from the labeled calls’ have

been captured in the Q&A and taxonomy.

● The phrases captured for the Q&A, and

actions are well form In dispite of ASR issues

● Tiling : merged questions, actions. However

semantically similar phrases were not merged

● “The list of topic specific phrases matched and

at times was more similar than hand generated

sets”

Page 25: Automatic generation of domain models for call centers

Application

How to access the knowledge in the

taxonomy?

Topic Identification

Page 26: Automatic generation of domain models for call centers

Topic Identification

Identify the topic of call by listening to the

initial part of the call

Discriminative Features

Page 27: Automatic generation of domain models for call centers

Topic Identification

Variation: check how good is prediction with

certain clusters..

Page 28: Automatic generation of domain models for call centers

Conclusions

• Automating part of building Knowledge representation is possible

• It is also possible to bring better performance probably by extracting relations, topic vocabulary from manuals, and external knowledge

• Semantic level processing tools can be used to improve the given method

• The application side apparently showed that the created taxonomy is good enough for actually solving problems in the call center

Page 29: Automatic generation of domain models for call centers

Critical review – How to asses the goodness/correctness of a Taxonomy

– How to compare human generated vs machine

generated taxonomies

– Given the pipeline and the good results, does “ASR”

issues really matter?

– Possibility of adding extra knowledge: from topic

articles, manuals..etc

– The ‘performance’ depends on text clustering ->

goodness of each node.

Page 30: Automatic generation of domain models for call centers

Thank you for your time