[email protected] tml.hut.fi/~pv

20
H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G Y G O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e Copyright © 2000 GO Project Telecommunication and Software Engineering Institute (TSE) Speech Interface Implementation for XML Browser Aki Teppo & Petri Vuorimaa Telecommunications Software and Multimedia Laboratory [email protected] http://www.tml.hut.fi/~pv/

Upload: devin

Post on 18-Mar-2016

77 views

Category:

Documents


6 download

DESCRIPTION

Speech Interface Implementation for XML Browser Aki Teppo & Petri Vuorimaa Telecommunications Software and Multimedia Laboratory. [email protected] http://www.tml.hut.fi/~pv/. Agenda. Introduction X-Smiles XML Browser VoiceXML Implementation Movie Service Example Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Speech Interface Implementation for XML Browser

Aki Teppo & Petri VuorimaaTelecommunications Software and

Multimedia Laboratory

[email protected]://www.tml.hut.fi/~pv/

Page 2: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Agenda

• Introduction• X-Smiles XML Browser• VoiceXML Implementation• Movie Service Example• Conclusions

Page 3: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Introduction

• Web content is becoming more popular in different kinds of handheld devices

• Since the display size is often limited different kinds of multimodal user interfaces are an interesting alternative

• XML and - especially - VoiceXML are the most promising markup languages

• In this paper, we present how VoiceXML can be used in X-Smiles XML browser

Page 4: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

X-Smiles History

• The XML browser was started as a student software project 1998– X-Smiles SMIL-browser

• Support for XSL stylesheet and XML parser was improved during summer 1999

• XSL Formatting Objects, Scalable Vector Graphics, XForms, and Streaming were added during 2000

• Released as open source (www.x-smiles.org) 2001

Page 5: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Some X-Smiles features

• XSL Formatting Objects (XSL FO)• Synchronized Multimedia Integration Language

(SMIL) and streaming• Scalable Vector Graphics (SVG)• XForms• XML Messaging• Session Initiation Protocol (SIP) client• Specific Graphical User Interfaces (GUIs)

Page 6: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

XML Parser XSL Processor

Browser Configuration

ECMAScript HandlingMLFC mgmt. & retrieval

General Functionality

Event Broker

ECMAScript interpreter +

extensions

MLFC specific GUIGeneral GUI

DOM Builder

XSL FO MLFC

Rendering Presentation

SMIL MLFC

Rendering

Config

DOM InterfaceSAX Interface

Presentation Presentation

SVG MLFC

Rendering

sourceMLFC

treeMLFC

XML Processing

Browser core functionality

User interfaceand interaction

MLFCs

Page 7: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

VoiceXML Implementation

• A special Markup Language Functional Component (MLFC) was made for VoiceXML

• In addition, a separate VoiceXML interpreter was created

• Public domain components were used for text to speech conversion and speech recognition

• Java Speech API was used to connect the components together

Page 8: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

FestivalText-To-Speech

SphinxSpeech Recognition

VoiceXMLInterpreter

Java Speech API

JS API for

Festival

JS API for

Sphinx

InterpreterPackage

EnginePackage

X-SmilesXML Browser

VoiceXMLMLFC

BrowserPackage

Page 9: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

VoiceXML Interpreter

• The VoiceXML Interpreter translates the XML content into suitable actions for the underlying speech engines

• We implemented only part of the VoiceXML specification

• Prompt and menu are most important features

Page 10: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Text to Speech Engine

• We used the Festival Text to Speech engine• Due to a license problem, we had to implement

our own Java Speech API for the Festival

Page 11: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Speech Recognition Unit

• We used the Sphinx Automatic Speech Recognition (ASR) library as the speech recognition unit

• The ASR server runs on a separate Linux server• Dynamic grammars are not supported

Page 12: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Movie Service Example

• We used a movie service as a demonstration service

• The user can browse available movies and get information about them

• Parts of the information is rendered using text to speech engine

• Speech can be used for navigation

Page 13: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

XML Sample Data<movie name="Star Wars" id="star">

<information>When the opening scroll of Star Wars

mentions "a galaxy far, far away," it might unwittingly refer to the '70s, a time when "the force" went hand in hand with "the Fonz," and hokeyness ran unchecked. </information><picture file="sw.jpg"/>

</movie>

Page 14: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

XSL Transformation<xsl:stylesheet version="1.0” xmlns:xsl=

"http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><vxml version="1.0"><xsl:apply-templates select="movies"/></vxml></xsl:template><xsl:template match="movies"><!-- Creates the main menu --></xsl:template></xsl:stylesheet>

Page 15: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

VoiceXML Main Menu<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE vxml SYSTEM "voicexml1-0.dtd"><vxml version="1.0"><menu>

<prompt>Welcome to current movies</prompt><prompt>Select one of:<enumerate/></prompt><choice next="pulp.fo">Pulp Fiction</choice><choice next="fifth.fo">Fifth Element</choice>

<choice next="star.fo">Star Wars</choice><choice next="sound.fo">Sound of Music</choice>

</menu></vxml>

Page 16: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Main Menu

Page 17: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Movie Information

Page 18: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

VoiceXML DialogBrowser: Welcome to current movies! Select one

of: Pulp Fiction, Fifth Element, Star Wars, Sound Of Music.

User: Pulp FictionBrowser: Pulp Fiction – Information – Quentin

Tarantino’s award-winning homage to dime-store novels is presented in a collector’s . . .Please select one of: Back

User: BackBrowser: Welcome to current movies! . . .

Page 19: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Results

• The demonstration run well on Intel Celeron 450 MHz computer with 128 Mbytes of memory

• It did not work well with Intel Pentium II 300 MHz computer with 64 Mbytes of memory

• The text to speech engine started in few seconds, while the speech recognition engine started in about ten seconds after opening a page

Page 20: Petri.Vuorimaa@hut.fi tml.hut.fi/~pv

H E L S I N K I U N I V E R S I T Y O F T E C H N O L O G YG O p r o j e c t : S e r v i c e A r c h i t e c t u r e f o r t h e N o m a d i c I n t e r n e t U s e r s o f t h e F u t u r e

Copyright © 2000 GO ProjectTelecommunication and Software Engineering Institute (TSE)

Conclusions

• VoiceXML is convenient tool to implement speech based web applications

• XSL Transformations can be used to convert XML based information to VoiceXML

• Integration of VoiceXML to XML browser is possible, but consumes a lot of resources

• Commercial use requires further optimization