(발제) grounding words in perception and action computational model. +tredns in cognitive...
Post on 06-Jul-2015
69 Views
Preview:
TRANSCRIPT
최유진
Grounding words in perception and action: computational model.
Deb RoyTRENDS in Cognitive Sciences
Vol.9 No.8 August 2005
Thursday, October 13, 2011
Language
English
Russian
Korean
French
Chinese
Japanese
Portuguise
Indian Germane
SpanishArabic
Thursday, October 13, 2011
Oneʼs language = Oneʼs perspective on the world
Makes a language of machines with that of humans.
Human - communicate with - Machine
Thursday, October 13, 2011
Deb Roy
Associate Professor of Media Arts and SciencesDirector, Cognitive Machines
Roy studies how children learn language, and designs machines that learn to communicate in human-like ways. To enable this work, he has pioneered new data-driven methods for analyzing and modeling human linguistic and social behavior.
: artificial intelligence, cognitive modeling, human-machine interaction, data mining and information visualization
http://www.ted.com/talks/deb_roy_the_birth_of_a_word.html
Thursday, October 13, 2011
We use words to communicate about the things and kinds of things, their properties, relations and actions. Analogy between Human and Machine.
- Researches in robotics and simulated systems uses : Ground language in machine perception and action = Human abilities.
- Research Tradition in computational model moves from : purely symbolic level to connecting symbolic to physical realm of the real world referents. : purely symbolic modelf context-dependent� XWj j8, I2� { P Y�.
Index.
1. Words about the physical world.2. Association between words and perceptual categories.3. Modeling context-dependent word use.4. Models of infant word learning that process ʻfirst-person-perspectiveʼ sensory data.5. Richer representational structures : grounding verbs in physical action.6. Integration of action and perception in grounding nouns.7. Conclusions
0. Research Background
Thursday, October 13, 2011
1. Words about the physical world.
• Is human language is like dictionary?
�y computational modelf l�j XWDag E�symbolice( k��]�1 Ux ��ZG D$�f XW, Dute( Da�� =�� Ux ���j ��g �� Da��.
Real-world referents : �W�f lT� �cj \�g �� W"� j8, @[;��?
• Computational model and embodied nature of language : Complex crossmodal phenomena --> particularly useful in situated language acquisition.
W. Vk�k XW, �h R�� �� ��f z( Ux ��(physical env.) NZG �E� ��(object and activities), \��� �k�.
• Implication of the study : the possibility of machines to autonomously acquire and verify beliefs about the world, and to communicate in natural language about their beliefs.
ROUND PUSH HEAVY
Visual feature Motor control feature Haptic feature
Thursday, October 13, 2011
2. Words - Perceptual Categories : Salient Linguistic Feature
2.1 Language grounding system & categorization.
Sensory input Natural language description.translation
�>tl �J `M: continuous sensor input (vectors) -- linguistic categories
e.g. Generative and discriminative models of categorization.
(a). Two prototypes can ʻcompeteʼ (b), leading to a category boundary along points of equal distance from both prototypes (if non-Euclidean distance measures are used, non-linear boundaries may emerge). Categories may also be modeled by explicitly representing categorical boundaries. In (c), a linear model, f(height)=A*width + B, encodes the same categorical distinction as the prototypes in (b)
Thursday, October 13, 2011
2. Words - Perceptual Categories : Salient Linguistic Feature
2.2. Models of color naming : Is perceptual model is fixed?
Mojsilovicʼs early model :�W_ S� lT ��-, 0��� �f w� �&nk#� �w.
Ux ��ZG, D$�f F� �W, =� d\�� Da��.in different context.
“Purple”“Red” “Red wine”
Thursday, October 13, 2011
3. Words - Perceptual Categories : Context-dependent Word Use
3.1 Gardenforʼs model : Color distance
How linguistic convention and visual perception combine to determine word meanings.
: Arbitrary linguistic convention within perceptual color constraints.
e.g. ʻRed wineʼ in Spanish : ʻvinto tintoʼ(colored wine,literally) in Catalan : ʻvino negroʼ(black wine)
red(tinto)_ black(negro) �j �W H�f linguistic conventionj 6qct(arbitrary)l ��k}/ Gardenfor 3�Z �+1 red_ white �j �W u�f B����.
: Distance between white and red(dark) wine > between white and white(light) wine (in the context-independent prototype)
Thursday, October 13, 2011
3. Words - Perceptual Categories : Context-dependent Word Use
3.2 Reiger : Spatial Distance
: studied graded acceptability judgments of 1) spatial terms.
For English speakers , how they perceive the term “Above” in conjunction with the physical context.
ʻ The circle is above the blockʼ : Q_V%j a, b, c | W! �k �s r I2� o� � ��`?
“Above”
L1 : Connects the centers of the mass of the regions.
L2 : Connects the closest points between the regions.
L1 of (b) = L1 of (c) L2 of (a) = L2 of (b)
�#G L1k� L2j �m `M/e(� � K ��, I2� P Y�.
�*) ^Wj above� near�j �m �W� BA2�S�tl ��, �w��.
Thursday, October 13, 2011
3. Words - Perceptual Categories : Context-dependent Word Use
3.2 Reiger : CONT.
2) movements : simple movies of objects moving relative one another to visually ground words s.a. ʻthroughʼ and ʻintoʼ.
e.g. ʻPutting a key into a lockʻ vs. ʻRemoving a key from a lockʼ: events distinguished by their initial points vs. end points.
3.3 Limitation in spatial semantics and further studies.- Lack of functional contexts e.g. (? ~��M�Z� ʻclean behind the couch(�b� ��g �M�#)ʼ _ ʻhind behind the couchʼ(�b� �( QW#)ZG � 2'7j behind j8j �k, I2�} 5�.
Thursday, October 13, 2011
4. Models of Infant word learning that process ʻfirst-person-perspectiveʼ sensory data
4.1. Cross-channel early lexical learning(CELL) “Step into the shoes” of humans and learn natural sensory data.: Directly process recordings from natural human environments became enabled without manual transcription.
CELL Computational Model : S�tl �� 9 F� ��-(visual categories)_ <�� XW(spoken words) Dkj \ Jg �V�. - A model of learning words from sights and sounds.
CELL vs. Blinded system : 50% accuracy rate gaps!
Thursday, October 13, 2011
4. Models of Infant word learning that process ʻfirst-person-perspectiveʼ sensory data
4.1. Cross-channel early lexical learning(CELL)
Method : Lexical Learning Analysis
1) STM : Utterance-Context pair : audio-visual inputaudio -phonetic representations of spoken sequences : linguistic unitvideo- context: visually observable object and motion : semantic(contextual) unit
2) LTM - Lexical candidatesutterance are decomposed into a set of hypothesized linguistic unit prototypecontexts are decomposed into a set of hypothesized semantic category prototypes
e.g. bounce - ball , ruf-ruf - dog, vrrooom - car...shoes, truck
Limitation :
1) Noises from sensory processes2) Semantically Inappropriate candidatese.g. ʻyeahʼ
Thursday, October 13, 2011
1. word - perception : indirect processing - purely semantic - context-dependent
2. first-person perspective : direct processing - CELL(single object at once) - Eyegaze(multiple objects at once)
3. whatʼs next?
VERB = ACTION.
Thursday, October 13, 2011
5. Richer representational structures : grounding verbs in physical action.
Verbs that refer to physical actions are naturally grounded in representations that encode the temporal flow of events.
5.1 Siskind : Perceptually grounded model of verbs - sequences of human hands moving colored blocks.(video recorded)
- D$j O, C), �p Dkj v�, ;�, \�(contact, support, attachment)j [Talmyʼs theory of force dynamics]
- semantics of basic verbs = temporal schema, an expected sequences of force dynamic interactions.
e.g. ʻHands pick up blockʼ table-supports-blockhand-contacts-blockhand-attached-blockhand-supports-block
12
34
subject verb object
* Allen relations : 13 logical pairs of time interval between A and B
5.2. Bailey et al. developed a system that learns verb semantics and action control structure, ʻX-schemaʼ.- e.g.Difference between ʻPushʼ and ʻShoveʼ
Thursday, October 13, 2011
6. Integration of action and perception in grounding nouns.
6.1. Roy : structure networks of motors and sensor primitives : conversational robot named Ripley.
ʻHand me the blue one on your rightʼ - Ripley maintain a dynamic mental model, three-dimensional model of physical environment : �p �1, (?j 4�, �� E�p(l�)j c�
- the contents of the robotʼs mental model maybe updated based onlinguistic,visual,or haptic input. (Ripley remember the position of the object when it is out of its sensory field.)
- multimodal sensory expectation :
When Ripley do something What visual system expects
Look at the location
Find the visual region
Reaches to the location
Touch and grasp the object
Grasps the objects
control over object locationlocation info. updated
Thursday, October 13, 2011
6. Integration of action and perception in grounding nouns.
6.1. Roy : CONT.
Ripleyʼs representations and algorithms approches to the grounds the meaning of verbs,adjectives,and nouns using a unified representational system.
VERB motor-control like X-schemes actions
ADJECTIVES�w �cZG LG� :i� object, I2: All perceptual properties corresponds to actions.
red =/ color categories = categories linked to motor programs
ADJECTIVES�w �cZG LG� :i� object, I2: All perceptual properties corresponds to actions.
heavy = haptic categories linked to specific actions.
NOUNS Objects linked with locations
Ball - Round (or color,size..) - All of actions involved.
Thursday, October 13, 2011
7. Conclusions - Interaction between word use, perception, and action - Further research (Box 3):other aspects of the language such as grammatical composition and functional use in social context.- Re-unite sub-fields of AI : from computer vision, parsing, information retrieval, machine learning, and planning.- Drop in cost of sensor and robotic technology, and ubiquitous situated computing : create new forms of situated human-machine communication.
Thursday, October 13, 2011
top related