multimedia search: from lab to web prof. dr. l. schomaker ki rug invited lecture, presented at the...

Multimedia search: From Lab to Web

prof. dr. L. Schomaker

KI

RuGInvited lecture, presented at the 4e Colloque International sur le Document Electronique,24-26 octobre 2001.Schomaker, LRB (2001) Image Search and Annotation: From Lab to Web.Proceedings of CIDE 2001, pp.373-375, ISBN 2-909285-17-0.

©2001 LRB Schomaker - KI/RuG

2

KI

RuG

Overview

Methods in content-based image search

The user’s perspective: ergonomics, cognition and perception

Feeding the data-starved machine


3

KI

RuG

Researchers

L. Schomaker L. Vuurpijl E. Deleau E. Hoenkamp A. Baris


4

KI

RuG

A definition

In content-based image retrieval systems, the goal is to provide the user with a set of images, based on a query which consists - partly or completely - on pictorial information

Exclude: point & click navigation in pre-organized image bases


5

KI

RuG

Image-based queries on WWW: existing methods and their problems

IBIR - image-based information retrieval

CBIR - content-based image retrieval

QBIC - queries based on image content

PBIR - pen-based image retrieval


6

KI

RuG

Existing systems & prototypes

QBIC (IBM) VisualSEEk (Columbia) Four-Eyes (MIT Media) … and many more: (Webseek,

Excalibur,Imagerover,Chabot,Piction) Research: IMEDIA (Inria),

Viper/GIFT (Marchand-Maillet)


7

KI

RuG

Query Methods

Query Matched with Algorithm

Keywords Manual text annotation

String search,

Information Retrieval

Keywords Textual context of image

String search, IR

Exemplar image Complete image Template matching, feature vector matching

Rectangular sub image

Complete image Feature, texture based

Layout structure Complete image Texture and Color

Object outline Partial image Outlines, Edges

Object sketch Partial image Features, Edges


8

KI

RuG

Example 1. QBIC (IBM)

Features:– Colors, textures, edges, shape

Matching– Layout, full-image templates, shape

Upper-left picture is the query

“boy in yellow raincoat”

…yields very counter-intuitive results

What was the user’s intention?


10

KI

RuG

Example 2. VisualSEEk

Features:– Colors, textures, edges. bitmap shape

Matching:– layout, full-image templates

Layout- and feature-based query construction

Requires detailed user knowledge on pattern-recognition issues!

VisualSEEk (Columbia Univ.)


12

KI

RuG

Example 3. FourEyes (MIT Medialab)

Imposed block segmentation

Textual annotation per block

Labels are propagated on the basis of texture matching

FourEyes (MIT Medialab)


14

KI

RuG

FourEyes…

Imposed block segmentation: is unrelated to object placement

object details are lost: global + textural

Interesting: a role for the user


15

KI

RuG

Problems

Full-image template matching yields bad retrieval results

Feature-based matching requires a lot of input and knowledge by the user

Layout-based search only suits a subset of image needs

Grid-based partitioning misses details and breaks up meaningful objects


16

KI

RuG

Problems…

Reasons behind a retrieved image list are unclear (Picard, 1995)

Features and matching scheme are not easily explainable to the user

An intelligent system should learn from previous queries of the user(s)


17

KI

RuG

A statement

In content-based image retrieval systems, just as in text-based Information Retrieval, the performance of current systems is limited due to their incomplete and weak modeling of the user’s – Needs– Goals– Perception– Cognition (semantics)


18

KI

RuG

User-Interfacing aspects

Computer users are continuously evaluating the value of system responses as a function of the effort spent on input actions (cost / benefit evaluation)

Consequence: after formulating a query with a large amount of key clicks, slider adjustments and mouse clicks, the quality of an image hit list is expected to be very high…

Conversely, user expectancies are low when the effort only consists of a single mouse click


19

KI

RuG

Pragmatic aspects

a survey on WWW revealed that users are interested in objects (71%) and not in layout, texture or abstract features.

The preferred image type is photographs (68%)


20

KI

RuG

Cognitive & Perceptual aspects

Objects are best recognized from 'canonical views' (Blanz et al., 1999),

Photographers know and utilize this phenomenon by manipulating camera attitude or objects


21

KI

RuG

Photographs and paintings imply communication

Photographer

Painter

User, viewer

=Surveillance

camera

Computer Vision

World World


22

KI

RuG

Photographs and paintings imply communication

Photographer

Painter

User, viewer

=Surveillance

camera

Computer Vision

World World

Problems of geometrical invariance are less extreme


23

KI

RuG

Canonical Views

Non-canonical object orientation


24

KI

RuG

Canonical Views

Canonical object orientation


25

KI

RuG

More cognition: Basic-level object categories

In a hierarchy of object classes (ontology) a node of the type 'Basic Level' (Rosch et al.,1976) adds many structural features in its description, as compared to the level above, whereas the number of unique additional features is reduced when going down towards a more specific node.


26

KI

RuG

Basic-level categories, example

“furniture” [virtually no geometrical features]

“chair” [many clearly-defined structural features]

“kitchen chair” [only a few additional features].


27

KI

RuG

Basic-level object categories and mental imagery

A basic level is the highest level for which clear mental imagery exists in an object ontology


28

KI

RuG



A basic-level object elicits almost the same feature description when it is named, or shown visually


29

KI

RuG



A basic-level object elicits almost the same feature description when it is named or shown visually

Basic-level object descriptions often contain reference to structural components (parts)


30

KI

RuG





In verbally describing the contents of a picture, people will tend to use 'basic-level' words.


31

KI

RuG





In verbally describing the contents of a picture, people will tend to use 'basic-level' words.

Rosch, E., Mervis, C.B., Gray, W.E., Johnson, E.M.

and Boyes-Braem, P. (1976).

Basic objects in natural categories.

Cognitive Psychology, 8, pp. 382-439.


32

KI

RuG

Implication of the ‘basic level’ category

The basic level forms a natural bridge between textual and pictorial information

It is likely to determine both annotation and search behavior of the users

It is an ideal starting point for developing computer vision systems which generate text on the basis of a photograph (ultimately)


33

KI

RuG

Misconception about Perception and Cognition

“A picture is worth a thousand words”?

True or False?

“A picture is worth a thousand words”….

But many pictures could use a few words…!

“A picture is worth a thousand words”?

This is a part of a rocket engine by NASA


37

KI

RuG

Assumptions

In image retrieval, the media type of photographs is preferred

There is a predominant interest in objects (in the broad sense: including humans and animals)

The most likely level of description in real-world images is the “basic-level” category (Rosch et al.)


38

KI

RuG

Goal: object-based image search

Object recognition in an open domain?

Not possible yet.

Extensive annotation is needed in any case: for indexed access and for machine learning (MPEG-7 allows for sophisticated annotation)

But who is going to do the annotation: the

content provider or the user, and how?


39

KI

RuG

How to realize object-based image search?

Bootstrap process for pattern recognition

cf.: Project CyC (Lenat) and openMind (Stork)

Collaborative, opportunistic annotation and object labeling (browser side)

Background learning process (server side)


40

KI

RuG

Design considerations

Focus on object-based representations and queries

Material: photographics with identifiable objects for which a verbal description can be given

Exploit human perceptual abilities

Allow for incremental annotation to obtain a growing training set


41

KI

RuG

Outline-based queries

In order to bridge the gap between what is currently possible and the ultimate goal of automatic object detection and classification, a closed curve, drawn around a known object is used as a bootstrap representation: An outline.

This closed curve contains shape information itself (XY, dXdY, curvature) and allows to separate visual object characteristics represented by the pixels which are enclosed by it from the background

Scribbles vs Outlines

Examples of outlines from a “Wild West” base of photographs

Outline, basic features and matching


45

KI

RuG

More outline-based features

Lengths of radii from center of gravity Curvature Curvature scale space Bitmap of an outline Absolute Fourier transform |FFT| Others (not tried yet): wavelets, Freeman

coding

Outline features: coordinates, running angle (cos(f),sin(f)), radii, |FFT|

Outline examples from motor bicycle set

Motor bike engine

Image (pixel-based) features

Matching possibilities


51

KI

RuG

Annotation

After the user has produced an outline (by pen or mouse), it is fruitful to ask for a text label (keyboard, speech, handwriting)

Knowledge on semantics can be exploited to guide the user (e.g., with menus)

Annotation tool

Initial results


54

KI

RuG

Problems in performance measurement

The systems usually have the goal to return a list of similar-looking images

What is good? What is bad?

No clear-cut definition of ‘class’, unlike speech and handwriting recognition

Performance measurement is borrowed from Information Retrieval: Precision & Recall


55

KI

RuG

Ensemble vs Hit list vs Wanted


56

KI

RuG

Precision of a hit list


57

KI

RuG

Precision of a hit list: accidental or real?

Example:


58

KI

RuG

Intermediate summary

Outline-based search yields promising results Many questions remain:

Can users do it? Do they like to perform outlining+annotation? Is the ‘bootstrap’ idea valid: can the outlines

be used for matching with unseen images?

Can users produce outlines?

Object classes:• Locomotive• Christmas tree• Atomic explosion• Jukebox• 4-wheel drive car• Brain• Motor bike• Pistol• Buddha• Stop sign

User (N=33) differences in outline production


60

KI

RuG

Multistable outlining behavior?

Locomotive: with or without smoke?

Accurate or sloppy curvature followers

Observations

Ambiguities in outlining

Cluster analysis solves outline variation

Outline-based image retrieval system

Outlines XY (colored) vs Edges ΔI (grey)

Match operator, for each point i on the outline:


65

KI

RuG

Outline vs Edge search results

Caveat: no translation, orientation, scaleinvariance (early results)

More use for outlines: class-specific edge detectors

Generic edge detection

Edge detector (MLP)Trained with outline points from the motor bicycle base as targets for output neuron


67

KI

RuG

User input is highly valuable

Labeled outlines are needed to train classifiers/matchers

Labeled outlines are needed to develop benchmark sets (like: “BenchAthlon”)

Examples from other fields:

Unipen: 5 million characters,

NIST: millions of characters,

LDC: thousands of hours labeled speech


68

KI

RuG

User input is highly valuable

openMind arguments (Stork, 1999; 2001)

The best teams have the largest labeled training sets

Differences between algorithms vanish when huge training sets are used (Ho & Baird, 1997)

Processor speed can be exploited if sufficient amounts of data are used (free ride on Moore’s Law)


69

KI

RuG

The Vind(X) site (Schomaker & Vuurpijl)

The experience collected thus far has been integrated in a functional Web site for image search and collaborative annotation

In collaboration with Amsterdam Rijksmuseum: a large image base of paintings and their descriptions in a text base


70

KI

RuG

The Vind(X) site (Schomaker & Vuurpijl)

Site: http://kepler.cogsci.kun.nl/vindx/

The site will become part of the openMind initiative: http://www.openmind.org

System consisting of Java/Javascript WWW pages, server-side pattern recognition in C

Vind(X) has extensive search and rendering functions

http://kepler.cogsci.kun.nl/vindx/

http://www.openmind.org/

Vind(x) system with paintings data base of Amsterdam Rijksmuseum.

Query at upperleft “sitting man”

(Schomaker & Vuurpijl, 1999)




72

KI

RuG

Outline results for one user


73

KI

RuG

More questions: open user access

How to detect non-cooperative outlining and annotation?

How to merge ‘identical’ outlines?

How to merge ‘identical’ textual annotation

How to detect valuable expert input?


74

KI

RuG

More questions: semantics and geometry

How to achieve ‘explainable’ image hit list results?

make sure the underlying features are based on human perception

Hypothesis: “The construction of ontologies based on both semantics and feature- space characteristics will help in producing ‘explainable’ hit list results”

Example of an ontology created from all collected object annotations

A relation between semantic classes and contours?

fruit

plant

inanimate

creature


77

KI

RuG

Summary

Existing systems have problems in useability

Knowledge on the user (ergonomics, perception, cognition) may help substantially

Objects are a preferred search criterion

Object-based approaches have a strong connection to semantics


78

KI

RuG

Summary (continued)

An outline-based object search system was presented

The prototype was converted to a Web site with real content: Dutch paintings (> 80)

The site is used for collecting human annotations to this image base (> 1000)

The resulting data are very useful for future research in a number of areas: IR, outline matching, pixel matching, dedicated preprocessing

multimedia search: from lab to web prof. dr. l. schomaker ki rug invited lecture, presented at the...

Documents