detecting high interest in the classroom through non ...€¦ · learners who display a high level...

Master Thesis

Detecting High Interest in theClassroom through Non-verbal Learner

Behavior

Supervisor Professor Michihiko Minoh

Department of Intelligence Science and Technology

Graduate School of Informatics

Kyoto University

Gary Jay Coffman

July 27th, 2009

i

Detecting High Interest in the Classroom through

Non-verbal Learner Behavior

Gary Jay Coffman

内容梗概

近年，高等教育における FD（Faculty Development）が広がってきており，

2008年 4月からは大学の学士課程教育においても FDが義務化されている．授

業改善のために様々な取組が行われており、その中でも、撮影した授業映像を

授業者が見ることで自分自身の授業を振り返ったり、数名の教員がその映像を

見ながら授業運営に関する議論を行うといった試みも行われている。しかしな

がら、大学における授業は 90分であり、全部を見るためには同じ時間かかって

しまうという問題があり、検討するポイントを検出することが必要となる。ま

た、授業改善について検討する際には学生の反応や様子が重要であることから、

授業者の映像のみならず、学生を撮影した映像が重要となる。

本研究では、授業運営について議論する際には学生が集中している場面が重

要であると考え、学生の姿勢に関する画像情報から学生が集中している場面を

抽出することを目的とする。

学生の体の傾き、首の傾き、手の位置・状態を姿勢特徴とし、画像から学生

個人における各姿勢特徴の状態を人手で決定する。授業の状態として、学生全

体が集団として「興味が高い」または「未定義」、という 2つを設定する。

3つの授業において学生個人の姿勢特徴と授業の状態に関するデータを収集

し、決定木のアルゴリズムを用いて、姿勢特徴のデータから授業の状態を検出

することを試みた。手順としては、各授業の前半部分のデータを使って機械学

習を行ってルールを導出し、後半のデータに対してそのルールを適用した。

その結果として、3つの授業においてそれぞれ 81.3%、82.7%、75.6%の割合

で「興味度が高い」状態を検出することができ、このような手法、手順で学生

集団の状態を把握することができることを明らかにした。本研究では、学生の

姿勢情報を人手で判断しているが、今後画像処理技術を用いて姿勢情報を認識

できるようにすることで、より大量のデータを用いて授業の状況を判断できる

ようにすることが課題である。

ii



Gary Jay Coffman

Abstract

Relying solely on video data, this thesis proposes a method for identifying

learners who display a high level of interest in the content presented in a lecture

for the purposes of professional development.

The study employs a list of non-verbal heuristics for each learner as they are

captured by classroom video footage. The heuristics are selected on the basis

that they can be collected in a binary format and, when combined, are indicative

of the nature of any verbal behavior as well as the moment-to-moment postures

of each learner involved.

Using these heuristics, a model is composed for interpreting learners’ non-

verbal behavior with regards to interest in the lectures they attend. Data is col-

lected at one-second intervals to represent specific postures maintained by each

learner throughout a lecture. The high-interest moments are defined, verified

through third-party evaluation, and posture features are mined in correlation

with moments of perceived “high interest” in the lecture using a decision tree

algorithm. The data model proposed is based on the rules output from decision

tree analysis which are used to predict moments of “high interest” in unmined

portions of lectures.

The method proposed yields high-interest detection capabilities for three

different lectures expressed in precision rates, 81.3%、82.7%, and 75.6% respec-

tively and is intended for application to an automated form of learner state

detection.



Contents

Chapter 1 Introduction 1

Chapter 2 ICT in Education: A Literature Survey 3

2.1 Information Communication Technology in Higher Education . . 3

2.1.1 Improving Higher Education . . . . . . . . . . . . . . . . . . . 3

2.1.2 Current ICT Tools in Higher Education . . . . . . . . . . . 4

2.2 Difficulties of Improving Education with ICT . . . . . . . . . . . . . 5

2.2.1 Challenges of Improving Higher Education . . . . . . . . . 5

2.2.2 Availability of Technology . . . . . . . . . . . . . . . . . . . . . 8

2.3 Model Development Approach . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Importance of High Interest . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.3 Requirements Engineering . . . . . . . . . . . . . . . . . . . . . 11

2.3.4 Study Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Chapter 3 Learner Behavior Model for the Classroom 15

3.1 Elements of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Posture Features Collection . . . . . . . . . . . . . . . . . . . . 15

3.1.2 High Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.3 Undefined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Machine Learning Techniques . . . . . . . . . . . . . . . . . . 22

3.2.2 Model Objective and Makeup . . . . . . . . . . . . . . . . . . 23

3.2.3 Determining Group High Interest . . . . . . . . . . . . . . . . 23

3.2.4 Verifying Group High Interest . . . . . . . . . . . . . . . . . . 23

3.2.5 Data Mining Algorithm . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Testing the Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 4 Model Validation 27

4.1 Data Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.1 Filming Environment . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.2 Lecture Video Data . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2.1 Objectivity of High Interest . . . . . . . . . . . . . . . . . . . . 29

4.2.2 Attribute Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.3 Analyzing Decision Tree Rules . . . . . . . . . . . . . . . . . . 30

4.2.4 Testing for an Affective State . . . . . . . . . . . . . . . . . . . 31

4.2.5 Attribute Importance . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Discussion of Experimental Results . . . . . . . . . . . . . . . . . . . . 32

4.3.1 Application of Results . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.2 Generalizing the Model . . . . . . . . . . . . . . . . . . . . . . . 33

Chapter 5 Conclusions 38

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Acknowledgments 40

References 41

Chapter 1 Introduction

Efforts, in recent years, have been increased to apply information communi-

cation technology (henceforth referred to as “ICT”) to learning environments.

In most cases, this technology is used to add another facet to learning by ex-

ploiting the audiovisual stimuli and hands-on nature of the Internet and educa-

tional software[1]. The accessibility of educational media has not only improved

through the introduction of said technology, but so have the learning environ-

ments incorporating them.

One field to which ICT has contributed many applications is professional

development. In this field, technology can not only be used as a tool for en-

hancing the learner’s experience and modifying learning environments, but for

improving the quality of education. Approaches in professional development

seek to improve educational content and learners’ experiences, and technologi-

cal applications often take on the perspective of educational administrators, if

not the instructors themselves, and allow such improvements with greater ease.

This thesis proposes a method of using ICT for the purpose of furthering

professional, or faculty development in higher education. The method is cen-

tered on the utilization of already-existing video technology, used for filming

and uploading university classes, for analyzing the behavior of learners in a

classroom setting. The purview of this study is limited to analyzing learner

non-verbal behavior where a cluster of learners is analyzed for reactions to the

instructor and/or content, each learner’s actions are analyzed individually, and

then interpreted. Non-verbal behavior is broken down into features, combina-

tions, and patterns of the seated postures of each learner. The relationship

between this learner posture data and the occurrence of these perceived reac-

tions is examined and used as a basis for creating the learner behavior model

proposed in this study.

The active state of learners that is sought in this study is that of “high

interest”. The learners featured in this thesis are observed in a group during

several classes of one course (all taught by the same instructor). Moments

where the group of learners appeared highly interested in what was being taught

1

and/or presented are specifically recorded, verfied via third-party evaluation

based on what is deemed “high interest” criteria, and the behavioral metadata

is modeled to find the relevance of posture features to those moments. The

behavioral model is tested by assessing how well it can detect “high interest”

among learners in a class separate from those used for data gathering.

2

Chapter 2 ICT in Education: A Literature

Survey

2.1 Information Communication Technology in Higher

Education

2.1.1 Improving Higher Education

ICT has enhanced modern education as well as allowed it to progress into

new directions. One of the new ways in which education is being promoted

is through globalization, facilitated more and more as information and commu-

nication technologies evolve. However, the concept of globalization, used in an

educational context and among pedagogical scholars, refers more to the expan-

sion and thinning of educational objectives rather than a growing availability

of educational resources[2].

The objectives for “globalized” educational institutions include ones for in-

vestment or financial growth, broadened curriculum, and increased networking

opportunities for the purpose of maintaining resource and materials costs. For

higher education, most of the objectives associated with globalization, aside

from increasing awareness of international competition, are ones that have a

limited or negative effect on educational quality. Higher education administra-

tors are now faced with the challenge of maintaining a realistic and high stan-

dard for education at their respective institutions to counteract the continued

effects of globalization. Educational institutions continue to have a responsi-

bility to provide education of the highest possible quality. Higher educational

institutions especially have an obligatory role as cultivators of human resources

in administrative, managerial, and even political positions. As economic cli-

mates change, the demand for raising the standards of higher education is well

documented in government legislation and the close association between edu-

cation and economic and/or governmental policies is evident. The emphasis on

this type of quality maintenance in higher education has brought the field of

educational quality assurance to the forefront of education administrative and

pedagological fields. In quality assurance, the accountability of educational in-

stitutions to cultivate adults who are well prepared to contribute to society is

3

at issue as a factor of motivation, as well as competition, be it from institutions

with disciplines and/or locations that are close in proximity or statistics that

reflect an overall standard[3].

2.1.2 Current ICT Tools in Higher Education

Synchronous E-Learning

Ongoing studies of learner behavior include those from distance learning lec-

tures. Video data remains from several international distance courses between

Kyoto University, National Taiwan University, and University of California,

Los Angeles. A number have been conducted to use in-class learner behavior

to gauge the difficulty of lecture content or materials [4].

In the case of studies of Taiwan and Kyoto Universities’ distance lectures,

learner affective states during the actual lectures is not sought. However, learner

attitudes with regards to the courses are evaluated using ICT tools such as

learner-written blogs[5]. Although learner affective states are sought in the

courses between University of California, Los Angeles and Kyoto University,

they are done so exclusively with learner behavior[6].

The original objective being to find additional uses for video footage of such

learner galleries, this study begins as one exploring the correlation between the

number and frequency of verbal interactions and the level of learner interest in

lecture content. It focused specifically on the verbal behavior of learners at-

tending the lectures of a distance course between University of California, Los

Angeles and Kyoto University. However, the contrast between the verbal be-

havior of learners from each of the participant universities proved too great[7].

More specifically, verbal actions and interactions in the Kyoto University class-

room were much fewer than those originating in University of California, Los

Angeles .

As such, the above studies, although formulated with different objectives,

integrate the resources for finding learner affective states. It is also noteworthy

that footage of the instructors themselves is available for synchronized viewing

with the student gallery video, rendering it easier to relate back to the lecture

content, therefore making effective for professional development.

Asynchronous E-learning

4

With the advent of distance learning and video classroom archiving, learning

institutions have turned to recording videos of classes for purposes ranging

from making them available to the public (e.g. open courseware, iTunesU c©)

to storing them simply for reference. However, most of these recordings are

often focused exclusively on the instructor. Using the same technology, some

institutions can record videos of learner galleries which are used for automatic

attendance-taking and following learner verbal interactions (as in the case of

distance lectures)[8][9].

2.2 Difficulties of Improving Education with ICT

2.2.1 Challenges of Improving Higher Education

Although the focus of this study is notably situated on higher education in a

specific context, i.e. an undergraduate course of Kyoto University lectures, it is

important to conceptualize higher education in a much broader context for the

purposes of comprehending the need for reassessing the merits of improving it

on a fundamental level. Therefore, this thesis defines higher education as what

intended levels of ability and/or knowledge an educational body has in order to

graduate its pupils.

Unused Lecture Footage

Videos of lectures, although used for very little other than storage of instructor-

focused videos[10] for the purposes of recording lecture content and facilitating

distance learning, have the potential for providing important information for

instructors and educational institutions. The review of such classroom videos

empowers the viewer with a potential insight different from that of any vantage

point in the classroom. This is especially true when compared to observing a

class of learners while it is being taught. Watching a class of learners via video

gives the viewer an opportunity to focus on a lecture’s moment- to-moment

events, which can be viewed as many times as seen necessary. However, this

video data continues to go unused in spite of these potential benefits.

Assessment of Learners’ Comprehension

One way higher education separates itself from other fields of education is

through its objective to not only base learner evaluation on an array of edu-

5

cational criteria, which today is usually set and met through the assignment

of tasks or examinations, but also on resulting behaviors and/or attitudes. Al-

though several approaches are employed throughout academic institutions to

evaluate characteristics of learners that may not be apparent in assignments

or test scores, a conformable approach based on a widely recognized educa-

tional theory evades them[11]. Information and communication technology has

gained acknowledgement in the field of education for its ability to create means

for evaluating behavior and/or attitude prior to graduation toward the goal of

establishing a theoretical framework.

Now that those in academic communities have the resources to utilize the

latest ICT, they are making efforts to explore ways to use it to improve the

content and styles of their lectures as well as the evaluation of their effectiveness

as instructors[12]. Along with adding to the challenge of maintaining pace with

the ever-rising standards of teaching in higher education, the rising accessibility

of ICT has richly contributed to the study of the improvement and optimization

of teaching methods, knowns as the field of professional development.

Many different objectives and problems are addressed in the field of profes-

sional development, as it pertains to higher education. For example, long-term

objectives such as molding a class of learners to pass a standardized exam with

a minimum average score are conceived to motivate instructors as well as short-

term goals such as gauging learners’ in-class attitudes and how they are relevant

to what is being taught. Professional development can be individually initiated

as well as facilitated by faculty seminars and the like. The ultimate objective

of faculty development, in a more general context, is to examine ways to under-

stand how much a learner has learned, and ICT accomodates instructors search

for these outcomes[13].

With this newly distributed technology, namely classroom archiving, the

question of what approaches to professional development should be used re-

mains. There has been a recent shift among professional development scholars

from instructor-focused to learner-focused research where learner behavior in

reaction to what is being taught is gaining more attention than the evalua-

tion of instructors themselves. However, information on how learners react to

6

instructors and content is very limited, and obtaining it through monitoring

remains a difficult task.

Viewing Learner Reactions in the Classroom

While instructors have clear methods and criteria for evaluating learners’

levels of comprehension of course content, effective methods for evaluating their

own lecture, in other words, assessing how well lecture content reaches learners,

remain scarce. As mentioned in Section 2.1.2, ICT has provided new means

for assessing how lecture content and teaching styles affect learners. However,

proposing the positive application of these means is based on the pretense that

instructors have access to the resources and have the free time to visually review

and analyze videos of themselves and/or attending learners’ reactions[14].

Conversely, when lectures are in session, instructors are able to perceive

how many learners in the classroom can follow what is being taught and/or

how much of a lecture is generally reaching their learners. Although this is

ideal for instructor professional development, there are limits to how much an

instructor can observe in the act of teaching in terms of span of sight and pre-

occupation with other tasks[15]. In addition, instructors are unable to create or

maintain any record of learners’ reactions as they occur in the lecture because

they are preoccupied with teaching responsibilities. On a more fundamental

level, this is mostly due to logistical problems, such as time constraints, which

can be overcome using automated in-class evaluation of learner reactions just

as Scantron technology has made automatic examination result evaluation and

analysis possible. However, the challenge of not only identifying learners’ affec-

tive signals (with relation to the lecture), but also expressing them in the form

of numerical data in preparation for automation must be met in order to realize

this concept.

Interpreting Learner Reactions

There are constant and cross-disciplinary discrepancies within the defini-

tion of the states of subjects (such as affective states, psychological states).

Although, fundamentally, there is no sound way to positively identify a single

state or expression with confidence, observational methods can lead to more

accurate and verifiable hypotheses, and even diagnoses[16]. These methods are

7

often employed by instructors inside and outside of the classroom.

Even when instructors have access to video footage of lectures where their

learners can be clearly observed, creating and verifying the criteria for inter-

preting their reactions remain, in theory, difficult to execute, therefore making

analysis and evaluation nearly impossible[17]. This difficulty is compounded

by that of identifying characterizations of learner reactions with their emotions

or attitudes toward the class content or instructor. Originally in professional

development, affective data for both inside- and outside-class states was sought

using questionnaires. The results from this method, however, have always been

limited to the preliminary step of analysis with learner performance data, such

as marks, rather than the affective data. Visually confirming the correspon-

dences of learners’ reactions inside and outside of class with an output, such as

interest level, has been a challenge for pedagogical scholars[18].

Finding Relevant Footage of Learners

Especially with video footage featuring more than one subject, it is difficult

to assess what sequences of video footage are relevant to determining affective

states[19]. In a lecture setting, automatically detecting sequences where cer-

tain affective states are featured can provide support for instructors wanting to

improve their lectures, but do not have the time to review such footage. Exclu-

sively extracting such sequences has the potential for saving instructors’ time

when reviewing their own lectures.

2.2.2 Availability of Technology

Access to ICT in higher education depends greatly on the extent to which

technology is integrated into the surrounding society. This can be reflected

in how technology is used in government and infrastructure, as ICT tends to

be used and measured in terms of how much it is used for general welfare

using qualitative and quantitative data[20]. Common factors that are used

for collecting said data are how often technology of this sort is accessed, how

affordable it is, and the availibility of content that has been generated locally

as well as qualitative factors such as the extent that technology is incorporated

into civilians’ lives, the perception of how secure and/or trustworthy it is, and

government involvement in its development.

8

One indication of how prevelant information and communication is in higher

education, as a specific realm, is in the national and/or regional presence and

commonality of distance education programs. Along with the above factors, the

possibility that the purposes and benefits of distance education, as well as the

technological makeup of it, can be misunderstood accounts for a clear lack of

demand for the necessary resources.

2.3 Model Development Approach

2.3.1 Importance of High Interest

The concept of “high interest” is pursued by this study an affective state that is

both achievable by students and desired by instructors in a classroom setting.

Essential learner affective states, like high interest, are sought in professional

development as an indication of the presence of engaging lecture content and/ or

teaching styles. Instructors, therefore, have the potential of benefitting from the

identification of high interest. These benefits, and the ability to recognize high

interest, broadly refered to as “interest” in previous works, is widely referenced

and discussed in professional development studies as well as other disciplines.

2.3.2 Previous Works

Studies done to include non-verbal behavior as it occurs specifically within a

classroom setting, let alone studies that rely solely on video data, have gained

little attention from scholars in both technological and social fields. Therefore,

authors of studies of non-verbal behavior in a learning environment, in general,

are sought for supporting this study’s methods for observation and analysis.

Although one such study does not rely on video data, it is noteworthy in its

methods for labeling postures and its objective[21]. Mota and Picard’s study

aims to create an electronic learning companion (featured in the form of a

computer avatar) that can react and interact according to users’ emotional

states, and data collection takes place for one subject at a time. This enables

the authors to utilize a more costly technology for determining postures of each

subject (pressure sensors). Although the above objective and technology are

not applicable to the study for this thesis, its methods for matching postures

and posture sequences to learner interest level serve as a foundation for labeling

9

Table 1: Affective state classification rates for posture features[21]

Percentage of Static Posture Classification

Leaning Forward 96.68%

Leaning Forward Left 80.02%

Leaning Forward Right 76.65%

Sitting Upright 93.21%

Leaning Back 90.91%

Leaning Back Left 79.86%

Leaning Back Right 89.43%

Sitting on the Edge of Seat 91.91%

Slumping Back 90.12%

Average 87.64%

learner postures finding correlations between them and interest level (Table 1).

Mota uses a set of independent Hidden Markov Models (HMM) which are

each linked to a sequence of postures. The probability that the actual posture

sequence was produced by an HMM that represents a certain affective state is

computed. Therefore, each posture sequence is classified with HMM producing

the highest probability.

One other such study is, conversely, founded on video data[22]. The purpose

of the study, conducted by De Silva and Bianchi-Berthouze, is geared toward

affective computing rather than computer-aided learning. Human postures are

duplicated using a vast set of three-dimensional computer-generated avatars

(that are not seated). The study emphasizes the advantage of collecting pos-

ture features rather than gestures and aims to recognize four emotions at a

high rate based solely on the positioning of each avatar. The accuracy with

which each avatar expresses a single emotion is determined in the first phase

by a group of actors whose depictions of the emotions are collected via motion

capture, and then displayed to a set of third-parties. The third-party evalua-

tors not only assign a label to each position, but also an intensity (on a 5-point

scale) showing how well the pose represents the associated emotion. Although

10

the posture labels do not apply to the study of this thesis, the method for

collecting postures and verifying them (as well as the method used in Mota’s

study) is employed in the first proposed model to identify learner postures in

the classroom, and correlate them to a single category, “highly interested”, with

a five-point intensity scale.

2.3.3 Requirements Engineering

This study proposes building a model where the relevant factors are initially

unknown, and therefore, several hypotheses are tested in order to arrive at

a rate of correlation. The process by which these correlations can be gener-

ated is known as requirements engineering[23]. The methods that are used for

testing factors, which are, in this case, learner posture features, are known as

ethnomethodological.

In requirements engineering, the development and validation of methods for

how requirements should be elicited, represented and analyzed are adopted for

the purposes of identifying the requirements of a given system. Requirements

engineering then probes into the further transformation of requirements into the

specific details needed for design, and finally, implementation. For the purposes

of this study, the definition of requirements engineering is considered broadly as

a set of activites, structured, or “engineered”, for the creation and maintenance

of model requirements, expressable in document form. As such, the study sets

out to follow the theoretical framework of eliciting, analyzing, validating, and

therefore, documenting the requirements for the proposed model.

2.3.4 Study Objective

In the first step, elicitation, the video data of learners’ in-class behaviors is

mined for the ethnomethodological correlation of hypothetical requirements

with an affective state. In the analytical stage, step two, the hypothetical re-

quirements, in the form of low-level data, is analyzed using means ranging from

simple observation to automated information processing, where data amounts

reach levels that are impractical for visual observation. The requirements are

viewed as potential attributes for the model and are examined for positive and

negative correlations as well as irrelevant and contradictory data. The third

step contains the task of validating the model by testing the above attributes

11

on separate sets of data in an attempt to simulate possible future applications.

Finally, in the fourth step, the relevance of the set of requirements used for val-

idating the ethnomethodological model is documented in order to display the

verfication of requirements hypotheses as well as discuss unexpected outcomes

and provide a basis for discussion of future applications.

The requirements readied for elicitation have the potential to describe the

functions and tendencies of the proposed model whereas the settings of experi-

menting with the model reflect how a user implements the model. Said require-

ments and settings are able to be displayed and discussed in language ranging

from informal to semiformal and natural, as well as structured and logical, which

is consistent with more formally descriptive language. Along with the capabili-

ties to semantically express a model, requirements engineering has the distinct

ability to include abstracted processes based on the results for processing. As

the consistency of data is always subject to change, so should the requirements

of the model. Particularly for ethnomethodological data sets where consistency

is lacking or the degree of which is unknown, requirements engineering proves

an appropriate method determining attributes for a given model by refining the

relationships between the ever changing requirements which is applied to model

design.

Although, in the best-case scenario for this study, a generalized model is

ideal, where the same requirements can be expected for all settings and sub-

jects, it is unrealistic. Generalization of a model is fundamentally reflected

in the level or amount of abstract processes within its construction. A model

with these conditions that also maintains a number of highly specific processes,

known as a specilization model, is only applicable to data where these specific

processes meet a low level of abstraction. When executing these processes using

requirements-elicitable software, algorithms for implementation are to be care-

fully considered and selected that are appropriate to the processing specificity

and abstraction. Therefore, the model proposed for a given set of data must

be regarded as appropriate for it in logic and concept prior to consideration of

validation techniques, which precedes implementation.

A data process’s promptness for immediate use and level of abstraction can

12

both be obtained through requirements engineering. Due to its adaptable na-

ture, a generalized model is, although unrealistic in this context, is an ideal

objective for requirements engineering not only in the interests of conserving

time and resources via its reusability, but because it can be widely applied re-

gardless of settings and factors not included in the data. A common problem

with quantitatively and statistically analyzing subjects for scientific research is

that any two sets of subjects are not likely to be alike and abstraction is an

inevitable factor. In addition, even in the case that there are common charac-

teristics, there is no guarantee that they will be apparent in the available data

(as mentioned in Section 2.1.2). Therefore, a method for collecting and testing

data for factors that are relevant to a certain context is needed to compensate

for outside influences that are excluded from the data (such as experimental

environments, backgrounds of subjects, etc.).

It is imperative that a requirement is established along with the properties

of the data model that are hypothesized to implicate it. The levels of data that

represent such properties can range from high to low and can be considered in

conjunction with other types of data. When building the model, it is, therefore,

possible to consider other aspects of the requirements, such as those represented

by time, reliability, and the like.

Following the identification stage, the model is to be designed considering

the properties’ relation to the requirement. The fact that these relationships

are subject to change throughout the course of implementing the model must

be considered during the design stage as well as the high probability that the

data used to build the model will not be static.

In the case of learning environments, the data must focus on the interactive

aspects of the space [24]. Interactions can be defined and classified according

to the requirements of the model. Likewise with all ethnomethodologies, the

nature of the data used for constructing a model will not likely remain constant

as exterior factors change nor static in its relationships.

As this study poses no exception to the above restrictions and does not in-

tend to create a “universal” model for detecting “high interest” among learners,

it employs an ethnomethodological approach where the requirements are sought

13

by testing posture features on moments of “high interest” in order to predict

and then detect when these moments are occurring within the timespan of a

specific lecture course. The elements and design of the model are described

in Chapter 3, and an evaluation of how the model tested on a Kyoto Univer-

sity undergraduate class of learners is detailed in Chapter 4 and discussed in

Chapter 5.

14

Chapter 3 Learner Behavior Model for the Class-

room

3.1 Elements of Model

3.1.1 Posture Features Collection

The approach used for collecting learner posture features is designed to enable

the expression of any visually confirmable, seated posture as even from a seated

position, affective states can be expressed and apparent through posture[25].

It is especially helpful that affective information can still be understood from

non-verbal behavior in this position because it is one frequently observed for

learners regardless of setting. For this reason, the characteristics that make up

learner postures in this study involve those observable above the waist.

As the model proposed in this study is encouraged for eventual use with

computer vision, postures must consist of characteristics that are detailed, yet

detectable using computer vision technology within a 3D space containing nu-

merous human subjects and objects[26], such as a classroom. The sheerness of

these characteristics is determined by the threshold of where each charcteristic

ceases to be recognizable using computer vision. Meanwhile, each characteristic

of a posture should be capable of, when combined with other characteristics,

amounting to a thorough description of a single posture.

Just as postures, themselves, will be composed of these smaller posture

characteristics, posture features are determined by the single characteristics

as well as the characteristics working simultaneously to make up whole body

postures as they relate to an affective state. To support as wide a range of

human subjects’ motions as possible, the body is categorized into three parts:

the torso, the hand/arms, and head. For each of these categories, a number

of heuristics is assigned to describe possible positions or actions: five for the

direction of the torso, six for hand and arm placement or activity, and four for

head positioning .

With the objective of tracking human body postures and poses, the field

of computer vision often considers the limitations of movement as a factor for

classifying and estimating body positioning. Several factors contribute to the

15

categorization of postures that commonly occur within a certain space or cir-

cumstance and those that are exceptional[27]. In the context of a classroom,

it is important to consider the confines of the space that learners have and the

space that is available for recording video data. This study considers these con-

fines as they relate to the space used in the experimentation phase in order to

classify possible postures, and a list of common posture features is made based

on video observation within this setting.

Torso

Torso angle for each learner is determined by its angle in relation to the seat

being occupied. “Leaning left” and “leaning right” are each present when the

perpendicularity of the learner’s body’s axis with the seat bottom is broken

by a substantial angle. “Leaning forward” and “leaning back” are likewise

determined by the angle of the learner’s back with his/her seatback, when it

exceeds parallelism substantially. When neither of the above criteria is in effect,

the learner is deemed “sitting straight up”(Figure 1).

Hands/arms

“hands on face” and “touching head” are distinguished using the hair lines on

each side of the learner’s head. When the hand comes in contact with any

part of the head beyond the hair line, he/she is deemed “touching head”, and

everything on the facial side of the hair line, including eye glasses, is marked as

“hands on face” when touched. “Writing” is confirmed when a writing utencil in

a learner’s hand is actually seen touching a writing material on the desk. When

either hand is seen inactive and on the top of the desk, “hand(s) rested (on

desk)” is marked regardless of the position and/or activity of the other hand.

Finally, a hand not seen doing any of the above is deemed “hand(s) rested (off

desk)”, and can occur simultaneously with any of the above labels (Figure 2).

Head

“head up” and “head down” are determined by angle of the facial plane in rela-

tion to the top of the desk. When the learner’s head comes within a substantial

angle of parallelism or less, ”head down” is chosen, whereas “head up” pertains

to all greater angles. “Head tilted right” and “Head tilted left” are determined

by the angle of the axis of the learner’s head and his/her shoulders. When it ex-

16

Sitting Straight Up

Leaning Forward

Leaning Back

Leaning Right

Leaning Left

Figure 1: Diagrams of torso positions used for rendering posture features (as

viewed from the front and side)

ceeds a substantial angle to the right (according the the learner’s perspective),

“Head tilted right” is recorded, and “Head tilted left” when the same criteria

on his/her left side is satisfied (Figure 3).

Data Coding and Formats for Mining

Posture characteristic tagging is done using a spreadsheet with exclusively

binary data (Figure 4). After posture tags that are found insignificant are

eliminated, 15 posture tags remain and are noted for each learner over a given

period of time of a lecture.

When the interdependent data is eliminated, the data format used for input

in the analysis phase changes to a 9-point system based on the original 15

posture tags (Table 2) .

In the torso category, features are represented nominally as follows: S =

Sitting up Straight, F = Leaning Forward, B = Leaning Backward, R = Leaning

Right, and L = Leaning Left. For head raising, “U” denotes the head is up,

17

Hand(s) Rested

Writing Elbow(s) Rested

Hand(s) on Face Touching Head

(On Desk) (Off Desk)Hand(s) Rested

Figure 2: Diagram of six hand/arm position types used for rendering posture

features

Table 2: Four posture categories detailing nine posture features

Torso Direction Hand/arms Head Raising Head Tilting

S, F, B, R, or L Hand(s) on Face Down or Up Right or Left

Touching Head

Hand(s) Rested (off)

Hand(s) Rested (on Desk)

Elbow(s) Rested (on Desk)

Writing

and “D” is for down. One input, “N”, is added to the head tilting attribute

which denotes that there is no tilting observed (where “R”, right, nor “L”, left

appears). The “N” attribute does not apply to head raising as the head is only

considered to be up or down.

Interdependent and Independent Posture Characteristics

18

Head Tilted Right

Head Tilted Left

Head Up

Head Down

Figure 3: Diagrams of head positions used for rendering posture features (as

viewed from the front and side)

The data used in this study is collected regardless of the dependence of

posture characteristics on other characteristics. However, it is imperative that

at the stage where the correlation between the postures tagged in the collection

phases and high interest is analyzed and mined, this data is input such that

interdependent factors (posture characteristics) do not coexist.

Through observation of learners in a classroom setting, the basic planes and

angles at which the above posture tags are defined are examined (Figure 5).

Once one of these posture characteristics is discovered it is noted using binary

coding with “1” representing the presence of such characteristics, and “0”, an

absence thereof.

These nine heuristics, when collected in combination, offer a precise means of

collecting posture features for each learner in every frame of the video and pre-

senting them as attributes for the proposed model. The postures are collected

and expressed in linear binary code for each time interval .

19

DateTimeSeat

Posture TagsHigh Interest

Figure 4: Example of input vectors and the corresponding (expected) affective

state tags

Figure 5: Torso and head angles in lecture video footage

As the nature of these posture characteristics are based on individual video

observation and analysis, it is the burden of this study to prove that each of

these characteristics have a significant frequency of occurrence in the class-

room(Figures 6-8). Posture characteristics may or may not appear based on

numerous factors including learner individuality and classroom environment.

However, for the purposes of this study, postures that do not appear within

this realm of significance are eliminated in the analysis stage.

20

Figure 6: Distribution of posture tags for October 22nd lecture

Figure 7: Distribution of posture tags for November 12th lecture

3.1.2 High Interest

The category of learner state sought by this study is “high interest”, which is

defined, in the context of a lecture taught in a classroom, as manifesting itself

in learners as the appearance or the impression of listening especially intently

to what is being taught or presented by the instructor(s). This entails, but is

not limited to the following reactive behaviors: taking notes, making facial ex-

21

Figure 8: Distribution of posture tags for December 3rd lecture

pressions with sight focused on the instructor(s) or materials display, including

nodding, smiling, laughter, and /or confusion. This is applied to clusters of

learners rather than individuals and these criteria are defined and clarified to

those participants in the third-party evaluation stage of the experiment for this

study.

3.1.3 Undefined

The undefined learner state referred to in this study indicates a learner state

that either does not fall under the above written criteria for “high interest”, or

is, for circumstantial reasons, unable to be identified or “definied”.

3.2 Proposed Model

3.2.1 Machine Learning Techniques

The correlation between posture feature attributes (and any combination thereof)

is found through machine learning, where instances of data for each learner are

used as input for training a system, and based on the trends learned in the

training phase, the system predicts outcomes. With the nature of the desired

outcome for this study, either high interest or undefined, being that of a classi-

fication problem, decision tree-based learning is employed to construct the data

model necessary for predicting high interest. The decision tree algorithm is

22

ideal for testing data by using precise probabilities, assigned as attribute val-

ues, based on how often a single posture attribute or combination of posture

attributes is correlated with high interest[28].

3.2.2 Model Objective and Makeup

As a study employing requirements engineering, the objective of the data model

proposed in this study is summarized in two parts: a.) designate an appropri-

ate methodology for establishing and learning the correlation between posture

features and high intereste, and b.) to predict when moments of high interest

are occurring in a lecture. This methodology validated in this study is intended

for application to computer vision techniques, and therefore, must rely on forms

of data appropriate for eventually detecting affective states automatically. .

In order to achieve this, there must be a clear definition of the relationship

between moments of high interest and the nine posture attributes by render-

ing a correlation coefficient. Initially, this is explored through the collection

and mining of both high-interest scenes of lectures as well as the surrounding

undefined scenes.

3.2.3 Determining Group High Interest

A group of learners is observed over a span of time and high-interest labels

are initially assigned second-by-second to represent where the group appears to

display high interest. The video screens used to shoot the video gallery, divided

into two part, are alligned to replicate the learners’ seat positions as closely to

how they appeared in the lecture as possible. This positioning (of the learners’

seats and the video screens used for observation) is key to establishing the

overall atmosphere as a group of learners as well as the technique for automatic

detection when applying this study to computer vision [29]. Learners are then

individually observed and high-interest labels are inserted into each learners’

set of posture data

3.2.4 Verifying Group High Interest

Third parties are employed to provide observations and data detached from

the objectives of this study. However, third parties are asked to label high

interest based on the criteria discussed in Sections 3.1.2-3.1.3. Third-party

high-interest evaluators are also asked to assign “high interest” or “undefined”

23

Table 3: Total high-interest instances verfified by third parties for three lectures

Date 10/22/2008 11/12/2008 12/03/2008

Total No. of Learners 6 6 6

Total No. of Matched Instances 80 210 500

Avg. no. of Learners per Sequence 3.542 3.689 3.09

labels while observing the group of students for 10-second intervals. Second-

by-second annotation was not possible due to time constraints on behalf of the

third-party evaluators.

The high-interest labels are then paired with the labels prepared in the

initial group high-interest observation stage. Finally, the number of students

that necessitate a “group” is determined for each of the lectures observed in

this study by averaging the number of learners for each high-interest sequence

that matches between the initial stage and the third-party labels. The overall

average number of learners in these matched sequences is 3.4 learners (Table 3).

As each lecture contains the same number of learners observed for this study,

it is deduced that for this set of lectures, three or more learners are needed to

establish a “group” displaying “high interest”. In keeping with this, although

the individual high-interest labels mentioned in Section 3.2.3 are employed in

this study, they are summed for each second of the lecture recorded in the

data. Therefore, high interest can be expressed numerically from 0-6 for each

instance of the data set. The average number of learners per each of these

instances comes to 57% (3.44/6 learners) for all three lectures. Therefore, a

threshold is then set to three learners so that every instance with less than

three learners is recorded as “undefinied”, and those with three or greater are

labeled “high interest” (Figure 9).

3.2.5 Data Mining Algorithm

As the crux of the methodology of this study, choosing an appropriate data

mining algorithm is a key step in engineering the requirements for detecting

high interest. During the course of research done to support this study, the

collected data is tested on a number of algorithms (including ones that are

24

Figure 9: Graph of High Interest for Six Learners

not decision tree-based) using the data mining tool, Weka c©. The algorithm

implemented in the model validation phase of this study outperformed any such

algorithms.

The purpose of the data model employed in this study is to automatically

predict when moments of high interest are occurring in a lecture based on

individual postures. This is achieved through a machine learning algorithm

known as J4.8, which is the latest C4.5 decision tree (8th revision) available to

the public[30].

1) An attribute value is assigned to a root node, and a corresponding branch is

created.

2) Based on the data set, each branch is then divided based on probability, and

the two products are treated as root nodes.

3) The above two processes are repeated allowing the trees branches to grow.

4) When the leaf of a branch yields an instance that completely corresponds to

one class, the branch stops growing.

As a Java-implemented algorithm, the sets of data are able to be catego-

rized according to object-oriented programming. This categorization of the two

possible outcomes, “high interest” and “undefined”, is notated in binary code.

The classification of the J4.8 algorithm is similar to the Id3 decision tree ex-

cept for the pruning capabilities which minimize classifier errors by eliminating

attributes that result in sets of contradictory data[31].

25

3.3 Testing the Methodology

Once data from numerous scenes of varying lectures has been collected, ma-

chine learning is applied to determine weights of posture features and activity

levels as they correlate to high-interest and undefined scenes respectively. The

posture features are used as attributes and tested on data from separate scenes

of the lectures involving the same instructor, content, and classroom to min-

imize surrounding influences. The ability of the model to accurately predict

high interest is assessed using the percentage of correct classification resulting

from the test.

26

Chapter 4 Model Validation

4.1 Data Gathering

4.1.1 Filming Environment

In keeping with Kyoto University’s ethical standards, permission is sought to

film classes where the learners participating in the study are made aware that

they are being filmed and issued consent forms roughly detailing the period

and purpose of filming. The data from two 90-minute lectures spanning one

semester are used for this study, all subjects of which attend lectures within the

same lecture hall layout (Figure10).

Instructor’sDesk

Electronic WhiteboardElectronic Whiteboard

Learners’ Seat

Screen

Target Learners

Figure 10: Layout of lecture hall used for data collection

Depending on the time of the day the lecture takes place, learner attendance

varies from week to week, and often changes during the span of one lecture

creating many distractions toward the beginning of each class and complicating

the makeup of learner groups that are being observed. Therefore, observations

are conducted for each class beginning 15 minutes following the instructors’ first

utterances and ending 10 minutes prior to dismissal.

4.1.2 Lecture Video Data

Lectures are viewed at a speed of either two or three times the actual time of the

lecture (Figure 11). Data for this study is recorded at one-second intervals for

observational purposes as bodily gestures can be recognized by humans within

one second[32]. Instances of “high interest” among the entire body of learners

27

captured by the video are noted by marking the start time of a period where

learners are seen showing an apparent “high interest” than displayed in the

majority of the lecture. These “high interest” scenes are isolated, and learners

sitting in the front three rows of the classroom are selected for posture data

collection (Figure 12). Video footage of instructors teaching these courses is

also available and used (for reference to lecture content).

Figure 11: Four learners captured by one camera at one-second intervals

Figure 12: Subjects for data collection as captured by two classroom cameras

Prior to collecting or comparing data between learners, whether the learners

28

are showing “high interest” or not is generally assessed based on the criteria

specified in Section 3.1.2 viewing the lecture video footage (Figure 13). Each

of these “high interest” moments are labeled thusly, and all other sequences

are labeled as “undefined”. Posture features for all learners are collected at

one-second intervals and aligned with verified “high interest” and “undefined”

labels. This is continued for three separate classes in October, November, and

December respectively for six learners in each class.

(i) ”Sample scene of “high interest”” (ii) ”Sample “undefined” scene”

Figure 13: Still photos of video data used for observational assessment of “high

interest”

It is from these scenes, where “high interest” is hypothesized to be present,

that posture data is collected manually and the correlation between posture

tags and high interest is examined.

4.2 Experiments

4.2.1 Objectivity of High Interest

In order to verify the presence of high interest in the isolated scenes, third-party

evaluation of “high interest” is performed for the entirety of the classes used in

the experiment phase. The participants are asked to label 10-second segments of

the lecture video as “high interest” or “undefined” based on their observations of

the learners occupying the first three rows and the criteria described in Section

3.1.3. The third-party 10-second-interval data is then evaluated with the one-

second “high interest” data created in the first stage of the experiment which

yields a mean match of 87.6% (Table 4).

29

Table 4: Rates for matching initial and third-party evaluations of “high interest”

Matching rates for each lecture

10/22/2008 11/12/2008

6 Students 6 Students

89.02% Match 86.18% Match

Total Time

3’10” 1’00”

4.2.2 Attribute Values

In the training stage of the experiment, three nominal attributes and six nu-

merical attributes, all based on low-level posture data, are used to construct the

model for three lectures, dating October 22nd, November 12th, and December

3rd respectively. The lectures are all on the same topic from the same course

curriculum and taught by the same instructor.

The data for six learners from each lecture is divided in half, with the first

half used as the training set learned using the J4.8 decision tree on Weka c©.

Prior to testing this set of data on the remaining half (henceforth referred to as

the test set), decision tree rules (Figures 14-16) are renderred to assign values

to each of the attributes which allows for more precise classification of high

interest moments.

4.2.3 Analyzing Decision Tree Rules

As one of the features of the J4.8 decision tree, a C4.5 decision tree, a pruning

algorithm counteracts overfitting by pruning inactive attributes (dealt with in

the initial posture feature analysis of the data collection stage) as well as con-

tradictory data within one set. Machine learning using the J4.8 decision tree

allows for the creation of a distinctive model for the data used from each lecture.

The size and depth of each tree varies with each lecture and its innumerable

factors. This is evident in the size, depth, and number of leaves (Table 5) as

well as the posture features found in each lectures’ set of rules (Figures 14-16).

All three lectures yield an average of 83.9% correct classification in training the

data.

30

Table 5: Decision tree analysis and rates of classification for the training phase

Machine Learning Results

Lecture Date October 22 November 12 December 3

Tree Depth 7 7 6

Number of Leaves 31 37 41

Size of Tree 48 59 68

Correct Classification 88.4% 81.8% 81.4%

Table 6: Precision and recall rates for three separate lectures

October 22 November 12 December 3

Total Instances 1,140 1,668 3,540

Total Time of Lecture 6’20” 9’16” 39’20”

Total High Interest 498 1,274 3,006

Total High Interest Precision 81.3% 82.7% 75.6%

Total High Interest Recall 91.0% 96.2% 80.5%

The amount of relevance and contradiction of each of these posture features

is detailed in the set of rules generated in the training phase.

4.2.4 Testing for an Affective State

The test set, made up of the remaining half of data, is used to evaluate how

accurately the Weka c© can predict high interest for each lecture based on the

corresponding set of rules.

Precision: (No. of Correctly Classified High Interest Instances)(No. of Correctly Classified High Interest Instances)+(No. of Incorrectly Classified High Interest Instances)

Recall: (No. of Correctly Classified High Interest Instances)(No. of Correctly Classified High Interest Instances)+(No. of Incorrectly Classified Undefined Instances)

In the case of detecting “high interest” among learners, the precision rate for

high interest classification (Table 6) is used as opposed to the overall prediction

rate. As “undefined” scenes collected in this study are of undefined criteria and

relevance, the ability for overall detection of the test set is excluded.

Based on the model created using the J4.8 algorithm, a comprehensive set of

31

Table 7: Precision-rate comparison subtracting each posture feature once

October 22 November 12 December 3

9-attribute Precision 81.3% 82.7% 75.6%

- Torso 77.6% 81.0% 75.6%

- Hands on Face 77.6% 82.6% 74.6%

- Touching Head 80.7% 81.8% 75.5%

- Hands Rested off 85.5% 82.6% 74.1%

- Hands Rested on Desk 82.5% 82.6% 70.7%

- Elbows Rested 82.3% 84.1% 73.8%

- Writing 81.3% 82.1% 75.5%

- Head Raising 75.3% 83.1% 79.4%

- Head Tilting 75.8% 79.5% 74.7%

decision tree rules allows for precise prediction (an average correlation coefficient

of 79.8%) of where high interest occurs during a lecture based on learners’ low-

level posture data.

4.2.5 Attribute Importance

The posture tags in this study, based on the results of high-interest prediction

precision, are examined to verify that attributes are irreplaceable and will not

yield higher precision when eliminated. This verfication is established by com-

paring precision rates for the model after removing each attribute individually,

one at a time (Table 7).

It is concluded that a single attribute, although yielding higher precision

in single lectures, will not yield higher precision when eliminated for all three

classes.

4.3 Discussion of Experimental Results

4.3.1 Application of Results

The tree rules generated for each lecture (Figures 14-16) are concise and able to

be related back to the learners from whom affective states are being extracted.

This application to professional development can be carried out by analyzing

32

the characteristics of each tree as though they represent the features of the

group of learner subjects.

The most influential of all posture features appears at the top (the “root”)

of the three trees representing each lecture (“head tiled right”, “elbows rested”,

and “touching head”, respectively). Although the posture features directly

below the top of the tree are considered conditions that, in tandem, correlate

with the target affective state, as the most influential posture feature, it can be

noted in video footage of the group of learners in question, or during the actual

lecture from which posture features have been extracted.

4.3.2 Generalizing the Model

Generalization of the model proposed in this study is key to applying it to

professional development. This allows posture data taken for learners in one

class to be applied to the same learners in a separate class.

In a preliminary experiment, the posture features for six learners (sitting

nearest the camera) are tagged at one-second intervals for three lecture recorded

on October 22, November 12, and December 3, 2008. Thirteen segments of

group high interest are recorded for all three lectures, and data from the first

two lectures is tested on the third lecture using an Id3 decision tree. This

yielded a low correct classification rate (less than 60%) and the more data that

was added to the training set, the more the correct classification rate would

decrease.

In order to improve correct classification results and expand the versatility

of the proposed model, additional steps are needed to ensure not only high rates

for classification, but precision as well. However, for the purpose of automated

affective-state detection using low-level video data, this study supports the ar-

gument that it is possible to predict where affective states exist based on static

information.

Low classification rates in this scenario are normally a result of a common

data mining problem known as “overfitting” where contradictory data exist

and lower the effectiveness of the decision tree to predict an outcome correctly.

However, to prove the J4.8 decision tree is effective in dealing with overfitting,

10-fold cross validation is conducted on all three lectures’ data samples at the

33

Table 8: Cross Validation Results for Three Lectures Combined

Total No. of Instances Tested 12,696

No. of Folds 10

Precision Rate 74.8%

Recall Rate 87.8%

same time, which yielded a precision rate of 74.8% for nearly 12,700 instances

8.

This high precision rate is an indication that the cause of low classification

rates when testing one lecture’s data on another (even with overlapping learners)

is a lack of training samples rather than overfitting. It can also be gathered from

these results that, although many samples must be used for machine learning,

the size of each may cover less than 10 minutes of a 90-minute lecture. Therefore,

experimenting with more samples of shorter lecture times is suggested as a

future work of this study.

34

Figure 14: Decision tree rules for October 22nd lecture

35

Figure 15: Decision tree rules for November 12th lecture

36

Figure 16: Decision tree rules for December 3rd lecture

37

Chapter 5 Conclusions

5.1 Conclusions

This thesis has demonstrated a method for identifying when learners display a

high level of interest in the content presented in a lecture for the purposes of

professional development in higher education. The method is solely based on

video data as a means for applying ICT to professional development.

The study has employed a list of heuristics for each learner as they are

captured by classroom video footage. The heuristics were selected on the basis

that the data thereof can be collected in a binary format and, when combined,

are indicative of the nature of any of the moment-to-moment postures displayed

by each learner involved. The posture features used in this study are intended

to be applied eventually to computer vision techniques.

Using the heuristics mentioned above, a model was composed for interpret-

ing learners’ non-verbal behavior with regards to interest in the lectures they

attend. Data was collected at one-second intervals to describe specific pos-

tures maintained by each learner throughout a lecture at one-second intervals.

The high-interest moments are defined by the author, verified through third-

party evaluation, and each posture feature and combination thereof is mined

in correlation with moments of perceived “high interest” and contrasted with

“undefined” moments in the lecture. Static postures are collected from the pre-

cise span of time verified in the previous phase and the postures are assigned

a set of rules according to their correlation sought in separate segments of the

lecture, and then reassessed repeating the above steps.

The affective state of learners that was sought in this study is that of “high

interest”. The learners featured in this thesis are observed in a group during

several classes of one course (all taught by the same instructor). Moments

where the group of learners appeared highly interested in what was being taught

and/or presented are specifically recorded, verfied via third-party evaluation

based on what is deemed high-interest criteria, and the behavioral metadata is

modeled to find the relevance of posture features to those moments. The method

proposed yielded a precision rate (for detecting high interest) of approximately

38

80% and aims to be applied to an automated form of learner affective state

detection.

5.2 Future Works

In order to create a model for a course of lectures, this study concludes that

the number of lectures used as training samples, regardless of sample size, is

necessary for establishing generalization. Therefore, this study suggests that

instructors use posture features from a group of learners in 10-minute intervals

from 4-6 different lectures to test on the remaining time of said lectures, or

lectures from which data has not been collected. The relevant posture features

may vary from lecture to lecture, but are intended to remain consistent for the

group of target learners.

A possibility that has arisen during the experiments for generalizing the

proposed model is that generalization of lectures for one course of lectures is

achievable through the use of assigned seating. Although experimenting on this

hypothesis poses several problems with classroom layout, it can be executed

given the learner subjects’ cooperation.

Additionally, given that arranging classroom seats is not possible, the model

proposed in this study could be applied to finding learner group features based

on where groups are seated within the classroom. Results from experiments

dealing with this issue would be particularly interesting in contrast to those of

this study as it focused on a narrow section of the classroom which was close

in proximity to the instructor.

Eventually, the posture features found relevant by this study, are proposed

to be automatically tracked and recorded using computer vision. Analysis of

these postures, as they occur throughout a lecture, may also be automatized.

Moreover, this study is contingent on a model built with data collected in the

controlled environment detailed in Section 4.1. However, the possibility that

posture data reflects the environment from which it is gathered suggests that

generality is an issue, and separate models may have to be built according to

their environments.

39

Acknowledgments

I would like to thank, first and foremost, Professor Michihiko Minoh for the

opportunity to be a part of his laboratory and work alongside such an esteemed

group of young and talented researchers, as well as the time and attention he

graciously put into guiding this study and seeing it through to completion. I

also wish to thank Professor Koh Kakusho of Kwansei Gakuin University and

Professor Hajime Kita of Kyoto University for presiding on my midterm defense

committee in the fall of 2008.

I owe a debt of graditude to Associate Professor Masayuki Mukunoki for

overseeing this study and serving on the committee for the oral defense along

with Professor Minoh, Professor Kakusho, and Department of Intelligence Sci-

ence and Technology Dean, Professor Akihiro Yamamoto. Thanks also to all

Presence Group members, Assistant Professor Takuya Funatomi, Global COE

researcher, Mayumi Ueda, and Academic Center for Computing and Media

Studies researcher, Nimit Pattanasri for their unfailing support and constant

feedback.

Thanks to Professor Victor Kryssanov of Ritsumeikan University for believ-

ing in my potential and empowering me with the ambition to take on otherwise

impossible challenges.

Finally, I wish to thank Associate Professor Masayuki Murakami of Kyoto

University of Foreign Studies and Assistant Professor Tetsuo Shoji of Nara

University, without whom the completion of this study would not have been

possible, for their tireless efforts and devotion to the fruition of this study, their

patience and tolerance for working with someone from a research background

completely different from theirs, and their shared passion for the content of this

study.

40

References

[1] Abbot, C., ICT: Changing Education, Routledge, 2001

[2] Van Damme, D., “Trends and Models in International Quality Assurance

in Higher Education in Relation to Trade in Education”, Higher Education

Management and Policy, Vol. 14, No. 3, pgs. 93-136, 2002

[3] Boyle, P., Bowden, J. A., “Educational Quality Assurance in Universities:

An Enhanced Model”, Assessment Evaluation in Higher Education, Vol.

22, No. 2, pgs. 111-122, 1997

[4] Nakamura, K., Kakusho, K., Murakami, M., and Minoh, M., “Estimating

Learners’ Subjective Impressions of the Difficulty of Course Materials in

e-Learning Environments”, APRU 9th Distance Learning and the Internet

Conference, pgs. 199-206, 2008

[5] Lin, W. J., Liu, Y. L., Kakusho, K., Yueh, H. P., Murakami, M., Minoh,

M., “Blog as a Tool to Develop e-Learning Experience in an International

Distance Course”, Sixth International Conference on Advanced Learning

Technologies, 290-292, 2006

[6] Coffman, G. J., “Interaction in the Classroom and Filming Applications

Based on ‘College Culture’: A Socio-anthropological Approach”, Presented

at: 5th Annual Conference on Applications of Social Network Analysis

2008, 2008

[7] Murakami, M., Yagi, K., Kakusho, K., Minoh, M., “Evaluation of Distance

Learning Course Shared by UCLA and Kyoto University”, Proceedings of

2nd International Conference on Information Technology Based Higher

Education and Training, 2001

[8] Keller, C., Cernerud, L., “Students Perceptions of E-learning in University

Education”, Journal of Educational Media, pgs. 55-67, 2002

[9] Minoh, M., “Automatic Lecture Archiving System”, Proceedings of the

International Conference on Informatics Research for Development for

Knowledge Society Infrastructure 2004, pgs. 39-45, 2004

[10] Marutani, T., Nishiguchi, S., Kakusho, K., Minoh, M, “Making a Lec-

ture Content with Deictic Information about Indicated Objects in Lecture

41

Materials”, AEARU Workshop on Network Education, pgs. 70-75, 2005

[11] Carter, R., “A Taxonomy of Objectives for Professional Education”, Stud-

ies in Higher Education, Vol. 10, No. 2, pgs. 135-149, 1985

[12] Fulk, J., “Social Construction of Communication Technology”, Academy

of Management Journal, Vol. 36, No. 5, pgs. 921-950, 1993

[13] Mizogami, S., “Five Types of Open Class as Faculty Development Activity

: From Study of 13 Cases”, Japan Journal of Educational Technology, pgs.

25-28, 2003

[14] Mason, R., Bacsich, P. D., Applications in Education and Training:

Applications in Education and Training, The Institution of Engineering

and Technology, 1994

[15] Blatchford, P., Basset, P., “Teachers’ and Pupils’ Behavior in Large and

Small Classes: A Systematic Observation Study of Pupils Aged 10 and 11

Years”, Journal of Educational Psychology, Vol. 97, No. 3, pgs. 454-467,

2005

[16] Greenson, R. R., “Empathy and Its Vicissitudes”, International Journal

of Psycho-Analysis, Vol. 41, pgs. 418-424, 1960

[17] Fritschner, L. M., “Inside the Undergraduate College Classroom: Faculty

and Students Differ on the Meaning of Student Participation”, Journal of

Higher Education, Vol. 71, No. 3, pgs. 342-367, 2000

[18] Monteil, J. M., Brunot, S., “Cognitive Performance and Attention in the

Classroom: An Interaction between Past and Present Academic Experi-

ences”, Journal of Educational Psychology, Vol. 88, No. 2, pgs. 242-248,

1996

[19] Gatica-Perez, D., McCowan, I., Zhang, D., Bengio, S., “Detecting Group

Interest-Level in Meetings”, Proceedings of IEEE International Conference

on Acoustics, Speech, and Signal Processing, Vol. 1, pgs. 489- 492, 2005

[20] Whelan, R., “Use of ICT in Education in the South Pacific: Findings of

the Pacific eLearning Observatory”, Distance Education, Vol. 29, No. 1,

pgs. 53-70, 2008

[21] Mota, S., Picard, R. W., “Automated Posture Analysis for Detecting

Learner’s Interest Level”, Proceedings of the 2003 Conference on Com-

42

puter Vision and Pattern Recognition Workshop, 2003

[22] De Silva, P. R., Bianchi-Berthouze, N., “Modeling Human Affective Pos-

tures: An Information Theoretic Characterization of Posture Features”,

The Journal of Visualization and Computer Animation, Vol. 15, No. 3-4,

pgs. 269-276

[23] Sutcliffe, A., “A Conceptual Framework for Requirements Engineering”,

Requirements Engineering, Vol. 1, No. 3, pgs. 170-189, 1996

[24] Economou, D., “Requirements Elicitation for Virtual Actors in Collabora-

tive Learning Environments”, Computers and Education, Vol. 34, No. 3-4,

pgs. 225-239

[25] Mehrabian, A., Friar, J., “Encoding of Attitude by a Seated Communi-

cator via Posture and Position Cues”, Journal of Consulting and Clinical

Psychology, Volume 33, pgs. 330-336, 1969

[26] Mori, G., Malik, J., “Recovering 3D Human Body Configurations Using

Shape Contexts”, IEEE Transactions on Pattern Analysis and Machine

Intelligence, Vol. 28, No. 7, 2006

[27] Mori, G. Xiaofeng, R., Efros, A. A., Malik, J., “Recovering Human Body

Configurations: Combining Segmentation and Recognition”, Conference

on Computer Vision and Pattern Recognition, IEEE Computer Society,

Vol. 2, pgs. 326-333, 2004

[28] Motoda, H., Tsumoto, S., Yamaguchi, T., Numao, M., IT Text Fundamental Data Mining,

Ohmsha, Ltd., 2006

[29] Cupillard, F., Bremond, F., Thonnat, M., “Group Behavior Recognition

with Multiple Cameras”, Proceedings of Sixth IEEE Workshop on Appli-

cations of Computer Vision, pgs. 177-183, 2002

[30] Witten, I. H., Data Mining: Practical Machine Learning Tools and Techniques,

Second Edition, Morgan Kauffman Publishers, 2005

[31] Panda, M., Patra, M. R., “A Comparative Study of Data Mining Algo-

rithms for Network Intrusion Detection”, First International Conference

on Emerging Trends in Engineering and Technology, pgs. 504-505, 2008

[32] Davis, J. W., Vaks, S., “A Perceptual User Interface for Recognizing Head

Gesture Acknowledgements”, Proceedings of the 2001 Workshop on Per-

43

ceptive User Interfaces, ACM International Conference Proceeding Series,

Vol. 15, pgs. 1-7, 2001

44

detecting high interest in the classroom through non ...€¦ · learners who display a high level...

Documents