detecting high interest in the classroom through non ...€¦ · learners who display a high level...
TRANSCRIPT
Master Thesis
Detecting High Interest in theClassroom through Non-verbal Learner
Behavior
Supervisor Professor Michihiko Minoh
Department of Intelligence Science and Technology
Graduate School of Informatics
Kyoto University
Gary Jay Coffman
July 27th, 2009
i
Detecting High Interest in the Classroom through
Non-verbal Learner Behavior
Gary Jay Coffman
内容梗概
近年,高等教育における FD(Faculty Development)が広がってきており,
2008年 4月からは大学の学士課程教育においても FDが義務化されている.授
業改善のために様々な取組が行われており、その中でも、撮影した授業映像を
授業者が見ることで自分自身の授業を振り返ったり、数名の教員がその映像を
見ながら授業運営に関する議論を行うといった試みも行われている。しかしな
がら、大学における授業は 90分であり、全部を見るためには同じ時間かかって
しまうという問題があり、検討するポイントを検出することが必要となる。ま
た、授業改善について検討する際には学生の反応や様子が重要であることから、
授業者の映像のみならず、学生を撮影した映像が重要となる。
本研究では、授業運営について議論する際には学生が集中している場面が重
要であると考え、学生の姿勢に関する画像情報から学生が集中している場面を
抽出することを目的とする。
学生の体の傾き、首の傾き、手の位置・状態を姿勢特徴とし、画像から学生
個人における各姿勢特徴の状態を人手で決定する。授業の状態として、学生全
体が集団として「興味が高い」または「未定義」、という 2つを設定する。
3つの授業において学生個人の姿勢特徴と授業の状態に関するデータを収集
し、決定木のアルゴリズムを用いて、姿勢特徴のデータから授業の状態を検出
することを試みた。手順としては、各授業の前半部分のデータを使って機械学
習を行ってルールを導出し、後半のデータに対してそのルールを適用した。
その結果として、3つの授業においてそれぞれ 81.3%、82.7%、75.6%の割合
で「興味度が高い」状態を検出することができ、このような手法、手順で学生
集団の状態を把握することができることを明らかにした。本研究では、学生の
姿勢情報を人手で判断しているが、今後画像処理技術を用いて姿勢情報を認識
できるようにすることで、より大量のデータを用いて授業の状況を判断できる
ようにすることが課題である。
ii
Detecting High Interest in the Classroom through
Non-verbal Learner Behavior
Gary Jay Coffman
Abstract
Relying solely on video data, this thesis proposes a method for identifying
learners who display a high level of interest in the content presented in a lecture
for the purposes of professional development.
The study employs a list of non-verbal heuristics for each learner as they are
captured by classroom video footage. The heuristics are selected on the basis
that they can be collected in a binary format and, when combined, are indicative
of the nature of any verbal behavior as well as the moment-to-moment postures
of each learner involved.
Using these heuristics, a model is composed for interpreting learners’ non-
verbal behavior with regards to interest in the lectures they attend. Data is col-
lected at one-second intervals to represent specific postures maintained by each
learner throughout a lecture. The high-interest moments are defined, verified
through third-party evaluation, and posture features are mined in correlation
with moments of perceived “high interest” in the lecture using a decision tree
algorithm. The data model proposed is based on the rules output from decision
tree analysis which are used to predict moments of “high interest” in unmined
portions of lectures.
The method proposed yields high-interest detection capabilities for three
different lectures expressed in precision rates, 81.3%、82.7%, and 75.6% respec-
tively and is intended for application to an automated form of learner state
detection.
Detecting High Interest in the Classroom through
Non-verbal Learner Behavior
Contents
Chapter 1 Introduction 1
Chapter 2 ICT in Education: A Literature Survey 3
2.1 Information Communication Technology in Higher Education . . 3
2.1.1 Improving Higher Education . . . . . . . . . . . . . . . . . . . 3
2.1.2 Current ICT Tools in Higher Education . . . . . . . . . . . 4
2.2 Difficulties of Improving Education with ICT . . . . . . . . . . . . . 5
2.2.1 Challenges of Improving Higher Education . . . . . . . . . 5
2.2.2 Availability of Technology . . . . . . . . . . . . . . . . . . . . . 8
2.3 Model Development Approach . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Importance of High Interest . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Requirements Engineering . . . . . . . . . . . . . . . . . . . . . 11
2.3.4 Study Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 Learner Behavior Model for the Classroom 15
3.1 Elements of Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Posture Features Collection . . . . . . . . . . . . . . . . . . . . 15
3.1.2 High Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.3 Undefined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Machine Learning Techniques . . . . . . . . . . . . . . . . . . 22
3.2.2 Model Objective and Makeup . . . . . . . . . . . . . . . . . . 23
3.2.3 Determining Group High Interest . . . . . . . . . . . . . . . . 23
3.2.4 Verifying Group High Interest . . . . . . . . . . . . . . . . . . 23
3.2.5 Data Mining Algorithm . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Testing the Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 4 Model Validation 27
4.1 Data Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 Filming Environment . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Lecture Video Data . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 Objectivity of High Interest . . . . . . . . . . . . . . . . . . . . 29
4.2.2 Attribute Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.3 Analyzing Decision Tree Rules . . . . . . . . . . . . . . . . . . 30
4.2.4 Testing for an Affective State . . . . . . . . . . . . . . . . . . . 31
4.2.5 Attribute Importance . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Discussion of Experimental Results . . . . . . . . . . . . . . . . . . . . 32
4.3.1 Application of Results . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.2 Generalizing the Model . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 5 Conclusions 38
5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Acknowledgments 40
References 41
Chapter 1 Introduction
Efforts, in recent years, have been increased to apply information communi-
cation technology (henceforth referred to as “ICT”) to learning environments.
In most cases, this technology is used to add another facet to learning by ex-
ploiting the audiovisual stimuli and hands-on nature of the Internet and educa-
tional software[1]. The accessibility of educational media has not only improved
through the introduction of said technology, but so have the learning environ-
ments incorporating them.
One field to which ICT has contributed many applications is professional
development. In this field, technology can not only be used as a tool for en-
hancing the learner’s experience and modifying learning environments, but for
improving the quality of education. Approaches in professional development
seek to improve educational content and learners’ experiences, and technologi-
cal applications often take on the perspective of educational administrators, if
not the instructors themselves, and allow such improvements with greater ease.
This thesis proposes a method of using ICT for the purpose of furthering
professional, or faculty development in higher education. The method is cen-
tered on the utilization of already-existing video technology, used for filming
and uploading university classes, for analyzing the behavior of learners in a
classroom setting. The purview of this study is limited to analyzing learner
non-verbal behavior where a cluster of learners is analyzed for reactions to the
instructor and/or content, each learner’s actions are analyzed individually, and
then interpreted. Non-verbal behavior is broken down into features, combina-
tions, and patterns of the seated postures of each learner. The relationship
between this learner posture data and the occurrence of these perceived reac-
tions is examined and used as a basis for creating the learner behavior model
proposed in this study.
The active state of learners that is sought in this study is that of “high
interest”. The learners featured in this thesis are observed in a group during
several classes of one course (all taught by the same instructor). Moments
where the group of learners appeared highly interested in what was being taught
1
and/or presented are specifically recorded, verfied via third-party evaluation
based on what is deemed “high interest” criteria, and the behavioral metadata
is modeled to find the relevance of posture features to those moments. The
behavioral model is tested by assessing how well it can detect “high interest”
among learners in a class separate from those used for data gathering.
2
Chapter 2 ICT in Education: A Literature
Survey
2.1 Information Communication Technology in Higher
Education
2.1.1 Improving Higher Education
ICT has enhanced modern education as well as allowed it to progress into
new directions. One of the new ways in which education is being promoted
is through globalization, facilitated more and more as information and commu-
nication technologies evolve. However, the concept of globalization, used in an
educational context and among pedagogical scholars, refers more to the expan-
sion and thinning of educational objectives rather than a growing availability
of educational resources[2].
The objectives for “globalized” educational institutions include ones for in-
vestment or financial growth, broadened curriculum, and increased networking
opportunities for the purpose of maintaining resource and materials costs. For
higher education, most of the objectives associated with globalization, aside
from increasing awareness of international competition, are ones that have a
limited or negative effect on educational quality. Higher education administra-
tors are now faced with the challenge of maintaining a realistic and high stan-
dard for education at their respective institutions to counteract the continued
effects of globalization. Educational institutions continue to have a responsi-
bility to provide education of the highest possible quality. Higher educational
institutions especially have an obligatory role as cultivators of human resources
in administrative, managerial, and even political positions. As economic cli-
mates change, the demand for raising the standards of higher education is well
documented in government legislation and the close association between edu-
cation and economic and/or governmental policies is evident. The emphasis on
this type of quality maintenance in higher education has brought the field of
educational quality assurance to the forefront of education administrative and
pedagological fields. In quality assurance, the accountability of educational in-
stitutions to cultivate adults who are well prepared to contribute to society is
3
at issue as a factor of motivation, as well as competition, be it from institutions
with disciplines and/or locations that are close in proximity or statistics that
reflect an overall standard[3].
2.1.2 Current ICT Tools in Higher Education
Synchronous E-Learning
Ongoing studies of learner behavior include those from distance learning lec-
tures. Video data remains from several international distance courses between
Kyoto University, National Taiwan University, and University of California,
Los Angeles. A number have been conducted to use in-class learner behavior
to gauge the difficulty of lecture content or materials [4].
In the case of studies of Taiwan and Kyoto Universities’ distance lectures,
learner affective states during the actual lectures is not sought. However, learner
attitudes with regards to the courses are evaluated using ICT tools such as
learner-written blogs[5]. Although learner affective states are sought in the
courses between University of California, Los Angeles and Kyoto University,
they are done so exclusively with learner behavior[6].
The original objective being to find additional uses for video footage of such
learner galleries, this study begins as one exploring the correlation between the
number and frequency of verbal interactions and the level of learner interest in
lecture content. It focused specifically on the verbal behavior of learners at-
tending the lectures of a distance course between University of California, Los
Angeles and Kyoto University. However, the contrast between the verbal be-
havior of learners from each of the participant universities proved too great[7].
More specifically, verbal actions and interactions in the Kyoto University class-
room were much fewer than those originating in University of California, Los
Angeles .
As such, the above studies, although formulated with different objectives,
integrate the resources for finding learner affective states. It is also noteworthy
that footage of the instructors themselves is available for synchronized viewing
with the student gallery video, rendering it easier to relate back to the lecture
content, therefore making effective for professional development.
Asynchronous E-learning
4
With the advent of distance learning and video classroom archiving, learning
institutions have turned to recording videos of classes for purposes ranging
from making them available to the public (e.g. open courseware, iTunesU c©)
to storing them simply for reference. However, most of these recordings are
often focused exclusively on the instructor. Using the same technology, some
institutions can record videos of learner galleries which are used for automatic
attendance-taking and following learner verbal interactions (as in the case of
distance lectures)[8][9].
2.2 Difficulties of Improving Education with ICT
2.2.1 Challenges of Improving Higher Education
Although the focus of this study is notably situated on higher education in a
specific context, i.e. an undergraduate course of Kyoto University lectures, it is
important to conceptualize higher education in a much broader context for the
purposes of comprehending the need for reassessing the merits of improving it
on a fundamental level. Therefore, this thesis defines higher education as what
intended levels of ability and/or knowledge an educational body has in order to
graduate its pupils.
Unused Lecture Footage
Videos of lectures, although used for very little other than storage of instructor-
focused videos[10] for the purposes of recording lecture content and facilitating
distance learning, have the potential for providing important information for
instructors and educational institutions. The review of such classroom videos
empowers the viewer with a potential insight different from that of any vantage
point in the classroom. This is especially true when compared to observing a
class of learners while it is being taught. Watching a class of learners via video
gives the viewer an opportunity to focus on a lecture’s moment- to-moment
events, which can be viewed as many times as seen necessary. However, this
video data continues to go unused in spite of these potential benefits.
Assessment of Learners’ Comprehension
One way higher education separates itself from other fields of education is
through its objective to not only base learner evaluation on an array of edu-
5
cational criteria, which today is usually set and met through the assignment
of tasks or examinations, but also on resulting behaviors and/or attitudes. Al-
though several approaches are employed throughout academic institutions to
evaluate characteristics of learners that may not be apparent in assignments
or test scores, a conformable approach based on a widely recognized educa-
tional theory evades them[11]. Information and communication technology has
gained acknowledgement in the field of education for its ability to create means
for evaluating behavior and/or attitude prior to graduation toward the goal of
establishing a theoretical framework.
Now that those in academic communities have the resources to utilize the
latest ICT, they are making efforts to explore ways to use it to improve the
content and styles of their lectures as well as the evaluation of their effectiveness
as instructors[12]. Along with adding to the challenge of maintaining pace with
the ever-rising standards of teaching in higher education, the rising accessibility
of ICT has richly contributed to the study of the improvement and optimization
of teaching methods, knowns as the field of professional development.
Many different objectives and problems are addressed in the field of profes-
sional development, as it pertains to higher education. For example, long-term
objectives such as molding a class of learners to pass a standardized exam with
a minimum average score are conceived to motivate instructors as well as short-
term goals such as gauging learners’ in-class attitudes and how they are relevant
to what is being taught. Professional development can be individually initiated
as well as facilitated by faculty seminars and the like. The ultimate objective
of faculty development, in a more general context, is to examine ways to under-
stand how much a learner has learned, and ICT accomodates instructors search
for these outcomes[13].
With this newly distributed technology, namely classroom archiving, the
question of what approaches to professional development should be used re-
mains. There has been a recent shift among professional development scholars
from instructor-focused to learner-focused research where learner behavior in
reaction to what is being taught is gaining more attention than the evalua-
tion of instructors themselves. However, information on how learners react to
6
instructors and content is very limited, and obtaining it through monitoring
remains a difficult task.
Viewing Learner Reactions in the Classroom
While instructors have clear methods and criteria for evaluating learners’
levels of comprehension of course content, effective methods for evaluating their
own lecture, in other words, assessing how well lecture content reaches learners,
remain scarce. As mentioned in Section 2.1.2, ICT has provided new means
for assessing how lecture content and teaching styles affect learners. However,
proposing the positive application of these means is based on the pretense that
instructors have access to the resources and have the free time to visually review
and analyze videos of themselves and/or attending learners’ reactions[14].
Conversely, when lectures are in session, instructors are able to perceive
how many learners in the classroom can follow what is being taught and/or
how much of a lecture is generally reaching their learners. Although this is
ideal for instructor professional development, there are limits to how much an
instructor can observe in the act of teaching in terms of span of sight and pre-
occupation with other tasks[15]. In addition, instructors are unable to create or
maintain any record of learners’ reactions as they occur in the lecture because
they are preoccupied with teaching responsibilities. On a more fundamental
level, this is mostly due to logistical problems, such as time constraints, which
can be overcome using automated in-class evaluation of learner reactions just
as Scantron technology has made automatic examination result evaluation and
analysis possible. However, the challenge of not only identifying learners’ affec-
tive signals (with relation to the lecture), but also expressing them in the form
of numerical data in preparation for automation must be met in order to realize
this concept.
Interpreting Learner Reactions
There are constant and cross-disciplinary discrepancies within the defini-
tion of the states of subjects (such as affective states, psychological states).
Although, fundamentally, there is no sound way to positively identify a single
state or expression with confidence, observational methods can lead to more
accurate and verifiable hypotheses, and even diagnoses[16]. These methods are
7
often employed by instructors inside and outside of the classroom.
Even when instructors have access to video footage of lectures where their
learners can be clearly observed, creating and verifying the criteria for inter-
preting their reactions remain, in theory, difficult to execute, therefore making
analysis and evaluation nearly impossible[17]. This difficulty is compounded
by that of identifying characterizations of learner reactions with their emotions
or attitudes toward the class content or instructor. Originally in professional
development, affective data for both inside- and outside-class states was sought
using questionnaires. The results from this method, however, have always been
limited to the preliminary step of analysis with learner performance data, such
as marks, rather than the affective data. Visually confirming the correspon-
dences of learners’ reactions inside and outside of class with an output, such as
interest level, has been a challenge for pedagogical scholars[18].
Finding Relevant Footage of Learners
Especially with video footage featuring more than one subject, it is difficult
to assess what sequences of video footage are relevant to determining affective
states[19]. In a lecture setting, automatically detecting sequences where cer-
tain affective states are featured can provide support for instructors wanting to
improve their lectures, but do not have the time to review such footage. Exclu-
sively extracting such sequences has the potential for saving instructors’ time
when reviewing their own lectures.
2.2.2 Availability of Technology
Access to ICT in higher education depends greatly on the extent to which
technology is integrated into the surrounding society. This can be reflected
in how technology is used in government and infrastructure, as ICT tends to
be used and measured in terms of how much it is used for general welfare
using qualitative and quantitative data[20]. Common factors that are used
for collecting said data are how often technology of this sort is accessed, how
affordable it is, and the availibility of content that has been generated locally
as well as qualitative factors such as the extent that technology is incorporated
into civilians’ lives, the perception of how secure and/or trustworthy it is, and
government involvement in its development.
8
One indication of how prevelant information and communication is in higher
education, as a specific realm, is in the national and/or regional presence and
commonality of distance education programs. Along with the above factors, the
possibility that the purposes and benefits of distance education, as well as the
technological makeup of it, can be misunderstood accounts for a clear lack of
demand for the necessary resources.
2.3 Model Development Approach
2.3.1 Importance of High Interest
The concept of “high interest” is pursued by this study an affective state that is
both achievable by students and desired by instructors in a classroom setting.
Essential learner affective states, like high interest, are sought in professional
development as an indication of the presence of engaging lecture content and/ or
teaching styles. Instructors, therefore, have the potential of benefitting from the
identification of high interest. These benefits, and the ability to recognize high
interest, broadly refered to as “interest” in previous works, is widely referenced
and discussed in professional development studies as well as other disciplines.
2.3.2 Previous Works
Studies done to include non-verbal behavior as it occurs specifically within a
classroom setting, let alone studies that rely solely on video data, have gained
little attention from scholars in both technological and social fields. Therefore,
authors of studies of non-verbal behavior in a learning environment, in general,
are sought for supporting this study’s methods for observation and analysis.
Although one such study does not rely on video data, it is noteworthy in its
methods for labeling postures and its objective[21]. Mota and Picard’s study
aims to create an electronic learning companion (featured in the form of a
computer avatar) that can react and interact according to users’ emotional
states, and data collection takes place for one subject at a time. This enables
the authors to utilize a more costly technology for determining postures of each
subject (pressure sensors). Although the above objective and technology are
not applicable to the study for this thesis, its methods for matching postures
and posture sequences to learner interest level serve as a foundation for labeling
9
Table 1: Affective state classification rates for posture features[21]
Percentage of Static Posture Classification
Leaning Forward 96.68%
Leaning Forward Left 80.02%
Leaning Forward Right 76.65%
Sitting Upright 93.21%
Leaning Back 90.91%
Leaning Back Left 79.86%
Leaning Back Right 89.43%
Sitting on the Edge of Seat 91.91%
Slumping Back 90.12%
Average 87.64%
learner postures finding correlations between them and interest level (Table 1).
Mota uses a set of independent Hidden Markov Models (HMM) which are
each linked to a sequence of postures. The probability that the actual posture
sequence was produced by an HMM that represents a certain affective state is
computed. Therefore, each posture sequence is classified with HMM producing
the highest probability.
One other such study is, conversely, founded on video data[22]. The purpose
of the study, conducted by De Silva and Bianchi-Berthouze, is geared toward
affective computing rather than computer-aided learning. Human postures are
duplicated using a vast set of three-dimensional computer-generated avatars
(that are not seated). The study emphasizes the advantage of collecting pos-
ture features rather than gestures and aims to recognize four emotions at a
high rate based solely on the positioning of each avatar. The accuracy with
which each avatar expresses a single emotion is determined in the first phase
by a group of actors whose depictions of the emotions are collected via motion
capture, and then displayed to a set of third-parties. The third-party evalua-
tors not only assign a label to each position, but also an intensity (on a 5-point
scale) showing how well the pose represents the associated emotion. Although
10
the posture labels do not apply to the study of this thesis, the method for
collecting postures and verifying them (as well as the method used in Mota’s
study) is employed in the first proposed model to identify learner postures in
the classroom, and correlate them to a single category, “highly interested”, with
a five-point intensity scale.
2.3.3 Requirements Engineering
This study proposes building a model where the relevant factors are initially
unknown, and therefore, several hypotheses are tested in order to arrive at
a rate of correlation. The process by which these correlations can be gener-
ated is known as requirements engineering[23]. The methods that are used for
testing factors, which are, in this case, learner posture features, are known as
ethnomethodological.
In requirements engineering, the development and validation of methods for
how requirements should be elicited, represented and analyzed are adopted for
the purposes of identifying the requirements of a given system. Requirements
engineering then probes into the further transformation of requirements into the
specific details needed for design, and finally, implementation. For the purposes
of this study, the definition of requirements engineering is considered broadly as
a set of activites, structured, or “engineered”, for the creation and maintenance
of model requirements, expressable in document form. As such, the study sets
out to follow the theoretical framework of eliciting, analyzing, validating, and
therefore, documenting the requirements for the proposed model.
2.3.4 Study Objective
In the first step, elicitation, the video data of learners’ in-class behaviors is
mined for the ethnomethodological correlation of hypothetical requirements
with an affective state. In the analytical stage, step two, the hypothetical re-
quirements, in the form of low-level data, is analyzed using means ranging from
simple observation to automated information processing, where data amounts
reach levels that are impractical for visual observation. The requirements are
viewed as potential attributes for the model and are examined for positive and
negative correlations as well as irrelevant and contradictory data. The third
step contains the task of validating the model by testing the above attributes
11
on separate sets of data in an attempt to simulate possible future applications.
Finally, in the fourth step, the relevance of the set of requirements used for val-
idating the ethnomethodological model is documented in order to display the
verfication of requirements hypotheses as well as discuss unexpected outcomes
and provide a basis for discussion of future applications.
The requirements readied for elicitation have the potential to describe the
functions and tendencies of the proposed model whereas the settings of experi-
menting with the model reflect how a user implements the model. Said require-
ments and settings are able to be displayed and discussed in language ranging
from informal to semiformal and natural, as well as structured and logical, which
is consistent with more formally descriptive language. Along with the capabili-
ties to semantically express a model, requirements engineering has the distinct
ability to include abstracted processes based on the results for processing. As
the consistency of data is always subject to change, so should the requirements
of the model. Particularly for ethnomethodological data sets where consistency
is lacking or the degree of which is unknown, requirements engineering proves
an appropriate method determining attributes for a given model by refining the
relationships between the ever changing requirements which is applied to model
design.
Although, in the best-case scenario for this study, a generalized model is
ideal, where the same requirements can be expected for all settings and sub-
jects, it is unrealistic. Generalization of a model is fundamentally reflected
in the level or amount of abstract processes within its construction. A model
with these conditions that also maintains a number of highly specific processes,
known as a specilization model, is only applicable to data where these specific
processes meet a low level of abstraction. When executing these processes using
requirements-elicitable software, algorithms for implementation are to be care-
fully considered and selected that are appropriate to the processing specificity
and abstraction. Therefore, the model proposed for a given set of data must
be regarded as appropriate for it in logic and concept prior to consideration of
validation techniques, which precedes implementation.
A data process’s promptness for immediate use and level of abstraction can
12
both be obtained through requirements engineering. Due to its adaptable na-
ture, a generalized model is, although unrealistic in this context, is an ideal
objective for requirements engineering not only in the interests of conserving
time and resources via its reusability, but because it can be widely applied re-
gardless of settings and factors not included in the data. A common problem
with quantitatively and statistically analyzing subjects for scientific research is
that any two sets of subjects are not likely to be alike and abstraction is an
inevitable factor. In addition, even in the case that there are common charac-
teristics, there is no guarantee that they will be apparent in the available data
(as mentioned in Section 2.1.2). Therefore, a method for collecting and testing
data for factors that are relevant to a certain context is needed to compensate
for outside influences that are excluded from the data (such as experimental
environments, backgrounds of subjects, etc.).
It is imperative that a requirement is established along with the properties
of the data model that are hypothesized to implicate it. The levels of data that
represent such properties can range from high to low and can be considered in
conjunction with other types of data. When building the model, it is, therefore,
possible to consider other aspects of the requirements, such as those represented
by time, reliability, and the like.
Following the identification stage, the model is to be designed considering
the properties’ relation to the requirement. The fact that these relationships
are subject to change throughout the course of implementing the model must
be considered during the design stage as well as the high probability that the
data used to build the model will not be static.
In the case of learning environments, the data must focus on the interactive
aspects of the space [24]. Interactions can be defined and classified according
to the requirements of the model. Likewise with all ethnomethodologies, the
nature of the data used for constructing a model will not likely remain constant
as exterior factors change nor static in its relationships.
As this study poses no exception to the above restrictions and does not in-
tend to create a “universal” model for detecting “high interest” among learners,
it employs an ethnomethodological approach where the requirements are sought
13
by testing posture features on moments of “high interest” in order to predict
and then detect when these moments are occurring within the timespan of a
specific lecture course. The elements and design of the model are described
in Chapter 3, and an evaluation of how the model tested on a Kyoto Univer-
sity undergraduate class of learners is detailed in Chapter 4 and discussed in
Chapter 5.
14
Chapter 3 Learner Behavior Model for the Class-
room
3.1 Elements of Model
3.1.1 Posture Features Collection
The approach used for collecting learner posture features is designed to enable
the expression of any visually confirmable, seated posture as even from a seated
position, affective states can be expressed and apparent through posture[25].
It is especially helpful that affective information can still be understood from
non-verbal behavior in this position because it is one frequently observed for
learners regardless of setting. For this reason, the characteristics that make up
learner postures in this study involve those observable above the waist.
As the model proposed in this study is encouraged for eventual use with
computer vision, postures must consist of characteristics that are detailed, yet
detectable using computer vision technology within a 3D space containing nu-
merous human subjects and objects[26], such as a classroom. The sheerness of
these characteristics is determined by the threshold of where each charcteristic
ceases to be recognizable using computer vision. Meanwhile, each characteristic
of a posture should be capable of, when combined with other characteristics,
amounting to a thorough description of a single posture.
Just as postures, themselves, will be composed of these smaller posture
characteristics, posture features are determined by the single characteristics
as well as the characteristics working simultaneously to make up whole body
postures as they relate to an affective state. To support as wide a range of
human subjects’ motions as possible, the body is categorized into three parts:
the torso, the hand/arms, and head. For each of these categories, a number
of heuristics is assigned to describe possible positions or actions: five for the
direction of the torso, six for hand and arm placement or activity, and four for
head positioning .
With the objective of tracking human body postures and poses, the field
of computer vision often considers the limitations of movement as a factor for
classifying and estimating body positioning. Several factors contribute to the
15
categorization of postures that commonly occur within a certain space or cir-
cumstance and those that are exceptional[27]. In the context of a classroom,
it is important to consider the confines of the space that learners have and the
space that is available for recording video data. This study considers these con-
fines as they relate to the space used in the experimentation phase in order to
classify possible postures, and a list of common posture features is made based
on video observation within this setting.
Torso
Torso angle for each learner is determined by its angle in relation to the seat
being occupied. “Leaning left” and “leaning right” are each present when the
perpendicularity of the learner’s body’s axis with the seat bottom is broken
by a substantial angle. “Leaning forward” and “leaning back” are likewise
determined by the angle of the learner’s back with his/her seatback, when it
exceeds parallelism substantially. When neither of the above criteria is in effect,
the learner is deemed “sitting straight up”(Figure 1).
Hands/arms
“hands on face” and “touching head” are distinguished using the hair lines on
each side of the learner’s head. When the hand comes in contact with any
part of the head beyond the hair line, he/she is deemed “touching head”, and
everything on the facial side of the hair line, including eye glasses, is marked as
“hands on face” when touched. “Writing” is confirmed when a writing utencil in
a learner’s hand is actually seen touching a writing material on the desk. When
either hand is seen inactive and on the top of the desk, “hand(s) rested (on
desk)” is marked regardless of the position and/or activity of the other hand.
Finally, a hand not seen doing any of the above is deemed “hand(s) rested (off
desk)”, and can occur simultaneously with any of the above labels (Figure 2).
Head
“head up” and “head down” are determined by angle of the facial plane in rela-
tion to the top of the desk. When the learner’s head comes within a substantial
angle of parallelism or less, ”head down” is chosen, whereas “head up” pertains
to all greater angles. “Head tilted right” and “Head tilted left” are determined
by the angle of the axis of the learner’s head and his/her shoulders. When it ex-
16
Sitting Straight Up
Leaning Forward
Leaning Back
Leaning Right
Leaning Left
Figure 1: Diagrams of torso positions used for rendering posture features (as
viewed from the front and side)
ceeds a substantial angle to the right (according the the learner’s perspective),
“Head tilted right” is recorded, and “Head tilted left” when the same criteria
on his/her left side is satisfied (Figure 3).
Data Coding and Formats for Mining
Posture characteristic tagging is done using a spreadsheet with exclusively
binary data (Figure 4). After posture tags that are found insignificant are
eliminated, 15 posture tags remain and are noted for each learner over a given
period of time of a lecture.
When the interdependent data is eliminated, the data format used for input
in the analysis phase changes to a 9-point system based on the original 15
posture tags (Table 2) .
In the torso category, features are represented nominally as follows: S =
Sitting up Straight, F = Leaning Forward, B = Leaning Backward, R = Leaning
Right, and L = Leaning Left. For head raising, “U” denotes the head is up,
17
Hand(s) Rested
Writing Elbow(s) Rested
Hand(s) on Face Touching Head
(On Desk) (Off Desk)Hand(s) Rested
Figure 2: Diagram of six hand/arm position types used for rendering posture
features
Table 2: Four posture categories detailing nine posture features
Torso Direction Hand/arms Head Raising Head Tilting
S, F, B, R, or L Hand(s) on Face Down or Up Right or Left
Touching Head
Hand(s) Rested (off)
Hand(s) Rested (on Desk)
Elbow(s) Rested (on Desk)
Writing
and “D” is for down. One input, “N”, is added to the head tilting attribute
which denotes that there is no tilting observed (where “R”, right, nor “L”, left
appears). The “N” attribute does not apply to head raising as the head is only
considered to be up or down.
Interdependent and Independent Posture Characteristics
18
Head Tilted Right
Head Tilted Left
Head Up
Head Down
Figure 3: Diagrams of head positions used for rendering posture features (as
viewed from the front and side)
The data used in this study is collected regardless of the dependence of
posture characteristics on other characteristics. However, it is imperative that
at the stage where the correlation between the postures tagged in the collection
phases and high interest is analyzed and mined, this data is input such that
interdependent factors (posture characteristics) do not coexist.
Through observation of learners in a classroom setting, the basic planes and
angles at which the above posture tags are defined are examined (Figure 5).
Once one of these posture characteristics is discovered it is noted using binary
coding with “1” representing the presence of such characteristics, and “0”, an
absence thereof.
These nine heuristics, when collected in combination, offer a precise means of
collecting posture features for each learner in every frame of the video and pre-
senting them as attributes for the proposed model. The postures are collected
and expressed in linear binary code for each time interval .
19
DateTimeSeat
Posture TagsHigh Interest
Figure 4: Example of input vectors and the corresponding (expected) affective
state tags
Figure 5: Torso and head angles in lecture video footage
As the nature of these posture characteristics are based on individual video
observation and analysis, it is the burden of this study to prove that each of
these characteristics have a significant frequency of occurrence in the class-
room(Figures 6-8). Posture characteristics may or may not appear based on
numerous factors including learner individuality and classroom environment.
However, for the purposes of this study, postures that do not appear within
this realm of significance are eliminated in the analysis stage.
20
Figure 6: Distribution of posture tags for October 22nd lecture
Figure 7: Distribution of posture tags for November 12th lecture
3.1.2 High Interest
The category of learner state sought by this study is “high interest”, which is
defined, in the context of a lecture taught in a classroom, as manifesting itself
in learners as the appearance or the impression of listening especially intently
to what is being taught or presented by the instructor(s). This entails, but is
not limited to the following reactive behaviors: taking notes, making facial ex-
21
Figure 8: Distribution of posture tags for December 3rd lecture
pressions with sight focused on the instructor(s) or materials display, including
nodding, smiling, laughter, and /or confusion. This is applied to clusters of
learners rather than individuals and these criteria are defined and clarified to
those participants in the third-party evaluation stage of the experiment for this
study.
3.1.3 Undefined
The undefined learner state referred to in this study indicates a learner state
that either does not fall under the above written criteria for “high interest”, or
is, for circumstantial reasons, unable to be identified or “definied”.
3.2 Proposed Model
3.2.1 Machine Learning Techniques
The correlation between posture feature attributes (and any combination thereof)
is found through machine learning, where instances of data for each learner are
used as input for training a system, and based on the trends learned in the
training phase, the system predicts outcomes. With the nature of the desired
outcome for this study, either high interest or undefined, being that of a classi-
fication problem, decision tree-based learning is employed to construct the data
model necessary for predicting high interest. The decision tree algorithm is
22
ideal for testing data by using precise probabilities, assigned as attribute val-
ues, based on how often a single posture attribute or combination of posture
attributes is correlated with high interest[28].
3.2.2 Model Objective and Makeup
As a study employing requirements engineering, the objective of the data model
proposed in this study is summarized in two parts: a.) designate an appropri-
ate methodology for establishing and learning the correlation between posture
features and high intereste, and b.) to predict when moments of high interest
are occurring in a lecture. This methodology validated in this study is intended
for application to computer vision techniques, and therefore, must rely on forms
of data appropriate for eventually detecting affective states automatically. .
In order to achieve this, there must be a clear definition of the relationship
between moments of high interest and the nine posture attributes by render-
ing a correlation coefficient. Initially, this is explored through the collection
and mining of both high-interest scenes of lectures as well as the surrounding
undefined scenes.
3.2.3 Determining Group High Interest
A group of learners is observed over a span of time and high-interest labels
are initially assigned second-by-second to represent where the group appears to
display high interest. The video screens used to shoot the video gallery, divided
into two part, are alligned to replicate the learners’ seat positions as closely to
how they appeared in the lecture as possible. This positioning (of the learners’
seats and the video screens used for observation) is key to establishing the
overall atmosphere as a group of learners as well as the technique for automatic
detection when applying this study to computer vision [29]. Learners are then
individually observed and high-interest labels are inserted into each learners’
set of posture data
3.2.4 Verifying Group High Interest
Third parties are employed to provide observations and data detached from
the objectives of this study. However, third parties are asked to label high
interest based on the criteria discussed in Sections 3.1.2-3.1.3. Third-party
high-interest evaluators are also asked to assign “high interest” or “undefined”
23
Table 3: Total high-interest instances verfified by third parties for three lectures
Date 10/22/2008 11/12/2008 12/03/2008
Total No. of Learners 6 6 6
Total No. of Matched Instances 80 210 500
Avg. no. of Learners per Sequence 3.542 3.689 3.09
labels while observing the group of students for 10-second intervals. Second-
by-second annotation was not possible due to time constraints on behalf of the
third-party evaluators.
The high-interest labels are then paired with the labels prepared in the
initial group high-interest observation stage. Finally, the number of students
that necessitate a “group” is determined for each of the lectures observed in
this study by averaging the number of learners for each high-interest sequence
that matches between the initial stage and the third-party labels. The overall
average number of learners in these matched sequences is 3.4 learners (Table 3).
As each lecture contains the same number of learners observed for this study,
it is deduced that for this set of lectures, three or more learners are needed to
establish a “group” displaying “high interest”. In keeping with this, although
the individual high-interest labels mentioned in Section 3.2.3 are employed in
this study, they are summed for each second of the lecture recorded in the
data. Therefore, high interest can be expressed numerically from 0-6 for each
instance of the data set. The average number of learners per each of these
instances comes to 57% (3.44/6 learners) for all three lectures. Therefore, a
threshold is then set to three learners so that every instance with less than
three learners is recorded as “undefinied”, and those with three or greater are
labeled “high interest” (Figure 9).
3.2.5 Data Mining Algorithm
As the crux of the methodology of this study, choosing an appropriate data
mining algorithm is a key step in engineering the requirements for detecting
high interest. During the course of research done to support this study, the
collected data is tested on a number of algorithms (including ones that are
24
Figure 9: Graph of High Interest for Six Learners
not decision tree-based) using the data mining tool, Weka c©. The algorithm
implemented in the model validation phase of this study outperformed any such
algorithms.
The purpose of the data model employed in this study is to automatically
predict when moments of high interest are occurring in a lecture based on
individual postures. This is achieved through a machine learning algorithm
known as J4.8, which is the latest C4.5 decision tree (8th revision) available to
the public[30].
1) An attribute value is assigned to a root node, and a corresponding branch is
created.
2) Based on the data set, each branch is then divided based on probability, and
the two products are treated as root nodes.
3) The above two processes are repeated allowing the trees branches to grow.
4) When the leaf of a branch yields an instance that completely corresponds to
one class, the branch stops growing.
As a Java-implemented algorithm, the sets of data are able to be catego-
rized according to object-oriented programming. This categorization of the two
possible outcomes, “high interest” and “undefined”, is notated in binary code.
The classification of the J4.8 algorithm is similar to the Id3 decision tree ex-
cept for the pruning capabilities which minimize classifier errors by eliminating
attributes that result in sets of contradictory data[31].
25
3.3 Testing the Methodology
Once data from numerous scenes of varying lectures has been collected, ma-
chine learning is applied to determine weights of posture features and activity
levels as they correlate to high-interest and undefined scenes respectively. The
posture features are used as attributes and tested on data from separate scenes
of the lectures involving the same instructor, content, and classroom to min-
imize surrounding influences. The ability of the model to accurately predict
high interest is assessed using the percentage of correct classification resulting
from the test.
26
Chapter 4 Model Validation
4.1 Data Gathering
4.1.1 Filming Environment
In keeping with Kyoto University’s ethical standards, permission is sought to
film classes where the learners participating in the study are made aware that
they are being filmed and issued consent forms roughly detailing the period
and purpose of filming. The data from two 90-minute lectures spanning one
semester are used for this study, all subjects of which attend lectures within the
same lecture hall layout (Figure10).
Instructor’sDesk
Electronic WhiteboardElectronic Whiteboard
Learners’ Seat
Screen
Target Learners
Figure 10: Layout of lecture hall used for data collection
Depending on the time of the day the lecture takes place, learner attendance
varies from week to week, and often changes during the span of one lecture
creating many distractions toward the beginning of each class and complicating
the makeup of learner groups that are being observed. Therefore, observations
are conducted for each class beginning 15 minutes following the instructors’ first
utterances and ending 10 minutes prior to dismissal.
4.1.2 Lecture Video Data
Lectures are viewed at a speed of either two or three times the actual time of the
lecture (Figure 11). Data for this study is recorded at one-second intervals for
observational purposes as bodily gestures can be recognized by humans within
one second[32]. Instances of “high interest” among the entire body of learners
27
captured by the video are noted by marking the start time of a period where
learners are seen showing an apparent “high interest” than displayed in the
majority of the lecture. These “high interest” scenes are isolated, and learners
sitting in the front three rows of the classroom are selected for posture data
collection (Figure 12). Video footage of instructors teaching these courses is
also available and used (for reference to lecture content).
Figure 11: Four learners captured by one camera at one-second intervals
Figure 12: Subjects for data collection as captured by two classroom cameras
Prior to collecting or comparing data between learners, whether the learners
28
are showing “high interest” or not is generally assessed based on the criteria
specified in Section 3.1.2 viewing the lecture video footage (Figure 13). Each
of these “high interest” moments are labeled thusly, and all other sequences
are labeled as “undefined”. Posture features for all learners are collected at
one-second intervals and aligned with verified “high interest” and “undefined”
labels. This is continued for three separate classes in October, November, and
December respectively for six learners in each class.
(i) ”Sample scene of “high interest”” (ii) ”Sample “undefined” scene”
Figure 13: Still photos of video data used for observational assessment of “high
interest”
It is from these scenes, where “high interest” is hypothesized to be present,
that posture data is collected manually and the correlation between posture
tags and high interest is examined.
4.2 Experiments
4.2.1 Objectivity of High Interest
In order to verify the presence of high interest in the isolated scenes, third-party
evaluation of “high interest” is performed for the entirety of the classes used in
the experiment phase. The participants are asked to label 10-second segments of
the lecture video as “high interest” or “undefined” based on their observations of
the learners occupying the first three rows and the criteria described in Section
3.1.3. The third-party 10-second-interval data is then evaluated with the one-
second “high interest” data created in the first stage of the experiment which
yields a mean match of 87.6% (Table 4).
29
Table 4: Rates for matching initial and third-party evaluations of “high interest”
Matching rates for each lecture
10/22/2008 11/12/2008
6 Students 6 Students
89.02% Match 86.18% Match
Total Time
3’10” 1’00”
4.2.2 Attribute Values
In the training stage of the experiment, three nominal attributes and six nu-
merical attributes, all based on low-level posture data, are used to construct the
model for three lectures, dating October 22nd, November 12th, and December
3rd respectively. The lectures are all on the same topic from the same course
curriculum and taught by the same instructor.
The data for six learners from each lecture is divided in half, with the first
half used as the training set learned using the J4.8 decision tree on Weka c©.
Prior to testing this set of data on the remaining half (henceforth referred to as
the test set), decision tree rules (Figures 14-16) are renderred to assign values
to each of the attributes which allows for more precise classification of high
interest moments.
4.2.3 Analyzing Decision Tree Rules
As one of the features of the J4.8 decision tree, a C4.5 decision tree, a pruning
algorithm counteracts overfitting by pruning inactive attributes (dealt with in
the initial posture feature analysis of the data collection stage) as well as con-
tradictory data within one set. Machine learning using the J4.8 decision tree
allows for the creation of a distinctive model for the data used from each lecture.
The size and depth of each tree varies with each lecture and its innumerable
factors. This is evident in the size, depth, and number of leaves (Table 5) as
well as the posture features found in each lectures’ set of rules (Figures 14-16).
All three lectures yield an average of 83.9% correct classification in training the
data.
30
Table 5: Decision tree analysis and rates of classification for the training phase
Machine Learning Results
Lecture Date October 22 November 12 December 3
Tree Depth 7 7 6
Number of Leaves 31 37 41
Size of Tree 48 59 68
Correct Classification 88.4% 81.8% 81.4%
Table 6: Precision and recall rates for three separate lectures
October 22 November 12 December 3
Total Instances 1,140 1,668 3,540
Total Time of Lecture 6’20” 9’16” 39’20”
Total High Interest 498 1,274 3,006
Total High Interest Precision 81.3% 82.7% 75.6%
Total High Interest Recall 91.0% 96.2% 80.5%
The amount of relevance and contradiction of each of these posture features
is detailed in the set of rules generated in the training phase.
4.2.4 Testing for an Affective State
The test set, made up of the remaining half of data, is used to evaluate how
accurately the Weka c© can predict high interest for each lecture based on the
corresponding set of rules.
Precision: (No. of Correctly Classified High Interest Instances)(No. of Correctly Classified High Interest Instances)+(No. of Incorrectly Classified High Interest Instances)
Recall: (No. of Correctly Classified High Interest Instances)(No. of Correctly Classified High Interest Instances)+(No. of Incorrectly Classified Undefined Instances)
In the case of detecting “high interest” among learners, the precision rate for
high interest classification (Table 6) is used as opposed to the overall prediction
rate. As “undefined” scenes collected in this study are of undefined criteria and
relevance, the ability for overall detection of the test set is excluded.
Based on the model created using the J4.8 algorithm, a comprehensive set of
31
Table 7: Precision-rate comparison subtracting each posture feature once
October 22 November 12 December 3
9-attribute Precision 81.3% 82.7% 75.6%
- Torso 77.6% 81.0% 75.6%
- Hands on Face 77.6% 82.6% 74.6%
- Touching Head 80.7% 81.8% 75.5%
- Hands Rested off 85.5% 82.6% 74.1%
- Hands Rested on Desk 82.5% 82.6% 70.7%
- Elbows Rested 82.3% 84.1% 73.8%
- Writing 81.3% 82.1% 75.5%
- Head Raising 75.3% 83.1% 79.4%
- Head Tilting 75.8% 79.5% 74.7%
decision tree rules allows for precise prediction (an average correlation coefficient
of 79.8%) of where high interest occurs during a lecture based on learners’ low-
level posture data.
4.2.5 Attribute Importance
The posture tags in this study, based on the results of high-interest prediction
precision, are examined to verify that attributes are irreplaceable and will not
yield higher precision when eliminated. This verfication is established by com-
paring precision rates for the model after removing each attribute individually,
one at a time (Table 7).
It is concluded that a single attribute, although yielding higher precision
in single lectures, will not yield higher precision when eliminated for all three
classes.
4.3 Discussion of Experimental Results
4.3.1 Application of Results
The tree rules generated for each lecture (Figures 14-16) are concise and able to
be related back to the learners from whom affective states are being extracted.
This application to professional development can be carried out by analyzing
32
the characteristics of each tree as though they represent the features of the
group of learner subjects.
The most influential of all posture features appears at the top (the “root”)
of the three trees representing each lecture (“head tiled right”, “elbows rested”,
and “touching head”, respectively). Although the posture features directly
below the top of the tree are considered conditions that, in tandem, correlate
with the target affective state, as the most influential posture feature, it can be
noted in video footage of the group of learners in question, or during the actual
lecture from which posture features have been extracted.
4.3.2 Generalizing the Model
Generalization of the model proposed in this study is key to applying it to
professional development. This allows posture data taken for learners in one
class to be applied to the same learners in a separate class.
In a preliminary experiment, the posture features for six learners (sitting
nearest the camera) are tagged at one-second intervals for three lecture recorded
on October 22, November 12, and December 3, 2008. Thirteen segments of
group high interest are recorded for all three lectures, and data from the first
two lectures is tested on the third lecture using an Id3 decision tree. This
yielded a low correct classification rate (less than 60%) and the more data that
was added to the training set, the more the correct classification rate would
decrease.
In order to improve correct classification results and expand the versatility
of the proposed model, additional steps are needed to ensure not only high rates
for classification, but precision as well. However, for the purpose of automated
affective-state detection using low-level video data, this study supports the ar-
gument that it is possible to predict where affective states exist based on static
information.
Low classification rates in this scenario are normally a result of a common
data mining problem known as “overfitting” where contradictory data exist
and lower the effectiveness of the decision tree to predict an outcome correctly.
However, to prove the J4.8 decision tree is effective in dealing with overfitting,
10-fold cross validation is conducted on all three lectures’ data samples at the
33
Table 8: Cross Validation Results for Three Lectures Combined
Total No. of Instances Tested 12,696
No. of Folds 10
Precision Rate 74.8%
Recall Rate 87.8%
same time, which yielded a precision rate of 74.8% for nearly 12,700 instances
8.
This high precision rate is an indication that the cause of low classification
rates when testing one lecture’s data on another (even with overlapping learners)
is a lack of training samples rather than overfitting. It can also be gathered from
these results that, although many samples must be used for machine learning,
the size of each may cover less than 10 minutes of a 90-minute lecture. Therefore,
experimenting with more samples of shorter lecture times is suggested as a
future work of this study.
34
Figure 14: Decision tree rules for October 22nd lecture
35
Figure 15: Decision tree rules for November 12th lecture
36
Figure 16: Decision tree rules for December 3rd lecture
37
Chapter 5 Conclusions
5.1 Conclusions
This thesis has demonstrated a method for identifying when learners display a
high level of interest in the content presented in a lecture for the purposes of
professional development in higher education. The method is solely based on
video data as a means for applying ICT to professional development.
The study has employed a list of heuristics for each learner as they are
captured by classroom video footage. The heuristics were selected on the basis
that the data thereof can be collected in a binary format and, when combined,
are indicative of the nature of any of the moment-to-moment postures displayed
by each learner involved. The posture features used in this study are intended
to be applied eventually to computer vision techniques.
Using the heuristics mentioned above, a model was composed for interpret-
ing learners’ non-verbal behavior with regards to interest in the lectures they
attend. Data was collected at one-second intervals to describe specific pos-
tures maintained by each learner throughout a lecture at one-second intervals.
The high-interest moments are defined by the author, verified through third-
party evaluation, and each posture feature and combination thereof is mined
in correlation with moments of perceived “high interest” and contrasted with
“undefined” moments in the lecture. Static postures are collected from the pre-
cise span of time verified in the previous phase and the postures are assigned
a set of rules according to their correlation sought in separate segments of the
lecture, and then reassessed repeating the above steps.
The affective state of learners that was sought in this study is that of “high
interest”. The learners featured in this thesis are observed in a group during
several classes of one course (all taught by the same instructor). Moments
where the group of learners appeared highly interested in what was being taught
and/or presented are specifically recorded, verfied via third-party evaluation
based on what is deemed high-interest criteria, and the behavioral metadata is
modeled to find the relevance of posture features to those moments. The method
proposed yielded a precision rate (for detecting high interest) of approximately
38
80% and aims to be applied to an automated form of learner affective state
detection.
5.2 Future Works
In order to create a model for a course of lectures, this study concludes that
the number of lectures used as training samples, regardless of sample size, is
necessary for establishing generalization. Therefore, this study suggests that
instructors use posture features from a group of learners in 10-minute intervals
from 4-6 different lectures to test on the remaining time of said lectures, or
lectures from which data has not been collected. The relevant posture features
may vary from lecture to lecture, but are intended to remain consistent for the
group of target learners.
A possibility that has arisen during the experiments for generalizing the
proposed model is that generalization of lectures for one course of lectures is
achievable through the use of assigned seating. Although experimenting on this
hypothesis poses several problems with classroom layout, it can be executed
given the learner subjects’ cooperation.
Additionally, given that arranging classroom seats is not possible, the model
proposed in this study could be applied to finding learner group features based
on where groups are seated within the classroom. Results from experiments
dealing with this issue would be particularly interesting in contrast to those of
this study as it focused on a narrow section of the classroom which was close
in proximity to the instructor.
Eventually, the posture features found relevant by this study, are proposed
to be automatically tracked and recorded using computer vision. Analysis of
these postures, as they occur throughout a lecture, may also be automatized.
Moreover, this study is contingent on a model built with data collected in the
controlled environment detailed in Section 4.1. However, the possibility that
posture data reflects the environment from which it is gathered suggests that
generality is an issue, and separate models may have to be built according to
their environments.
39
Acknowledgments
I would like to thank, first and foremost, Professor Michihiko Minoh for the
opportunity to be a part of his laboratory and work alongside such an esteemed
group of young and talented researchers, as well as the time and attention he
graciously put into guiding this study and seeing it through to completion. I
also wish to thank Professor Koh Kakusho of Kwansei Gakuin University and
Professor Hajime Kita of Kyoto University for presiding on my midterm defense
committee in the fall of 2008.
I owe a debt of graditude to Associate Professor Masayuki Mukunoki for
overseeing this study and serving on the committee for the oral defense along
with Professor Minoh, Professor Kakusho, and Department of Intelligence Sci-
ence and Technology Dean, Professor Akihiro Yamamoto. Thanks also to all
Presence Group members, Assistant Professor Takuya Funatomi, Global COE
researcher, Mayumi Ueda, and Academic Center for Computing and Media
Studies researcher, Nimit Pattanasri for their unfailing support and constant
feedback.
Thanks to Professor Victor Kryssanov of Ritsumeikan University for believ-
ing in my potential and empowering me with the ambition to take on otherwise
impossible challenges.
Finally, I wish to thank Associate Professor Masayuki Murakami of Kyoto
University of Foreign Studies and Assistant Professor Tetsuo Shoji of Nara
University, without whom the completion of this study would not have been
possible, for their tireless efforts and devotion to the fruition of this study, their
patience and tolerance for working with someone from a research background
completely different from theirs, and their shared passion for the content of this
study.
40
References
[1] Abbot, C., ICT: Changing Education, Routledge, 2001
[2] Van Damme, D., “Trends and Models in International Quality Assurance
in Higher Education in Relation to Trade in Education”, Higher Education
Management and Policy, Vol. 14, No. 3, pgs. 93-136, 2002
[3] Boyle, P., Bowden, J. A., “Educational Quality Assurance in Universities:
An Enhanced Model”, Assessment Evaluation in Higher Education, Vol.
22, No. 2, pgs. 111-122, 1997
[4] Nakamura, K., Kakusho, K., Murakami, M., and Minoh, M., “Estimating
Learners’ Subjective Impressions of the Difficulty of Course Materials in
e-Learning Environments”, APRU 9th Distance Learning and the Internet
Conference, pgs. 199-206, 2008
[5] Lin, W. J., Liu, Y. L., Kakusho, K., Yueh, H. P., Murakami, M., Minoh,
M., “Blog as a Tool to Develop e-Learning Experience in an International
Distance Course”, Sixth International Conference on Advanced Learning
Technologies, 290-292, 2006
[6] Coffman, G. J., “Interaction in the Classroom and Filming Applications
Based on ‘College Culture’: A Socio-anthropological Approach”, Presented
at: 5th Annual Conference on Applications of Social Network Analysis
2008, 2008
[7] Murakami, M., Yagi, K., Kakusho, K., Minoh, M., “Evaluation of Distance
Learning Course Shared by UCLA and Kyoto University”, Proceedings of
2nd International Conference on Information Technology Based Higher
Education and Training, 2001
[8] Keller, C., Cernerud, L., “Students Perceptions of E-learning in University
Education”, Journal of Educational Media, pgs. 55-67, 2002
[9] Minoh, M., “Automatic Lecture Archiving System”, Proceedings of the
International Conference on Informatics Research for Development for
Knowledge Society Infrastructure 2004, pgs. 39-45, 2004
[10] Marutani, T., Nishiguchi, S., Kakusho, K., Minoh, M, “Making a Lec-
ture Content with Deictic Information about Indicated Objects in Lecture
41
Materials”, AEARU Workshop on Network Education, pgs. 70-75, 2005
[11] Carter, R., “A Taxonomy of Objectives for Professional Education”, Stud-
ies in Higher Education, Vol. 10, No. 2, pgs. 135-149, 1985
[12] Fulk, J., “Social Construction of Communication Technology”, Academy
of Management Journal, Vol. 36, No. 5, pgs. 921-950, 1993
[13] Mizogami, S., “Five Types of Open Class as Faculty Development Activity
: From Study of 13 Cases”, Japan Journal of Educational Technology, pgs.
25-28, 2003
[14] Mason, R., Bacsich, P. D., Applications in Education and Training:
Applications in Education and Training, The Institution of Engineering
and Technology, 1994
[15] Blatchford, P., Basset, P., “Teachers’ and Pupils’ Behavior in Large and
Small Classes: A Systematic Observation Study of Pupils Aged 10 and 11
Years”, Journal of Educational Psychology, Vol. 97, No. 3, pgs. 454-467,
2005
[16] Greenson, R. R., “Empathy and Its Vicissitudes”, International Journal
of Psycho-Analysis, Vol. 41, pgs. 418-424, 1960
[17] Fritschner, L. M., “Inside the Undergraduate College Classroom: Faculty
and Students Differ on the Meaning of Student Participation”, Journal of
Higher Education, Vol. 71, No. 3, pgs. 342-367, 2000
[18] Monteil, J. M., Brunot, S., “Cognitive Performance and Attention in the
Classroom: An Interaction between Past and Present Academic Experi-
ences”, Journal of Educational Psychology, Vol. 88, No. 2, pgs. 242-248,
1996
[19] Gatica-Perez, D., McCowan, I., Zhang, D., Bengio, S., “Detecting Group
Interest-Level in Meetings”, Proceedings of IEEE International Conference
on Acoustics, Speech, and Signal Processing, Vol. 1, pgs. 489- 492, 2005
[20] Whelan, R., “Use of ICT in Education in the South Pacific: Findings of
the Pacific eLearning Observatory”, Distance Education, Vol. 29, No. 1,
pgs. 53-70, 2008
[21] Mota, S., Picard, R. W., “Automated Posture Analysis for Detecting
Learner’s Interest Level”, Proceedings of the 2003 Conference on Com-
42
puter Vision and Pattern Recognition Workshop, 2003
[22] De Silva, P. R., Bianchi-Berthouze, N., “Modeling Human Affective Pos-
tures: An Information Theoretic Characterization of Posture Features”,
The Journal of Visualization and Computer Animation, Vol. 15, No. 3-4,
pgs. 269-276
[23] Sutcliffe, A., “A Conceptual Framework for Requirements Engineering”,
Requirements Engineering, Vol. 1, No. 3, pgs. 170-189, 1996
[24] Economou, D., “Requirements Elicitation for Virtual Actors in Collabora-
tive Learning Environments”, Computers and Education, Vol. 34, No. 3-4,
pgs. 225-239
[25] Mehrabian, A., Friar, J., “Encoding of Attitude by a Seated Communi-
cator via Posture and Position Cues”, Journal of Consulting and Clinical
Psychology, Volume 33, pgs. 330-336, 1969
[26] Mori, G., Malik, J., “Recovering 3D Human Body Configurations Using
Shape Contexts”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 28, No. 7, 2006
[27] Mori, G. Xiaofeng, R., Efros, A. A., Malik, J., “Recovering Human Body
Configurations: Combining Segmentation and Recognition”, Conference
on Computer Vision and Pattern Recognition, IEEE Computer Society,
Vol. 2, pgs. 326-333, 2004
[28] Motoda, H., Tsumoto, S., Yamaguchi, T., Numao, M., IT Text Fundamental Data Mining,
Ohmsha, Ltd., 2006
[29] Cupillard, F., Bremond, F., Thonnat, M., “Group Behavior Recognition
with Multiple Cameras”, Proceedings of Sixth IEEE Workshop on Appli-
cations of Computer Vision, pgs. 177-183, 2002
[30] Witten, I. H., Data Mining: Practical Machine Learning Tools and Techniques,
Second Edition, Morgan Kauffman Publishers, 2005
[31] Panda, M., Patra, M. R., “A Comparative Study of Data Mining Algo-
rithms for Network Intrusion Detection”, First International Conference
on Emerging Trends in Engineering and Technology, pgs. 504-505, 2008
[32] Davis, J. W., Vaks, S., “A Perceptual User Interface for Recognizing Head
Gesture Acknowledgements”, Proceedings of the 2001 Workshop on Per-
43
ceptive User Interfaces, ACM International Conference Proceeding Series,
Vol. 15, pgs. 1-7, 2001
44