strategies for model-oriented information organization robert b. allen ロバート アレン...

15
Strategies for Model-Oriented Information Organization Robert B. ALLEN ロロロロ ロロロ Research Center for Knowledge Communities University of Tsukuba Tsukuba, Japan [email protected]

Upload: khalid-folks

Post on 12-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Strategies for Model-Oriented Information Organization

Robert B. ALLEN  ロバート アレンResearch Center for Knowledge CommunitiesUniversity of TsukubaTsukuba, Japan

[email protected]

“Big Data” Problem of Organization and Access for Cultural Heritage Materials

• Behavior-based models go beyond ontologies and traditional approaches to knowledge representation.

• Rather than developing indexes. Perhaps we can model communities and eventually cultures.

• Those models can provide structure to support organization, context, and access. Original motivation is information science view based on indexing historical newspapers. But, it is potentially broader than information science.

Causal Relationships in Timelines of EventsEvents can be threaded into narratives. Other types of discourse as well. Here is an interactive threaded curated timeline visualization which describes threaded relationships of events.

.

Allen, R.B., Visualization, Causation, and History, iConference, 2011.

Explicit Modeling of Texts and Communities

• Many types of cultural texts.• Many communities are relatively closed

systems. This makes the more tractable than indexing cities.

• Detailed knowledge about the community allows synergies – People– Locations– Processes

• Earlier work developed an “interactive community directory”.

Allen, R.B., Toward an Interactive Directory for Norfolk, Nebraska: 1899-1900, IFLA Newspaper and Genealogy Section Meeting, Singapore, Aug 2013.  arXiv:1308.5395

Behavior-Based Models• Timing is right for more explicit cultural modeling.

– Big data needs to be organized (e.g., animations from keynote)– Need to represent process and functionality

• Fragments of knowledge are unified by models across collections and databases. Our models have conceptual units

• Behavior-based models go beyond ontologies and traditional approaches to knowledge representation.

• Based on software engineering. Full-fledged programming languages, specifically object-oriented programming languages, are useful for representing those descriptions. Here we explore how to use Java for modeling communities and events in communities.– Entities with state and behavior.– Abstraction and instantiation

• Executable models which unfold as they are run.• We need to be cautious about complex, large-scale models

– Some efforts at complex modeling such as Cyc have not fulfilled their promise– Other efforts at large-scale indexing has done better

• UMLS (Unified Medical Language System)– We focus on a conservative approach to modeling

• we will not do too much automatic much inference.

Allen, R.B., Model-Oriented Information Organization: Part 1, The Entity-Event Fabric, D-Lib Magazine, July 2013. Allen, R.B., Model-Oriented Information Organization: Part 2, Discourse Relationships, D-Lib Magazine, July 2013.

Modeling Text Descriptions with FrameNet• We have lots of rich text descriptions from cultural descriptions. Could we use

that? After all, the text descriptions are representations. • One approach to modeling.• FrameNet ( https://framenet.icsi.berkeley.edu/fndrupal/ )

– Essential concepts in natural language described with frames. Connected semantic roles.

– Based on cognitive principles, but we can use it as a language resource for out modeling.

– We are particularly interested in verb frames because they describe transitions in attributes.• About 700 verb frames.

• Frame: “Releasing”A Captor ends the captivity or inhibition of the motion of a Theme from the Location_of_confinement. The release is in accord with the plans of the Captor. 

Modeling Text Descriptions (continued)• Limitations of frames

– Not always a perfect match• Other types of supplemental knowledge.

– Conceptual relationships.• Some of these from FrameNet• Classification (inheritance) hierarchies• Partonomies

– Grouping like-objects– Hierarchical parts of a system

– World knowledge from many sources• Newspapers, Census, Books, Diaries

• Separate the entity-event fabric from discourse.• Can we simplify the frame-based models with models of the underlying

mechanisms?

Code Fragment for Verb Frame “Release” as a Java Class

class V_Release {// A Captor ends the captivity or inhibition of the motion

// of a Theme from the Location_of_confinement. The // release is in accord with the plans of the Captor.

// State of confinement would be better

public V_Release(Person Captor, Person Captive){Captive.isPrisoner=false;

}; Could also be a group

} Too simplistic

Example Text• We used textbook or Wikipedia-level texts– These are relatively straight-forward, with simple past

tense– By comparison, primary sources have many difficulties. Full

of slang, complex constructions, un-grammatical, and often incorrect statements.

– Some massaging is still required• Early history (1750-1820) of Minneapolis, Minnesota from

Wikipedia

French explorer Daniel Greysolon, Sieur du Lhut explored the Minnesota area in 1680 on a mission to extend French dominance over the area. While exploring the St. Croix River area, he got word that some other explorers had been held captive. He arranged for their release. 

Representing Processes (Flows)• More than individual events. Composite structures.– Extending verbs to be processes.• Baking as a narrow process of applying heat versus a

complex activity of using a recipes.– Abstractions

• Typology of processes– Deterministic sequences of events, scripts – Non-deterministic

• Error conditions and work-arounds

Representing Information Resources in the Community

• Representing the context in which information resources are generated and accessed.

• We need to couple the model of the information resource with flexible models of the community and of the reader.

• This is somewhat analogous to the insight about the records continuum model from archives.

Challenges for Coding Natural Language

• Processes • Information resources• Discourse versus events: Narrative, explanation, argumentation• Future events, goals• Inexact descriptions (“some”, “sometimes”)• Rich representations about people– Representations for mental events

• Culture– Abstraction

Status

– We have shown first steps to developing community models.

– More gaps than known events. Need to develop frameworks for adding constraints to high-level descriptions.

– FrameNet frames generally works well but they need to be extended and there are difficulties with some constructions.

– Many composites in natural language such as:“bake-baker-baking-bakery”How to represent this generally given the nuances such as baking at home vs baking in a bakery

Future Directions• Extending community models

– Multi-family genealogies– Modeling cities– Linking community and city models together (national models)– Wire in additional procedures (e.g., laws).

• Support User Interaction– Better support for discourse such as argumentation by authors– Better support for authoring model-oriented descriptions– Interactive interfaces for working with community histories– Supporting scholarly workbench– Tutorial like descriptions of histories.– Interactive historical re-enactors, games, and cyber-dramas

• Broader effort to develop model-oriented information organization– Application of model-oriented information organization to museum objects and

informatics– Relationship to cognitive modeling– Frames as a protocol for agents in multi-agent systems– Standards

Strategies for Model-Oriented Information Organization

Robert B. ALLEN  ロバート アレンResearch Center for Knowledge CommunitiesUniversity of TsukubaTsukuba, Japan

As of March 1, 2014: Department of Library and Information ScienceYonsei University, Seoul, Korea

For more information see: http://boballen.info/Contact: [email protected]