inconsistencies of connection for heterogeneity and a new rela,on discovery method

20
Inconsistencies of Connec,on for Heterogeneity and a New Rela,on Discovery Method that Solved them Takafumi NAKANISHI , Kiyotaka UCHIMOTO, Yutaka KIDAWARA Na,onal Ins,tute of Informa,on and Communica,on Technology (NICT), Japan

Upload: takafumi-nakanishi

Post on 16-Jan-2015

70 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Inconsistencies  of  Connec,on  for  Heterogeneity  and  

a  New  Rela,on  Discovery  Method  that  Solved  them

Takafumi  NAKANISHI  ,  Kiyotaka  UCHIMOTO,  Yutaka  KIDAWARA  

Na,onal  Ins,tute  of  Informa,on  and  Communica,on  

Technology  (NICT),  Japan  

Page 2: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

What’s  Big  Data?

•  Speed  up?  Processing  a  lot  of  data?  – What  differences  are  there  between  VLDB  and  Big  Data.  (Very  Large  Database)?  

•  Fragmental  data  exist  – Un,l  now,  scien,sts  work  such  data  for  simula,on.  

•  Heterogeneous  Database  Integra,on(Cross  database  search)  –  S,ll  Considering?

Page 3: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Purposes  of  this  presenta,on

•  We  should  consider  the  paradigm  shiV  in  computer  science.  – From  the  closed  assump,on  to  the  opened  assump,on  

– What  are  there  any  problems?  •  Businesspeople  require  not  only  EDW  (Enterprise  Data  Warehouse)  but  also  the  other  analysis  methods.    •  Discovering  rela,on  between  heterogeneous  concept,  dataset,  etc.  •  Three  Opened  Assump,on’s  Evils  

Page 4: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

True  Problem  Defini,ons  of  Big  Data

Rela,on  Discovery  in  Heterogeneity  

Big  data

Speeding  Up,  Promo,on  of  

Streamlining,  and  Increasing  Data  

Volume    for  Processing

Schemaless  Data  and  New  Data  Processing  

Method

Distributed  Parallel  Processing,  High  

Performance  Compu,ng  (HPC),  Network  Delay,  

etc.   Construc)on  of  Big  data  environment  (Hardware,  middleware  researches)

Big  data  analy)cs  (So=ware  researches)

Closed  Assump,on  System  à  Open  Assump,on  System

Page 5: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

AI  Community DB  Community a1

a2

b10

b8 a9 a8

a7 a6

a5 a4

a3

b9

b6

b7

b4

b5

b2 b3

b1

Someone  adds  rela,onships  between  a3  and  b4

Rela,onships  among  persons  in  communi,es  AI  and  DB.  ai,  bj  are  researchers.  When  someone  adds  symmetric  and  transi,ve  rela,onships  between  a3  and  b4,  it  is  true  that  a1  is  related  to  b5  because  a1  is  related  to  a3,  a3  is  related  to  b4,  and  b4  is  related  to  b5.

Page 6: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Office  Community Music  Community a1

a2

b10

b8 a9 a8

a7 a6

a5 a4

a3

b9

b6

b7 b4

b5

b2 b3

b1 Someone  adds  rela,onships  between  a3  and  b4

Rela,onships  among  persons  in  workplace  and  music  communi,es.  ai  are  co-­‐workers,  and  bj  are  musicians.  When  someone  adds  symmetric  and  transi,ve  rela,onships  between  a3  and  b4,  it  is  actually  not  true  that a1  is  related  to  b5.  In  graph  structure,  it  is  true  that  a1  is  related  to  b5. However,  realis,cally,  a1  and  b5  do  not  share  ground  without  other  defini,ons  or  analysis.

Page 7: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Difference  of  two  examples

•  “AI Community” ∩  “DB Community” ≠ ∅. à Closed Assumption – Representation of relations in the previous methods

such as owl, RDF, etc.

•  “Office  Community” ∩  “Music  Community”  =  ∅. àOpened Assumption – unable representation of relations in the previous

method

Page 8: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Proof  of  inconsistency  of  order  rela,on  between  two  certain  sets  [1/2]

•  A = {a1, a2, … , an}, B = {b1, b2, …, bm} •  A ∩ B = ∅. •  Both sets A and B may define the order

relations differently. •  prove that we cannot discover the relationship

between sets A and B or other relationships when we get relationship f between a1 ∈ A and b1 ∈ B. à b1=f(a1)

Page 9: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Proof  of  inconsistency  of  order  rela,on  between  two  certain  sets  [2/2]

•  We prove that it is satisfied when bi = f(ai) is not true by induction. –  b1 = f(a1) is true by the above condition when i = 1. – We assume that bk = f(ak) is true when i = k. – When i = k + 1, bk+1 = f(ak+1) is not true.

•  set A has an order relation. set B has another order relation. –  bk ≤ bk+1 may not be true, if ak ≤ ak+1 is true and vice

versa. Furthermore, both ak ≤ ak+1 and bk ≤ bk+1 may not be true.

•  Although b1 = f(a1) is true, bi = f(ai) is not.

Page 10: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Proof  of  inconsistency  of  the  transi,ve  rela,on  between  two  certain  sets[1/2] •  A = {a1, a2, … , an}, B = {b1, b2, …, bm} •  A ∩ B = ∅. •  Set B has order relation b1 ≤ b2 ≤ b3 ≤ b4… – Transitive relation –  If b1 ≤ b2 and b2 ≤ b3 are true, b1 ≤ b3 is true

•  Set A has its own order relation.

Page 11: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Proof  of  inconsistency  of  the  transi,ve  rela,on  between  two  certain  sets[2/2] •  Assume a1 = (1, 5), b1 =(2, 1), b2 = (3, 2), b3 = (4, 3). •  We prove that a1 ≤ b3 is true when we get relation a1 ≤ b1. •  To reveal the conclusion first, a1 ≤ b3 may not satisfy.

•  The relationship of a1 and b1 focuses on each first element. •  Then a1 ≤ b1 is true. •  The order relation of set B focuses on more values of each second

element. •  Then b1 ≤ b2 ≤ b3, and if b1 ≤ b2 and b2 ≤ b3 is true, then b1 ≤ b3 is

true.

•  However, a1 ≤ b3 is not true in the order set of set B. •  Like the relation of a1 and b1, an inconsistency occurs whose order

and transitive relations of set B are not guaranteed.

Page 12: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Inconsistencies    –  Three  Opened  Assump,on’s  Evils   •  Inconsistency  is  shown  whose  rela,on  does  not  guarantee  the  future  

•  Inconsistency  where  any  transi,ve  rela,on  is  not  true,  when  anyone  connects  links  for  heterogeneous  fields  

•  Inconsistency  where  any  rela,on  in  heterogeneous  fields  cannot  be  discovered  in  set  theory

Page 13: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Misconcep,on  of  Future  Informa,on  Systems

•  A  user  Do  Not  want  to  retrieve  some  data,  need  some  solu,ons  – A  system  solve  some  clues  for  a  user  from  data  by  rela,vely  comparing  

–  It  is  important  to  rela,vely  compare  between  data.  •  We  can  Not  write  anymore  rela,onships  –  dynamical  changing  depending  on  user,  situa,on,  etc.  – when  data  are  changing,  rela,onships  are  changing  

•  We  cannot  create  indexes.  

•  We  cannot  discover  without  wri,ng  rela,onships  – However,  a  system  can  compare  on  the  basis.  

Page 14: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Functional Predicate

Set Theory Coordinates System

•  commutative property •  associative property •  distributive property •  reflexive relation •  antisymmetric relation •  transitive relation

•  axis adaptability evaluation •  uniqueness evaluation •  certainty evaluation •  predicate satisfaction evaluation

Incomplete  Mutual  Map  Transforma,on  Framework  between  set  theory  and  the  

Cartesian  system  of  coordinates.

Mutual  mapping  by  mathema,cal  rule,  formula,  etc.  (Because  the  mathema,cal  rule  and  formula  are  closed  assump,on)

Page 15: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Overview  of  our  method

Sampling  Data

•  A  query  given  by  a  user •  Sampling  the  data  set  depend  on  a  query

Selec,on  of  Basis

•  A  system  selects  some  basis  for  solu,on  of  query •  Order  rela,onships?,  con,nues  or  equal  interval  Sampling?    

Mapping    from  set  theory  to  the  

Cartesian  system  of  coordinates

• Mathema,cal  rule/formula    à  closed  assump,on

•  Crea,on  transforma,on  opera,on  on  the  closed  assump,on  manually.

Discovery  of  rela,onships  on  the  

the  Cartesian  system  of  coordinates

•  Predefini,on  of  func,onal  predicates •  Sa,sfying  each  func,on  predicates

 Re-­‐mapping  to  set  theory

•   Representa,on  of    predicate  in  predicate  func,ons •   Representa,on  of  reasons  in  basis

Page 16: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Example:  Crea,on  Func,onal  Predicate  –  dependOn

•  ”dependOn” means that set A relies on set X. – The value of element ai of set A should only

change with the variation of the value of element xj of set X.

•  ”dependOn” is represented in {A}(X), when set A depends on set X.

Page 17: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Example  Dataset

  Jan. Feb. Mar. Apr. May. Jun. Jul. Aug. Sep. Oct. Nov. Dec. Ave. 2007 4.9 6.1 8.2 12.3 18.7 22.7 23.5 28 24.1 17.1 11 6.5 15.3 2008 3.6 2.9 9 13.6 18 21.1 26.3 25.8 22.9 17.6 10.7 6.9 14.9 2009 4.3 5.5 7.6 14.1 19.4 22.2 25.4 25.8 22 16.8 11.4 6.7 15.1 2010 4.3 4.8 7.2 11.2 18.1 23.5 27 29 24.2 17.7 11.2 7.2 15.5 2011 2.4 4.9 6.1 12.6 17.8 22.9 27.1 26.6 23.9 17.1 12.3 4.8 14.9

  cucumber cabbage 2007 1168 604 2008 1226 594 2009 1102 662 2010 1231 739 2011 1179 573

MONTHLY  AVERAGE  TEMPERATURE  IN  GUMMA  PREFECTURE,  JAPAN

ANNUAL  AVERAGE  PRICE  (Y  en)  OF  CUCUMBERS  (5kg)  AND  CABBAGE(10kg)

Page 18: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Result     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Cucumber

AAE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

UE 0.031 0.394 0.028 0.345 0.707 0.002 0.207 0.188 0.355 0.924 0.090 0.043

CE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

BV -9.590 -27.269 8.075 -27.022 -67.471 2.254 16.039 16.006 32.882 132.937 -25.899 11.466

Cabbage

AAE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

UE 0.243 0.024 0.007 0.199 0.052 0.255 0.045 0.330 0.003 0.114 0.048 0.436

CE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

BV 34.617 8.705 -5.190 -26.357 23.588 37.635 9.576 27.231 3.696 59.930 -24.346 47.057

•  AAE: axis adaptability evaluation •  UE: uniqueness evaluation •  CE: certainty evaluation •  BV: predicate satisfaction evaluation

{Cucumber Price}(May temperature) Discovered  dependOn  Rela,ons

{Cucumber Price}(Oct temperature) {Cabbage Price}(Dec temperature)

Page 19: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Conclusion •  Three  opened  assump,on  evils  – We  represented  the  inconsistencies  of  past  researches  that  contributed  to  the  interconnec,on  of  such  heterogeneous  fields  as  Linked  Data,  and  our  past  researches.  

•  Map  transforma,on  framework  from  set  theory  to  the  Cartesian  system  of  coordinates  –  defining  such  predicate  func,ons  as  disjoint, meet, overlap,

coveredBy, covers, equal, contain, inside, correlate, moreThan, lessThan, alongWith, join, etc.

•  A  preliminary  evalua,on  of  predicate  func,on  ”dependOn”

Page 20: Inconsistencies of Connection for Heterogeneity and a New Rela,on Discovery Method

Thank  you