inconsistencies of connection for heterogeneity and a new rela,on discovery method
Post on 16-Jan-2015
70 Views
Preview:
DESCRIPTION
TRANSCRIPT
Inconsistencies of Connec,on for Heterogeneity and
a New Rela,on Discovery Method that Solved them
Takafumi NAKANISHI , Kiyotaka UCHIMOTO, Yutaka KIDAWARA
Na,onal Ins,tute of Informa,on and Communica,on
Technology (NICT), Japan
What’s Big Data?
• Speed up? Processing a lot of data? – What differences are there between VLDB and Big Data. (Very Large Database)?
• Fragmental data exist – Un,l now, scien,sts work such data for simula,on.
• Heterogeneous Database Integra,on(Cross database search) – S,ll Considering?
Purposes of this presenta,on
• We should consider the paradigm shiV in computer science. – From the closed assump,on to the opened assump,on
– What are there any problems? • Businesspeople require not only EDW (Enterprise Data Warehouse) but also the other analysis methods. • Discovering rela,on between heterogeneous concept, dataset, etc. • Three Opened Assump,on’s Evils
True Problem Defini,ons of Big Data
Rela,on Discovery in Heterogeneity
Big data
Speeding Up, Promo,on of
Streamlining, and Increasing Data
Volume for Processing
Schemaless Data and New Data Processing
Method
Distributed Parallel Processing, High
Performance Compu,ng (HPC), Network Delay,
etc. Construc)on of Big data environment (Hardware, middleware researches)
Big data analy)cs (So=ware researches)
Closed Assump,on System à Open Assump,on System
AI Community DB Community a1
a2
b10
b8 a9 a8
a7 a6
a5 a4
a3
b9
b6
b7
b4
b5
b2 b3
b1
Someone adds rela,onships between a3 and b4
Rela,onships among persons in communi,es AI and DB. ai, bj are researchers. When someone adds symmetric and transi,ve rela,onships between a3 and b4, it is true that a1 is related to b5 because a1 is related to a3, a3 is related to b4, and b4 is related to b5.
Office Community Music Community a1
a2
b10
b8 a9 a8
a7 a6
a5 a4
a3
b9
b6
b7 b4
b5
b2 b3
b1 Someone adds rela,onships between a3 and b4
Rela,onships among persons in workplace and music communi,es. ai are co-‐workers, and bj are musicians. When someone adds symmetric and transi,ve rela,onships between a3 and b4, it is actually not true that a1 is related to b5. In graph structure, it is true that a1 is related to b5. However, realis,cally, a1 and b5 do not share ground without other defini,ons or analysis.
Difference of two examples
• “AI Community” ∩ “DB Community” ≠ ∅. à Closed Assumption – Representation of relations in the previous methods
such as owl, RDF, etc.
• “Office Community” ∩ “Music Community” = ∅. àOpened Assumption – unable representation of relations in the previous
method
Proof of inconsistency of order rela,on between two certain sets [1/2]
• A = {a1, a2, … , an}, B = {b1, b2, …, bm} • A ∩ B = ∅. • Both sets A and B may define the order
relations differently. • prove that we cannot discover the relationship
between sets A and B or other relationships when we get relationship f between a1 ∈ A and b1 ∈ B. à b1=f(a1)
Proof of inconsistency of order rela,on between two certain sets [2/2]
• We prove that it is satisfied when bi = f(ai) is not true by induction. – b1 = f(a1) is true by the above condition when i = 1. – We assume that bk = f(ak) is true when i = k. – When i = k + 1, bk+1 = f(ak+1) is not true.
• set A has an order relation. set B has another order relation. – bk ≤ bk+1 may not be true, if ak ≤ ak+1 is true and vice
versa. Furthermore, both ak ≤ ak+1 and bk ≤ bk+1 may not be true.
• Although b1 = f(a1) is true, bi = f(ai) is not.
Proof of inconsistency of the transi,ve rela,on between two certain sets[1/2] • A = {a1, a2, … , an}, B = {b1, b2, …, bm} • A ∩ B = ∅. • Set B has order relation b1 ≤ b2 ≤ b3 ≤ b4… – Transitive relation – If b1 ≤ b2 and b2 ≤ b3 are true, b1 ≤ b3 is true
• Set A has its own order relation.
Proof of inconsistency of the transi,ve rela,on between two certain sets[2/2] • Assume a1 = (1, 5), b1 =(2, 1), b2 = (3, 2), b3 = (4, 3). • We prove that a1 ≤ b3 is true when we get relation a1 ≤ b1. • To reveal the conclusion first, a1 ≤ b3 may not satisfy.
• The relationship of a1 and b1 focuses on each first element. • Then a1 ≤ b1 is true. • The order relation of set B focuses on more values of each second
element. • Then b1 ≤ b2 ≤ b3, and if b1 ≤ b2 and b2 ≤ b3 is true, then b1 ≤ b3 is
true.
• However, a1 ≤ b3 is not true in the order set of set B. • Like the relation of a1 and b1, an inconsistency occurs whose order
and transitive relations of set B are not guaranteed.
Inconsistencies – Three Opened Assump,on’s Evils • Inconsistency is shown whose rela,on does not guarantee the future
• Inconsistency where any transi,ve rela,on is not true, when anyone connects links for heterogeneous fields
• Inconsistency where any rela,on in heterogeneous fields cannot be discovered in set theory
Misconcep,on of Future Informa,on Systems
• A user Do Not want to retrieve some data, need some solu,ons – A system solve some clues for a user from data by rela,vely comparing
– It is important to rela,vely compare between data. • We can Not write anymore rela,onships – dynamical changing depending on user, situa,on, etc. – when data are changing, rela,onships are changing
• We cannot create indexes.
• We cannot discover without wri,ng rela,onships – However, a system can compare on the basis.
Functional Predicate
Set Theory Coordinates System
• commutative property • associative property • distributive property • reflexive relation • antisymmetric relation • transitive relation
• axis adaptability evaluation • uniqueness evaluation • certainty evaluation • predicate satisfaction evaluation
Incomplete Mutual Map Transforma,on Framework between set theory and the
Cartesian system of coordinates.
Mutual mapping by mathema,cal rule, formula, etc. (Because the mathema,cal rule and formula are closed assump,on)
Overview of our method
Sampling Data
• A query given by a user • Sampling the data set depend on a query
Selec,on of Basis
• A system selects some basis for solu,on of query • Order rela,onships?, con,nues or equal interval Sampling?
Mapping from set theory to the
Cartesian system of coordinates
• Mathema,cal rule/formula à closed assump,on
• Crea,on transforma,on opera,on on the closed assump,on manually.
Discovery of rela,onships on the
the Cartesian system of coordinates
• Predefini,on of func,onal predicates • Sa,sfying each func,on predicates
Re-‐mapping to set theory
• Representa,on of predicate in predicate func,ons • Representa,on of reasons in basis
1
2
3
4
5
Example: Crea,on Func,onal Predicate – dependOn
• ”dependOn” means that set A relies on set X. – The value of element ai of set A should only
change with the variation of the value of element xj of set X.
• ”dependOn” is represented in {A}(X), when set A depends on set X.
Example Dataset
Jan. Feb. Mar. Apr. May. Jun. Jul. Aug. Sep. Oct. Nov. Dec. Ave. 2007 4.9 6.1 8.2 12.3 18.7 22.7 23.5 28 24.1 17.1 11 6.5 15.3 2008 3.6 2.9 9 13.6 18 21.1 26.3 25.8 22.9 17.6 10.7 6.9 14.9 2009 4.3 5.5 7.6 14.1 19.4 22.2 25.4 25.8 22 16.8 11.4 6.7 15.1 2010 4.3 4.8 7.2 11.2 18.1 23.5 27 29 24.2 17.7 11.2 7.2 15.5 2011 2.4 4.9 6.1 12.6 17.8 22.9 27.1 26.6 23.9 17.1 12.3 4.8 14.9
cucumber cabbage 2007 1168 604 2008 1226 594 2009 1102 662 2010 1231 739 2011 1179 573
MONTHLY AVERAGE TEMPERATURE IN GUMMA PREFECTURE, JAPAN
ANNUAL AVERAGE PRICE (Y en) OF CUCUMBERS (5kg) AND CABBAGE(10kg)
Result Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Cucumber
AAE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
UE 0.031 0.394 0.028 0.345 0.707 0.002 0.207 0.188 0.355 0.924 0.090 0.043
CE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
BV -9.590 -27.269 8.075 -27.022 -67.471 2.254 16.039 16.006 32.882 132.937 -25.899 11.466
Cabbage
AAE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
UE 0.243 0.024 0.007 0.199 0.052 0.255 0.045 0.330 0.003 0.114 0.048 0.436
CE 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
BV 34.617 8.705 -5.190 -26.357 23.588 37.635 9.576 27.231 3.696 59.930 -24.346 47.057
• AAE: axis adaptability evaluation • UE: uniqueness evaluation • CE: certainty evaluation • BV: predicate satisfaction evaluation
{Cucumber Price}(May temperature) Discovered dependOn Rela,ons
{Cucumber Price}(Oct temperature) {Cabbage Price}(Dec temperature)
Conclusion • Three opened assump,on evils – We represented the inconsistencies of past researches that contributed to the interconnec,on of such heterogeneous fields as Linked Data, and our past researches.
• Map transforma,on framework from set theory to the Cartesian system of coordinates – defining such predicate func,ons as disjoint, meet, overlap,
coveredBy, covers, equal, contain, inside, correlate, moreThan, lessThan, alongWith, join, etc.
• A preliminary evalua,on of predicate func,on ”dependOn”
Thank you
top related