csc662 data mining, data warehouse and...

9
4 . . CSC662 Data Mining, Data Warehouse and Visualization ©๒๕๕๐ กรุง สินอภิรมยสราญ 2 2 กก (OLAP and Data cube) กก กก ©๒๕๕๐ กรุง สินอภิรมยสราญ 3 3 ©๒๕๕๐ กรุง สินอภิรมยสราญ 4 4 กก gender Date Product Region East 800 MHz Computers Mar-98 East 800 MHz Computers Mar-98 North CD Players Mar-98 West 13" Televisions Mar-98 East 13" Televisions Mar-98 South 13" Televisions Mar-98 North 13" Televisions Mar-98 West 800 MHz Computers Feb-98 South 800 MHz Computers Feb-98 West CD Players Feb-98 East CD Players Feb-98 North CD Players Feb-98 West 13" Televisions Feb-98 North 800 MHz Computers Jan-98 West CD Players Jan-98 South CD Players Jan-98 South 13" Televisions Jan-98 North 13" Televisions Jan-98 Region Product Date

Upload: trinhlien

Post on 01-May-2018

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

����� 4 ����� ������ ��. � . ก �� ������ ���� ��

�������� ������ � � !��������� � "�#��ก ��$������%�

CSC662 Data Mining, Data Warehouse and Visualization

©๒๕๕๐ กรุง สินอภิรมยสราญ 2

2

� ����!&'��((ก���ก� (OLAP and Data cube)

� !���&�����

� (ก���ก����

� ก� �)��$�*��&'��(ก%����

� �%��)�����ก� ���+�'+����

� �,�-��ก �����

��*.�$�

©๒๕๕๐ กรุง สินอภิรมยสราญ 3

3

�,�-��ก �&����/�

©๒๕๕๐ กรุง สินอภิรมยสราญ 4

4

��������

� ���$���,0�ก !���ก� ���� �!$���� ��'��� �!�)� �"&'��("�ก��1!�����2*���'�$� (���$ *������%�2%�3���� �ก4+������%�+��%�&'��(

gender

Date Produc

tReg

ion

East800 MHz ComputersMar-98

East800 MHz ComputersMar-98

NorthCD PlayersMar-98

West13" TelevisionsMar-98

East13" TelevisionsMar-98

South13" TelevisionsMar-98

North13" TelevisionsMar-98

West800 MHz ComputersFeb-98

South800 MHz ComputersFeb-98

WestCD PlayersFeb-98

EastCD PlayersFeb-98

NorthCD PlayersFeb-98

West13" TelevisionsFeb-98

North800 MHz ComputersJan-98

WestCD PlayersJan-98

SouthCD PlayersJan-98

South13" TelevisionsJan-98

North13" TelevisionsJan-98

RegionProductDate

Page 2: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 5

5

����&���� *�����1&1�����+�'ก%�5�6���� ����

� ��1&1�������$�1�� !���ก��$�0���%�

� ��1&1�������$�1�� !���ก���$�*��

ก%�$���%� (Symmetric

multiprocessor - SMP)

� ��1&1������)����2 '��ก%�$����1&1�� (Massively parallel processor - MPP)

©๒๕๕๐ กรุง สินอภิรมยสราญ 6

6

����*�ก&��ก� +�'�� *�����1&1�����

� ROLAP (Relational On-Line Analytical Process)

� MOLAP (Multidimensional On-Line Analytical Process)

� HOLAP (Hybrid On-Line Analytical Process)

©๒๕๕๐ กรุง สินอภิรมยสราญ 7

7

� ROLAP �*� Relational OLAP

� +�' !��"%�ก� 7��&'��(�����%�2%�3�+�ก� �ก8��!"%�ก� �%�&'��( ก� �)�/ !���+�����'��+�'ก� �%.�&'��)�,�� Query processing

� ��%�����ก� �&'�,0�&'��(��ก��1�����+�' !��ก� "%�ก� 7��&'��(����� !���3���2 2 '��ก%�+�'��9� SQL (Structured Query Language) +�ก� $�� ���!ก� � ��ก&'��(

� ���� ,+�'ก%�&'��(&���+$�1��ก : /�'

� !��ก� "%�ก� 7��&'��(����ก�� ��%�����ก� +�'���")������ก

�� *�����1&1�� !���� � (ROLAP)

©๒๕๕๐ กรุง สินอภิรมยสราญ 8

8

� MOLAP = Multidimensional OLAP (MOLAP) � +�'ก� �ก8�&'��(+�%ก9 !(ก���ก�$������ (Mutidimensional

cube)

� +�'������ก� �ก8�&'��(��� sparse �2 �!��&'��(����;��(�����(1��ก

� &'��(,(ก�%��0ก+� (&�� array $������ ก� �'��,0�+�'� �������;��%��&

� ก� �'��,0��5���+�'� ����)�+$' !����ก� �)��� ��� 8�

� �'��ก� ��*.����&��$�1������")� ��� ��1�ก%�&���&��(ก���ก� ( ��� &'��(���/�1+�1�(���)

�� *�����1&1�� !������ (MOLAP)

Page 3: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 9

9

� HOLAP = Hybrid OLAP� �;�(ก��� !$�1�� ROLAP (+�'ก%� !�%�1�����) �! MOLAP

(+�'ก%� !�%�����(�&0.�/)

� ��ก� +�'��*.����$�1������")���� %������/��� MDDB

� �������*�$��1��1�ก� +�'����(�

� Specialized SQL servers� �;���1&1�� SQL ����2���ก� +�'��'� 1������!�ก8�$��!

�� *�����1&1�� !����<��!�*�� :

©๒๕๕๐ กรุง สินอภิรมยสราญ 10

10

OLAP – Online Analytical Processing

� �����:

� &'��(�%��%���(1+�ก1��(ก���ก�

� ����2��������������� ,+�ก� ���� �!$���ก�$�*�"�ก SQL

� $%กก� &�����+�'ก� 2�"� �"�ก��1!���� �2*���(������� ก������ก��&0.�

©๒๕๕๐ กรุง สินอภิรมยสราญ 11

11

(ก���ก���� =

� age = Adult� product type = TV� date = 1/12/48� count = 10� value = $30000� cost = $5500

Age

Product type

Dat

e

©๒๕๕๐ กรุง สินอภิรมยสราญ 12

12

(ก���ก���� ๒

Age

Product type

Dat

e

Age = youngDate = 1/12/48Product type = TVCount = 6Value = $30000Cost = $5500

Age = youngDate = 1/12/48Product type = RCount = 10Value = $15000Cost = $400

Age = youngDate = 1/12/48Product type = SCount = 145Value = $50000Cost = $40000

Page 4: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 13

13

(ก���ก���� ๓

� ��'� 1�� (��� (Star Schema)

Facts

Week

Product

Product

Year

Region

Time

Channel

Revenue

Expenses

Units

Model

Type

Color

Channel

Region

Nation

District

Dealer

Time

©๒๕๕๐ กรุง สินอภิรมยสราญ 14

14

� ก� ����(ก���ก� 3 �������+�'�� ��������� ���������!$�'�

� ก� ����&'��($ก���� ���������!�1��&��(ก���ก�

(ก���ก���� ๔

Page Columns

Region:

North

Sales

Red

blob

Blue

blob

Total

1996

Rows 1997

Year Total

Dimension Example

Brand Mt. Airy

Store Atlanta

Customer segment Business

Product group Desks

Period January

Variable Units sold

©๒๕๕๐ กรุง สินอภิรมยสราญ 15

15

�%��)�����ก� ���: ก� $��� Pivot

©๒๕๕๐ กรุง สินอภิรมยสราญ 16

16

�%��)�����ก� ���: �"�!����� (Drill Down)

Page 5: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 17

17

ก� � �������� !$�1���$�*��&'��(ก%����

©๒๕๕๐ กรุง สินอภิรมยสราญ 18

18

� �%�&'��(�)��&'� multidimensional data model �2*��+�'ก%����

� &'��((ก���ก� !ก���'��

� ���� ��1� item(item_name, brand, type), time(day, week, month,

quarter, year)

� �����%��%���1� dollars_sold

� � �� ��ก cuboid ������%.� n �����1� 7�������� (base cuboid) �!� �� ��ก

cuboid �����(1 !�%��(����$ *����� 0 �1� apex cuboid �!"�����+�� ��ก data cube

�� ���!(ก���ก�

©๒๕๕๐ กรุง สินอภิรมยสราญ 19

19

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

all

time item location supplier

time,item time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,location

time,item,supplier

time,location,supplier

item,location,supplier

time, item, location, supplier

(ก���ก�+�����������5 (Lattice)

©๒๕๕๐ กรุง สินอภิรมยสราญ 20

20

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D(base) cuboid

all

product date country

product,date product,country date, country

product, date, country

�%���1��&������5

Page 6: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 21

21

� ��� ก� &���;�6-�ก��%�&�� product, month �! region

Pro

duct

Regio

n

Month

����: Product, Location, Time

���� !�%��%.�/�'�%���.Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

&'��(���$������

©๒๕๕๐ กรุง สินอภิรมยสราญ 22

22

�%���1��&��&'��((ก���ก�

Total annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntrysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Grand total����������

©๒๕๕๐ กรุง สินอภิรมยสราญ 23

23

�%���1����'� 1�����&�����ก&��

©๒๕๕๐ กรุง สินอภิรมยสราญ 24

24

� �� *����*������'����2

� ���� ,+�'6-�ก��%�&�����

� ��'���ก%��('+�'�%���

ก� �����'����2&��(ก���ก�

Page 7: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 25

25

� ��&0.� (Roll up): �;�&%.����� � ��&'��("�ก&'��(+� !�%���)�&0.�/

�(1 !�%�����(�&0.� �$�*��ก%�ก� �")����������� ���%23����/�1��������%.�

� �"�!� (Drill down): �;�&%.�����"�!�+� ��!����� ก1���*������

!�%�ก� �����+� !�%� ��!����� �$�*��ก%�ก� �2�������+$'ก%�(ก���ก�

� �?*���!�%���1� (Slice and dice): �;�&%.����ก� �*�ก����1��&��

(ก���ก�������� /�'�ก1ก� ก)�$���1�+$'ก%����� �!ก� �*�ก�?2�!����1�����

��+"

�%��)�����ก� +����

©๒๕๕๐ กรุง สินอภิรมยสราญ 26

26

� $��� (Pivot or rotate): �;�&%.����ก� �����������&��(ก���ก� ��3�

ก� �*��%��ก�ก� ����(ก���ก� ก��� �+�'�� ���������+�ก� �������

���&��(ก���ก� n ����

� �%��)�����ก� %ก9 !�*��

� �"�!&'�� (drill across): �;�ก� �"�!�/�(1 MDDB �*��

� �"�!�!� (drill through): �;�ก� �"�!�/+� !�%������)�ก�1�

(ก���ก�+�&%.���)������� ����"�!/�%������&��&'��(�����(1+�7��&'��(

�%��)�����ก� +����

©๒๕๕๐ กรุง สินอภิรมยสราญ 27

27

Shipping Method

AIR-EXPRESS

TRUCKORDER

Customer Orders

CONTRACTSCustomer

Product

PRODUCT GROUP

PRODUCT LINE

PRODUCT ITEM

SALES PERSON

DISTRICT

DIVISION

OrganizationPromotion

CITY

COUNTRY

REGION

Location

DAILYQTRLYANNUALYTime

��ก�������� �ก footprint

���")���&'��)�,���� *�&1����� (Star-Net query)

©๒๕๕๐ กรุง สินอภิรมยสราญ 28

28

� ��������7��: �('�'�� ���"�กก� �%.������7�� �'��*�ก�%��)�����ก� ���

�2*���0�&'��(������

� &'����*� �('�%.������7���%ก�;��('� �$� ����� !��ก� � �)�+$'ก� 2�"� ��(&'��(�;�//�'��1���� !���3���2�! ��� 8�

� &'������*� �('+�'��"*������7�������1��$ *�!�����"!2�"� ��%�������%ก9 ! �)�+$'/�1�$8�������� ก����� �ก��&0.� ��1,'��('+�'�'���%.������7������;�//�'�%.�$����"+�'�������2 �! ��� �����7������;�//�'����ก

������ก� �)� �" MDDB

Page 8: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 29

29

� +�'���2����� ��!ก� �%����+":

� �('�'�ก)�$���1�2� ������ �����1���ก%ก9 !�����ก�1��+���ก !�%�

� ���2����� �+�'�ก @��%�ก1���%����&'��(+�ก1���1��;�ก� �'�2� ������ก�1�� (Exception) $ *�/�1

� &'��(+�ก1�������&'��(��ก�1�����2� ������ ����ก)�$��"!,(ก����+�%ก9 !�����ก�1��"�กก1���*�� ��1�+�'2*.��������ก�1��

� �1�����1���.������� ก�� ��1� SelfExp, InExp, PathExp

������ก� �)� �" MDDB

©๒๕๕๐ กรุง สินอภิรมยสราญ 30

30

�%���1��ก� �)� �"+�(ก���ก�

©๒๕๕๐ กรุง สินอภิรมยสราญ 31

31

� &%.����ก� "%�ก� �� ����� (Information processing)

� +�'+�ก� �%.�&'��)�,�� ���� �!$��,����%��/ ���� ������� �� ก �6

� &%.����ก� ���� �!$�3� ก�" (Analytical processing)

� ก� ���� �!$�$������&��&'��(+��%�&'��(

� +�'ก� ก !�)�ก� ��� /�'�ก1 slice-dice, drilling, pivoting

� &%.����ก� �)��$�*��&'��( (Data mining)

� �'�$����� ('����A���(1+�&'��(

� ก� ������'����2���3 �

ก� �)��%�&'��(/+�'

©๒๕๕๐ กรุง สินอภิรมยสราญ 32

32

� ��������� (On-Line Analytical Mining)

� +�'�%�&'��(�;�7�� �'���ก����'�� OLAP

� +�'ก !���ก� �� ������1�� ��1� ODBC, OLEDB, Web accessing,

service facilities, reporting �!�%��)�����ก� ���

� �;�ก� ���� �!$�&'��(���+�'����;�7��

� �;�&%.�ก� �*�ก6-�ก��%�ก� �)��$�*��&'��(����������/�'�1��

� ก� ���&'��'��ก%�&��6-�ก��%� &%.������3��!���ก� �)��$�*��&'��(

���� OLAM

Page 9: CSC662 Data Mining, Data Warehouse and Visualizationpioneer.netserv.chula.ac.th/~skrung/csc662/04olap_4up.pdf · 4 . . ก ˘ˇ ˆ ˙˝ ˇ˙˛˚ ˜˙˛ ˙ ˆ ˛ !˚ ˙ ˙ ˆ " #˙

©๒๕๕๐ กรุง สินอภิรมยสราญ 33

33

Data Warehous

e

Meta Data

MDDB

OLAMEngine

OLAPEngine

User GUI API

Data Cube API

Database API

Data cleaning

Data integration

Layer3

OLAP/OLAM

Layer2

MDDB

Layer1

Data Repository

Layer4

User Interface

Filtering&Integration Filtering

Databases

Mining query Mining result

�,�-��ก �����

©๒๕๕๐ กรุง สินอภิรมยสราญ 34

34

� �%�&'��( �*� subject-oriented, integrated, time-variant, nonvolatile collection

of data in support of management’s decision-making process

� ก� ��ก����%�&'��(+�'� Star schema, snowflake schema, fact constellations

� ��������� dimensions �!�%��%� measures

� �%��)�����ก� ���: drilling, rolling, slicing, dicing �! pivoting

� ��1&1�����: ROLAP, MOLAP, HOLAP

� ก� +�'����%�&'��(�! MDDB

� +�'ก� �)� �" $ *����� (OLAM:on-line analytical mining)

� �

©๒๕๕๐ กรุง สินอภิรมยสราญ 35

35

� S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases, 506-521, Bombay, India, Sept. 1996.

� D. Agrawal, A. E. Abbadi, A. Singh, and T. Yurek. Efficient view maintenance in data warehouses. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data, 417-427, Tucson, Arizona, May 1997.

� R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data, 94-105, Seattle, Washington, June 1998.

� R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc. 1997 Int. Conf. Data Engineering, 232-243, Birmingham, England, April 1997.

� K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), 359-370, Philadelphia, PA, June 1999.

� S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26:65-74, 1997.

� OLAP council. MDAPI specification version 2.0. In http://www.olapcouncil.org/research/apily.htm, 1998.

��ก�� �'�����

©๒๕๕๐ กรุง สินอภิรมยสราญ 36

36

� J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.

� V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. ACM-SIGMOD Int. Conf. Management of Data, pages 205-216, Montreal, Canada, June 1996.

� Microsoft. OLEDB for OLAP programmer's reference version 1.0. In http://www.microsoft.com/data/oledb/olap, 1998.

� K. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. 1997 Int. Conf. Very Large Data Bases, 116-125, Athens, Greece, Aug. 1997.

� K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. Int. Conf. of Extending Database Technology (EDBT'98), 263-277, Valencia, Spain, March 1998.

� S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. Int. Conf. of Extending Database Technology (EDBT'98), 168-182, Valencia, Spain, March 1998.

� E. Thomsen. OLAP Solutions: Building Multidimensional Information Systems. John Wiley & Sons, 1997.

��ก�� �'����� 2