discovering web access patterns and trends by applying olap and data mining technology on web logs...

18
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 성 성

Upload: thomasine-patrick

Post on 11-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Discovering Web Access Patterns and Trends by Applying OLAP

and Data Mining Technology on Web logs

Data Engineering Lab

성 유 진

Page 2: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Abstract

Web server log files analysis • server performance improvement• system performance improvement• customer targeting in electronic commerce

problem and difficulty• large raw log data processing is not easy• data reduce

• size and time

Page 3: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

• current weglogminer • slow, inflexible, difficult to maintain

• only frequency count not enough WebLogMiner

• Virtual University/data mining WeblogMiner• OLAP and data mining technique• multi-dimensional data cube• scalability, interactivity, variety, flexibility

Page 4: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Design of a Web log Miner

Web log server log file information• domain name of the request / user name / date and time of

the request / the method of the request(GET, POST) / the name of the file requested / the result of the request(success, failure, error, etc) / size of the data sent back / the URL of the referring page / identification of the client agent

• Example210.114.3.64 - - [01/Jul/1998:17:34:05 0900]"GET/~yjsung/sign.htmlHTTP/1.1" 200 740

210.114.3.64 -- [01/Jul/1998:17:38:44-0900]"POST/cgi-bin/yjsung/signHTTP/1.1" 200 352

POST : 브라우저가 채워진 양식을 서버에 전달 할 때 GET : 서버로부터의 데이터 요청 시

Page 5: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

• Cache information • frequent backtracking and reload : deficient design

– client site log

• Access count• not always the measure of interestingness

– 특정 document 를 access 하기 위해 반드시 거쳐야하는 사이트

• Time and Date • evaluate user interest by time spent

• Domain name • Sequence of requests can predict next request

improve traffic

Page 6: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

.Filtering the data, creating relational DB

2. Data cube construction

3. OLAP is used

4. Data mining technique are used

WebLogMiner 4 Stages

Page 7: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

1.DATABASE CONSTRUCTION FROM SERVER LOG FILES

Data Cleansing and Transformation• filter out page graphics(sound and video) but 보존• two types

• without knowledge about site– (time day, month, year 등으로의 transformation 은 서버 정보

없이 가능 )

• with knowledge about site : – associating server request to intended action needs site structure

• relation database• cleaned data and new implicit data is added

Page 8: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

2.MULTI-DIMENSIONAL WEB LOG DATA

CUBE CONSTRUCTION AND MANIPULATION Data Cube

• group by operator in SQL is used to compute aggregates on a set of attributes

sum of sales by P, C: for each product, give a breakdown on how much of it was sold to each customer

• CUBE is the n-dimensional generalization of group-by• gives remarkable flexibility to manipulate and view the

data• allow OLAP operation such as drill-down, roll-up,

slice and dice

Page 9: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

•Attributes - URL - domain name

- size of resource,

- time

. . .

•Attributes - URL - domain name

- size of resource,

- time

. . .

Page 10: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

3.DATA MINING ON WEB LOG DATA CUBE

AND WEB LOG DATABASE Data Characterization

• find rule that summarize user defined data set☞ the traffic on a web server for a given type of media

in a particular time of day Class comparison

• discover discriminant rules ☞ compare requests from two different web browsers

Association • discover the patterns that access to different

resources consistently occurring together Prediction

☞ access to a new resource on a given day can be prediected based on accesses to similar old resources on similar days

Page 11: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Classification • can be used to develop a better understanding of

each class in the web log database, and perhaps restructure a web sit or customize answers to requests based on classes of requests

Time-series analysis - • to analyze data along time sequences to discover

time-related interesting patterns …☞ disclose the patterns and trends of the

improvement of services of the web server

Focus will be on time-series analysis because web log records are highly time-related

Page 12: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Experiments with the web log miner Virtual-U:six different major component: Goal - understand the usage and user

behavior patterns

Data Cleaning and transformations• all entries were mapped one on one into

relational database• field site, user action are added.• Problem

– extraneous information => define those entries and eliminate them

– multiple server requests by same user action– same server request by multiple user actions– local activities are not recorded

Page 13: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진
Page 14: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Multi-dimensional data cube construction manipulation• summarization(group-bys on different

dimensions)• request/domain /event/session/bandwidth/error/referring organization /browser summary

ExamplesFigure2) OLAP analysis of Web log

Page 15: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Fig3) Typical event sequence and user behavior pattern analysis

Fig4) Web traffic analysis of Web log

Page 16: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진
Page 17: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

•Fig6) Event trees of month one to four

Page 18: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진

Discussion and Conclusion

WebLogMiner• OLAP and data mining technique• multi-dimensional data cube• major strength

• scalability, interactivity, variety, flexibility

Current log file 의 문제점• web server should collect more information• new structure is needed ==> would

simplify pre-processing