kanthaka - high volume cdr analyzer

24
Big Data CDR Analyzer 080201N M.K.P.R. Jayawardhana 080254D P.K.A.M. Kumara 080331L W.D.A.I. Paranawithana 080357V T.D.K. Perera Project Supervisors- Mr. Thilina Anjitha hSenid Dr.Shahani Markus Weerawarana

Upload: pushpalanka-jayawardhana

Post on 15-Jan-2015

3.619 views

Category:

Technology


1 download

DESCRIPTION

'Kanthaka' is an attempt to bring the benefits of Big Data technologies to telecom industry. The objective of the system is to analyze the CDRs (Caller Detail Record) and give results in near real time. This is carried out as a final year project for my degree B. Sc. of Engineering (Hons) at University of Moratuwa as a team with 3 more colleagues, under the supervision of a senior lecturer and an industry expert. The presentation exhibits the background, findings after literature review and proposing architecture of the system as for now. Any feed backs on improvements that can be made, are warmly welcome!

TRANSCRIPT

Page 1: Kanthaka - High Volume CDR Analyzer

Big Data CDR Analyzer

080201N – M.K.P.R. Jayawardhana

080254D – P.K.A.M. Kumara

080331L – W.D.A.I. Paranawithana

080357V – T.D.K. Perera

Project Supervisors- Mr. Thilina Anjitha – hSenid Dr.Shahani Markus Weerawarana

Page 2: Kanthaka - High Volume CDR Analyzer

Overview

• Background • Current Situation • Scope and Assumptions • Kanthaka – big data CDR Analyzer System • Technology Comparison - Map Reduce - No SQL Databases • Architecture • Project Plan • Risks and Possible Remedies • References

Page 3: Kanthaka - High Volume CDR Analyzer

Background Mobile Promotions

Page 4: Kanthaka - High Volume CDR Analyzer

Current Situation

• Promotions based only on their network usage

• Use only active call switch for triggering promotions

• No way of analyzing and processing high volume CDR records

• No efficient CDR analyzing method

• No access to historical data

• Complex rules not supported

&@$*#

Page 5: Kanthaka - High Volume CDR Analyzer

to rescue

• Selecting eligible users for both commercial organizations based and network usage based promotions.

Eg- giving 20% discount for pizza lovers within age group 16-40 who have called pizza hut more than 5 times a month

• High volume CDR analysis.

• Near real time selection of eligible users for promotions.

Page 6: Kanthaka - High Volume CDR Analyzer

• CDR Analyzer system which

▫ can process 30 million records per day

▫ can produce results within 10-15 seconds

▫ provides a GUI to define dynamic rules

▫ can be used to offer real-time sales promotions

for mobile subscribers

Page 7: Kanthaka - High Volume CDR Analyzer

Scope and Assumptions Scope

30 M

Multiple Rules

Offer Promotion

30 M

Single Rule

Select eligibilities for promotion only

Real system operation Operation expect by Kanthaka

Page 8: Kanthaka - High Volume CDR Analyzer

Assumptions

• CDR records can be only in .CSV format.

• Event type can be in different types like SMS, Voice call, MMS, USSD, Top-up, GPRS, LBS.

• CDR can be received as batches to the system asynchronously.

• Only 6 attributes out of many attributes will be considered during processing.

Page 9: Kanthaka - High Volume CDR Analyzer

Technology Comparison

Page 10: Kanthaka - High Volume CDR Analyzer

Lot of data + higher speed

--> Scale out system

Page 11: Kanthaka - High Volume CDR Analyzer

Map Reduce Hadoop map-reduce • Can handle lot of data • Latency is high that not suitable where results are expected in near real time

To count words of size of 100KB file Start time = 01.04.44 End time =01.05.12 Total time = 28 sec

Page 12: Kanthaka - High Volume CDR Analyzer

DB Technology Comparison

• RDMS

▫ Provide ACID properties

▫ Use sharding to scale up

▫ Managing overhead is huge in scaling up

▫ Performance degrade with higher data load

▫ Less partition tolerant

Page 13: Kanthaka - High Volume CDR Analyzer

DB Technology Comparison Ctd.

• NoSQL

▫ Lot of available options(Cassandra, HBase, MongoDB, Hive)

▫ Promised easy scale up(Lot of big users – Facebook, Twitter)

▫ Provide BASE properties under CAP theorem

▫ Hard to model the system into limited data model

▫ Partition tolerant

▫ More memory --> Higher performance

Page 14: Kanthaka - High Volume CDR Analyzer

DB Technology Comparison Ctd.

• NewSQL

▫ Provide ACID properties

▫ Familiar relational data model

▫ Options available(ScaleDB, VoltDB)

▫ Totally run on memory, hence need lot of memory

▫ Promised speed

▫ Persistency achieved by replaying logs

Page 15: Kanthaka - High Volume CDR Analyzer

With persistency, less restricted hardware, proven performance,

best to try out is NoSQL.

• Cassandra – a key-value pair column family store(Used at Facebook, Twitter, eBay)

• HBase – a key value pair column family store (Facebook)

• MongoDB – document store(Adobe)

• Hive – HDFS based database

Page 16: Kanthaka - High Volume CDR Analyzer

YCSB Benchmarks

• With more big users, active mailing lists, most promising technologies (secondary index, counters) best to try out is Cassandra.

Page 17: Kanthaka - High Volume CDR Analyzer

Technology selection

Technologies left behind Technologies selected

• Complex Event Processing engines(CEP)

▫ No persistency

• Rules Engine

▫ More layers More latency

• Hadoop

• NoSQL DB- Hbase, MongoDB, Hive

• NoSQL DB - Cassandra

Page 18: Kanthaka - High Volume CDR Analyzer

Architecture

Page 19: Kanthaka - High Volume CDR Analyzer

Project Plan

Milestones Target date Status

First chapters of final report - Done

ERU abstracts - Accepted

ERU Paper 31/07/2012 Due

Architecture 06/06/2012 Done

Setting up the Cassandra cluster 06/06/2012 Done

GUI for rule define 15/06/2012 On going

Bulk data load to Cassandra 15/06/2012 On going

System Requirement Specification 20/06/2012 Due

Query data from database periodically 26/06/2012 Due

Initial Design Document 27/06/2012 Due

Algorithm for Pre-processing 10/07/2012 Due

Testing 10/07/2012 Due

Final report 10/08/2012 Due

Page 20: Kanthaka - High Volume CDR Analyzer

Risks and Possible

Remedies

• NoSQL databases

High performance More memory

Use an external cluster with descent memory

• In the long run

Performance degrade More data

Archiving

Page 21: Kanthaka - High Volume CDR Analyzer

• Concurrency issues handling

Low speed Locking database

Use shadow copy

• NoSQL fails to achieve requirements

Options :

NewSQL– VoltDB (totally run on memory)

CEP (Need actions to preserve persistency )

• Handling sudden peaks

Should have an auto balancing mechanism ready

Page 22: Kanthaka - High Volume CDR Analyzer

Final Deliverables

• Big Data CDR Analyzer system

• Research Paper

• Final Report

Page 23: Kanthaka - High Volume CDR Analyzer

References

• http://www.slideshare.net/gvdinesh/cap-and-base-8169489

• B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with YCSB,” 2010, pp. 143–154.

Visit us at Kanthaka

Page 24: Kanthaka - High Volume CDR Analyzer

Thank You!