Download - Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)

Scalable load-balancing forlarge-scale big data applications

+ Brazil, São Paulo, USP, IME

Carlos Eduardo Moreira dos SantosUniversity of São Paulo

University of Tokyo, 2014-05-29

Brazil

Brazil

● 5th largest country (8,515,767 km²)● 27 states and over 5.5k cities● Capital: Brasília● Language: (Brazilian) Portuguese● 6th most populous (202,656,788 in 2014)● 8th largest economy (Gross Domestic Product)● Currency: Brazilian Real● Info relative to Japan

○ Size: 22.5 * Japan's○ Population: 1.6 * Japan's○ "Distance": 27h-hour flight○ Time: Japan's minus 12h

Brazil

São Paulo

São Paulo

● Largest Japanese community outside Japan (665k in 2010)

● 7th largest metropolitan area (7,943.818 km²)○ 0.59 * Tokyo's

● 8th most populous (19,956,590 in 2012)○ 0.54 * Tokyo's

● "Financial capital of Brazil"● 10th largest Gross Domestic Product in the world● BOVESPA stock exchange

○ Largest in Latin America○ Second in the world, in market value

● Largest number of helicopters in the world

http://en.wikipedia.org/wiki/Stock_exchange

University of São Paulo

● Latin America's largest University● >25% of Brazilian scientific production● QS World University Rankings

○ Improving rank position■ 2009: 207th■ 2013: 127th

○ Global top 50 in 7 of the 30 disciplines

University of São Paulo

USP(2010)

Tokyo Univ. (2013)

Professors 5,732 2,604

Undergrads 56,998 14,120

Graduate students 25,591 13,878

Main campus size 7.4 km² 1.6 km²

Institute of Mathematics and Statistics (IME) CS Department

● 42 full-time professors (+ 4 active retired)● 250 undergrads● 223 graduate students (124 masters + 99

PhD)● Graduating per year

○ 40-50 Bachelors○ 44 Masters○ 10 PhDs

Institute of Mathematics and Statistics (IME) CS Department

● Research Areas○ Computer Theory○ Artificial Intelligence○ Software Engineering○ Parallel, Distributed, and Grid Computing○ Continuous Optimization○ Combinatorial Optimization○ Databases○ Software Systems○ Bioinformatics

FLOSS Competence Center● Founded in January/2009 at our department:

○ USP Free and Open Source Competence Centre○ Funded by European Commission, Brazilian

government, and USP● Goal: promote the use of FLOSS and work towards

improving its quality○ Teaching○ Research○ Consulting

2014 Brazilian Soccer Team

Parallel/Distributed Systems Group

● Professors1. Alfredo Goldman2. Daniel Batista3. Fabio Kon4. Marco Aurélio Gerosa5. Marcos Dimas Gubitoso


● Close Collaborators1. João Eduardo Ferreira2. Marcelo Finger3. Siang W. Song4. Flavio S. C. Silva5. Kunio Okuda6. Routo Terada7. Kelly Braghetto8. Renata Wassermann


● Students○ ~20 doctoral○ ~30 masters○ ~20 undergrads

Parallel/Distributed Systems Group● Research Areas

○ Software Engineering○ Agile Software Development methodologies○ OOP and Patterns○ Parallel Computing / HPC○ Distributed Systems / Middleware○ Grid Computing / Cloud Computing○ Big Data○ Databases (distributed / mobile)○ Object-Orientation in Software Architectures○ Mobile Computing○ Energy Efficiency

Parallel/Distributed Systems GroupEducation

● Undergraduate and graduate courses○ Parallel, Distributed, and Cloud Computing○ Advanced Object Oriented Software Development○ eXtreme Programming Laboratory○ Entrepreneurship in Software Startups

● Continuing Education and Community courses○ Grid/Cloud Computing○ Web development with advanced OO tools○ Design Patterns and Agile Software Development

Parallel/Distributed Systems GroupEducation

● Consulting work - OO software development○ São Paulo Legislature (Assembléia Legislativa)○ Ministry of Health○ USP administration, CPqD, LARC, Scopus, ITM, etc.○ Entrepreneurship (for startups)

Main Research Projects

● HP Baile (Scalable, cloud-based systems)● CHOReOS (Web Service Choreographies)● InteGrade (Opportunistic Grid Computing)● Microsoft Borboleta (Telehealth with

smartphones)● Agile Methods for Software Development● Qualipso (Quality in Open Source)● IBM Eclipse Innovation

CHOReOS

Scalable Web Service Choreographies for the Future Internet● 2010 - 2013● European Commision funding● 16 partners (education/industry) from Europe

(France, Greece, Italy, Lithuania, Latvia and UK) and Brazil (IME - USP)

CHOReOS

Enactment Engine● Input

○ Web Services (implementation and/or URL)○ Metadata (dependency info, etc)

● Provision cloud resources● Deploy Web Services● Configure dependencies (by roles)● Technologies: Java, SOAP, REST, Chef

Embraer

● 3rd biggest world's aircraft manufacturer● 20k employees● Clients in 55 countries

○ Japan Airlines (E170)○ Armed forces in 48 countries

● 2013 net income: US$ 342 millions

HP Baile

Development and use of WS choreographies in large-scale environments● 2010-2012● Funded by HP Brasil● Collaboration with HP Labs● Some outcomes

○ Rehearsal: WS choreographies with TDD○ Scalability Explorer○ Tech transfer on change impact analysis for

workflow repository management.

InteGrade

● 2002 - 2011● Object-oriented grid middleware● Opportunistic● My final work for undergraduation

○ Grid Computing Resource Management - Node Control Center, 2009■ Limit CPU usage on the client-side■ Web interface (C++)

InteGrade Node Control Center

● Multi-core● CPU affinity

Brazil


Motivations● Increasing supercomputer power

○ 2008 Blue Gene/P Intrepid system■ 40,960 nodes■ 163,840 processor cores

○ 2011/12 Blue Gene/Q Sequoia system■ 98,304 nodes■ 1.6 million processor cores

● "Unlimited" resources in cloud computing● Big Data


Questions● Can centralized systems handle the load?● What about decentralized systems?● Are distributed systems required?


Many applications and solutions are available. We will start with MapReduce.Apache Hadoop implementation (2005)● Open Source (Apache top-level project)● Useful in a wide range of applications● Global community of users and contributors

○ Commercial support for companies○ Sponsors: Yahoo!, Google, HP, IBM, Facebook, ...

São Paulo - Liberdade

MapReduce Paradigm

2004 by Google

MapReduce Paradigmfunction map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)

function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)

Hadoop v1

Hadoop v2 (YARN)

Hadoop v1 vs v2

● Pros○ Job Tracker was split

■ Resource Manager (RM)■ Application Master (AM)

○ As many AMs as jobs○ 5k nodes in 2009 to 10k in 2012

■ Same price = 2x resources● RM and AM are still centralized components

Research

● Study scalability in Hadoop v1, v2○ Experiments○ Understand scalability gains in YARN

● Scalability limits○ Model centralized components overhead○ Predict scalability limits by simulation

● Conceive and simulate an alternative solution

Related Works - MTC

● Falkon (2007)○ Less features○ 487 vs 11 tasks/sec in Condor (2004)

● MATRIX (2013)○ Fully distributed○ Work-stealing leads to better efficiency (18-82% to

vs 92-97%)● CloudKon (CCGrid 2014)

○ Based on cloud services (IaaS and SaaS)○ The only one to support 256 VMs (up to 1024)○ Blames too many open TCP connections

Related Works - MTC

Proposal

● Less functionalities● Distributed selfish load balancing by

Adolphs & Berenbrink● No global information

○ Less open connections● ε-approximate NE convergence in O(ln

(m/n))○ Mathematically guaranteed to be fast○ Scalable

● Can deal with different speeds, weights○ Data-awareness

Progress

Measuring Hadoop v1 and v2 latency● HiBench suite● Nuvem USP Cloud up to 64 VMs● Experiments automatization with Python

○ VM Management○ Hadoop Management○ Monitoring○ Log parsing○ Graphs

● Deadline: qualifying exam in August

The End

Thank you! Questions?

Links

● http://www.usp.br● http://www.ime.usp.br● http://ccsl.ime.usp.br● http://www.choreos.eu● http://ccsl.ime.usp.br/baile● http://www.integrade.org.br● Contact: cadu at ime.usp.br

http://www.usp.br

http://www.usp.br

http://www.ime.usp.br

http://www.ime.usp.br

http://ccsl.ime.usp.br

http://ccsl.ime.usp.br

http://www.choreos.eu

http://www.choreos.eu

http://ccsl.ime.usp.br/baile

http://ccsl.ime.usp.br/baile

http://www.integrade.org.br

http://www.integrade.org.br

Download - Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)

Top Related