"erlang, webmail and hibari" at rakuten tech talk

Post on 08-May-2015

3.897 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation materials to talk about erlang overview, webmail development by erlang and "hibari" use case for GB mail box web mail at Rakuten tech talk on August 24, 2010

TRANSCRIPT

People, Software, WebMail, BigDataPowered by

Erlang & Functional

Programming

Gemini Mobile Technologies, Inc.

August 24, 2010

GMT Erlang August 2010 - Rakuten Tech Talk 1/55

Agenda

• Introduction

• Erlang/OTP

• Why Erlang/OTP?

• WebMail Case Study

• Hibari Case Study

• What’s Next?

GMT Erlang August 2010 - Rakuten Tech Talk 2/55

IntroductionWho is Gemini Mobile Technologies?

• Founded: July, 2001• Offices: San Mateo, CA; Shibuya, Tokyo; Star City, Beijing• Milestones:

• 2003: Multimedia messaging service (MMS), Vodafone Japan• 2005: MMSC, Nextel International• 2006: MMSC, eMobile Japan• 2006: S!Town, Softbank Mobile• 2007: ExCast enterprise mail gateway, NTT docomo• 2008: eXplo(tm) service, China Unicom• 2008: Fax satellite gateway, NTT docomo• 2009: International MMS gateway, NTT docomo• 2010: WebMail, Japanese Mobile Carrier & Internet Provider• 2010: Hibari/BigData, Open Source Community

• Investors: Goldman Sachs, Ignite, Mizuho Capital, TokyoMUFJ, Nomura, Access, Aplix

• Erlang: Deployed in Japan, China, & Europeantelecoms use for 3 years

GMT Erlang August 2010 - Rakuten Tech Talk 3/55

Agenda

• Introduction

• Erlang/OTP

• Why Erlang/OTP?

• WebMail Case Study

• Hibari Case Study

• What’s Next?

GMT Erlang August 2010 - Rakuten Tech Talk 4/55

Erlang/OTPErlang is . . .

• General purpose programming language, runtime environment

• Originally written in Prolog, now self-hosting in ownenvironment

• Functional language, with strict evaluation, single assignment,and dynamic typing

• Support for concurrency, multi-core CPU, networkdistribution, and fault tolerance

• Designed for soft-real-time, non-stop applications

GMT Erlang August 2010 - Rakuten Tech Talk 5/55

Erlang/OTPOpen Telecom Platform is . . .

• Collection of libraries to support Erlang applications

• Standard support libraries: lists, trees, dictionaries, sets, files,queues, network sockets, time manipulation, generic servers &FSMs, basic mathematical funcs, string handling, timedevents,

• Error handling & logging, alarms, hot codeupgrade/downgrade, process tree manipulation & supervision,. . .

• Protocol stacks for ASN.1, CORBA, HTTP, SSL,Megaco/H.248, . . .

• Mnesia distributed database

• Foreign language interfaces for C/C++, Java, TCP-basedservers

GMT Erlang August 2010 - Rakuten Tech Talk 6/55

Why Erlang?

• Software for Ericsson’s products (telecomm switches, radiogear) was getting too complex: C, C++, Pascal, EriPascal,assembler, PLEX, . . . over 20 different languages used inproduction and research labs.

• Ad hoc mechanisms for field maintenance, bugfixes, upgrades.

• ”There must be a better way” . . .• Must be high-level to provide productivity gains.• Must support concurrency, error recovery. Soft-realtime

requires no back-tracking, very cheap thread model.• Hot code upgrades very desirable.

• Best-known product: AXD301 ATM switch with now 2 MLoCErlang, plus another 1+ MLoC C and C++ (proprietary h/wdrivers, third-party firmward/drivers and protocol stacks)

GMT Erlang August 2010 - Rakuten Tech Talk 7/55

Erlang Timelinehttp://www.erlang.org/course/history.html

• 1982-85: Language surveys

• 1985-86: Experiments with LISP, Prolog, Parlog.

• 1988: First Ericsson PBX product to use Erlang (in Prolog)

• 1989: Experimental rewrite of switch code, Plex -¿ Erlang,10x programmer efficiency. First non-Prolog-based interpreter.

• 1990: Conference papers, Erlang spreads to Bellcore & others.

• 1992: Ports to VxWorks, PC, Macintosh. First two ”real”Ericsson products start using Erlang.

• 1993: Network distribution added. Spinoff organization tosupport Erlang development.

GMT Erlang August 2010 - Rakuten Tech Talk 8/55

Erlang Timelinecontinued . . .

• 1995: Ericsson AXE-N product collapses (non-Erlang). Thereplacement ADX starts with Erlang.

• 1998: Erlang banned for new products: it wasn’t C++ :(

• 1998: Erlang open-sourced, new companies spin off

• Today . . .• Erlang still used in Ericsson (despite ban): productivity is too

high• AXD301 has 11% of world market (market leader), runs

British Telecom’s country-wide ATM network, handles 30-40million calls/week (avg 49-66 calls/sec), has experienced 31milliseconds of downtime per year (9 ”nines” reliability)

• Active and Growing Open Source Community

GMT Erlang August 2010 - Rakuten Tech Talk 9/55

Erlang Overview

• Concurrency: User-space thread model (extremely cheap tocreate, switch contexts, destroy), now support for multipleCPUs and multi-core CPUs. Such threads are really”processes”.

• Distribution: All inter-process communication by messagepassing. Multiple Erlang VMs (virtual machines)communicate transparently via TCP. Same syntax used formessage passing for intra- and inter-node communication.

• Robustness: All processes are isolated, no data sharing.Reliable detection of crashed processes, even on remote nodes.

• Hot code upgrade: old and new code can run simultaneouslyduring code upgrade. Support for data structure changes,module dependencies, etc.

GMT Erlang August 2010 - Rakuten Tech Talk 10/55

Erlang Overviewcontinued . . .

• External interfaces: via Erlang message passing over TCP,”standard” TCP & UDP protocols, UNIX pipes, shared libraryAPI interface.

• Portable: Same VM runs on Linux & UNIX, Windows,Macintosh, VxWorks. Message passing between heterogenoussystems not a problem.

• Many programming errors avoided by: garbage collected datastructures, single-assignment variables, robust exceptionhandling and inter-node communication

GMT Erlang August 2010 - Rakuten Tech Talk 11/55

Currency Oriented Programming

• Utterly independent processes: imagine they’re on differentmachines!

• Process semantics: No data sharing, copy-everything messagepassing

→ Sharing means: inefficient (distribution is Hard), complicated(mutexes, condition variables, write barriers, etc.)

• No penalty for massive parallelism (e.g. tens of thousands ofprocesses)

• Each process has an unforgeable name

• To send a message, the recipient’s process name is required

• Message passing semantics are unreliable, ”send and pray”

• Reliable monitoring of remote processes: when and why

• No unavoidable penalty for distribution

• Same behavior on any hosted OS

GMT Erlang August 2010 - Rakuten Tech Talk 12/55

Why use a Concurrency Oriented Programminglanguage?

• The world is parallel. And distributed.

• Things fail.

• The biggest challenge is using the proper degree of parallelismin a COP program . . . but it’s difficult to err when processesare cheap.

• Programs are automatically scalable: if it works on 1 CPU, itworks on many.

• Programs are automatically robust when a process fails, nomatter where the process is located.

See Appendix for additional information.

GMT Erlang August 2010 - Rakuten Tech Talk 13/55

Erlang in 11 Examples

”One minute per example” text courtesy of Joe Armstrong

• Sequential Erlang: 5 examples

• Concurrent Erlang: 2 examples

• Distribute Erlang: 1 example

• Fault-tolerant Erlang: 2 examples

• Bit syntax: 1 example

See Appendix for additional information.

GMT Erlang August 2010 - Rakuten Tech Talk 14/55

Sequential: Factorial

-module(math).

-export([fac/1]).

fac(N) when N > 0 -> N * fac(N-1);

fac(0) -> 1.

> math:fac(25).

15511210043330985984000000

GMT Erlang August 2010 - Rakuten Tech Talk 15/55

Sequential: Binary Tree Search

lookup(Key, {Key, Val, _, _}) ->

{ok, Val};

lookup(Key, {Key1, Val, Left, Right}) when Key < Key1 ->

lookup(Key, Left);

lookup(Key, {Key1, Val, Left, Right}) ->

lookup(Key, Right);

lookup(Key, nil) ->

not_found.

GMT Erlang August 2010 - Rakuten Tech Talk 16/55

Sequential: Append, Sort, Adder

%% append

append([H | T], L) -> [H | append(T, L)];

append([], L) -> L.

%% sort

sort([Pivot | T]) ->

sort([X || X <- T, X < Pivot]) ++

[Pivot] ++

sort([X || X <- T, X >= Pivot]);

sort([]) -> [].

%% adder

> Adder = fun(N) -> fun(X) -> X + N end end.

#Fun

> G = Adder(10).

#Fun

> G(5).

15

GMT Erlang August 2010 - Rakuten Tech Talk 17/55

Concurrent: Spawn, Send and Receive

%% spawn

Pid = spawn(fun() -> loop(0) end)

%% send

Pid ! Message,

...

%% receive

receive

Message1 ->

Actions1;

Message2 ->

Actions2;

...

after Time ->

TimeOutActions

end

GMT Erlang August 2010 - Rakuten Tech Talk 18/55

Distributed Erlang

...

true = net_kernel:connect_node(NodeName),

Pid1 = spawn(NodeName, Fun),

Pid2 = spawn(NodeName, Module, Func, ArgList),

true = is_process_alive(Pid1),

...

GMT Erlang August 2010 - Rakuten Tech Talk 19/55

Fault Tolerance: catch/throw

...

case (catch foo(A, B)) of

{abnormal_case1, Y} ->

...

{’EXIT’, Opps} ->

...

Val ->

...

end,

...

foo(A, B) ->

...

throw({abnormal_case1, ...})

GMT Erlang August 2010 - Rakuten Tech Talk 20/55

Fault Tolerance: monitor a process

...

process_flag(trap_exit, true),

Pid = spawn_link(fun() -> ... end),

receive

{’EXIT’, Pid, Why} ->

...

end

GMT Erlang August 2010 - Rakuten Tech Talk 21/55

Parsing an IP datagram

-define(IP_VERSION, 4).

-define(IP_MIN_HDR_LEN,5).

DgramSize = size(Dgram),

case Dgram of

<<?IP_VERSION:4, HLen:4,

SrvcType:8, TotLen:16, ID:16, Flgs:3,

FragOff:13, TTL:8, Proto:8, HdrChkSum:16,

SrcIP:32, DestIP:32, Body/binary>> when

HLen >= 5, 4*HLen =< DgramSize ->

OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN),

<<Opts:OptsLen/binary,Data/binary>> = Body,

...

GMT Erlang August 2010 - Rakuten Tech Talk 22/55

Agenda

• Introduction

• Erlang/OTP

• Why Erlang/OTP?

• WebMail Case Study

• Hibari Case Study

• What’s Next?

GMT Erlang August 2010 - Rakuten Tech Talk 23/55

Why Erlang/OTP?A Killer App

In 2008, Gemini deployed it’s first commercial Erlang-basedproduct . . . a high-performance “User Profile” storage server aspart of a larger system.

What wasn’t selected?

• LDAP - persistent, fast . . . but no transactions

• RDBMS - persistent, transactions . . . but too slow

Why was Erlang selected?

• Mnesia - persistent, fast, and transactions

• plus many other benefits (programmable, high quality, andopen source!)

and we haven’t looked back since . . . no regrets!

GMT Erlang August 2010 - Rakuten Tech Talk 24/55

Why Erlang/OTP?What have we learned?

Erlang and functional programming has taught us some goodpractices and lessons:

• lots of processes and messaging passing can be cheap

• shared and mutable data can be (are) evil

• side-effects can be (are) evil

• let it crash! . . . defensive programming is evil

• don’t (over) optimize too soon . . . the bottlenecks aren’talways where you expect

• keep it simple . . . less is more

• don’t be afraid to re-factor . . . when you have the right tools

• distributed systems can still be (are) difficult and complex

GMT Erlang August 2010 - Rakuten Tech Talk 25/55

Agenda

• Introduction

• Erlang/OTP

• Why Erlang/OTP?

• WebMail Case Study

• Hibari Case Study

• What’s Next?

GMT Erlang August 2010 - Rakuten Tech Talk 26/55

WebMail: Multi-Tier Architecture20K Meter View

MOBILE

HTTPSMTP/POP/IMAP

PCISPMTA

LDAP

O&M

FRONT API

DIRECTORY STORE DATA STORE

AUTH API

CLIENT API

BACK API

GMT Erlang August 2010 - Rakuten Tech Talk 27/55

WebMail: Multi-Tier Architecture10K Meter View

MOBILE

HTTP SMTP/POP/IMAP

PC ISP MTA

LDAP

O&M

M2FE I/F

MNESIAMNESIAHIBARI

AUTH I/F

M2CI I/F

M2FE AUTH I/F

M2FE JOBQ I/F

M2BE I/F

GMT Erlang August 2010 - Rakuten Tech Talk 28/55

WebMail: ErlangWhat’s It Doing?

• All core processing for the “webmail” application

• JSON-RPC with the Web browser-based UI (based on UBF)

• HTTP and LDAP with authentication and proxy to full-textindexing services

• UBF for most inter-application communication

• Interface with C++ components for speed, legacy protocolsupport, and code re-use

• Application/Transaction logging and message tracing

• Hibari distributed, scalable key-value store for all persistentdata

• Mnesia for job queuing and multi-indexed profile data

GMT Erlang August 2010 - Rakuten Tech Talk 29/55

WebMail: HibariKey-Value Storage for (Almost) Everything

• Profile Store• User• Mail• Mail Incoming & Outgoing Filters• User Interface• External ISP

• Address Book Store• vCards - Singletons & Packs• Labels - Folders, Flags, and User-Defined

• Mail Store• Messages - Singletons & Packs• Message Summaries - Singletons & Packs• Meta Data - Next Uid, Quotas, . . .• Labels - Folders, Flags, and User-Defined

• Quota Policy Store

GMT Erlang August 2010 - Rakuten Tech Talk 30/55

WebMail: MnesiaStorage for Everything Else

• Subset of Profile Store• Indexing & retrieval by various attributes• The WebMail application keeps Mnesia and Hibari

synchronized for provisioning, updates, and deprovisioning• The WebMail application uses Hibari as the master copy

• Job Queue• Outgoing mail, bounce messages, vacation messages, . . .• Notifications to external text indexer• Asynchronous mail deletion• Asynchronous user deprovisioning• . . .

Possible with Hibari-based storage, but Mnesia was easier (at theproject start).

GMT Erlang August 2010 - Rakuten Tech Talk 31/55

WebMail: Post (almost) MortemStuff We’ll Repeat

• Erlang, the secret sauce

→ Ericsson’s support of Erlang/OTP is wonderful

• UBF, QuickCheck, & UBF+QuickCheck

→ Auto-compilation of QuickCheck generators from UBFcontracts

• Test in various environments:

→ Exactly the same hardware as customer, on really old & slowhardware, and on a single box/laptop

• Automate everything possible: regression tests, performancetests, cluster setups, post-mortem log file gathering, . . .

• Document everything possible (with good tools): Git,AsciiDoc, Graphviz, “mscgen”

GMT Erlang August 2010 - Rakuten Tech Talk 32/55

WebMail: Post (almost) MortemStuff We Would Probably Do Differently

• Negotiate “less aggressive” schedule

• More hardware

• Always double check “X & Y” before customer triesdoing “X & Y”

• Always revisit and cleanup “initial” prototypes

• Better and “practical” code review by peers

• Better traffic models (for finding bottlenecks, garbagecollection issues, . . . )

• 100% automated unit test and code coverage analysis

GMT Erlang August 2010 - Rakuten Tech Talk 33/55

WebMail: Summary

• Technically, Erlang was a great fit for this large system.

→ Used another language (C++) whenever convenient.

• UBF is a very good tool for design, implementation, andtesting phases of a large project.

• Combining UBF and QuickCheck was invaluable in findingbugs that otherwise would’ve been discovered later.

• It’s feasible to develop real-time apps on top of a distributedkey-value database.

→ Hibari’s “strong consistency” support is a large advantage.

GMT Erlang August 2010 - Rakuten Tech Talk 34/55

Agenda

• Introduction

• Erlang/OTP

• Why Erlang/OTP?

• WebMail Case Study

• Hibari Case Study

• What’s Next?

GMT Erlang August 2010 - Rakuten Tech Talk 35/55

HibariWhat is Hibari?

• Hibari is a production-ready, distributed, key-value, big datastore.

→ China Mobile and China Unicom - SNS→ Japanese internet provider - GB mailbox webmail→ Japanese mobile carrier - GB mailbox webmail

• Hibari uses chain replication for strong consistency,high-availability, and durability.

• Hibari has excellent performance especially for read and largevalue operations.

• Hibari is open-source software under the Apache 2.0 license.

GMT Erlang August 2010 - Rakuten Tech Talk 36/55

HibariEnvironments

• Hibari runs on commodity, heterogeneous servers.

• Hibari supports Red Hat, CentOS, and Fedora Linuxdistributions.

→ Debian, Ubuntu, Gentoo, Mac OS X, and Free BSD arecoming soon.

• Hibari supports Erlang/OTP R13B04.

→ R14A is coming soon.

• Hibari supports Amazon S3, JSON-RPC-RFC4627,UBF/EBF/JSF and native Erlang client APIs.

→ Thrift is coming soon.

GMT Erlang August 2010 - Rakuten Tech Talk 37/55

HibariWhy Another NonSQL?

Durable updates Every update is written and flushed to stablestorage (fsync() system call) before sendingacknowledgments to the client.

Consistent updates After an update is acknowledged, no clientcan see an older version.

High Availability Each key can be replicated multiple times. Aslong as one copy of the key survives, all operationson that key are permitted.

GMT Erlang August 2010 - Rakuten Tech Talk 38/55

HibariWhy Another NonSQL?

Lockless API Locks are not required for all client operations.Optionally, Hibari supports “test-and-set” of eachkey-value pair via an increasing (enforced by theserver) timestamp value.

Micro-transactions Under limited circumstances, operations onmultiple keys can be given transactionalcommit/abort semantics.

GMT Erlang August 2010 - Rakuten Tech Talk 39/55

HibariOverview - Chain Replication

GMT Erlang August 2010 - Rakuten Tech Talk 40/55

HibariMisc - Chain Balancing

GMT Erlang August 2010 - Rakuten Tech Talk 41/55

HibariNetwork Partition - Admin Server

GMT Erlang August 2010 - Rakuten Tech Talk 42/55

HibariNetwork Partition - Chains

GMT Erlang August 2010 - Rakuten Tech Talk 43/55

HibariNetwork Partition - Clients

GMT Erlang August 2010 - Rakuten Tech Talk 44/55

HibariWhy Erlang/OTP?

• Functional

• Concurrency and Distribution

• Robustness

• Hot code and incremental upgrade

• Tools

→ Development, analysis, production support, . . .

• Efficiency and Productivity

→ Small teams make big impact.

• Ericsson’s support of Erlang/OTP is wonderful

Everything you need to build robust, high performance distributedsystems!

GMT Erlang August 2010 - Rakuten Tech Talk 45/55

Agenda

• Introduction

• Erlang/OTP

• Why Erlang/OTP?

• WebMail Case Study

• Hibari Case Study

• What’s Next?

GMT Erlang August 2010 - Rakuten Tech Talk 46/55

What’s Next?

• WebMail• improving the end-user’s experience• expanding the system’s capacity• adding new and valueable features and services

• Hibari• Benchmarking - YCSB performance test• Thrift and Cassandra API• Hadoop map/reduce integration• . . .

• Community Building• Erlang and Functional Programming

→ UBF hands-on workshop(s)

• Hibari and BigData

→ Hibari hands-on workshop(s)→ Application developer workshop(s)

GMT Erlang August 2010 - Rakuten Tech Talk 47/55

Work Hard, Work Smarter, Have Fun

Thank You

http://www.erlang.org/

http://www.geminimobile.com/http://www.geminimobile.jp/http://hibari.sourceforge.net/

http://github.com/norton/ubfhttp://github.com/norton/ubf-jsonrpchttp://github.com/norton/ubf-bertrpc

Feedback, Contributors Wanted: hibari@geminimobile.com

GMT Erlang August 2010 - Rakuten Tech Talk 48/55

AppendixAdditional Slides

Concurrency OrientedProgramming

GMT Erlang August 2010 - Rakuten Tech Talk 49/55

Java And COP

”The only safe way to execute multiple applications,written in the Java programming language, on the samecomputer is to use a separate JVM for each of them, andto execute each JVM in a separate OS process. Thisintroduces various inefficiencies in resource utilization,which downgrades performance, scalability, andapplication startup time.”

– Czajkowski & Daynes, Sun Microsystems

GMT Erlang August 2010 - Rakuten Tech Talk 50/55

JSR-000121, Application Isolation API

JSR-000121, Application Isolation API, appears (?) to implementsuch process separation and inter-object communication. It defines:

• 11 classes, 78 methods (not including constructors), and 3exceptions.

• Does not directly address inter-machine communication.

• Does not directly address debugging and profiling issues.

• ”Links” are used for communication between ”isolates”.However . . .

”To maintain isolation, Links provide only ”data” passingfacilities; normal Java Objects cannot be shared bypassing them. However, a limited number of object typesmay be passed, including byte arrays, strings, isolates,and links themselves.”

GMT Erlang August 2010 - Rakuten Tech Talk 51/55

C/C++ and COP

• No, neither are even close to being a COP.

• No processes, no memory isolation, non-portable, no GC, . . .

• Pipes, files, FIFOs, UNIX domain sockets, TCP/UDP sockets,. . .

• Advantage: You have complete freedom to create the idealsolution.

• Disadvantage: You have complete freedom to create the idealsolution.

GMT Erlang August 2010 - Rakuten Tech Talk 52/55

Is Erlang a COP?

• Mostly.

• It’s possible to ”forge” an Erlang process name• The ”E” language uses crypto for provably-difficult-to-forge

process naming. (Ask Google. . . )• Very useful for debugging, almost never used by any

production system.• Would be possible to remove feature from local VM, but would

be very difficult to discriminate between ”legit” vs. ”forged”PIDs received from remote nodes.

• Better security policies are needed for WAN-scale distribution.

• Robust failure handling is almost all there, but programmerinput is slight more than the COP ideal.

GMT Erlang August 2010 - Rakuten Tech Talk 53/55

AppendixAdditional Slides

ErlangExamples

GMT Erlang August 2010 - Rakuten Tech Talk 54/55

Behaviors: A generic server

A universal client/server, with hot code swapping.

rpc(A, B) ->

Tag = new_ref(),

A ! {rpc, self(), Tag, B},

receive

{Tag, Val} -> Val

end.

server(Fun, Data) ->

receive

{new_fun, Fun1} ->

server(Fun1, Data);

{rpc, From, ReplyAs, Q} ->

{Reply, Data1} = Fun(Q, Data),

From ! {ReplyAs, Reply},

server(Fun, Data1)

end.

GMT Erlang August 2010 - Rakuten Tech Talk 55/55

top related