the challenges tomcat faces in high throughput … · biography • huxing zhang (张乎兴) •...

48
The Challenges Tomcat Faces in High Throughput Production Systems May 17, 2017 © 2017 Alibaba Middleware Group 1

Upload: duongkiet

Post on 28-Jul-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

TheChallengesTomcatFacesinHighThroughputProductionSystems

May17,2017 ©2017AlibabaMiddlewareGroup 1

Page 2: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Biography

• Huxing Zhang (张乎兴)• Software Engineer @ AlibabaMiddlewareGroup since 2014• Previously worked for DeNA, Tokyo, Japan• Apache Tomcat Committer since 2016• Core tomcat maintainer @ Alibaba Inc.

May17,2017 ©2017AlibabaMiddlewareGroup 2

Page 3: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Outline• How we use Tomcat at Alibaba• High throughputweb applications• The challenges• Case studies• Summary• Q & A

May17,2017 ©2017AlibabaMiddlewareGroup 3

Page 4: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Outline

• How we use Tomcat at Alibaba• High throughputweb applications• The challenges• Case studies• Summary• Q & A

May17,2017 ©2017AlibabaMiddlewareGroup 4

Page 5: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Tomcat in Alibaba

• Small trial since 2009• Migration From JBoss to Tomcat

• Jboss uses Tomcat as its servlet container• Officially started in 2013

• Becomes the dominant servlet containerin 2015• Thousands of web applications• Hundreds of thousands of instances

• StatusQuo• 95% tomcat, 5% other• Mainly 7.0.x (support JDK 6+)• Few 8.0.x• Emerging 8.5.x because of micro-service (e.g. spring-boot)

May17,2017 ©2017AlibabaMiddlewareGroup 5

Page 6: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

DeploymentStandard

May17,2017 ©2017AlibabaMiddlewareGroup 6

• Startup script features• Per startup log rotation for catalina.out• Smart auto-fix for incorrect JVM args• CATALINA_BASE preparation• Tomcat startup status detection

One web applicationper Tomcat instance

Nginxconf

Startup script

Tomcat conf

AppName.war

NginxConf

TomcatConf

JVMArgs

HealthCheck

StartupHooks

Nginxconf

Tomcat conf

AppName.war

Deployment

Platform

Page 7: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Monitoring, diagnostics and development tools• Tomcat Monitor (RESTful service)• Metrics for OS, JVM, Tomcat, Middleware,and application• Classloading: locatewhich jar file a class is loaded from• Thread: per thread CPU usage, thread state, etc.• Connector stats: tomcat thread pool statistics

• Diagnostics on production environment• https://github.com/oldmanpushcart/greys-anatomy• Much easier to use than BTrace

• Development tools• Package free tomcat Eclipse plugin

May17,2017 ©2017AlibabaMiddlewareGroup 7

Page 8: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Other Usages that worth mentioning

• Cluster Session Management:TaobaoSession• Cookie + distributed key-value storage(Tair)

• Database Connection Pool: Druid• https://github.com/alibaba/druid• Performance, Monitoring,Extensibility

• Multi-tenancy• High density deployment for legacy web applications• Providing CPU/Memory isolation• Based on multi-tenant JVM

May17,2017 ©2017AlibabaMiddlewareGroup 8

Page 9: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Outline• How we use Tomcat at Alibaba• High throughput web applications• The challenges• Case studies• Summary• Q & A

May17,2017 ©2017AlibabaMiddlewareGroup 9

Page 10: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

High throughput web application

• Taobao API Gateway for mobile devices• Interacting with hundredsof backend services• Handling millions of request per second

May17,2017 ©2017AlibabaMiddlewareGroup 10

API Gateway

Shopping Cart

MobileDevices Order

Inventory

Shipping

Page 11: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Overall Architecture

• Nginx + Tomcat on the same machine• Security• Rate LimitingMay17,2017 ©2017AlibabaMiddlewareGroup 11

Connection & Load Balance

PC

Nginx Nginx Tomcat Database

Nginx

Nginx

Database

Database

Mobile

Nginx Tomcat

Nginx Tomcat

Nginx Tomcat

Page 12: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Outline• How we use Tomcat at Alibaba• High throughputweb applications• The challenges• Case studies• Summary• Q & A

May17,2017 ©2017AlibabaMiddlewareGroup 12

Page 13: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

The challenges

• ConnectorSelection• Configuration Tweaking• NIO Connection & Concurrency Issues• Case study 1

• Security Issues• Case study 2

May17,2017 ©2017AlibabaMiddlewareGroup 13

Configurationtuning

Tomcat

Security

Concurrency

GCAsynchronization

Connection

MemoryLeak

Page 14: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

The challenges

• ConnectorSelection• Configuration Tweaking• NIO Connection & Concurrency Issues• Case study 1

• Security Issues• Case study 2

May17,2017 ©2017AlibabaMiddlewareGroup 14

Configurationtuning

Tomcat

Security

Concurrency

GCAsynchronization

Connection

MemoryLeak

Page 15: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Architecture: BIO + Sync RPC

• One working thread per request• Can not scale up to thousands of requests per second• Low cpu utilization

May17,2017 ©2017AlibabaMiddlewareGroup 15

BIOConnector Servlet

Remote Service

Remote Service

Remote Service

UserRequest Nginx

Page 16: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Architecture: APR + Async RPC

• Blocking on reading request headers• Non-blockingon waiting for the next request• Cons:

• Unexpected JVM crash• Maintenance cost: requires separate tcnative library

May17,2017 ©2017AlibabaMiddlewareGroup 16

APRConnector

AsyncServlet

Remote Service

Remote Service

Remote Service

UserRequest Nginx

Page 17: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Architecture: NIO + Async RPC

• Non-blocking on reading request headers• Non-blocking on waiting for next request• Fully asynchronized

May17,2017 ©2017AlibabaMiddlewareGroup 17

NIOConnector

AsyncServlet

Remote Service

Remote Service

Remote Service

UserRequest Nginx

Page 18: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

The challenges

• ConnectorSelection• Configuration Tweaking• NIO Connection & Concurrency Issues• Case study 1

• Security Issues• Case study 2

May17,2017 ©2017AlibabaMiddlewareGroup 18

Configurationtuning

Tomcat

Security

Concurrency

GCAsynchronization

Connection

MemoryLeak

Page 19: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Tweaking Tomcat

• Connector configuration• Do not change if you really encounter some issue• Read the documentation carefully

• access log disabled• Availableon Nginx side

May17,2017 ©2017AlibabaMiddlewareGroup 19

Configuration Default Recommended Note

ConnectionTimeout 20s Decrease

maxThreads 200 Increase

acceptCount (backlog) 100 Increase min(acceptCount , /proc/sys/net/core/somaxconn)

processorCache 200 Increase max(maxThreads, concurrent connection)

Page 20: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

The challenges

• ConnectorSelection• Configuration Tweaking• NIO Connection & Concurrency Issues• Case study 1

• Security Issues• Case study 2

May17,2017 ©2017AlibabaMiddlewareGroup 20

Configurationtuning

Tomcat

Security

Concurrency

GCAsynchronization

Connection

MemoryLeak

Page 21: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Tomcat NIO revisit

1. Acceptor accepts a socket2. Acceptor retrieves a nioChannel object from cache3. Poller thread registers nioChannel to its selector

and selects for IO events4. Poller assigns the nioChannel to one of the worker

threads to process the request5. SocketProcessor finishes processing the request

and returns the nioChanel

May17,2017 ©2017AlibabaMiddlewareGroup 21

Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

SocketProcessor

nioChannel

Page 22: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Common NIO Issues

• Acceptor stops accepting new request• Connection leakage• Connection count leakage• nioChannel Cache pollutiondue to concurrency issues

• NPE during request processing• Caused by sendfile, default is on

May17,2017 ©2017AlibabaMiddlewareGroup 22

Page 23: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Case study 1

• Tomcat 7.0.x• NIO connector• Throughput: 1000+ request/s• Tomcat Poller thread throws ConcurrentModificationException

• After a while Tomcat stops accepting new request

May17,2017 ©2017AlibabaMiddlewareGroup 23

Page 24: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Troubleshooting

• Poller thread has its own selector• Shouldnot have concurrency issues• Who is modifying the SelectionKeys?

protected void timeout(int keyCount, boolean hasEvents) { ... //timeout

Set<SelectionKey> keys = selector.keys(); ...

for (Iterator<SelectionKey> iter = keys.iterator(); iter.hasNext();) {

SelectionKey key = iter.next(); ... } ...}

May17,2017 ©2017AlibabaMiddlewareGroup 24

Page 25: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Digging into the JVM• The only entrance: sun.nio.ch.EpollSelectorImpl

protected void implRegister(SelectionKeyImpl ski) { ...

keys.add(ski);+ System.out.println("> " + Thread.currentThread().getName() + ", " + this);

}

protected void implDereg(SelectionKeyImpl ski) throws IOException { ...

keys.remove(ski);+ System.out.println("> " + Thread.currentThread().getName() + ", " + this);

...}

Issue NOT reproducible!May17,2017 ©2017AlibabaMiddlewareGroup 25

Page 26: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Minimize the overhead

• 2 poller thread are modifying the same selector instance!

protected void implRegister(SelectionKeyImpl ski) { ...

keys.add(ski);+ System.out.println(Thread.currentThread().getName() + "> " + this);

}

protected void implDereg(SelectionKeyImpl ski) throws IOException { ...

keys.remove(ski);+ System.out.println(Thread.currentThread().getName() + "> " + this);

...}

http-nio-7001-ClientPoller-1> sun.nio.ch.EPollSelectorImpl@cc98f2c http-nio-7001-ClientPoller-1> sun.nio.ch.EPollSelectorImpl@5787ae61 http-nio-7001-ClientPoller-0> sun.nio.ch.EPollSelectorImpl@cc98f2c

May17,2017 ©2017AlibabaMiddlewareGroup 26

Page 27: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

JVM bug?

• Update to JDK 1.7 and the issue is remaining• Setting poller thread number to 1 prevents the issue from beingreproduced• NOT a JVM bug!• Where is the selection key being modified?java.lang.Exception: Stack trace

at sun.nio.ch.EPollSelectorImpl.implRegister(EPollSelectorImpl.java:171)at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:133)at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:180)at org.apache.tomcat.util.net.NioEndpoint$PollerEvent.run(NioEndpoint.java:897)at org.apache.tomcat.util.net.NioEndpoint$Poller.events(NioEndpoint.java:1039)at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1155)at java.lang.Thread.run(Thread.java:662)

@Overridepublic void run() {

if ( interestOps == OP_REGISTER ) { try {

socket.getIOChannel().register(socket.getPoller().getSelector(),

SelectionKey.OP_READ, key); } catch (Exception x) {

log.error(“”, x); }

} else { // …

}//end if}//run

nioChannel registered tothe wrong selector?

May17,2017 ©2017AlibabaMiddlewareGroup 27

Can’t be changed during runtime

Page 28: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

socket.getPoller() != this?

• It is confirmed that socket’s poller is modified after being pushed inevent queue

if (socket.getPoller() != this) {System.out.println(“t: ” + Thread.currentThread().getName() + “, s: ” + socket.getPoller().getName();

}socket.getIOChannel().register(

socket.getPoller().getSelector(), SelectionKey.OP_READ, key);// …

)

t: http-nio-7001-ClientPoller-1, s: http-nio-7001-ClientPoller-0t: http-nio-7001-ClientPoller-0, s: http-nio-7001-ClientPoller-1t: http-nio-7001-ClientPoller-1, s: http-nio-7001-ClientPoller-0

May17,2017 ©2017AlibabaMiddlewareGroup 28

Page 29: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Acceptor

PollerEvent0

PollerEvent1

Poller1Poller0

Suspicious Point

May17,2017 ©2017AlibabaMiddlewareGroup 29

nioChannel’s poller has been modified by another poller thread

Acceptor

PollerEvent0

Poller1Poller0

Duplicate nioChannelObject?

Page 30: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Further suspicion• Duplicate nioChannel objects in nioChannels cacheprotected boolean setSocketOptions(SocketChannel socket) {

try { // … NioChannel channel = nioChannels.poll(); if ( channel == null ) {

// … channel = new NioChannel(socket, bufhandler);

} //…getPoller0().register(channel);

} catch (Throwable t) { // … return false;

} return true;

}

>>> poll but still found in queue, org.apache.tomcat.util.net.NioChannel@28baf3bf:java.nio.channels.SocketChannel[closed], ts: 1435720813750>>> offer but already found in queue, org.apache.tomcat.util.net.NioChannel@28baf3bf:java.nio.channels.SocketChannel[closed], ts: 1435720812885May17,2017 ©2017AlibabaMiddlewareGroup 30

Page 31: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Why duplicate entries?

• Two threads are concurrently offering the same nioChannel to cache• Poller#timeoutdue to Async Timeout• Async Completeby Async RPC callback thread

public boolean processSocket(NioChannel socket, SocketStatus status, boolean dispatch, int where) {

...SocketProcessor sc = processorCache.poll();if ( sc == null ) sc = new SocketProcessor(socket,status, where);else sc.reset(socket,status, where);if ( dispatch && getExecutor()!=null ) getExecutor().execute(sc);

else sc.run(); ...}

thread: Thread[http-nio-7001-ClientPoller-1,5,main], where: Poller#processKeythread: Thread[http-nio-7001-ClientPoller-1,5,main], where: Poller#timeoutthread: Thread[HSF-CallBack-11-thread-2,10,main], where: Http11NioProcessor#actionInternal

May17,2017 ©2017AlibabaMiddlewareGroup 31

Page 32: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Recap• Async Servlet timeout• 4s~15s

• Async RPC timeout• 3s~15s

• 2 SocketProcessor threads are trying tooffer nioChannel object to cacheconcurrently• Async timeout• Async complete

May17,2017 ©2017AlibabaMiddlewareGroup 32

Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

… …

Page 33: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Recap(cont.)

• The same nioChannel object hasbeen offered twice

• And it is highly probable that theobject has been offered twiceconsecutively!

May17,2017 ©2017AlibabaMiddlewareGroup 33

Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

nioChannel

nioChannel

… …

Page 34: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

• The same nioChannel has been assignedto two different poller event queue• Poller thread 0 is processing timeout onthat object

• Poller thread 1 is registering keys to thatnioChannel

May17,2017 ©2017AlibabaMiddlewareGroup 34

Recap(cont.)Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

…… …

nioChannel nioChannel

Page 35: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

• Poller thread 0 throwsConcurrentModification Exception and dies

May17,2017 ©2017AlibabaMiddlewareGroup 35

Recap(cont.)Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

…… …

nioChannel nioChannel

…X

Page 36: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

• Acceptor does not know Poller thread 0has already died• Acceptor keeps assigning to Poller thread0 until it reaches maxConnection(10000)

May17,2017 ©2017AlibabaMiddlewareGroup 36

Recap(cont.)Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

…… …

nioChannel

…X………

Page 37: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

• Acceptor stops accepting new request

May17,2017 ©2017AlibabaMiddlewareGroup 37

Recap(cont.)Acceptor

User Request

Poller0 Poller1

SocketProcessor

SocketProcessor

…… …

nioChannel

…X………

X

Page 38: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Solution

• Tomcat has fixed a similar issue in 7.0.58• But not a complete fix

• It may also happen if handshake == -1• It means the socket has been processed by another thread

• Final fix• Allowingthe nioChannel to be offered exactly once• Using CAS to avoid lock overhead

• Available from7.0.64/8.0.25 onwards

May17,2017 ©2017AlibabaMiddlewareGroup 38

Page 39: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

The challenges

• ConnectorSelection• Configuration Tweaking• NIO Connection & Concurrency Issues• Case study 1

• Security Issues• Case study 2

May17,2017 ©2017AlibabaMiddlewareGroup 39

Configurationtuning

Tomcat

Security

Concurrency

GCAsynchronization

Connection

MemoryLeak

Page 40: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Security Issues• Disable manager and host-manager• Disable jmx remote access• Reverse Proxy can defend Tomcat against some attack• e.g. SlowHTTP body attack• Nginx will collect entire http body before handing off to tomcat

• Special Case to high throughputweb applications• Cookie-basedmemory exhaustion

May17,2017 ©2017AlibabaMiddlewareGroup 40

Page 41: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Case Study 2

• Symptom: MessageBytes object array has 13m instances, and takesup ~2G memory which cannot be garbage collected

• Who is holding the reference?

May17,2017 ©2017AlibabaMiddlewareGroup 41

Page 42: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Who is holding the reference?

• Normally 1 request has only a few MessageBytes• Why 2k requests has 13m MessageBytes?May17,2017 ©2017AlibabaMiddlewareGroup 42

Page 43: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

ServerCookies

• One request has 2k server cookies• What’s inside the cookies?May17,2017 ©2017AlibabaMiddlewareGroup 43

Page 44: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

ServerCookies inside• Dynamically double the cookie array if current size is exceeded

• Never gets shrunk even after recycling• If the cookie string is small enough, it can produce a large number of servercookie objects

private ServerCookie addCookie() { if( cookieCount >= scookies.length ) {

ServerCookie scookiesTmp[]=new ServerCookie[2*cookieCount]; System.arraycopy( scookies, 0, scookiesTmp, 0, cookieCount); scookies=scookiesTmp;

} ServerCookie c = scookies[cookieCount]; if( c==null ) {

c= new ServerCookie(); scookies[cookieCount]=c;

} cookieCount++;

return c; }

May17,2017 ©2017AlibabaMiddlewareGroup 44

Page 45: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

The malicious attack

• Cookies are considered as one of the header• maxHeaderCount cannot defend

• Only maxHttpHeader effectively limits the number of cookies.• RFC6265 does not require cookie name and cookie value to be non-empty

May17,2017 ©2017AlibabaMiddlewareGroup 45

Cookie string maxHttpHeader # of bytes a cookie takes # of server cookie object

“a=b;a=b;a=b;…” 8k 4 2k

“a=;a=;a=;…” 8k 3 4k

“=;=;=;=;…” 8k 2 4k

4k cookies *2k request = 8m cookies

Page 46: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Solution

• Add a maxCookieCount parameter to limit the max number of cookieallowed• Shrink the server cookie object array if limit is exceed• Fixed in• 9.0.xfor9.0.0.M10onwards• 8.5.xfor8.5.5onwards• 8.0.xfor8.0.37onwards• 7.0.xfor7.0.71onwards• 6.0.xfor6.0.46onwards

May17,2017 ©2017AlibabaMiddlewareGroup 46

Page 47: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Summary

• Asynchronize yourweb application to improve throughput• NIO is relatively STABLE for high throughput productionuse• Check your version for known issues!

• Troubleshooting concurrency issues can be very tricky• A good tool is really helpful

• Upgrade your tomcat version to get latest updates!

May17,2017 ©2017AlibabaMiddlewareGroup 47

Page 48: The Challenges Tomcat Faces in High Throughput … · Biography • Huxing Zhang (张乎兴) • SoftwareEngineer@Alibaba Middleware Groupsince2014 • Previously workedfor DeNA,

Q & A• Contact:• [email protected][email protected]• Facebook: https://www.facebook.com/huxing.zhang• LinkedIn: https://www.linkedin.com/in/huxing-zhang

• Thanks!

May17,2017 ©2017AlibabaMiddlewareGroup 48