site report: tokyo tomoaki nakamura icepp, the university of tokyo 2014/12/10tomoaki nakamura1

Post on 11-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Site report: Tokyo

Tomoaki Nakamura

ICEPP, The University of Tokyo

2014/12/10 Tomoaki Nakamura 1

Update from the last year

2014/12/10 Tomoaki Nakamura 2

No HW upgrade from the last year for Grid resources- 2560 CPU cores (18.03 HS06/core)- RAM (2GB/core for 1280CPU, 4GB/core for 1280CPU)- No memory upgrade until the end of 2015 (considered at last year)- 2000PB for pledged Disk (2014) and ~600TB for LocalGroupDisk

All service instance have been migrated to EMI3- CREAM, DPM, BDII (site/top), Arugus, gLexec-WN, APEL - WMS, LB, MyProxy: can be decommissioned for ATLAS

The other service instance- perfSONAR (latency 1G, bandwidth 1G, bandwidth 10G)- Squid (condDB x 2 + CVMFS x 2)

Services for ATLAS have been deployed- DPM-WebdDAV: used for Rucio renaming, will be used for central deletion- DPM-XrootD and FAX setup: connected with Asia redirector- Multi core queuex: 512 cores, 20% of resources, 64 static 8-core slots

FAX remote access

2014/12/10 Tomoaki Nakamura 3

4TB / day = ~46 MB / sec

ASAP (all data)

2014/12/10 Tomoaki Nakamura 4

(ATLAS Site Availability Performance)

99.77%

Pledge for the next year and beyond

2014/12/10 Tomoaki Nakamura 5

2013 2014 2015

CPU pledge

16000 [HS06]

20000 [HS06] 24000

CPU deployed

43673.6 [HS06-SL5] (2560core)

46156.8 [HS06-SL6] (2560core)

-

Disk pledge 1600 [TB] 2000 [TB] 2400 [TB]

Disk deployed 2000 [TB] 2000 [TB] -

For FY2015- Increase 400TB to pledge- 528TB (8 servers) will be added to DPM by the end of Mar. 2015- Total DPM capacity: 3168TB (~750TB for LocalGroupDisk)

End of 2015 - End of this system- Procurement work will start from the next spring- If we can get 6TB HDD, total storage capacity can be doubled at 4th system

International network for Tokyo

2014/12/10 Tomoaki Nakamura 6

TOKYO

ASGC

BNL

TRIUMF

NDGF

RALCCIN2P3CERNCANFPIC

SARANIKEF

LA

PacificAtlantic

10Gbps

10Gbps

WIX

New line (10Gbps)since May. 2013

OSAKA

40Gbps

10x3 Gbps

10x3 Gbps

10 Gbps

Amsterdam

Geneva

Dedicated line

Frankfurt

Configuration for the LHCONE evaluation

2014/12/10 Tomoaki Nakamura 7

MLXe32 (10G)

Dell8024 (10G)

Dell 5448 (1G)

Catalyst 6500 (10G)

Catalyst 3750 (10G)

NY

DC

LA

Dell8024 (10G)

UI (Gridftp)

perfSONAR(Latency)

perfSONAR(Bandwidth)

perfSONAR (Latency/Bandwidth)

UI (Gridftp)

ICEPP (production)157.82.112.0/21

UTnet SINET

IPv4/v6

LHCONE BGP peering

ICEPP (LHCONE evaluation)157.82.118.0/24

10Gbps 1Gbps

Stability on packet loss (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 8

Directly affect to transfer rate.

Fraction of packet loss (NY vs. DC)

2014/12/10 Tomoaki Nakamura 9

Comparable each other.

Minimum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 10

Useful to know the typical latency and stability.

Minimum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 11

Originating from other group in Univ. of Tokyo.

Distribution of Minimum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 12

Distribution of Minimum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 13

originating from other group.miss measurement.

Maximum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 14

Useful to find problems.

Maximum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 15

Also have spikes.Additional periodic noise.

Distribution of Maximum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 16

Distribution of Maximum latency (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 17

Discrepancy due to the periodic noise.

Also for the other sites

2014/12/10 Tomoaki Nakamura 18

(US)

(FR)

• One of the perfsonar instance in Tokyo seems to fall into the busy state once in a day.

• It is independent of source sites.

• But, no significant errors in system and service logs.

Maximum latency (masked by time)

2014/12/10 Tomoaki Nakamura 19

Periodic nose can be cleaned up.

Maximum latency by mask (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 20

Still remaining, but comparable.

Bandwidth measurement (CC-IN2P3 and CNAF)

2014/12/10 Tomoaki Nakamura 21

Asymmetric~38 MB/s (incoming)~28 MB/s (outgoing)

Symmetric, but unstable ~34 MB/s (incoming)~35 MB/s (outgoing)

Minimum latency (CC-IN2P3 in 2014)

2014/12/10 Tomoaki Nakamura 22

Minimum latency (CC-IN2P3 in 2014)

2014/12/10 Tomoaki Nakamura 23

Spikes were gone.

Average value is split.

Latency in one day (CC-IN2P3)

2014/12/10 Tomoaki Nakamura 24

Both production line via NY

Incoming

Outgoing

Load balancing somewhere in NY or GEANT?

Maximum latency (CC-IN2P3, 2014)

2014/12/10 Tomoaki Nakamura 25

Some improvement in FR-Geneva?

Bandwidth measurement (latest data)

2014/12/10 Tomoaki Nakamura 26

Still asymmetric~35 MB/s (incoming)~24 MB/s (outgoing)

Symmetric, and very stable ~32 MB/s (incoming)~30 MB/s (outgoing)

Configuration for the LHCONE evaluation

2014/12/10 Tomoaki Nakamura 27

MLXe32 (10G)

Dell8024 (10G)

Dell 5448 (1G)

Catalyst 6500 (10G)

Catalyst 3750 (10G)

NY

DC

LA

Dell8024 (10G)

UI (Gridftp)

perfSONAR(Latency)

perfSONAR(Bandwidth)

perfSONAR (Latency/Bandwidth)

UI (Gridftp)

ICEPP (production)157.82.112.0/21

UTnet SINET

IPv4/v6

LHCONE BGP peering

ICEPP (LHCONE evaluation)157.82.118.0/24

10Gbps 1Gbps

LHCONE (EU sites) for all production servers

2014/12/10 Tomoaki Nakamura 28

MLXe32 (10G)

Dell8024 (10G)

Dell 5448 (1G)

Catalyst 6500 (10G)

Catalyst 3750 (10G)

NY

DC

LA

Dell8024 (10G)

UI (Gridftp)

perfSONAR(Latency)

perfSONAR(Bandwidth)

perfSONAR (Latency/Bandwidth)

UI (Gridftp)

ICEPP (production)157.82.112.0/21

UTnet SINET

IPv4/v6

LHCONE BGP peering

ICEPP (LHCONE evaluation)157.82.118.0/24

10Gbps 1Gbps

Nov. 11, 2014 (latency for CCIN2P3)

2014/12/10 Tomoaki Nakamura 29

Nov. 11, 2014 (latency for CNAF)

2014/12/10 Tomoaki Nakamura 30

Nov. 11 (throughput for CCIN2P3)

2014/12/10 Tomoaki Nakamura 31

Nov. 11 (throughput for CNAF)

2014/12/10 Tomoaki Nakamura 32

Dec. 7, 2014 (incoming B.W. is saturated)

2014/12/10 Tomoaki Nakamura 33

User subscription of AOD via DaTri physics.Egampa, 8TeV all period: ~150TB

Still on going today (continuously several days)

Breakdown from GridFTP log

2014/12/10 Tomoaki Nakamura 34

Part of LHCONE contribution

Mainly FTS3 and direct transfer from multiple sites

10 min. bin

1 min. bin

Near future and Concerns

2014/12/10 Tomoaki Nakamura 35

LHCONE- Next for US and Canada- And then, for Asisa (ASGC, IHEP)

Network Bandwidth- 2015: more 10G from ICEPP to SINET? UTokyo is offering, but depends on

them.- JFY2016: SINET will be upgraded (SINET5)

• 100G for US (LA)• 20G for EU (reverse around)

EMI3- End of full support April 30, 2014- End of standard update October 31, 2014- End of security update April 30, 2015

Batch job systemTroque/Maui, no more support, not effective dynamic multi-core allocationHTCondor, SLURM or the other commercial product (UNIVA GE, LSF)

top related