site report: tokyo tomoaki nakamura icepp, the university of tokyo 2014/12/10tomoaki nakamura1
TRANSCRIPT
Site report: Tokyo
Tomoaki Nakamura
ICEPP, The University of Tokyo
2014/12/10 Tomoaki Nakamura 1
Update from the last year
2014/12/10 Tomoaki Nakamura 2
No HW upgrade from the last year for Grid resources- 2560 CPU cores (18.03 HS06/core)- RAM (2GB/core for 1280CPU, 4GB/core for 1280CPU)- No memory upgrade until the end of 2015 (considered at last year)- 2000PB for pledged Disk (2014) and ~600TB for LocalGroupDisk
All service instance have been migrated to EMI3- CREAM, DPM, BDII (site/top), Arugus, gLexec-WN, APEL - WMS, LB, MyProxy: can be decommissioned for ATLAS
The other service instance- perfSONAR (latency 1G, bandwidth 1G, bandwidth 10G)- Squid (condDB x 2 + CVMFS x 2)
Services for ATLAS have been deployed- DPM-WebdDAV: used for Rucio renaming, will be used for central deletion- DPM-XrootD and FAX setup: connected with Asia redirector- Multi core queuex: 512 cores, 20% of resources, 64 static 8-core slots
FAX remote access
2014/12/10 Tomoaki Nakamura 3
4TB / day = ~46 MB / sec
ASAP (all data)
2014/12/10 Tomoaki Nakamura 4
(ATLAS Site Availability Performance)
99.77%
Pledge for the next year and beyond
2014/12/10 Tomoaki Nakamura 5
2013 2014 2015
CPU pledge
16000 [HS06]
20000 [HS06] 24000
CPU deployed
43673.6 [HS06-SL5] (2560core)
46156.8 [HS06-SL6] (2560core)
-
Disk pledge 1600 [TB] 2000 [TB] 2400 [TB]
Disk deployed 2000 [TB] 2000 [TB] -
For FY2015- Increase 400TB to pledge- 528TB (8 servers) will be added to DPM by the end of Mar. 2015- Total DPM capacity: 3168TB (~750TB for LocalGroupDisk)
End of 2015 - End of this system- Procurement work will start from the next spring- If we can get 6TB HDD, total storage capacity can be doubled at 4th system
International network for Tokyo
2014/12/10 Tomoaki Nakamura 6
TOKYO
ASGC
BNL
TRIUMF
NDGF
RALCCIN2P3CERNCANFPIC
SARANIKEF
LA
PacificAtlantic
10Gbps
10Gbps
WIX
New line (10Gbps)since May. 2013
OSAKA
40Gbps
10x3 Gbps
10x3 Gbps
10 Gbps
Amsterdam
Geneva
Dedicated line
Frankfurt
Configuration for the LHCONE evaluation
2014/12/10 Tomoaki Nakamura 7
MLXe32 (10G)
Dell8024 (10G)
Dell 5448 (1G)
Catalyst 6500 (10G)
Catalyst 3750 (10G)
NY
DC
LA
Dell8024 (10G)
UI (Gridftp)
perfSONAR(Latency)
perfSONAR(Bandwidth)
perfSONAR (Latency/Bandwidth)
UI (Gridftp)
ICEPP (production)157.82.112.0/21
UTnet SINET
IPv4/v6
LHCONE BGP peering
ICEPP (LHCONE evaluation)157.82.118.0/24
10Gbps 1Gbps
Stability on packet loss (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 8
Directly affect to transfer rate.
Fraction of packet loss (NY vs. DC)
2014/12/10 Tomoaki Nakamura 9
Comparable each other.
Minimum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 10
Useful to know the typical latency and stability.
Minimum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 11
Originating from other group in Univ. of Tokyo.
Distribution of Minimum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 12
Distribution of Minimum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 13
originating from other group.miss measurement.
Maximum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 14
Useful to find problems.
Maximum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 15
Also have spikes.Additional periodic noise.
Distribution of Maximum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 16
Distribution of Maximum latency (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 17
Discrepancy due to the periodic noise.
Also for the other sites
2014/12/10 Tomoaki Nakamura 18
(US)
(FR)
• One of the perfsonar instance in Tokyo seems to fall into the busy state once in a day.
• It is independent of source sites.
• But, no significant errors in system and service logs.
Maximum latency (masked by time)
2014/12/10 Tomoaki Nakamura 19
Periodic nose can be cleaned up.
Maximum latency by mask (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 20
Still remaining, but comparable.
Bandwidth measurement (CC-IN2P3 and CNAF)
2014/12/10 Tomoaki Nakamura 21
Asymmetric~38 MB/s (incoming)~28 MB/s (outgoing)
Symmetric, but unstable ~34 MB/s (incoming)~35 MB/s (outgoing)
Minimum latency (CC-IN2P3 in 2014)
2014/12/10 Tomoaki Nakamura 22
Minimum latency (CC-IN2P3 in 2014)
2014/12/10 Tomoaki Nakamura 23
Spikes were gone.
Average value is split.
Latency in one day (CC-IN2P3)
2014/12/10 Tomoaki Nakamura 24
Both production line via NY
Incoming
Outgoing
Load balancing somewhere in NY or GEANT?
Maximum latency (CC-IN2P3, 2014)
2014/12/10 Tomoaki Nakamura 25
Some improvement in FR-Geneva?
Bandwidth measurement (latest data)
2014/12/10 Tomoaki Nakamura 26
Still asymmetric~35 MB/s (incoming)~24 MB/s (outgoing)
Symmetric, and very stable ~32 MB/s (incoming)~30 MB/s (outgoing)
Configuration for the LHCONE evaluation
2014/12/10 Tomoaki Nakamura 27
MLXe32 (10G)
Dell8024 (10G)
Dell 5448 (1G)
Catalyst 6500 (10G)
Catalyst 3750 (10G)
NY
DC
LA
Dell8024 (10G)
UI (Gridftp)
perfSONAR(Latency)
perfSONAR(Bandwidth)
perfSONAR (Latency/Bandwidth)
UI (Gridftp)
ICEPP (production)157.82.112.0/21
UTnet SINET
IPv4/v6
LHCONE BGP peering
ICEPP (LHCONE evaluation)157.82.118.0/24
10Gbps 1Gbps
LHCONE (EU sites) for all production servers
2014/12/10 Tomoaki Nakamura 28
MLXe32 (10G)
Dell8024 (10G)
Dell 5448 (1G)
Catalyst 6500 (10G)
Catalyst 3750 (10G)
NY
DC
LA
Dell8024 (10G)
UI (Gridftp)
perfSONAR(Latency)
perfSONAR(Bandwidth)
perfSONAR (Latency/Bandwidth)
UI (Gridftp)
ICEPP (production)157.82.112.0/21
UTnet SINET
IPv4/v6
LHCONE BGP peering
ICEPP (LHCONE evaluation)157.82.118.0/24
10Gbps 1Gbps
Nov. 11, 2014 (latency for CCIN2P3)
2014/12/10 Tomoaki Nakamura 29
Nov. 11, 2014 (latency for CNAF)
2014/12/10 Tomoaki Nakamura 30
Nov. 11 (throughput for CCIN2P3)
2014/12/10 Tomoaki Nakamura 31
Nov. 11 (throughput for CNAF)
2014/12/10 Tomoaki Nakamura 32
Dec. 7, 2014 (incoming B.W. is saturated)
2014/12/10 Tomoaki Nakamura 33
User subscription of AOD via DaTri physics.Egampa, 8TeV all period: ~150TB
Still on going today (continuously several days)
Breakdown from GridFTP log
2014/12/10 Tomoaki Nakamura 34
Part of LHCONE contribution
Mainly FTS3 and direct transfer from multiple sites
10 min. bin
1 min. bin
Near future and Concerns
2014/12/10 Tomoaki Nakamura 35
LHCONE- Next for US and Canada- And then, for Asisa (ASGC, IHEP)
Network Bandwidth- 2015: more 10G from ICEPP to SINET? UTokyo is offering, but depends on
them.- JFY2016: SINET will be upgraded (SINET5)
• 100G for US (LA)• 20G for EU (reverse around)
EMI3- End of full support April 30, 2014- End of standard update October 31, 2014- End of security update April 30, 2015
Batch job systemTroque/Maui, no more support, not effective dynamic multi-core allocationHTCondor, SLURM or the other commercial product (UNIVA GE, LSF)