移动通信网络设备的 高可用性平台设计 -...
TRANSCRIPT
移动通信网络设备的 高可用性平台设计
李程远
纲要 • 移动网络结构与网元设备 • 高可用性在移动网络设备中的含义 • 高可用性平台的设计
n 系统冗余备份模型 n 内部消息的冗余机制 n 网络与传输 n 存储系统 n 过载保护与残留资源的清理 n 系统升级
移动网络结构与网元设备
移动网络结构
3G RAN
Iu-CS Mc
Iu-PS
Ater
ATM/IP backbone
Iub
BTS
Nb
Mb
Mg
Mc Mc
2G BSS
Abis
PSTN
CN
Gn
CS-CN
PS-CN
Iur RNS
BTS
SGSN
A
Ater
TC BSC
Gm
Nc RAN
PSTN
MSS/ GCS
CPS
GGSN
MSC Server
BTS
BTS
BSS
RNC
RNC
MGW MGW
MGW
3GPP 用户平面与控制平面的协议栈
ApplicationData Link
IP
SCTPUDP
Iub FP NBAP
TransportNetwork
Layer
Iub
Iub User Plane Iub Control Plane
Data Link
IP
SCTPUDP
Iur FP RNSAP
TransportNetwork
Layer
Iur
Iur User Plane Iur Control Plane
M3UA
SCCP
Data Link
IP
SCTPUDP
Iu UP RANAP
TransportNetwork
Layer
Iu
Iu User Plane Iu Control Plane
M3UA
SCCPGTP RTP/
RTCP
相关术语解释
n NE (Network Element, 网元) According to Telecommunications Act of 1996, the term `network element' means a facility or equipment used in the provision of a telecommunications service. Such term also includes features, functions, and capabilities that are provided by means of such facility or equipment, including subscriber numbers, databases, signaling systems, and information sufficient for billing and collection or used in the transmission, routing, or other provision of a telecommunications service.
n UE (User Equipment,用户设备) UE is any device used directly by an end-user to communicate. It can be a hand-held telephone, a laptop computer equipped with a mobile broadband adapter, or any other device
n RAN (Radio Access Network, 无线接入网) Conceptually, it resides between a device such as a mobile phone, a computer, or any remotely controlled machine and provides connection with its core network (CN). RAN types: GRAN (GSM RAN), UTRAN (UMTS RAN).
n CN (Core Network, 核心网) It is the central part of a telecommunication network that provides various services to customers who are connected by the access network. One of the main functions is to route telephone calls across the PSTN.
相关术语解释
n Interfaces n Uu, The radio interface between the UE and the BTS. n Iub, The interface between BTS and RNC. n Iu, The interface connects RAN and CN
n Iu-CS, Iu interface Circuit Switched n Iu-PS, Iu interface Packet Switched
n Iur, The interface between RNCs, for soft handover. n User Plane (用户平面)
All information sent and received by the user, such as the coded voice in a voice call or the packets in an Internet connection, are transported via the User Plane.
n Control Plane (控制平面) It’s for all control signalling, includes the application protocol (e.g. RANAP in Iu, RNSAP in Iur and NBAP in Iub), and the signalling bearer for transporting the application protocol messages.
n Management Plane (管理平面) The management plane carries the operations and administration traffic required for network management.
Network Element
1800(2200)
600
600
Cabinet / rack
Subrack
Plug-in unit
Advanced Telecom Computing Architecture http://www.picmg.org/v2internal/newinitiative.htm
Rack-mount Box
一个网元中不同的功能模块
Tran Unit
Inter-processor switch
...
Inter-processor switch
...
Inter-processor switch
MU
SPU UserPlane Unit ...
SPU ...
Inter-processor switch
MU
SPU ...
SPU ...
Iub IubIu Iu
Tran Unit
CCU
Tran Unit
Tran Unit
UserPlane Unit
UserPlane Unit
UserPlane Unit
UserPlane Unit
UserPlane Unit
CCU
UserPlane Unit
UserPlane Unit
IP/IPSec
UDP
GTPRTP
SCTP
M3UASCCP
IPC
IP/IPSec
UDPSCTP
IPC
IPC
IPC
RANAP NBAP FP
MAC
RLC
RRC
IU UP
Iu Iub
Tran Unit (Linux Node + HW Accelerate) Tran Unit (Linux Node + HW Accelerate)
SPU (Linux Node)
UPU (DSP or SE Node)
User Plane Traffic
Signaling Traffic
高可用性在移动网络设备中的含义
五个九的要求 n MTTF/MTBF = MTTF/(MTTF + MTTR) = 99.999%
一年中停机时间最多5分15秒
不同服务的可用性
n 用户平面 n 控制平面 n 管理平面
高可用性平台的设计
系统冗余备份模型 Cluster
Node
Service Unit
Process
Proxied Component
Service Group
1
1..*
1
1..*
1
1..*
1..*
1
1
1..*
1
0..*
1
0..*
系统冗余备份模型
n SG(Service Group) Redundance Model n 2N
n Active/Hot Standby.
n N+M n Active/Cold Standby
n N-Way Active n Active, Loadsharing
n No Redundancy
n State Model (ITU-T X.731)
n Administrative State n Locked/Unlocked/ Shutting Down.
n Operational State n Enable/Disable
n Usage State n Idle/Active/Busy
系统冗余备份模型
Process0
2N Service Group
Service Unit 0 Service Unit 1 (Active)
Node_0
Disk ResourceIP
Node_1
Process1 Process1
Process0
Disk Resource
Process0
2N Service Group
Service Unit 0(Active)
Service Unit 1 (Hot Standby)
Node_0
Disk Resource
IP
Node_1
Process1 Process1
Process0
Disk Resource
2N Service Group
系统冗余备份模型
Process0
Service Unit 1(Active)
Service Unit 3 (Cold Standby)
Node_1 Node_3
Process1 Process1
Process0Process0
Service Unit 0(Active)
Node_0
Process1
Process0
Service Unit 2(Active)
Node_2
Process1
Service Unit 4 (Cold Standby)
Node_4
Process1
Process0
N+M Service Group
N+M Service Group
系统冗余备份模型
Process0
Service Unit 1(Active)
Service Unit 3 (Active)
Node_1 Node_3
Process1 Process1
Process0Process0
Service Unit 0(Active)
Node_0
Process1
Process0
Service Unit 2(Active)
Node_2
Process1
Service Unit 4 (Active)
Node_4
Process1
Process0
N-Way Active Service Group
N-Way Active Service Group
系统冗余备份模型
Process0
Service Unit 0 (Active)
Node_1
Process1
Process0
Service Unit 0(Active)
Node_0
Process1
Process0
Service Unit 0 (Active)
Node_2
Process1
No Redundance Service Group 0
No Redundance Service Group 1
No Redundance Service Group 2
No Redundance Service Group
系统冗余备份模型
n 不同服务使用不同的冗余模型 n 2N
n Management Unit, Transport Unit.
n N+M n Signaling Processing Unit
n N-Way Active n UE Specific Unit.
n No Redundancy n User Plane Proxy
内部消息的冗余机制
n 内部消息的地址管理 n 对于每个Service Unit, 同时有两个地址
n 物理地址 n 逻辑地址
n 系统维护物理地址与逻辑地址的映射表 n 当Service Unit状态发生变化时更新映射表
内部消息的冗余机制
n 消息的输入同步
Process0
2N Service Group 1
Service Unit 0(Active)
Service Unit 1 (Hot Standby)
Process0
Process1
2N Service Group 2
Service Unit 2(Active)
Service Unit 3 (Hot Standby)
Process1
Process0
2N Service Group 1
Service Unit 0(Active)
Service Unit 1 (Hot Standby)
Process0
Process1
2N Service Group 2
Service Unit 2(Active)
Service Unit 3 (Hot Standby)
Process1
内部消息的冗余机制
n 模块的数据同步 n 数据同步的时机
n 2N Service Group n Standby Service Unit 重新启动的时候(通常是在switchover之后),从Active Service Unit中获取数据
n N+M Service Group n 在control switchover时, Standby Service Unit 从Active
Service Unit获取数据,然后standby -> Active. n 被侦测到失步
n 数据同步的时间 n 服务中断
网络与传输
n 网络冗余规划 n 网络接口的冗余
n 用户平面/管理平面网络接口 n 控制平面网络接口
n 接口失效检测与切换
n QoS
网络与传输 用户平面/管理平面网络接口
NE
Tran Unit 0
Tran Unit 1
Service IP
网络与传输
控制平面网络接口
NE
Tran Unit 0
SCTP IP_1
SCTP IP_2
primary path
secondary path
网络与传输
NE
Tran Unit 0
Tran Unit 1
Service IP
Service IP
SU
SU
接口失效检测与切换
网络与传输 • 接口失效检测与切换
• 接口状态? • 网关可达?
Tran Unit 0
SU
Kernel
User Space
eth0 eth1
BFDDetector Daemon
route table
Netlink MSG
Avalability Manager
Notification
Trigger Switchover
网络与传输
n QoS n 确保高优先级的包先处理 n 入包队列的调度 n 硬件协处理器
存储系统
n 硬盘存储的同步 n DRBD (Distributed Replicated Block Device) n DRBD 分区作为SG的资源,始终挂载在
Active SU 所在的Node. n 数据库的同步
n 例如,PostgreSQL synchronous replication. 数据在Active 和 Standby Node 之间同步。
过载保护与残留资源的清理
n 系统过载的原因 n 系统过载保护
n 监控每个节点的CPU/Memory, n 超限拒绝新请求。 n Traffic Ingress QoS
n 过载后的资源残留 n 定时同步user/control/transport资源, 清理残留
系统升级
n 软件升级 n 升级包的 小化 n 配置数据的升级转化 n 单次重起
n 硬件升级 n 硬件冗余, n 控制切换/安全关机 n 硬件替换单元
谢谢!