beginner's guide to connecting to the mbone · 组播地址 deering's advisor could apply...
TRANSCRIPT
Internet组播简介
徐 恪
清华大学计算机系
1
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv62
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv63
单播和组播的比较
4
Server
Router
Unicast
Server
Router
Multicast
组播的优势
Enhanced Efficiency: Controls network traffic and reduces server and CPU loads
Optimized Performance: Eliminates traffic redundancy
Distributed Applications: Makes multipoint applications possible
5
Example: Audio StreamingAll clients listening to the same 8 Kbps audio
0
0.2
0.4
0.6
0.8
TrafficMbps
1 20 40 60 80 100# Clients
Multicast
Unicast
组播带来的问题
Best Effort Delivery: Drops are to be expected. Multicast
applications should not expect reliable delivery of data and should be designed accordingly. Reliable Multicast is still an area for much research
No Congestion Avoidance: Lack of TCP windowing and “slow-start”
mechanisms can result in network congestion. If possible, Multicast applications should attempt to detect and avoid congestion conditions
6
组播是基于UDP的!
组播带来的问题
Duplicates: Some multicast protocol mechanisms (e.g.
Asserts, Registers and SPT Transitions) result in the occasional generation of duplicate packets
Out of Order Delivery: Some protocol mechanisms may also result in
out of order delivery of packets
7
组播的应用
Multimedia Streaming media, IPTV
Training, corporate communications
Conferencing—video/audio
Net Game
Any one-to-many data push applications
8
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv69
组播地址
10
IPv4 Multicast Group Addresses 224.0.0.0–239.255.255.255
Class “D” Address Space• High order bits of 1st Octet = “1110”
Reserved Link-local Addresses 224.0.0.0–224.0.0.255
Transmitted with TTL = 1
Examples:• 224.0.0.1 All systems on this subnet
• 224.0.0.2 All routers on this subnet
• 224.0.0.4 DVMRP routers
• 224.0.0.5 OSPF routers
• 224.0.0.13 PIMv2 routers
组播地址
11
Administratively Scoped Addresses 239.0.0.0–239.255.255.255
Private address space
• Similar to RFC1918 unicast addresses
• Not used for global Internet traffic
• Used to limit “scope” of multicast traffic
• Same addresses may be in use at different locations for different multicast sessions
Examples
• Site-local scope: 239.253.0.0/16
• Organization-local scope: 239.192.0.0/14
组播地址
12
32 Bits
28 Bits
25 Bits 23 Bits
48 Bits
01-00-5e-7f-00-01
1110
5 BitsLost
IP Multicast MAC Address Mapping(FDDI and Ethernet)
239.255.0.1
Steve Deering
Cisco Fellow
2010 IEEE Internet Award
For foundational contributions to the development of IP Multicast and IP version 6
Multicast Routing in a Datagram Internetwork. PhD thesis, Stanford University, 1991
13
Steve Deering
Steve Deering worked on a project on Distributed OS called “Vsystem”
Computers in “Vsystem” could send messages to group of different computers using Ethernet multicasting
As project progressed, bunch of computers were added that were on other side of the campus connected via a production router
Task of extending the MAC layer multicasting over to layer 3 fell on Steve Deering
14
组播地址
Deering's advisor could apply for only 1 OUI from IEEE for $1000 instead of 16 OUI Deering desired
Further his advisor was kind enough to give half of the addresses to play with (only 23 bits)
The OUI for IP Multicast Address is: 01-00-5e (hex)
The remaining 24 bits can vary from 00-00-00 to 7F-FF-FF (first bit of the 24 variable bits is 0)
Hence 28 IP bits have to map onto 23 MAC bits
15
32 Bits
28 Bits
25 Bits 23 Bits
48 Bits
01-00-5e-7f-00-01
1110
5 BitsLost
239.255.0.1
组播地址
16
224.1.1.1224.129.1.1225.1.1.1225.129.1.1
.
.
.238.1.1.1238.129.1.1239.1.1.1239.129.1.1
0x0100.5E01.0101
1 - Multicast MAC Address(FDDI and Ethernet)
32 - IP Multicast Addresses
Be Aware of the 32:1 Address Overlap
IP Multicast MAC Address Mapping(FDDI & Ethernet)
组播地址
Dynamic Group Address Assignment Historically accomplished using SDR application
Sessions/groups announced over well-known multicast groups
Address collisions detected and resolved at session creation time
Has problems scaling
17
组播地址
Future dynamic techniques under consideration Multicast Address Set-Claim (MASC)
• Hierarchical, dynamic address allocation scheme• Extremely complex garbage-collection problem • Long ways off
MADCAP• Similar to DHCP• Need application and host stack support
18
组播地址
Static Group Address Assignment Temporary method to meet immediate needs
Group range: 233.0.0.0 - 233.255.255.255
• Your AS number is inserted in middle two octets
• Remaining low-order octet used for group assignment
Defined in IETF RFC3180• GLOP Addressing in 233/8
19
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv620
主机和路由器的交互:IGMP
21
Routers solicit group membership from directly connected hosts
RFC 1112 specifies version 1 of IGMP RFC 2236 specifies version 2 of IGMP RFC 3376 specifies version 3 of IGMP
Supported on latest service pack for Windows and most UNIX systems
How hosts tell routers about group membership
主机和路由器的交互:IGMP
22
Host sends IGMP Report to join group
H3H3224.1.1.1
Report
H1 H2
Joining a Group
主机和路由器的交互:IGMP
23
Router sends periodic Queries to 224.0.0.1
Query
One member per group per subnet reports
224.1.1.1
Report
Other members suppress reports
224.1.1.1
Suppressed
X224.1.1.1
Suppressed
XH1 H2 H3
Maintaining a Group
主机和路由器的交互:IGMP
24
Host quietly leaves group
H1 H3H3 #1
Router sends 3 General Queries (60 secs apart)
General Query
#2
No IGMP Report for the group is received Group times out (Worst case delay ~= 3 minutes)
H2
Leaving a Group (IGMPv1)
主机和路由器的交互:IGMP
25
Host sends Leave message to 224.0.0.2
H1 H3H3
Leave to
224.0.0.2
224.1.1.1
#1
Router sends Group specific query to 224.1.1.1
Group Specific
Query to 224.1.1.1
#2
No IGMP Report is received within ~3 seconds Group 224.1.1.1 times out
H2
Leaving a Group (IGMPv2)
IGMPv3
RFC3376
Enables hosts to listen only to a specified subset of the hosts sending to the group
26
IGMPv3
27
Source = 1.1.1.1
Group = 224.1.1.1
H1 - Member of 224.1.1.1
R1
R3
R2
Source = 2.2.2.2
Group = 224.1.1.1
• H1 wants to receive from S = 1.1.1.1 but not from S = 2.2.2.2
• With IGMP, specific sources can be pruned back - S = 2.2.2.2 in this case
IGMPv3:Join 1.1.1.1, 224.1.1.1Leave 2.2.2.2, 224.1.1.1
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv628
组播分发树
29
Shortest Path or Source Distribution Tree
Receiver 1
B
E
A D F
Source 1Notation: (S, G)
S = Source
G = Group
C
Receiver 2
Source 2
组播分发树
30
Receiver 1
B
E
A D F
Source 1Notation: (S, G)
S = Source
G = Group
C
Receiver 2
Source 2
Shortest Path or Source Distribution Tree
组播分发树
31
Shared Distribution Tree
Receiver 1
B
E
A D F
Notation: (*, G)
* = All Sources
G = Group
C
Receiver 2
(RP) PIM Rendezvous Point
Shared Tree
(RP)
组播分发树
32
Shared Distribution Tree
Receiver 1
B
E
A F
Source 1 Notation: (*, G)
* = All Sources
G = Group
C
Receiver 2
Source 2
(RP) PIM Rendezvous Point
Shared Tree
Source Tree
D (RP)
组播分发树
Source or Shortest Path trees uses more memory O(S × G)
but you get optimal paths from source to all receivers
minimizes delay
Shared trees uses less memory O(G)
but you may get sub-optimal paths from source to all receivers
may introduce extra delay
33
Characteristics of Distribution Trees
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv634
组播转发
Multicast Forwarding is backwards from Unicast Forwarding Unicast Forwarding is concerned about where the
packet is going
Multicast Forwarding is concerned about where the packet came from
Multicast Forwarding uses “Reverse Path Forwarding”
35
组播转发
36
What is RPF? A router forwards a multicast datagram only if received
on the up stream interface to the source (i.e. it follows the distribution tree)
The RPF Check The routing table used for multicasting is checked
against the “source” IP address in the packet If the datagram arrived on the interface specified in
the routing table for the source address; then the RPF check succeeds
Otherwise, the RPF Check fails
Reverse Path Forwarding (RPF)
组播转发
37
Source
151.10.3.21
Example: RPF Checking
Mcast Packets
RPF Check FailsPacket arrived on wrong interface!
组播转发
38
RPF Check Fails!
Unicast Route Table
Network Interface151.10.0.0/16 S1
198.14.32.0/24 S0
204.1.16.0/24 E0
A closer look: RPF Check Fails
Packet Arrived on Wrong Interface!
E0
S1
S0
S2
S1
Multicast Packet fromSource 151.10.3.21
X
Discard Packet!
组播转发
39
A closer look: RPF Check Succeeds
RPF Check Succeeds!
Unicast Route Table
Network Interface151.10.0.0/16 S1
198.14.32.0/24 S0
204.1.16.0/24 E0
E0
S1
S0
S2
Multicast Packet fromSource 151.10.3.21
Packet Arrived on Correct Interface!S1
Forward out all outgoing interfaces.(i. e. down the distribution tree)
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv640
组播路由和单播路由
Multicast Routing is not unicast routing
You have to think of it differently
It is not like OSPF
It is not like RIP
It is not like anything you may be familiar with
41
组播路由协议的类型
Dense-mode Uses “Push” Model
Traffic Flooded throughout network
Pruned back where it is unwanted
Flood & Prune behavior (typically every 3 minutes)
Sparse-mode
Uses “Pull” Model
Traffic sent only to where it is requested
Explicit Join behavior
42
域内组播路由协议概况
Currently, there are four multicast routing protocols
DVMRPv3 (Internet-draft)• DVMRPv1 (RFC 1075) is obsolete and unused. A variant is
currently implemented
MOSPF (RFC 1584)
PIM-DM (Internet-draft)
PIM-SM (RFC 2362- v2)
Others (CBT, OCBT, QOSMIC, SM, etc.)
43
DVMRP概况
Dense Mode Protocol Distance vector-based
• Similar to RIP• Infinity = 32 hops• Subnet masks in route advertisements
DVMRP Routes used• For RPF Check• To build Truncated Broadcast Trees (TBTs)
Uses special “Poison-Reverse” mechanism
44
DVMRP概况
Dense Mode Protocol Uses Flood and Prune operation
Traffic initially flooded down TBT’s
TBT branches are pruned where traffic is unwanted
Prunes periodically time-out causing reflooding
45
DVMRP—Source Trees
46
Route for source network of metric “n”n
m
Source Network
E
X
Y
A B
C D
2
34
Poison reverse (metric + infinity) sent to upstream “parent” routerRouter depends on “parent” to receive traffic for this source
2
2
33
33
1
1
135
35
• Truncated Broadcast Trees Are Built using Best DVMRP Metrics Back to Source Network
• Lowest IP Address Used in Case of a Tie(Note: IP Address of D < C < B < A)
3
3
mrouted mrouted mrouted
mrouted
mrouted
Resulting Truncated Broadcast Tree for Source Network
mroutedmrouted
DVMRP—Source Trees
47
Forwarding onto Multi-access NetworksNetwork X
A
B C 2
2
1 1
mrouted
mrouted mrouted
Route advertisement for network X of metric “n”n
Both B and C have routes to network X.
To avoid duplicates, only one routercan be “Designated Forwarder” fornetwork X.
Router with best metric is elected asthe “Designated Forwarder”.
Lowest IP address used as tie-breaker.
Router C wins in this example.
(Note: IP Address of C < B )
DVMRP—Source Trees
48
E
X
Y
A B
C D
Resulting Truncated Broadcast Tree for Source Network “S1”
Source Network “S1”
S1 Source Tree
mrouted mrouted
mrouted mrouted mrouted
mrouted
mrouted
DVMRP—Source Trees
49
Each Source Network has it’s Own Truncated Broadcast Tree
E
X
Y
A B
C D
Note: IP Address of D < C < B < A
S2 Source TreeSource “S2”
mrouted
mrouted
mroutedmroutedmrouted
mroutedmrouted
DVMRP—Flood & Prune
50
Source “S”
Receiver 1
(Group “G”)
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
Initial Flooding of (S, G) Multicast Packets Down Truncated Broadcast Tree
E
X
Y
A B
C D
mrouted
mrouted
mroutedmroutedmrouted
mroutedmrouted
DVMRP—Flood & Prune
51
Routers C is a Leaf Node so it sends an “(S, G) Prune” Message
Prune
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mroutedRouter B Prunes interface.
mrouted
mrouted
mroutedmroutedmrouted
mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Flood & Prune
52
Routers X, and Y are also Leaf Nodesso they send “Prune (S, G)” Messages
Prune
Prune
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mrouted
mrouted
mroutedmrouted
mroutedmrouted
Router E prunes interface.
mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Flood & Prune
53
Router E is now a Leaf Node; it sends an (S, G) Prune message.
Prune
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mrouted
mrouted
mroutedmrouted
mroutedmrouted
Router D prunes interface.
mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Flood & Prune
54
Final Pruned State
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mrouted
mrouted
mroutedmrouted
mroutedmrouted
mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Grafting
55
Receiver 2 joins Group “G”
Receiver 2
(Group “G”)
Router Y sends a “Graft (S, G)” Message
Graft
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mrouted
mrouted
mroutedmrouted
mroutedmrouted
mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Grafting
56
Router E Responds with a “Graft-Ack”
Graft-Ack
Sends its Own “Graft (S, G) Message
Graft
Receiver 2
(Group “G”)
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mrouted
mrouted
mroutedmrouted
mroutedmrouted
mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Grafting
57
Receiver 2
(Group “G”)
Source “S”
Receiver 1
(Group “G”)
E
X
Y
A B
C D
mrouted
mrouted
mrouted
mroutedmrouted
Router D Responds with a “Graft-Ack”
Graft-Ack
Begins Forwarding (S, G) Packets
mrouted mrouted
Truncated Broadcast Tree based on DVMRP route metrics
(S, G) Multicast Packet Flow
DVMRP—Evaluation
Widely used on the MBONE (being phased out)
Significant scaling problems Slow Convergence—RIP-like behavior
Significant amount of multicast routing state information stored in routers—(S,G) everywhere
No support for shared trees
Maximum number of hops < 32
Not appropriate for large scale production networks Due to flood and prune behavior
Due to its poor scalability
58
MOSPF (RFC 1584)
Extension to OSPF unicast routing protocol OSPF: Routers use link state advertisements to understand
all available links in the network (route messages along least-cost paths)
MOSPF: Includes multicast information in OSPF link state advertisements to construct multicast distribution trees (each router maintains an up-to-date image of the topology of the entire network)
59
MOSPF (RFC 1584)
Group membership LSAs are flooded throughout the OSPF routing domain so MOSPF routers can compute outgoing interface lists
Uses Dijkstra algorithm to compute shortest-path tree
Separate calculation is required for each (SNet, G) pair
60
MOSPF Membership LSA’s
61
Membership
LSA’s
Membership
LSA’s
Area 1 Area 2
MABR1 MABR2
Area 0
MB
MB MAMAMA
MOSPF Intra-Area Traffic
62
Area 1 Area 2
(S1 , B) (S2 , A)MA MA
MB
MB MA
Not receiving (S2 , A) trafficMABR1 MABR2
Area 0
MOSPF Inter-Area Traffic
63
Area 1 Area 2
MA MA
MB
MB MA
Wildcard Receiver Flag
(*, *)Wildcard Receiver Flag
(*, *)
(S1 , B) (S2 , A)
Wildcard Receivers “pull” traffic from all sources in the area.
MABR1 MABR2
Area 0
MOSPF Inter-Area Traffic
64
Area 1 Area 2
MA MA
MB
MB MA(S1 , B) (S2 , A)
MABR1 MABR2
Area 0
MOSPF Inter-Area Traffic
65(S1 , B) (S2 , A)
(GA , GB) (GA )
Area 1 Area 2
MABR1 MABR2
MA MAMA
MB
MB
Summarized
Membership LSA
Summarized
Membership LSA
MABR routers inject Summary Membership LSAs into Area 0.
Area 0
MembershipLSA’s
MembershipLSA’s
MOSPF Inter-Area Traffic
66
Area 1 Area 2
MABR1
(S1 , B) (S2 , A)
MABR2
MA MA
MB
MB MA
Area 0
MOSPF Inter-Area Traffic
67
Area 1 Area 2
MABR1
(S1 , B) (S2 , A)
MABR2
Wildcard Receiver Flag
(*, *)
Wildcard Receiver Flag
(*, *)
Unnecessary traffic still flowing to the MABR Routers!!
Area 0
MOSPF Inter-Domain Traffic
68
(GA , GB) (GA )
Area 1 Area 2
MABR1 MABR2
MA MAMA
MB
MB
Summarized
Membership LSA
Summarized
Membership LSA
External AS
MASBR
Area 0
MembershipLSA’s
MembershipLSA’s
MOSPF Inter-Domain Traffic
69
(S2 , B)
External AS
Area 1 Area 2
MA
MABR1
MA
MB
MB MA
MABR2
MASBR
(S1 , A)Area 0
MOSPF Inter-Domain Traffic
70
External AS
Area 1 Area 2
MABR1
(S1 , B) (S2 , A)
MABR2
MASBR
Wildcard Receiver Flag
(*, *)
Wildcard Receiver Flag
(*, *)
Unnecessary traffic may flow all the way to the MASBR Router!!
Area 0
MOSPF—Evaluation
Appropriate for use within single routing domain
Flood multicast traffic everywhere to create state, Uses LSAs and the link-state database
Protocol dependent works only in OSPF-based networks
71
MOSPF—Evaluation
Significant scaling problems Dijkstra algorithm run for EVERY multicast (SNet, G)
pair!
Dijkstra algorithm rerun when:
• Group Membership changes• Line-flaps
Does not support shared-trees
Not appropriate for… General purpose multicast networks where the number
of senders may be quite large
• IP/TV—(Every IP/TV client is a multicast source)
72
PIM-DM
Protocol Independent Supports all underlying unicast routing protocols
including: static, RIP, IGRP, EIGRP, IS-IS, BGP, and OSPF
Uses reverse path forwarding Floods network and prunes back based on multicast
group membership
Assert mechanism used to prune off redundant flows
Appropriate for... Densely distributed receivers located in close proximity
to source
Few senders -to- many receivers (due to frequent flooding)
73
PIM-DM Flood & Prune
74
Source
Initial Flooding
Receiver
Multicast Packets
(S, G) State created inevery router in the network!
PIM-DM Flood & Prune
75
Source
Pruning Unwanted Traffic
Receiver
Multicast Packets
Prune Messages
PIM-DM Flood & Prune
76
Results After Pruning
Source
Receiver
Multicast Packets
Flood & Prune processrepeats every 3 minutes!!!
(S, G) State still exists inevery router in the network!
PIM-DM Assert Mechanism
77
E0
Incoming Multicast Packet(Successful RPF Check)
E0
S0
Routers receive packet on an interface in their “oilist”!!
Only one router should continue sending to avoidduplicate packets
1
S0
1
2 Routers send “PIM Assert” messages
Assert
<distance, metric>
Assert
<distance, metric>
22
Compare distance and metric values
Router with best route to source wins
If metric & distance equal, highest IP adr wins
Losing router stops sending (prunes interface)
PIM-DM — Evaluation
Most effective for large number of densely distributed receivers located in close proximity to source
Advantages: Easy to configure—two commands• ip pim dense-mode
Simple flood and prune mechanism
Potential issues... Inefficient flood and prune behavior
Complex Assert mechanism
Mixed control and data planes• Results in (S, G) state in every router in the network
No support for shared trees
78
PIM-SM (RFC 2362)
Supports both source and shared trees Assumes no hosts want multicast traffic unless they
specifically ask for it
Uses a Rendezvous Point (RP) Senders and Receivers “rendezvous” at this point to
learn of each others existence• Senders are “registered” with RP by their first-hop router• Receivers are “joined” to the Shared Tree (rooted at the RP) by their local
Designated Router (DR)
Appropriate for… Wide scale deployment for both densely and sparsely
populated groups in the enterprise
Optimal choice for all production networks regardless of size and membership density
79
PIM-SM Shared Tree Join
80
Receiver
RP
(*, G) Join
Shared Tree
(*, G) State created only
along the Shared Tree.
PIM-SM Sender Registration
81
Receiver
RP
(S, G) Join
Source
Shared Tree
(S, G) Register (unicast)
Source Tree
(S, G) State created only
along the Source Tree.Traffic Flow
PIM-SM Sender Registration
82
Receiver
RPSource
Shared Tree
Source Tree
RP sends a Register-Stop back to the first-hop router to stop the Register process.
(S, G) Register-Stop (unicast)
Traffic Flow
(S, G) Register (unicast)
(S, G) traffic begins arriving at the RP via the Source tree.
PIM-SM Sender Registration
83
Receiver
RPSource
Shared Tree
Source Tree
Traffic Flow
Source traffic flows nativelyalong SPT to RP.
From RP, traffic flows downthe Shared Tree to Receivers.
PIM-SM SPT Switchover
84
Receiver
RP
(S, G) Join
Source
Source Tree
Shared Tree
Last-hop router joins the Source
Tree.
Additional (S, G) State is created along new part of the Source Tree.
Traffic Flow
PIM-SM SPT Switchover
85
Receiver
RPSource
Source Tree
Shared Tree
(S, G)RP-bit Prune
Traffic begins flowing down the new branch of the Source Tree.
Additional (S, G) State is created along the Shared Tree to prune off (S, G) traffic.
Traffic Flow
PIM-SM SPT Switchover
86
Receiver
RPSource
(S, G) Traffic flow is now pruned off of the Shared Tree and is flowing to the Receiver via the Source Tree.
Source Tree
Shared Tree
Traffic Flow
PIM-SM SPT Switchover
87
Receiver
RPSource
Source Tree
Shared Tree
(S, G) traffic flow is no longer needed by the RP so it Prunes the flow of (S, G) traffic.
Traffic Flow
(S, G) Prune
PIM-SM SPT Switchover
88
Receiver
RPSource
(S, G) Traffic flow is now only flowing to the Receiver via a single branch of the Source Tree.
Source Tree
Shared Tree
Traffic Flow
PIM-SM FFF
The default behavior of PIM-SM in Cisco IOS is that routers with directly connected members will join the Shortest Path Tree as soon as they detect a new multicast source
89
PIM-SM Frequently Forgotten Fact
PIM-SM Evaluation
Effective for sparse or dense distribution of multicast receivers
Advantages: Traffic only sent down “joined” branches
Can switch to optimal source-trees for high traffic sources dynamically
Unicast routing protocol-independent
Basis for inter-domain multicast routing
• When used with MBGP and MSDP
90
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv691
域间组播路由协议
BGMP (Border Gateway Multicast Protocol) (未来) (Internet-draft)
MSDP (Multicast Source Discover Protocol) (RFC 3618)
MBGP (Multi-protocol BGP) (RFC 2283)
SSM (Source Specific Multicast) (RFC 3569)
92
域间组播路由协议-BGMP
BGMP(边界网关组播协议)
基本思想 对在网络中活动的任何组播组,存在一个单独的双向共享树,该共享树
可以跨越包含该组发送者或接收者的所有域
组播的根域应该是这样的一个域
它拥有一个特定的组播地址范围
93
域间组播路由协议-BGMP
任何组播组都存在跨越域的双向共享树
显式加入模型
发向根域的加入和剪枝
每个组单根域
需要BGP4+ 必须携带NLRI区域中的组前缀
需要建立双向树
要求严格的层次化的地址分配 MASC被推荐为分配方式
94
BGMPA host in C joins to Group G
95
DomainA
DomainE
DomainC
DomainD
DomainB
DomainF
Root domain
C1
A2
E1
A1A4
A3
D1
B1B2
F1F2
join
join
join
BGMPTree constructed, data goes to C
96
DomainA
DomainE
DomainC
DomainD
DomainB
DomainF
Root domain
C1
A2
E1
A1A4
A3
D1
B1B2
F1F2
BGMPDomain E joins to G
97
DomainA
DomainE
DomainC
DomainD
DomainB
DomainF
join
C1
A2
E1
A1A4
A3
D1
B1B2
F1F2
join
BGMPtree constructed. Data goes to E
98
DomainA
DomainE
DomainC
DomainD
DomainB
DomainF
C1
A2
E1
A1A4
A3
D1
B1B2
F1F2
理想与现实
BGMP和MASC非常遥远 两者都非常难以实施
仍处于IETF草案提案阶段
ISP目前要部署组播,解决方案如何?
99
Multicast ComponentsEnd-to-End Architecture
End Stations (hosts-to-routers): IGMP
Switches (Layer 2 optimization): CGMP, IGMP Snooping or RGMP
Routers (Multicast Forwarding Protocol): PIM SM or Bidirectional PIM
Multicast routing across domains MBGP
Multicast Source Discovery MSDP with PIM-SM
Source Specific Multicast PIM-SSM
100
Interdomain MulticastCampus Multicast
ISP B
Multicast Source
Y
ISP A
Multicast Source
X
ISP B
DR
RP
RP
DRDR
IGMP PIM-SMBidir PIMPIM-SSM
MVPN
IGMP Snooping, CGMP,
RGMP
MBGP
MSDP
ISP A
部署组播的要求
需要一个域内的显式加入的路由协议 提高效率
PIM-SM
域间路由使用现有的单播模型 MBGP
需要域间组播源发现 不同的PIM域RP需要共享组播源信息
MSDP
101
MBGP Overview
MBGP: Multiprotocol BGP, not Multicast BGP Defined in RFC 2283 (extensions to BGP)
Can carry different types of routes
• IPv4 Unicast IPv6 Unicast
• IPv4 Multicast IPv6 Multicast
May be carried in same BGP session
Does not propagate multicast state info
• Still need PIM to build Distribution Trees
Same path selection and validation rules
• AS-Path, LocalPref, MED, …
102
MBGP Overview
Separate BGP tables maintained Unicast BGP Table (U-Table)
Multicast BGP Table (M-Table)
BGP NLRI specifies which BGP Table
Allows different unicast/multicast topologies or policies
Unicast BGP Table (U-Table) Contains unicast prefixes for unicast forwarding
Populated with BGP unicast NLRI
Multicast BGP Table (M-Table) Contains unicast prefixes for RPF checking
Populated with BGP multicast NLRI
103
MBGP Update Message
Address Family Information (AFI) Identifies Address Type (see RFC1700)
• AFI = 1 (IPv4)
• AFI = 2 (IPv6)
Sub-Address Family Information (Sub-AFI) Sub category for AFI Field
Address Family Information (AFI) = 1 (IPv4)• Sub-AFI = 1 (NLRI is used for unicast)
• Sub-AFI = 2 (NLRI is used for multicast RPF check)
• Sub-AFI = 3 (Both unicast and multicast)
104
PIM RPF Calculation Details
105
MBGP — Capability Negotiation
106
AS 321AS 123
Sender
192.168.100.0/24
Receiver
router bgp 123
neighbor 192.168.100.2 remote-as 321 nlri unicast multicast
. . .
.1 .2
router bgp 321
neighbor 192.168.100.1 remote-as 123 nlri unicast multicast
. . .
MBGP — Capability Negotiation
107
AS 321AS 123
MBGP Session for Unicast and Multicast NLRI
Sender
192.168.100.0/24
192.192.25.0/24
Receiver
BGP: 192.168.100.2 open active, local address 192.168.100.1
BGP: 192.168.100.2 went from Active to OpenSent
BGP: 192.168.100.2 sending OPEN, version 4
BGP: 192.168.100.2 OPEN rcvd, version 4
BGP: 192.168.100.2 rcv OPEN w/option parameter type: 2, len: 6
BGP: 192.168.100.2 OPEN has CAPABILITY code: 1, length 4
BGP: 192.168.100.2 OPEN has MP_EXT CAP for afi/safi: 1/1
BGP: 192.168.100.2 rcv OPEN w/option parameter type: 2, len: 6
BGP: 192.168.100.2 OPEN has CAPABILITY code: 1, length 4
BGP: 192.168.100.2 OPEN has MP_EXT CAP for afi/safi: 1/2
BGP: 192.168.100.2 went from OpenSent to OpenConfirm
BGP: 192.168.100.2 went from OpenConfirm to Established
.1 .2
MBGP—NLRI Information
108
Unicast BGP Table
Multicast BGP Table
MP_REACH_NLRI: 192.192.2/24
AFI: 1, Sub-AFI: 1 (unicast)
AS_PATH: 300 200
MED:
Next-Hop: 192.168.200.2
BGP Update from Peer
*>i192.192.2.0/24 192.168.200.2 300 200 i
Network Next-Hop Path
*>i160.10.1.0/24 192.20.2.2 i
*>i160.10.3.0/24 192.20.2.2 i
Network Next-Hop Path
*>i160.10.1.0/24 192.20.2.2 i
*>i160.10.3.0/24 192.20.2.2 i
Storage of arriving NLRI information depends on AFI/SAFI fields in the Update message
Unicast BGP Table only (AFI=1/SAFI=1 or old style NLRI)
MBGP—NLRI Information
109
MP_REACH_NLRI: 192.192.2/24
AFI: 1, Sub-AFI: 2 (multicast)
AS_PATH: 300 200
MED:
Next-Hop: 192.168.200.2
BGP Update from Peer
*>i192.192.2.0/24 192.168.200.2 300 200 i
Network Next-Hop Path
*>i160.10.1.0/24 192.20.2.2 i
*>i160.10.3.0/24 192.20.2.2 i
Network Next-Hop Path
*>i160.10.1.0/24 192.20.2.2 i
*>i160.10.3.0/24 192.20.2.2 i
Storage of arriving NLRI information depends on AFI/SAFI fields in the Update message
Unicast BGP Table only (AFI=1/SAFI=1 or old style NLRI) Multicast BGP Table only (AFI=1/SAFI=2)
Unicast BGP Table
Multicast BGP Table
MBGP—NLRI Information
110
MP_REACH_NLRI: 192.192.2/24
AFI: 1, Sub-AFI: 3 (both)
AS_PATH: 300 200
MED:
Next-Hop: 192.168.200.2
BGP Update from Peer
Network Next-Hop Path
*>i160.10.1.0/24 192.20.2.2 i
*>i160.10.3.0/24 192.20.2.2 i
Network Next-Hop Path
*>i160.10.1.0/24 192.20.2.2 i
*>i160.10.3.0/24 192.20.2.2 i
*>i192.192.2.0/24 192.168.200.2 300 200 i
*>i192.192.2.0/24 192.168.200.2 300 200 i
Storage of arriving NLRI information depends on AFI/SAFI fields in the Update message
Unicast BGP Table only (AFI=1/SAFI=1 or old style NLRI) Multicast BGP Table only (AFI=1/SAFI=2)
Both BGP Tables (AFI=1/SAFI=3)
Unicast BGP Table
Multicast BGP Table
MBGP—Summary
Solves part of inter-domain problem Can exchange multicast routing information
Uses standard BGP configuration knobs
Permits separate unicast and multicast topologies if desired
Still must use PIM to Build distribution trees
Actually forward multicast traffic
PIM-SM recommended
111
MSDP协议(RFC 3618)
仅与PIM-SM合作
RP了解一个域中的所有组播源
• 源导致PIM注册
• 能够将其域中的所有源告诉其它域中的RP,通过MSDP的SA(Source Active)消息
RP了解一个域中的所有接收者
• 接收者导致(*,G)加入RP
• RP能够加入对等域中的源树
112
MSDP Overview
113
Domain C
Domain B
Domain D
Domain E
SA
SA
SA SA
SA
SA
Source Active
MessagesSA
Domain A
SA Message
192.1.1.1, 224.2.2.2
SA Message
192.1.1.1, 224.2.2.2
r
MSDP Peers RP
RP
RP
RP
Join (*, 224.2.2.2)
sRP
Register
192.1.1.1, 224.2.2.2
MSDP Example
MSDP Overview
114
Domain C
Domain B
Domain D
Domain E
Domain A
RP
RP
RP
RP
r
MSDP Peers RP
s
MSDP Example
MSDP Overview
115
Domain C
Domain B
Domain D
Domain E
Domain A
RP
RP
RP
RP
r
MSDP Peers
Multicast Traffic
RP
s
MSDP Example
MSDP Overview
116
Domain C
Domain B
Domain D
Domain E
Domain A
RP
RP
RP
RP
r
MSDP Peers
Multicast Traffic
RP
s
MSDP Example
MSDP Overview
117
Domain C
Domain B
Domain D
Domain E
Domain A
RP
RP
RP
RP
r
MSDP Peers
Multicast Traffic
RP
s
MSDP Example
MSDP SA Messages
MSDP Source Active (SA) Messages Used to advertise active Sources in a domain
Carry 1st multicast packet from source
SA Message Contents:• IP Address of Originator (RP address)
• Number of (S, G)’s pairs being advertised
• List of active (S, G)’s in the domain
• Encapsulated Multicast packet
118
Receiving SA Messages
RPF Check Rules depend on peering Rule 1: Sending MSDP peer = i(m)BGP peer
Rule 2: Sending MSDP peer = e(m)BGP peer
Rule 3: Sending MSDP peer != (m)BGP peer
Exceptions: RPF check is skipped when:
•Sending MSDP peer = Originating RP
•Sending MSDP peer = Mesh-Group peer
•Sending MSDP peer = only MSDP peer
119
RPF Check Rule 1
When MSDP peer = i(m)BGP peer Find “Best Path” to RP in BGP Tables
• Search MRIB first then URIB• If no path to Originating RP found, RPF Fails
Note “BGP peer” that advertised path• (i.e. IP Address of BGP peer that sent us this path)• Warning:
This is not the same as the Next-hop of the path!!! i(m)BGP peers normally do not set Next-hop = Self. This is also not necessarily the same as the Router-ID!
Rule 1 Test Condition:• MSDP Peer address = BGP peer address?
If Yes, RPF Succeeds
120
Rule1: MSDP peer = i(m)BGP peer
清华大学研究生课程 121
AS100
AS5 AS7
A
172.16.5.1172.16.6.1
show ip mbgp 172.16.6.1
BGP routing table entry for 172.16.6.0/24, version 8745118
Paths: (1 available, best #1)
7 5, (received & used)
172.16.5.1 (metric 68096) from 172.16.3.1 (172.16.3.1)
BGP Peer
MSDP Peer
SA Message
SA RPF Check Succeeds
F
i(m)BGP peer address = 172.16.3.1
(advertising best-path to RP)
MSDP Peer address = 172.16.3.1172.16.3.1172.16.4.1
E
RP
Source
RP
RP
MSDP Peer address = i(m)BGP Peer address
D
G
Rule1: MSDP peer = i(m)BGP peer
清华大学研究生课程 122
AS100
AS5 AS7
A
172.16.5.1172.16.6.1
show ip mbgp 172.16.6.1
BGP routing table entry for 172.16.6.0/24, version 8745118
Paths: (1 available, best #1)
7 5, (received & used)
172.16.5.1 (metric 68096) from 172.16.3.1 (172.16.3.1)
BGP Peer
MSDP Peer
SA Message
SA RPF Check Fails
F
i(m)BGP peer address = 172.16.3.1
(advertising best-path to RP)
MSDP Peer address = 172.16.4.1172.16.3.1172.16.4.1
E
RP
Source
RP
RPMSDP Peer address != i(m)BGP Peer address
D
G
RPF Check Rule 2
When MSDP peer = e(m)BGP peer Find (m)BGP “Best Path” to RP
• Search MRIB first then URIB
If no path to Originating RP found, RPF Fails
Rule 2 Test Condition:• First AS in path to the RP = MSDP peer?
If Yes, RPF Succeeds
123
Rule2: MSDP peer = e(m)BGP peer
124
AS100
AS5 AS7
A
RP
172.16.5.1172.16.6.1
Router A's BGP Table
Network Next Hop Path
*> 172.16.3.0/24 172.16.3.1 3 i
172.16.3.0/24 172.16.4.1 1 3 i
*> 172.16.4.0/24 172.16.4.1 1 i
172.16.4.0/24 172.16.3.1 3 1 i
*> 172.16.5.0/24 172.16.4.1 3 7 i
172.16.5.0/24 172.16.3.1 1 3 7 i
*> 172.16.6.0/24 172.16.3.1 3 7 5 i
172.16.6.0/24 172.16.4.1 1 3 7 5 i
SA RPF Check Succeeds
F
First-AS in best-path to RP = 3AS of MSDP Peer = 3
First-AS in best-path to RP =
AS of e(m)BGP Peer
AS1 AS3
172.16.3.1172.16.4.1
ED
Source
G
RP
RP
RPRP
BGP Peer
MSDP Peer
SA Message
Rule2: MSDP peer = e(m)BGP peer
125
AS100
AS5 AS7
A
172.16.5.1172.16.6.1
Router A's BGP Table
Network Next Hop Path
*> 172.16.3.0/24 172.16.3.1 3 i
172.16.3.0/24 172.16.4.1 1 3 i
*> 172.16.4.0/24 172.16.4.1 1 i
172.16.4.0/24 172.16.3.1 3 1 i
*> 172.16.5.0/24 172.16.3.1 3 7 i
172.16.5.0/24 172.16.4.1 1 3 7 i
*> 172.16.6.0/24 172.16.3.1 3 7 5 i
172.16.6.0/24 172.16.4.1 1 3 7 5 i
SA RPF Check Fails!
F
RP
AS1 AS3
172.16.3.1172.16.4.1
ED
Source
G
First-AS in best-path to RP = 3AS of MSDP Peer = 1
RP
RP
RPRP
First-AS in best-path to RP !=
AS of e(m)BGP Peer
BGP Peer
MSDP Peer
SA Message
RPF Check Rule 3
When MSDP peer != (m)BGP peer Find (m)BGP “Best Path” to RP
• Search MRIB first then URIBIf no path to Originating RP found, RPF Fails
Find (m)BGP “Best Path” to MSDP peer• Search MRIB first then URIB
If no path to sending MSDP Peer found, RPF Fails
Note AS of sending MSDP Peer• Origin AS (last AS) in AS-PATH to MSDP Peer
Rule 3 Test Condition:• First AS in path to RP = Sending MSDP Peer AS ?
If Yes, RPF Succeeds
126
Rule3: MSDP peer != BGP peer
127
AS100
AS5 AS7RP
172.16.5.1172.16.6.1
Router A's BGP Table
Network Next Hop Path
*> 172.16.3.0/24 172.16.3.1 3 i
172.16.3.0/24 172.16.4.1 1 3 i
*> 172.16.4.0/24 172.16.4.1 1 i
172.16.4.0/24 172.16.3.1 3 1 i
*> 172.16.5.0/24 172.16.4.1 3 7 i
172.16.5.0/24 172.16.3.1 1 3 7 i
*> 172.16.6.0/24 172.16.3.1 3 7 5 i
172.16.6.0/24 172.16.4.1 1 3 7 5 i
SA RPF Check Succeeds
F
First-AS in best-path to RP = 3AS of MSDP Peer = 3
First-AS in best-path to RP =
AS of MSDP Peer
AS1 AS3
172.16.3.1172.16.4.1
Source
G
RP
RPRP
RP
B
ED
A
BGP Peer
MSDP Peer
SA Message
Rule3: MSDP peer != BGP peer
128
AS100
AS5 AS7RP
172.16.5.1172.16.6.1
Router A's BGP Table
Network Next Hop Path
*> 172.16.3.0/24 172.16.3.1 3 i
172.16.3.0/24 172.16.4.1 1 3 i
*> 172.16.4.0/24 172.16.4.1 1 i
172.16.4.0/24 172.16.3.1 3 1 i
*> 172.16.5.0/24 172.16.4.1 3 7 i
172.16.5.0/24 172.16.3.1 1 3 7 i
*> 172.16.6.0/24 172.16.3.1 3 7 5 i
172.16.6.0/24 172.16.4.1 1 3 7 5 i
F
First-AS in best-path to RP = 3AS of MSDP Peer = 1
First-AS in best-path to RP !=
AS of MSDP Peer
AS1 AS3
172.16.3.1172.16.4.1
Source
G
RP
RPRP
RP
B
ED
A
SA RPF Check Fails
BGP Peer
MSDP Peer
SA Message
ISP Requirements
129
ISP BPublic
InterconnectISP A
ISP C AS
10888
RPRP
RP RP
eMBGP
iMBGPiMBGP
iMBGPiMBGP
PIM-SM
PIM-SM PIM-SM
Peering Solution: MBGP + PIM-SM +MSDP
eMSDP
主要内容
为什么需要组播?
组播地址
主机和路由器的交互:IGMP
组播分发树
组播转发
域内组播路由协议
域间组播路由协议
IPv6130
IPv4 versus IPv6 Multicast
131
IP Service IPv4 Solution IPv6 Solution
MLDv1, v2
Protocol Independent
All IGPs,and BGP4+
IGMPv1, v2, v3Group
Management
Routing
32-bit, class D 128-bitAddress Range
Domain ControlScope IdentifierBoundary/Border
Forwarding PIM-SM, PIM-SSM, PIM-bidir
PIM-DM, PIM-SM, PIM-SSM, PIM-bidir
Protocol Independent
All IGPs,and BGP4+
with v6 mcast SAFI
Interdomain Solutions
MSDP across Independent PIM
Domains
Single RP within Globally Shared
Domains
Embedded RP in IPv6 Multicast
132
IP Routing for Multicast
RPF based on reachability to v6 source same as with v4 multicast
RPF still protocol independent: Static routes, mroutes
Unicast RIB: BGP, ISIS, OSPF, EIGRP, RIP, etc
Multi-protocol BGP (mBGP)• support for v6 mcast sub-address family
• provide translate function for non-supporting peers
133
CNGI 大规模可控组播
建设主干网可控组播服务 组播控制,组播地址设计与管理-组播源、组及应用带宽控制
组播网关,把组播服务延伸到校园网内部
组播网管系统,合法用户认证,对组播服务进行监控和管理
建设校园网可控组播服务
组播过渡,实现 IPv4主干网 组播服务与 IPv6 主干网组播服务之间的互通
建立应用示范,支持全网视频直播应用示范,CNGI-CERNET2 国家网络中心提供1路高清视频源,100所学校应各提供1路普通视频/音频源
134
系统总体设计-系统连接图
135
系统总体设计-数据通路
136
系统总体设计-控制通路-发送
137
系统总体设计-控制通路-接收
138
组播中需要进一步研究的问题(1)
组播路由的基本问题是在网络中找一棵最小代价的组播树,这个问题在图论中被归结为Steiner树问题 NPC问题
已经提出了多种启发式算法用于解决该问题 大部分是集中式算法,可扩展性不好,难以在互联网上应用
139
组播中需要进一步研究的问题(2)
动态Steiner树问题 如何改善组播组动态变化后的组播树的性能,也是一个NPC
问题
目前有两种解决方案 在初始建立组播树时就考虑到组成员的动态性
在组播树受组成员动态变化影响而性能下降时重新建立组播
树
• 会破坏原有的数据传输的顺序,引起丢包等问题
目前还没有真正有效的解决方案
140
组播中需要进一步研究的问题(3)
域间组播路由的部署
目前域间组播路由协议的部署是MBGP/PIM-SM/MSDP三个协议组合来提供域间的组播路由 只是短期的解决方法
长期方案是BGMP/MASC 是否适应大规模的组播应用?
MASC分配地址会产生大量碎片
141
组播中需要进一步研究的问题(4)
组播地址的聚合问题
由于组播地址的特殊性,组播地址对应于一个逻辑
的组播组,它并不代表每个组成员的实际位置
在组播树中要求每个节点都要保存每个组播组的状
态
• 随着组播应用的发展,组播组的规模会不断增大,存储需求也不断增长
• 对大规模组播路由表的查找也会降低组播包的转发性能
142
组播中需要进一步研究的问题(5)
移动环境中的组播 可扩展性
可靠性
可靠组播 反馈爆炸问题
组播拥塞控制
组播安全 密钥管理
143