Friday, December 5, 2014

STP Concepts

This introduction to STP is brought to you by the band MOPP and by their song Dream About You.  They aren't financially supporting this, but sonically as it were.  STP is what lets you read everything written here as well as to hear MOPP's song.  It is that important!  Without it, a basic LAN wouldn't work, and without LANs we cannot have InterLans, aka the Internet.  After all, the Internet is just a collection of other people's networks, whether they be CANs, or larger MANs via several WAN links.  Since most of our networking accomplishments are still held by men it is prudent to point out that STP was invented by a woman, Radia Perlman.  So every time a gamer explores outer space, you know it's Radia to whom we owe some gratitude!

Wendell Odom wrote in his ICND2 book (200-101) that the Spanning Tree Protocol allows Ethernet LANs to utilize redundant links without the associated problems.  This is a perfect summary as STP certainly does a lot of related work as well.  It is clear that using redundant links in a LAN is a good way of ensuring uptime when real life happens.  But we need some means of managing this redundancy, lest we fall pray to the dangers of loops.  Frames, unlike packets, do not have a hop count limiting their forwarding.  Once they are copied (forwarded) out of the first switch, they can be forwarded ad naseum by other switches.  This results in one of three problems that STP solves - frame multiplication.  Since multiple frames are now flying all over the place, we start to see an instability with the MAC address table of a switch.  It hears of a destination on one port, then on the other port, so it updates it's MAC table over and over and that results in instability, making forwarding plane's task impossible.  All of that results in the third problem that STP solves - broadcast storms (see broadcast radiation).  Even one frame looping can cause a storm.  And these storms can even bring down PCs, not just switches.  When the NIC has to process copious volumes of broadcasts, it takes a toll on the CPU, whether in the switch or an end-user PC.  Broadcast storms are as dangerous as their real world counterparts and thus must be prevented.

 

Determining VLANs and MAC Tables

Ethernet LANs may vary in size but always consist of three parts: NICs, cables, and switches.  When a frame arrives on a port of a switch, before it can be forwarded the switch must first ascertain what VLAN (virtual LAN) it belongs to.  If it came on an access port (not from another switch) then the port's config indicates which VLAN it belongs to.  If it came off a trunk (connected to another switch) the VLAN info is in the tag added to the frame.  If there is no tag, it's the native VLAN (by default VLAN 1).  After the VLAN is determined, the source MAC address is added to the MAC address table (CAM).  As well the interface on which it came from and the VLAN is also associated with the entry.  As far as layer 2 is concerned, we now know what is located off of that port!  Next the switch compares the frame's destination MAC address with the other entries in the MAC address table.  If it finds a match, it forwards it to the matched port (interface).  Otherwise it floods the frame out all ports associated for the VLAN to which the frame belongs to.  This includes trunk ports, too.  That's all there is to the switching logic.  As it's executed sometimes thousands of times a second, it's gotta be fast and simple!  On Cisco switches to view the MAC address table use: show mac address-table dynamic.  The output of that command doesn't list anything specific to STP. However, as STP affects what MAC addresses are learned from what ports (as it can block frames on certain ports), it may happen that you don't see entries for certain ports at all on this list.  Lack of entries on the list may mean non-STP related causes.

A good way to determine what VLAN an interface is on is to use show interfaces status.  It's left-most column shows the port, while the middle column shows the associated VLAN.  It also indicates which ports are trunks and thus carry traffic for all VLANs.  An alternative way, with a differently organized display, is the command: show vlan brief.  On the left it lists the VLAN names, and on the right, what ports belong to it.  So it depends if you want the output ordered by ports or by VLANs.  The latter even shows names whereas the former only shows numbers.  Alternatively if you only care about trunks issue show interfaces trunk.  It's output is kinda verbose (with long English sentences).  It shows what VLANs are active on what trunk, in case someone's configured it specifically for load sharing.

STP is also known by the anti-human robot term: 802.1D (just see how many of such terms you can memorize)!  It's updated version, RSTP, is known also by a robotic term: 802.1w.  Some high up priests believe naming things with seemingly random numbers and letters is wiser than using labels such as STP5 or similar.  STP concerns itself with two very specific and well defined goals.  First, to not cut off any part of the LAN from any other part by blocking too many ports.  Second, for frames to have a short life and not to loop.  To accomplish it's goal STP adds an additional check to a port - whether it can forward frames or not.  If it's in an STP forwarding state, it works as usual.  If it's in the blocking state, it doesn't forward user frames, only BPDUs, used by the switches to exchange hellos and learn the cost of links (distances) between them.  BPDU (Bridge PDU) is also used for the election of a root bridge by the Spanning Tree Algorithm (STA).

 

STA

The STA ensures there is only one path through the tree of Ethernet LAN switches and blocks all other redundant ports.  This means you don't get to use them for added bandwidth since the redundant links will be idleing.  The only way to solve this problem is to use EtherChannel.  STA does it's job by first electing a root bridge, or in today's lingo, a root switch.  Apparently most of the info on STP still refers to bridges so beware of that as there aren't many bridges in the modern ecosystem.  When the STA elects a root bridge, all of its ports are placed into the forwarding state - no blocking happens at the root!  Imagine it as the base of a tree.  The water goes up from the root to all the other branches to reach the leaves.  All paths from the root must be open or some part of the tree will be cut off from the water supply.  Much the same way, all paths from the root switch must be open to connected switches. 

On all non-root switches, the port closest (by cost) to the root switch is called the root port (RP) and is placed into a forwarding state.  Usually in enterprise networks (or other multi-switch environments), two switches end up connected to the same segment.  The switch with the lowest root cost (closest to the root switch by cost) is called the designated switch, and it's port on that particular segment is called the designated port (DP).  It is placed into a forwarding state.  All other ports are placed into a blocking state.  That's right!  No traffic will flow down anything in the blocking state, except for BPDUs of course.  This is important so that during topology changes these blocked ports can move to the forwarding state to take over for a failed link, a failed switch, or other configuration change.

BPDUs and BIDs

BPDUs are generated only by the root switch but are updated by all other switches.  The cost field is set to 0 at the root, and is incremented on arrival by each switch subsequently.  Cost values are 100 for 10Mbit, 19 for 100mbit, 4 for 1GBit, and 2 for 10Gbit.  These numbers decrease for higher speeds, which is very counter-intuitive since speeds are likely to rise tremendously in the future.  It is useful to memorize these specific numbers, as they come in handy for tshoots!  So if there are 3 switches with two links of 1Gbit each between them, the root cost from the third switch is 8 (4+4).  STA doesn't care about the number of hops, it cares only about the speed of the links.  A path that takes 3 switches may have a lower cost than a path going through 2. 

Another value held in the BPDU is the BID (bridge ID).  It is 8-bytes consisting of a 2-byte priority field (which should really be called the distance) and 6-byte MAC address of the switch.  BIDs govern how root switches are elected.  The switch with the lowest BID wins (opposite of how auctions work!).  The reason I say that the priority (first 2-byte value) should be called distance is because the lower the value the better for the victory of the election process.  Usually a priority 2 issue is more important than a priority 1 issue, but the designers have created yet another counter-intuitive system and so the lower the priority the better.  Should two switches have the same priority, then the lowest MAC address wins.  This is also counter-intuitive.  As older gear on a network is likely to have a lower MAC address.  This means when you add a brand new more powerful switch, it likely won't win the election!  How sad of a design choice is that?  But fear not, the priority of the BID can be adjusted.  It's default value is 32,768, and must be set in increments of 4096 (another silly design decision!) to a maximum of 61,440.  Two other important values of note are 24,576 and 28,762 that are set automagically by IOS (configuring STP on Catalyst).

 

Root Election Process

The election process in general is quite simple.  At the onset, all switches believe they are root.  So they all send BPDUs claiming so (egotistical, eh?).  Then when they hear of another switch with a better BID, they  stop advertising their inferior selves (a technical term) and start advertising the superior BPDU.  The BPDU carries the originator's MAC address, so the switches know who is the root.  After convergence, all the switches agree on one switch to be the root as they all follow the same STA so they are governed by the same rules.  No chance for opposition at all!  :)

 

Link Costs

Remember those link costs I said were silly but you had to memorize?  Well they are also configurable.  So you could, technically, tell a switch which path is redundant, which to prefer and so forth.  Just be sure you configure the same cost on both ends of the link, as these settings aren't synchronized.  Things would get totally weird if on one end you had a cost of 4 and on the other end you had a cost of 19.  I'm too lazy to do the math to figure out what would result, but I bet it wouldn't be ideal.

 

Delays

During a topology change, such as due to a failed link, there is a danger of a loop causing one of those lovely storms I mentioned earlier.  There is a solution to that and it comes in the form of delays.  Three to be exact.  The Hello timer is set by default to 2 seconds and is the duration which the switch waits before sending the next BPDU Hello packet.  So every two seconds, the root switch sends a BPDU and the other switches propagate it, updating it in the process.  Should a Hello packet not arrive, there is a MaxAge timer, by default set to 10 times the length of the Hello timer.  Thus, if within 20 seconds a switch doesn't receive a Hello, it will run the STA again and change which ports are in blocking or in forwarding.  However, going back from Blocking to Forwarding, isn't immediate.  There are two temporary states: Listening and Learning.  Each uses the Forwarding timer of 15 seconds and neither forwards any frames other than BPDUs.  In the Listening stage the switch is waiting for MACs to arrive and it clears house of stale entries in the MAC address table (CAM).  In Learning, it is adding MACs to the address table, and still not forwarding frames at all.  All this means is that for a port to go to a Forwarding state it can take up to 50 seconds (20+15+15).  Pretty slow eh?  That's why Rapid STP was invented, it solves that problem.

EtherChannel, PortFast, and BPDU Guard

I mentioned EtherChannel earlier.  It essentially treats multiple links as one interface, and thus STP doesn't react when one of it's links changes state, only when the bundle as a whole changes state, or rather when the last link in the group fails.  On access ports, there is often no need to go through the laborious process of Listening and Learning states so there is a PortFast option for those ports, which eliminates the 30 second delay entirely.  This is of great benefit to DHCP clients such as workstations.  Another feature is called BPDU Guard.  It disables a port if BPDU packets arrive on it.  This is useful within corporate environments where it's undesirable to allow a random switch at someone's desk to join the network.  The danger is that this little consumer switch may become the root switch and cause the whole network a lot of problems.  Again, BPDU Guard is only to be used on access ports.

Now let STP do it's work and go check out MOPP's Dream About You!