Arista : VARP Configuration

Virtual-ARP or VARP is a routing technique that allows multiple switches or routers to simultaneously route packets from a common Virtual IP (VIP) address in an active/active switch/router configuration. Each switch or router is configured with the same VIP address on the corresponding VLAN interfaces (SVI) and a common virtual MAC address. In MLAG topologies, VARP is preferred over VRRP because VARP does not require traffic to traverse the peer-link to the master router as VRRP would.

A maximum of 500 VIP addresses can be assigned to a single VLAN interface. All virtual addresses on all VLAN interfaces resolve to the same virtual MAC address. However you cannot have a secondary VIP on the same VLAN interface, you can however implement VRRP on the same VLAN interface as VARP.

VARP functions by having each switch respond to ARP and GARP requests for the configured router IP address with the virtual MAC address. The virtual MAC address is only for inbound packets and never used in the source field of outbound packets.

The following commands configures 10.10.10.1 as the virtual IP address for VLAN 10. The Virtual-Router MAC address is entirely invented by you, I had a real issue finding clarification that it was just a made up MAC address, so here is my invented made up Virtual-Router MAC 1010.1010.1010 as the virtual MAC address on both switches. I also ran into an issue where #ip routing had to be enabled.

Here is what the Topology would look like:

Configuration that implements VARP on the first switch

TPW-SW1(config)#ip virtual-router mac-address 1010.1010.1010

TPW-SW1(config)#interface vlan 10

TPW-SW1(config-if-vl10)#ip address 10.10.10.2/24

TPW-SW1(config-if-vl10)#ip virtual-router address 10.10.10.1

Configuration that implements VARP on the second switch

TPW-SW2(config)#ip virtual-router mac-address 1010.1010.1010

TPW-SW2(config)#interface vlan 10

TPW-SW2(config-if-vl10)#ip address 10.10.10.3/24

TPW-SW2(config-if-vl10)#ip virtual-router address 10.10.10.1

 

The Packet Wizard : Spanning Tree Explained

Spanning Tree Protocol also known as STP

There are many different types of STP but here are a couple of the main ones

STP/802.1D – Original STP
PVST+ – Cisco Improved STP adding per VLAN feature
RSTP/802.1w – Improved STP with a much faster convergence time (Rapid Spanning Tree)
Rapid PVST+ – Cisco improved RSTP adding per VLAN feature

Why Per VLAN STP?
If you have a large network with lots of switches and VLAN’s you can use Per VLAN STP to plan for a more efficient network

Even although there are many versions of STP they all use a very similar set of rules.

What is STP?

STP is a feature used to prevent loops when you are using redundant switches and without STP a loop could form and cause a number of problems on the network.

During a unicast broadcast message (which happen all the time) the switch will forward the frame out of every port except the one it came in on. Therefore if SW1 sends a frame out and SW2 and SW3 receive it then SW2 and SW3 will forward out all ports except the one it came in on.  SW2 sends to SW3 and SW1. SW3 send to SW2 and SW1 and you can see how the loop is now beginning to form. This is known as a broadcast storm, this can kill a switches CPU and Memory usage very quickly.

The second problem is the MAC address being changed all the time as it receives frames. For example SW1 sends a broadcast message, SW2 and SW3 receive it, then forward it out all other ports like in the scenario above. However each switch learns the MAC address of the next switch and assigns that in the MAC address table, but if you consider SW1 sending to SW2 and SW3 and then SW2 and SW3 forwarding those frames and they eventually get back to SW1 but on different ports, then the MAC Address table will change constantly from I know about SW2 on this port,  I now know about SW2 via SW3 on this port, and that can cause unstable MAC address tables.

Another issues is explained below

HOST1  sends data to HOST2, however since SW2 doesn’t know how to get to SW2 it sends frames out all ports, thus sending to SW1 and SW3 so HOST2 receives frames from HOST1 via SW3 and then again via SW1>SW3. This is known as Duplicate Frames.

So how do we fix the issues mentioned above? Thats right Spanning Tree Protocol by blocking one of the redundant paths.

The question now becomes how do the switches decide on that Port to block? STP follow’s strict rules, when deciding what ports to block. 

1) Elect a Root Bridge (ROOT)
2) Place root interfaces into forwarding (FWD)
3) Select Root Port on non-Root Bridge Switches (RP) – this is the best root to the Root Bridge.
4) Non Root Switches decide on a Designated Port (DP)
5) All other ports put into Blocking State (BLK)

On per VLAN STP You could have this on VLAN 10

and this on VLAN 20

I will now cover the port roles and the port states so you know what each is:

ROLES
Root Ports : The best port to get to the Root Bridge

Designated Ports : The Lowest cost alternate best root to the Root Bridge.
Non Designated Ports : All other ports that are in blocking mode.

STATES
Disabled : A Port is shutdown
Blocking : A Port that is blocking traffic
Listening : A Port that is not forwarding and not learning MAC addresses
Learning: A Port that is learning MAC addresses but is not forwarding traffic
Forwarding : A Port that is sending and receiving traffic as normal

When ports change from one Role to another it will go through the Port States. Note also that the Listening and Learning states are transitional and it wont stay on either.

Root Bridge Election

Each switch has and sends messages to each other called Bridge Protocol Data Units (BPDU’s) These BPDU’s contain specific information pertaining to each switch, such as Root Cost, Bridge ID (BID) for Itself and for the Root.  A BID is made up of STP Priority and MAC address, the default value of The BID on SW1 would be 327691111:1111:1111 since 32769 is the default STP priority and the MAC address. The switch with the lowest BID will become the Root Bridge. This is what is looks like before the Root Bridge Election and the exchange of the BPDU’s

This is what it looks like after, when the lowest BID wins.

The ports on each switch now transition into their respective states following the STP Rules as mentioned above.

The ports can change based on the Cost of each link. The port costs are listed below, however in this example we will just be using Gig Ports, but for clarity a FastEthernet Port will be slower than a GigEthernetPort, the faster the port the lower the cost. The Root Port (RP) is the lowest port cost.

Data rate STP cost RSTP cost
(Link Bandwidth) (802.1D-1998) (802.1W-2004, default value)
4 Mbit/s 250 5,000,000
10 Mbit/s 100 2,000,000
16 Mbit/s 62 1,250,000
100 Mbit/s 19 200,000
1 Gbit/s 4 20,000
2 Gbit/s 3 10,000
10 Gbit/s 2 2,000
100 Gbit/s N/A 200
1 Tbit/s N/A 20

This is a quick diagram of how the port costs are worked out to get back to the Root Bridge. SW2 to get to SW1 is 0+4=4 and SW2 via SW3 to SW1 is 4+4=8

Of course there can be ties between multiple connections and STP can be tuned.

Designated Ports are selected by Root Cost the by Lowest BID and then by lowest numbered Interface. Therefor in the diagram above the Designated port would be GigEth1 on SW3 since it is a lower numbered interface than SW2 GigEth2.

All ports that are not Root Ports or Designated Ports are Blocking Ports.

STP Convergence Times

STP:
BPDU/Hello time = 2 secs – Hello messages to each switch to see its still there
Max Age = 20 secs – How long a switch will wait for a response to the Hello message
Listening = 15 secs
Learning = 15 secs

= 52 secs to convergence

From the time a link goes down to convergence it takes a total of 52 Seconds. When STP was designed that was fine but now, this is much too slow which is where Rapid Spanning Tree Comes in.

RSTP:
3 missed BDPU/Hello at 2 sec each = 6 secs
Learning (no listening) = 15 secs

= 21 secs to convergence.

I hope this have given you a good explanation of STP. 

 

The Packet Wizard : DHCP Troubleshooting

In todays scenario, I am going to walk through some changes I made and troubleshooting steps for when I recently added a moved a old SSID/Subnet off an old legacy wireless network onto a new network same IP space and SSID that requires RADIUS authentication.

These steps can be applied to many different scenarios for troubleshooting DHCP, I just made these ones specific since it was something I recently had to troubleshoot.

Here is a basic diagram of the setup, showing all the moving parts would be overkill for the diagram. The steps on what to do and troubleshooting are below the diagram.

What you will need:

Authentication Server IP

Authentication Secret Key

DHCP Server IP

Subnet and Mask that is being moved

SSID/Subnet being moved

Work and or Troubleshooting that needs to be done:

  1. Add the VLAN to the switches required
  2. Add the virtual interface on the firewall (gateway)
  3. Trunk the new vlan to the switch and configure the ports
  4. Setup DHCP helper to point to the DHCP server
  5. Allow DHCP traffic from the new subnet to the DHCP server
  6. Configure Radius on new Network
  7. Configure new SSID and network settings on Wireless LAN Controller

The Packet Wizard : Link Aggregation Group

The image above shows a link aggregation group between two switches. The reason we use Link Aggregation Groups (LAGs) are they allow you to combine multiple network physical connections to make a single higher load sharing bandwidth path thus increase the throughput beyond what a single connection could support, and also to provide redundancy incase one of the links should fail.

You can read on how to configure LAG’s on Ruckus Switches here:
Ruckus : Configure Link Aggregation Groups

Ruckus : Configure Link Aggregation Group

This is how to build a Link Aggregation Group on the Ruckus 7150. It is slightly different on the 7250’s.

 

tpwsw1# conf t

 

Configure the Link Aggregation Group. There are multiple LAG types and they must match on both sides of the lag, other vendors may use different names for the same thing here are the common ones:

Ruckus LAG Types Other Vendor Types
Static On
Dyanmic Active

Configure a static LAG.

tpwsw1(config)# lag <name-of-the-lag> static id 1

 


Configure a dynamic LAG.

tpwsw1(config)# lag <name-of-the-lag> dynamic id 1

 

 

The LAG ID can be automatically generated and assigned to a LAG using the auto option.

tpwsw1(config)# lag <name-of-the-lag> dynamic id auto

 

The Link Aggregation Group IDs are unique for each LAG on the switch. The LAG ID can’t be assigned to more than one LAG. If a LAG ID is already used, the CLI will reject the new LAG configuration and display an error message that suggests the next available LAG ID that can be used.

Once the LAG is built you have to add ports to the LAG.

tpwsw1(config-lag-<name-of-the-lag>)# ports ethernet 1/2/7 ethernet 1/2/8

 

The Packet Wizard : Work Travel

I am home! I have been travelling for work for the best part of the past 5 weeks. I was in Boston doing a network refresh the week before Easter, which included replacing all the network cables, installing new Palo Alto Firewalls and removing Cisco ASA’s. I also removed all Cisco Switches and installed a new stack of Ruckus 7250, replace the core switches with 2 new Arista’s. I then came home for 2 days and I left again for Singapore for 3 weeks. I was in Singapore integrating a new company we bought into our network, this was a team effort as we had other sites to bring online within 48 hours. Copenhagen and a small site in Kaohsiung, Taiwan. I have learned a lot over the past 2 month. I have some articles to write on what I have learned but for now, I just wanted to give a quick update. Here is some cable porn from the Boston Network Refresh.

Before:

After:

 

 

Arista : MLAG Setup

I have recently been setting up some Arista switches for a network refresh at our Boston site.

MLAG is short for Multi Chassis Link Aggregation and it allows more than 1 switch usually 2, to act like one logical switch which can allow you to just manage one switch instead of multiple. It also helps with redundancy and diversify paths. Its an awesome technology.  Here is the basic MLAG Topology:

1. Create Port Channel For Peer Links

I am using 2 Arista DCS-7150S-24-R switches with 2 10Gb Ethernet as our MLAG peer links. On each switch we will create a port channel 1000

 tpwsw1# config t
 tpwsw1(conf)#interface e23-24
 tpwsw1(config-if-Et23-24)# channel-group 1000 mode active
 tpwsw1(config-if-Et23-24)# interface port-channel 1000
 tpwsw1(config-if-Po1000)# switchport mode trunk

 

2. Create a VLAN for Peer MLAG Communication

You need to create a separate VLAN for MLAG communication and assign it the mlag-peer trunk group and disable spanning-tree on the VLAN. This step is done on both switches.

 tpwsw1(conf)#vlan 4094
 tpwsw1(config-vlan-4094)# trunk group mlag-peer
 tpwsw1(config-vlan-4094)# interface port-channel 1000
 tpwsw1(config-if-Po1000)# switchport trunk group mlag-peer
 tpwsw1(config-if-Po1000)# exit
 tpwsw1(conf)#no spanning-tree vlan 4094

 

 tpwsw2(conf)#vlan 4094
 tpwsw2(config-vlan-4094)# trunk group mlag-peer
 tpwsw2(config-vlan-4094)# interface port-channel 1000
 tpwsw2(config-if-Po1000)# switchport trunk group mlag-peer
 tpwsw2(config-if-Po1000)# exit
 tpwsw2(conf)#no spanning-tree vlan 4094

 

3. Set an IP on each Switch
On VLAN 4094 that was created above, we need to assign it an IP so each switch can communicate over layer 3 with each other.

 

tpwsw1(conf)#int vlan 4094
tpwsw1(config-if-Vl4094)# ip address 1.1.1.1/30

 

tpwsw2(conf)#int vlan 4094
tpwsw2(config-if-Vl4094)# ip address 1.1.1.2/30

***Send some pings to confirm basic connectivity

 

4. Configure MLAG peering for each switch

 tpwsw1(config)#mlag
 tpwsw1(config-mlag)#local-interface vlan 4094
 tpwsw1(config-mlag)#peer-address 1.1.1.2
 tpwsw1(config-mlag)#peer-link port-channel 1000
 tpwsw1(config-mlag)#domain-id mlagDOMAIN

 

 

 tpwsw2(config)#mlag
 tpwsw2(config-mlag)#local-interface vlan 4094
 tpwsw2(config-mlag)#peer-address 1.1.1.1
 tpwsw2(config-mlag)#peer-link port-channel 1000
 tpwsw2(config-mlag)#domain-id mlagDOMAIN

 

 

5. Verify MLAG Domain
On each switch, do a #show mlag to see if MLAG is up and running and you can confirm this by seeing State:Active and peer-link status: UP and locl-int status:UP

tpwsw1(config-mlag)#show mlag
MLAG Configuration:
domain-id : mlagDOMAIN
local-interface : Vlan4094
peer-address : 1.1.1.2
peer-link : Port-Channel1000
MLAG Status:
state : Active
negotiation status : Connected
peer-link status : Up
local-int status : Up
system-id : 02:1c:73:1e:97:dc
MLAG Ports:
Disabled : 0
Configured : 0
Inactive : 0
Active-partial : 0
Active-full : 0

 

 

tpwsw2(config-mlag)#show mlag
MLAG Configuration:
domain-id : mlagDOMAIN
local-interface : Vlan4094
peer-address : 1.1.1.1
peer-link : Port-Channel1000
MLAG Status:
state : Active
negotiation status : Connected
peer-link status : Up
local-int status : Up
system-id : 02:1c:73:1e:97:dc
MLAG Ports:
Disabled : 0
Configured : 0
Inactive : 0
Active-partial : 0
Active-full : 0

 

You can read more about MLAG here – https://www.arista.com/en/products/multi-chassis-link-aggregation-mlag

A great book to read about Arista is called Arista Warrior. I loved it. You can buy it here:

Cisco : Enable SSH on Cisco Switch, Router and ASA

When you configure a Cisco device, you need to use a console cable and connect directly to the system to access it. Follow the SSH setup below, will enable SSH access to your Cisco devices, since SSH is not enabled by default. Once you enable SSH, you can then access it remotely using SecureCRT or any other SSH client.

Set hostname and domain-name

The hostname has to have a hostname and domain-name.

switch# config t
switch(config)# hostname tpw-switch
tpw-switch(config)# ip domain-name thepacketwizard.com

Setup Management IP

In the following example, the management ip address will be set to 10.100.101.2 in the 101 VLAN. The default gateway points to the firewall, which is 10.100.101.1

tpw-switch# ip default-gateway 10.100.101.1
tpw-switch# interface vlan 101
tpw-switch(config-if)# ip address 10.100.101.2 255.255.255.0

Generate the RSA Keys

The switch or router should have RSA keys that it will use during the SSH process. So, generate these using crypto command as shown below.

tpw-switch(config)# crypto key generate rsa
  The name for the keys will be: tpw-switch.thepacketwizard.com
  Choose the size of the key modulus in the range of 360 to 2048 for your
    General Purpose Keys. Choosing a key modulus greater than 512 may take
    a few minutes.

How many bits in the modulus [512]: 1024
  % Generating 1024 bit RSA keys, keys will be non-exportable...[OK]

Setup the Line VTY configurations

Setup the following line vty configuration, where input transport is set to SSH only. Set the login to local, and password to 7, and make sure Telnet is not enabled:

tpw-switch# line vty 0 4
 tpw-switch(config-line)# transport input ssh
 tpw-switch(config-line)# login local
 tpw-switch(config-line)# password 7
 tpw-switch(config-line)# exit

If you have not set the console line yet, use the following:

tpw-switch# line console 0
tpw-switch(config-line)# logging synchronous
tpw-switch(config-line)# login local

Create the username password

If you don’t have an username created already, here is how:

tpw-switch# config t
Enter configuration commands, one per line.  End with CNTL/Z.
tpw-switch(config)# username thepacketwizard password tpwpassword123
tpw-switch# enable secret tpwenablepassword

Make sure the password-encryption service is turned-on, which will encrypt the password, and when you do “show run”, you’ll see only the encrypted password and not clear-text password.

tpw-switch# service password-encryption

Verify SSH access

From the switch, if you do ‘show ip ssh’, it will confirm that the SSH is enabled on this Cisco device.

tpw-switch# show ip ssh
 SSH Enabled - version 1.99
 Authentication timeout: 120 secs; Authentication retries: 3

After the above configurations, login from a remote machine to verify that you can ssh to this cisco switch.

In the example, 10.100.101.2 is the management ip-address of the switch.

TPW-Remote-Computer# ssh 10.100.101.2
 login as: thepacketwizard
 Using keyboard-interactive authentication.
 Password:

tpw-switch>en
 Password:
 tpw-switch#

You are now setup and logged in on SSH!

To read more on SSH visit: https://en.wikipedia.org/wiki/Secure_Shell

Palo Alto : Upgrade High Availability (HA) Pair

Over the last 3 weeks since the Christmas and New Year Holidays,  I have been upgrading all of our firewalls globally, many of them are an High Availability Pair. This means they are redundant and being redundant allows me to upgrade them individually while the site stays full up and functional.

The  instructions for upgrading an HA pair are recommended because:

      • It verifies HA functionality before starting the upgrade.
      • It ensures the upgrade is successfully applied to the first device before starting the upgrade on the second.
      • At any point in the procedure, if any issue arises, the upgrade can be seamlessly reverted without any expected downtime (unless you are having any dynamic routing protocols line OSPF/BGP).
      • When finished, the final active/passive device state will be the same as it was before the upgrade with the fewest number of fail overs possible (2).

Before you Begin :

Take backup of the configuration as well as Tech Support from both HA Peers. Give proper names to each file, here is how:

Device > Setup > Operations > Save Named Configuration Snapshot

Device > Setup > Operations > Export Named configuration Snapshot

Device > Setup > Operations > Export Device State (If device managed from panorama

Device > Support > Generate Tech Support File, and then download it. (Might be required if any issues)

(Optional but recommended) Disable preemption on High Availability settings to avoid the possibility of unwanted failovers. Disabling preempt configuration change must be committed on both peers. Likewise, once completed, re-enabling must be committed on both peers.

To disable preempt, go to

Device > High Availability > Election Settings and uncheck Preemptive.
Then, perform a commit.

 

 

 

If upgrade is between major versions (4.1 -> 5.0 OR 5.0-> 6.0), it is advisable to disable TCP-Reject-Non-SYN, so that sessions can failover even when they are not in sync. : Do this on both Firewalls from the CLI:

 # set deviceconfig setting session tcp-reject-non-syn no
 # commit

 

(Optional but recommended) Arrange for Out-of-Band access (Console access) to the firewall if possible. This is again to help recover from any unexpected situation where we are unable to login to the firewall. If you have a Terminal Server awesome, if not a simple Cell Phone tethered to a Laptop with RDP is also fine.

 

The Upgrade Process

Suspend Backup Device 

From the CLI

 > request high-availability state suspend

From the GUI

Go to Device > High Availability > Operational Commands  > Suspend local device

Install the new PAN-OS on the suspended device

Device > Software > Install

Reboot the device to complete the install.

When the upgraded device is rebooted, check the dashboard to check the version, wait for all the interfaces to come backup green.

If the device is still in suspended state make it functional again

From the CLI

> request high-availability state functional

From the GUI

Go to Device > High Availability > Operational Commands  > Make Local Device Functional

 

Repeat steps on other firewall.

 

Suspend Primary Device

From CLI

> request high-availability state suspend

From the GUI

Go to Device > High Availability > Operational Commands  > Suspend local device.

 

*The Backup Firewall will become Active – it does take 30-45 seconds so don’t panic

 

Install the new PAN-OS on the suspended device:

Device > Software > Install

Reboot the device to complete the install.

When the upgraded device is rebooted, check the dashboard to check the version, wait for all the interfaces to come backup green.

If the device is still in suspended state make it functional again

From the CLI

> request high-availability state functional

From the GUI

Go to Device > High Availability > Operational Commands  > Make Local Device Functional

To Get Primary Back to Primary by suspending the backup (current active) firewall (The Original Backup Firewall)

From the GUI,

Go to Device > High Availability > Operational Commands  > Suspend local device.

Once the Primary became active again, enable the suspended backup firewall

Enable TCP-Reject-Non-SYN, so that sessions can failover even when they are not in sync. : (Do this on both Firewalls)

# set deviceconfig setting session tcp-reject-non-syn yes
# commit

 

Re-Enable preempt configuration change must be committed on both peers. To re-enable preempt, go to Device > High Availability > Election Settings and uncheck Preemptive.  Then, perform a commit.

 

How to Downgrade

If an issue occurs on the new version and a downgrade is necessary:

To revert to the previous PAN-OS screen, run the following CLI command:

# debug swm revert

This causes the firewall to boot from the partition in use prior to the upgrade. Nothing will be uninstalled and no configuration change will be made.

 

However please be aware while running this command –

After rebooting from a SWM revert, the configuration active at the time before upgrade will be loaded with the activation of the previous partion. Any configuration changes made after upgrade will not be accounted for and will need to be manually recovered by loading the latest configuration version and committing the changes.

General Troubleshooting : How to determine the proper MTU size with ICMP pings

How to determine the proper MTU size with ICMP pings

To find the proper MTU size, you have to run a special ping to the destination address. This is usually the gateway, local server or an IP address domain name internet (e.g. thepacketwizard.com). You probably want to start around 1800 and move down 10 each time until you get to a ping reply. Once you have a ping reply start moving backup by 2-5 bits to get to the fragmented packet size. Take that value and add 28 to the value to account for the various TCP/IP headers. E.g. let’s say that 1452 was the proper packet size (where you first got an ICMP reply to your ping). The actual MTU size would be 1480, which is the optimum for the network we’re working with. Header size varies depending what the packet is traversing.

 

1500 Standard MTU

– 20 IP Header

– 24 GRE Encaps.

– 52 IPSec Encap.

– 8 PPPoE

– 20 TCP Header

 

Windows

ping  (host) (-f) (-l (packet size))

An example would be:

ping  thepacketwizard.com -f -l 1800

(result = "Packet needs to be fragmented but DF set.")

ping thepacketwizard.com -f -l 1472 

(result = reply)

 

The options used are:

      • -f: set “Don’t Fragment” flag in packet
      • -l size: send buffer size

 

Linux

ping (-M do) (-s (packet size)) (host)

An example would be:

ping thepacketwizard.com -M do -s 1800

(result = "Frag needed and DF set" or "message too long")

ping thepacketwizard.com -M do -s 1472

(result = reply)

 

The options used are:

      • -M <hint>: Select Path MTU Discovery strategy. <hint> may be either “do” (prohibit fragmentation, even local one), “want” (do PMTU discovery, fragment locally when packet size is large), or “dont” (do not set DF flag).
      • -s packetsize: Specifies the number of data bytes to be sent. The default is 56, which translates into 64 ICMP data bytes when combined with the 8 bytes of ICMP header data.

 

Mac

ping (-D) (-s (packet size)) (host)

An example would be:

ping thepacketwizard.com -D -s 1800

(result = "sendto: Message too long")

ping thepacketwizard.com -D -s 1462

(result = reply)

 

The options used are:

      • -D: set the “Don’t Fragment” bit
      • -s packetsize: Specify the number of data bytes to be sent. The default is 56, which translates into 64 ICMP data bytes when combined with the 8 bytes of ICMP header data.

There is a lot to know about MTU check it out on Wikipedia : Wikipedia – MTU