I'm just trying to share details about the network. I figure that's what people want to see here.
We recently migrated Seattle-ER1 from an RB2011 to a VM on an ESXi host. This was done for a few reasons, but most importantly, crypto is much faster in the new x86 CPU than in the RB2011. This will allow us to support more VPN terminations. (I believe the RB2011 maxed out around 18 Mbps of VPN traffic.) Seattle-SRV1 is another VM on the same host.
Today's outage was for a reboot of the ESXi host. Bart is more familiar with the details, but I understand the reboot was required to get the VMs to show up properly in vCenter server. This momentarily took out the two VMs, Seattle-ER1 and Seattle-SRV1. During the outage, the network re-routes around Seattle-ER1, but services only provided on Seattle-SRV1 remains inaccessible. After a few minutes, everything boots back up and service is restored.
The topology at the Westin looks something like this:
Seattle-ESXi
Seattle-ER1
Seattle-SRV1
Seattle-QueenAnne PtP
Each of these systems participates in OSPF, so if we have a failure of Seattle-ER1, Seattle-SRV1 will begin routing all traffic through Queen Anne. My guess is this will eventually make its way back to the Internet via our Tukwila datacenter. (The other possibility is exit through Corvallis via a VPN tunnel.)
For VM failover, we'd need a second ESXi host and configuration for automatic failover. My preference is that we make services redundant rather than systems, so if a VM or ESXi itself fail, another system at another site continues to provide the service. This is how we have DNS configured, for instance. I bet no one noticed any downtime for DNS during this maintenance event. Unfortunately, DNS is a lot simpler to make redundant than a SQL-backed website like
hawan.org.
Tom