After hours of debugging, I got this 20MHz single-polarity link moving up to 55Mbit. A high resolution spectrum analysis on both sides did indeed show a better frequency to use for the link. Spectrum analysis captures here: https://imgur.com/a/4H7GB There was also an IP conflict between two modems @ Capitol Park. The conflicting IP has been removed from CapitolPark-S3. The QueenAnne modem used for the link is not one used anywhere else on the network. It's a very low-end modem, and as a result it was having CPU/RAM starvation issues when running our regular diagnostic tools, which lead to out-of-memory conditions and kernel crashes. A different test methodology had to be used to verify the link speed (testing through the modem, instead of to the modem). The modem that links QueenAnne with the Westin building (on the QueenAnne side) had a mis-configured OSPF router-id. This was fixed. I'm still seeing weird routing decisions being made by OSPF. These are triggered by our point-to-point route entries (44.24.242.0/24 space). More research needs to be done here, and perhaps a re-write of how we define point-to-point interface addresses. Any OSPF experts in the house? I also discovered R1.QueenAnne was still vulnerable to hacking due to a mis-configuration of its control software. It missed the updates that were sent out to the whole network. This has been fixed now. R1.QueenAnne also didn't have the diagnostic bandwidth-server setup correctly. This was fixed. With the CapitolPark-QueenAnne link performing well now: [eo@CapitolPark-QueenAnne] /system resource> /tool bandwidth-test 44.24.241.81 direction=both status: running duration: 57s tx-current: 28.0Mbps tx-10-second-average: 28.4Mbps tx-total-average: 27.5Mbps rx-current: 27.6Mbps rx-10-second-average: 28.0Mbps rx-total-average: 27.0Mbps lost-packets: 288 random-data: no direction: both tx-size: 1500 rx-size: 1500 its OSPF config has been reset to a normal preference level, so that packets no longer try to avoid that link as they are routed through the network. This link can be sped up by upgrading to a dual-polarity modem @ CapitolPark. While testing if the OSPF hop cost was being calculated correctly in the Beacon-Haystack-QueenAnne RF link (they both connect to the same dish @ Haystack), I discovered a mis-config on the Haystack.Beacon modem (bad LAN IP binding) which was preventing it from bringing up OSPF on its LAN interface. This was fixed and that modem should act like an actual router now, moving traffic. During the same Beacon testing, I was reminded that our Baldi-Beacon RF link sucks <https://monitoring.hamwan.net/cacti/graph.php?action=view&local_graph_id=919&rra_id=all>. It was optimized for speed to Tukwila, which is now gone, so a trip needs to be scheduled to Baldi to rotate the dish a few degrees north and get that link going strong. I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well. --Bart
Bart, I appreciate the summary/detail, almost as much as I appreciate your work fixing things! Coordinating the work of maintaining a larger network like HamWAN is not easy, but with notes like this, it provides a log-book function. -Randy On Thu, Mar 29, 2018 at 11:17 PM, Bart Kus <me@bartk.us> wrote:
After hours of debugging, I got this 20MHz single-polarity link moving up to 55Mbit. A high resolution spectrum analysis on both sides did indeed show a better frequency to use for the link. Spectrum analysis captures here:
There was also an IP conflict between two modems @ Capitol Park. The conflicting IP has been removed from CapitolPark-S3.
The QueenAnne modem used for the link is not one used anywhere else on the network. It's a very low-end modem, and as a result it was having CPU/RAM starvation issues when running our regular diagnostic tools, which lead to out-of-memory conditions and kernel crashes. A different test methodology had to be used to verify the link speed (testing through the modem, instead of to the modem).
The modem that links QueenAnne with the Westin building (on the QueenAnne side) had a mis-configured OSPF router-id. This was fixed.
I'm still seeing weird routing decisions being made by OSPF. These are triggered by our point-to-point route entries (44.24.242.0/24 space). More research needs to be done here, and perhaps a re-write of how we define point-to-point interface addresses. Any OSPF experts in the house?
I also discovered R1.QueenAnne was still vulnerable to hacking due to a mis-configuration of its control software. It missed the updates that were sent out to the whole network. This has been fixed now. R1.QueenAnne also didn't have the diagnostic bandwidth-server setup correctly. This was fixed.
With the CapitolPark-QueenAnne link performing well now:
[eo@CapitolPark-QueenAnne] /system resource> /tool bandwidth-test 44.24.241.81 direction=both status: running duration: 57s tx-current: 28.0Mbps tx-10-second-average: 28.4Mbps tx-total-average: 27.5Mbps rx-current: 27.6Mbps rx-10-second-average: 28.0Mbps rx-total-average: 27.0Mbps lost-packets: 288 random-data: no direction: both tx-size: 1500 rx-size: 1500
its OSPF config has been reset to a normal preference level, so that packets no longer try to avoid that link as they are routed through the network. This link can be sped up by upgrading to a dual-polarity modem @ CapitolPark.
While testing if the OSPF hop cost was being calculated correctly in the Beacon-Haystack-QueenAnne RF link (they both connect to the same dish @ Haystack), I discovered a mis-config on the Haystack.Beacon modem (bad LAN IP binding) which was preventing it from bringing up OSPF on its LAN interface. This was fixed and that modem should act like an actual router now, moving traffic.
During the same Beacon testing, I was reminded that our Baldi-Beacon RF link sucks <https://monitoring.hamwan.net/cacti/graph.php?action=view&local_graph_id=919&rra_id=all>. It was optimized for speed to Tukwila, which is now gone, so a trip needs to be scheduled to Baldi to rotate the dish a few degrees north and get that link going strong.
I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well.
--Bart
_______________________________________________ PSDR mailing list PSDR@hamwan.org http://mail.hamwan.net/mailman/listinfo/psdr
I also appreciate this information, and think it is worthwhile to send out to the entire list. It helps everyone understand the complexity and issues involved. It also helps those of us who are new to HamWAN to help understand how it works, as well as the many considerations that have gone into the design, implementation, and operation of the system long before us noobs showed up. Carl, N7KUW From: PSDR [mailto:psdr-bounces@hamwan.org] On Behalf Of Bart Kus Sent: Thursday, March 29, 2018 11:18 PM To: Puget Sound Data Ring Subject: [HamWAN PSDR] CapitolPark <-> QueenAnne link After hours of debugging, … …I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well. --Bart
I should also mention that HamWAN Network Operations is not an exclusive club. If you (anybody, not just Carl) find yourself reading this email and thinking "boy, I sure would like to do all that stuff", feel free to reach out to netops@hamwan.org and we can talk about you helping to operate the network. Physical presence in the Puget Sound is not required. --Bart On 3/30/2018 9:48 AM, Carl wrote:
I also appreciate this information, and think it is worthwhile to send out to the entire list. It helps everyone understand the complexity and issues involved. It also helps those of us who are new to HamWAN to help understand how it works, as well as the many considerations that have gone into the design, implementation, and operation of the system long before us noobs showed up.
Carl, N7KUW
*From:*PSDR [mailto:psdr-bounces@hamwan.org] *On Behalf Of *Bart Kus *Sent:* Thursday, March 29, 2018 11:18 PM *To:* Puget Sound Data Ring *Subject:* [HamWAN PSDR] CapitolPark <-> QueenAnne link
After hours of debugging, …
…I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well.
--Bart
_______________________________________________ PSDR mailing list PSDR@hamwan.org http://mail.hamwan.net/mailman/listinfo/psdr
Wow, Bart. Let me look back at my ospf ptp configs. I'm no expert, but I recall it was a process of elimination to find the right settings. Are the ptp neighbors linking ospf in just one direction? On Thu, Mar 29, 2018, 23:17 Bart Kus <me@bartk.us> wrote:
After hours of debugging, I got this 20MHz single-polarity link moving up to 55Mbit. A high resolution spectrum analysis on both sides did indeed show a better frequency to use for the link. Spectrum analysis captures here:
There was also an IP conflict between two modems @ Capitol Park. The conflicting IP has been removed from CapitolPark-S3.
The QueenAnne modem used for the link is not one used anywhere else on the network. It's a very low-end modem, and as a result it was having CPU/RAM starvation issues when running our regular diagnostic tools, which lead to out-of-memory conditions and kernel crashes. A different test methodology had to be used to verify the link speed (testing through the modem, instead of to the modem).
The modem that links QueenAnne with the Westin building (on the QueenAnne side) had a mis-configured OSPF router-id. This was fixed.
I'm still seeing weird routing decisions being made by OSPF. These are triggered by our point-to-point route entries (44.24.242.0/24 space). More research needs to be done here, and perhaps a re-write of how we define point-to-point interface addresses. Any OSPF experts in the house?
I also discovered R1.QueenAnne was still vulnerable to hacking due to a mis-configuration of its control software. It missed the updates that were sent out to the whole network. This has been fixed now. R1.QueenAnne also didn't have the diagnostic bandwidth-server setup correctly. This was fixed.
With the CapitolPark-QueenAnne link performing well now:
[eo@CapitolPark-QueenAnne] /system resource> /tool bandwidth-test 44.24.241.81 direction=both status: running duration: 57s tx-current: 28.0Mbps tx-10-second-average: 28.4Mbps tx-total-average: 27.5Mbps rx-current: 27.6Mbps rx-10-second-average: 28.0Mbps rx-total-average: 27.0Mbps lost-packets: 288 random-data: no direction: both tx-size: 1500 rx-size: 1500
its OSPF config has been reset to a normal preference level, so that packets no longer try to avoid that link as they are routed through the network. This link can be sped up by upgrading to a dual-polarity modem @ CapitolPark.
While testing if the OSPF hop cost was being calculated correctly in the Beacon-Haystack-QueenAnne RF link (they both connect to the same dish @ Haystack), I discovered a mis-config on the Haystack.Beacon modem (bad LAN IP binding) which was preventing it from bringing up OSPF on its LAN interface. This was fixed and that modem should act like an actual router now, moving traffic.
During the same Beacon testing, I was reminded that our Baldi-Beacon RF link sucks <https://monitoring.hamwan.net/cacti/graph.php?action=view&local_graph_id=919&rra_id=all>. It was optimized for speed to Tukwila, which is now gone, so a trip needs to be scheduled to Baldi to rotate the dish a few degrees north and get that link going strong.
I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well.
--Bart
_______________________________________________ PSDR mailing list PSDR@hamwan.org http://mail.hamwan.net/mailman/listinfo/psdr
Moving PSDR@ to BCC so as to not continue spamming people as we debug this. Ugh, the problem is not presenting itself today in the same way. :( But here was the gist of it as of last night: 21:17 <@EO_> [eo@QueenAnne.Haystack] /routing ospf lsa> /ip route check 44.24.242.36 21:17 <@EO_> status: ok 21:17 <@EO_> interface: wlan1 21:17 <@EO_> nexthop: 44.24.242.39 21:17 <@EO_> [eo@QueenAnne.Haystack] /routing ospf lsa> /ip route check 44.24.242.37 21:17 <@EO_> status: ok 21:17 <@EO_> interface: ether1 21:17 <@EO_> nexthop: 44.24.241.37 1) The modem being asked about the route decision is "QueenAnne.Haystack", which is @ Haystack, RF-linked to QueenAnne site and running OSPF with it. 2) The destination IP of 44.24.242.36 is bound to QueenAnne.CapitolPark, which is @ CapitolPark, RF-linked to QueenAnne site and running OSPF with it. 3) The destination IP of 44.24.242.37 is bound to CapitolPark.QueenAnne, which is @ QueenAnne, RF-linked to CapitolPark site and running OSPF with it. In other words, .36 and .37 are 2 sides of an RF link between QueenAnne and CapitolPark. .36 sits on the CapitolPark side and .37 sits on the QueenAnne side. The shortest path to both IPs ought to be via the Haystack->QueenAnne wlan1 RF hop, and then through the LAN @ QueenAnne for .37, and through a 2nd RF hop for .36. That means .36 ought to be MORE DISTANT than .37. Meanwhile, the routing table lookup shows wlan1 being taken for .36 (the further destination) and ether1 being taken for .37 (the closer destination). The ether1 path is longer than the RF path, as it has to traverse the Haystack LAN, then an RF link to K7NVH, then a VPN tunnel to Seattle, then another RF link to QueenAnne, then the QueenAnne LAN. This weird routing seems to have gone away today. The strange thing is, the route costs in OSPF persist at being weird: 65 44.24.242.36/32 intra-area 30 44.24.242.39 wlan1 66 44.24.242.37/32 intra-area 40 44.24.242.39 wlan1 Why would .37 be more costly here than .36? Here are the traces today: [eo@QueenAnne.Haystack] /routing ospf route> /tool traceroute 44.24.242.36 # ADDRESS LOSS SENT LAST AVG BEST WORST STD-DEV STATUS 1 44.24.242.39 0% 2 31.7ms 33.7 31.7 35.7 2 2 44.24.241.83 0% 2 4.8ms 11.2 4.8 17.5 6.4 3 44.24.242.36 0% 2 11.5ms 9.7 7.8 11.5 1.9 [eo@QueenAnne.Haystack] /routing ospf route> /tool traceroute 44.24.242.37 # ADDRESS LOSS SENT LAST AVG BEST WORST STD-DEV STATUS 1 44.24.242.39 0% 2 22.6ms 13.5 4.3 22.6 9.2 2 44.24.242.37 0% 2 4.1ms 4 3.9 4.1 0.1 All interface costs along the path are "10", so I don't get this OSPF calculation, even though the resulting route decision is finally correct today. Here are the LSAs from CapitolPark.QueenAnne and QueenAnne.CapitolPark, respectively, as seen by QueenAnne.Haystack: [eo@QueenAnne.Haystack] > /routing ospf lsa print detail where originator=44.24.241.83 instance=default area=backbone type=router id=44.24.241.83 originator=44.24.241.83 sequence-number=0x80000DB4 age=1225 checksum=0x6BD0 options="E" body= flags= link-type=Point-To-Point id=44.24.240.7 data=*44.24.242.37* metric=10 link-type=Stub id=*44.24.242.36* data=255.255.255.255 metric=10 link-type=Transit id=44.24.241.84 data=44.24.241.83 metric=10 [eo@QueenAnne.Haystack] > /routing ospf lsa print detail where originator=44.24.240.7 instance=default area=backbone type=router id=44.24.240.7 originator=44.24.240.7 sequence-number=0x800045DD age=1240 checksum=0x754E options="E" body= flags= link-type=Point-To-Point id=44.24.241.83 data=*44.24.242.36* metric=10 link-type=Stub id=*44.24.242.37* data=255.255.255.255 metric=10 link-type=Transit id=44.24.240.7 data=44.24.240.7 metric=10 instance=default area=backbone type=network id=44.24.240.7 originator=44.24.240.7 sequence-number=0x800000C4 age=952 checksum=0xD1F1 options="E" body= netmask=255.255.255.240 routerId=44.24.240.7 routerId=44.24.240.3 routerId=44.24.240.1 routerId=44.24.240.4 routerId=44.24.240.5 routerId=44.24.240.6 I have boldified the relevant IPs in each LSA. Can you make heads or tails of why the weird costs are being computed? --Bart On 3/30/2018 9:59 AM, Dylan Ambauen wrote:
Wow, Bart.
Let me look back at my ospf ptp configs. I'm no expert, but I recall it was a process of elimination to find the right settings. Are the ptp neighbors linking ospf in just one direction?
On Thu, Mar 29, 2018, 23:17 Bart Kus <me@bartk.us <mailto:me@bartk.us>> wrote:
After hours of debugging, I got this 20MHz single-polarity link moving up to 55Mbit. A high resolution spectrum analysis on both sides did indeed show a better frequency to use for the link. Spectrum analysis captures here:
There was also an IP conflict between two modems @ Capitol Park. The conflicting IP has been removed from CapitolPark-S3.
The QueenAnne modem used for the link is not one used anywhere else on the network. It's a very low-end modem, and as a result it was having CPU/RAM starvation issues when running our regular diagnostic tools, which lead to out-of-memory conditions and kernel crashes. A different test methodology had to be used to verify the link speed (testing through the modem, instead of to the modem).
The modem that links QueenAnne with the Westin building (on the QueenAnne side) had a mis-configured OSPF router-id. This was fixed.
I'm still seeing weird routing decisions being made by OSPF. These are triggered by our point-to-point route entries (44.24.242.0/24 <http://44.24.242.0/24> space). More research needs to be done here, and perhaps a re-write of how we define point-to-point interface addresses. Any OSPF experts in the house?
I also discovered R1.QueenAnne was still vulnerable to hacking due to a mis-configuration of its control software. It missed the updates that were sent out to the whole network. This has been fixed now. R1.QueenAnne also didn't have the diagnostic bandwidth-server setup correctly. This was fixed.
With the CapitolPark-QueenAnne link performing well now:
[eo@CapitolPark-QueenAnne] /system resource> /tool bandwidth-test 44.24.241.81 direction=both status: running duration: 57s tx-current: 28.0Mbps tx-10-second-average: 28.4Mbps tx-total-average: 27.5Mbps rx-current: 27.6Mbps rx-10-second-average: 28.0Mbps rx-total-average: 27.0Mbps lost-packets: 288 random-data: no direction: both tx-size: 1500 rx-size: 1500
its OSPF config has been reset to a normal preference level, so that packets no longer try to avoid that link as they are routed through the network. This link can be sped up by upgrading to a dual-polarity modem @ CapitolPark.
While testing if the OSPF hop cost was being calculated correctly in the Beacon-Haystack-QueenAnne RF link (they both connect to the same dish @ Haystack), I discovered a mis-config on the Haystack.Beacon modem (bad LAN IP binding) which was preventing it from bringing up OSPF on its LAN interface. This was fixed and that modem should act like an actual router now, moving traffic.
During the same Beacon testing, I was reminded that our Baldi-Beacon RF link sucks <https://monitoring.hamwan.net/cacti/graph.php?action=view&local_graph_id=919&rra_id=all>. It was optimized for speed to Tukwila, which is now gone, so a trip needs to be scheduled to Baldi to rotate the dish a few degrees north and get that link going strong.
I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well.
--Bart
_______________________________________________ PSDR mailing list PSDR@hamwan.org <mailto:PSDR@hamwan.org> http://mail.hamwan.net/mailman/listinfo/psdr
_______________________________________________ PSDR mailing list PSDR@hamwan.org http://mail.hamwan.net/mailman/listinfo/psdr
Popping Bart's old email to top of stack. Re QA-Westin link. On Thu, Mar 29, 2018 at 11:17 PM Bart Kus <me@bartk.us> wrote:
After hours of debugging, I got this 20MHz single-polarity link moving up to 55Mbit. A high resolution spectrum analysis on both sides did indeed show a better frequency to use for the link. Spectrum analysis captures here:
There was also an IP conflict between two modems @ Capitol Park. The conflicting IP has been removed from CapitolPark-S3.
The QueenAnne modem used for the link is not one used anywhere else on the network. It's a very low-end modem, and as a result it was having CPU/RAM starvation issues when running our regular diagnostic tools, which lead to out-of-memory conditions and kernel crashes. A different test methodology had to be used to verify the link speed (testing through the modem, instead of to the modem).
The modem that links QueenAnne with the Westin building (on the QueenAnne side) had a mis-configured OSPF router-id. This was fixed.
I'm still seeing weird routing decisions being made by OSPF. These are triggered by our point-to-point route entries (44.24.242.0/24 space). More research needs to be done here, and perhaps a re-write of how we define point-to-point interface addresses. Any OSPF experts in the house?
I also discovered R1.QueenAnne was still vulnerable to hacking due to a mis-configuration of its control software. It missed the updates that were sent out to the whole network. This has been fixed now. R1.QueenAnne also didn't have the diagnostic bandwidth-server setup correctly. This was fixed.
With the CapitolPark-QueenAnne link performing well now:
[eo@CapitolPark-QueenAnne] /system resource> /tool bandwidth-test 44.24.241.81 direction=both status: running duration: 57s tx-current: 28.0Mbps tx-10-second-average: 28.4Mbps tx-total-average: 27.5Mbps rx-current: 27.6Mbps rx-10-second-average: 28.0Mbps rx-total-average: 27.0Mbps lost-packets: 288 random-data: no direction: both tx-size: 1500 rx-size: 1500
its OSPF config has been reset to a normal preference level, so that packets no longer try to avoid that link as they are routed through the network. This link can be sped up by upgrading to a dual-polarity modem @ CapitolPark.
While testing if the OSPF hop cost was being calculated correctly in the Beacon-Haystack-QueenAnne RF link (they both connect to the same dish @ Haystack), I discovered a mis-config on the Haystack.Beacon modem (bad LAN IP binding) which was preventing it from bringing up OSPF on its LAN interface. This was fixed and that modem should act like an actual router now, moving traffic.
During the same Beacon testing, I was reminded that our Baldi-Beacon RF link sucks <https://monitoring.hamwan.net/cacti/graph.php?action=view&local_graph_id=919&rra_id=all>. It was optimized for speed to Tukwila, which is now gone, so a trip needs to be scheduled to Baldi to rotate the dish a few degrees north and get that link going strong.
I normally wouldn't send this kind of verbose email to psdr@, but I hope it's illuminating as to the type and extent of work required to keep this network running well.
--Bart
_______________________________________________ PSDR mailing list PSDR@hamwan.org http://mail.hamwan.net/mailman/listinfo/psdr
participants (4)
-
Bart Kus -
Carl -
Dylan Ambauen -
Randy Neals