WIFI Outage
UIT Alert: WIFI Outage
Current Status
WIFI Outage
Resolution Statement
Actions and Root Causes
Team,
Summary, Wi-Fi services were impacted for a segment (16/82 sensors showing errors) of the campus at least from 6am-8:30am. Cluster 1 is stable with only 5 out of 8 controllers in operation, and will be restored to 8 controllers tomorrow morning. The cause has not been determined but a bug in the current version is suspected which is fixed in the latest version.
Timeline and steps;
Our first report of Wi-Fi services being down was 6:45 AM this morning. Services were only impacting Cluster 1 and even then they were only “probably” impacting a portion of the APs. Only 16 of our 82 sensors were showing active alarms.
Layer3 and Aruba were extremely responsive and the Aruba engineer we worked with at the beginning of the semester saw UH pop up in the que and took the ticket. This helped because he was familiar with our environment.
Cluster 1 has been stable since 8:30am and is only running 5 of the 8 controllers. Due to NextGen upgrades this cluster will not see any capacity issues with only 5 controllers. We have isolated the 3 bad controllers and will reboot today and try to bring them in a ready state. We have a meeting with the same engineer tomorrow to reestablish these three controllers to the cluster. This re-establishment will be invisible to students but we did not want to do this during the day today just in case.
The cause has not been determined and they will look at logs send today but the engineers suspicion is this is related to several known clustering bugs in the current version we are running. Our plan was to upgrade, due to these issues, next week but it popped up before.
Once we reestablish the 3 isolated controllers into cluster 1 we will discuss the best time to upgrade the controllers to the latest version.
Our expectation is that Cluster 1 will remain stable for the duration of the day. The specific error we will be looking to validate that is the “Wi-Fi association failed” on the Cape sensors.
Corrective and Preventative Measures
n/aAffected Services
- UHSecure Wireless network
- UH Wireless network
Event Updates
Team,
Summary, Wi-Fi services were impacted for a segment (16/82 sensors showing errors) of the campus at least from 6am-8:30am. Cluster 1 is stable with only 5 out of 8 controllers in operation, and will be restored to 8 controllers tomorrow morning. The cause has not been determined but a bug in the current version is suspected which is fixed in the latest version.
Timeline and steps;
Our first report of Wi-Fi services being down was 6:45 AM this morning. Services were only impacting Cluster 1 and even then they were only “probably” impacting a portion of the APs. Only 16 of our 82 sensors were showing active alarms.
Layer3 and Aruba were extremely responsive and the Aruba engineer we worked with at the beginning of the semester saw UH pop up in the que and took the ticket. This helped because he was familiar with our environment.
Cluster 1 has been stable since 8:30am and is only running 5 of the 8 controllers. Due to NextGen upgrades this cluster will not see any capacity issues with only 5 controllers. We have isolated the 3 bad controllers and will reboot today and try to bring them in a ready state. We have a meeting with the same engineer tomorrow to reestablish these three controllers to the cluster. This re-establishment will be invisible to students but we did not want to do this during the day today just in case.
The cause has not been determined and they will look at logs send today but the engineers suspicion is this is related to several known clustering bugs in the current version we are running. Our plan was to upgrade, due to these issues, next week but it popped up before.
Once we reestablish the 3 isolated controllers into cluster 1 we will discuss the best time to upgrade the controllers to the latest version.
Our expectation is that Cluster 1 will remain stable for the duration of the day. The specific error we will be looking to validate that is the “Wi-Fi association failed” on the Cape sensors.
WiFi outage, sporadically effecting throughout the main campus. SME's have identified the issue are working to resolve as soon as possible.