Sunday, September 18, 2011

So this was my day on Friday

The Short Version of the Story:
Our DS3 in Houston went down at approximately 8:15 AM on 09/16/11 due to equipment failure on AT&T's end. After several hours on the phone with AT&T we tracked down the problem. It was fixed at approximately 4:45 PM on 09/16/11, resulting in an approximately 7 hour 30 minute outage period.

The Long Version of the Story:
On Friday 09/16/2011 we had an issue with our DS3 in the Houston LATA.  At approximately 8:15 AM our internal monitoring system reported all sites in the Houston LATA were down.  Those schools include Chireno, Cushing, Diboll, Douglass, Excelsior, Lufkin, Martinsville, Nacogdoches, and Shelbyville.  A few minutes later we confirmed an issue with the DS3 and called in to AT&T.  After 15 minutes on hold we got a ticket put in, but were told it could take up to two hours under "ideal conditions" before we get to a tester and in fact at 10:35 we got a call back from an AT&T tester.  Then a three hour phone call occurred with the tester, where the AT&T tester and a technician was trying to, unsuccessfully, trace the DS3 Circuit from Houston back to our equipment in Longview.  During this three hours they also replaced various pieces of equipment where our circuit is located in Houston.  After several internal phone calls the AT&T Tester finally found out what portion of AT&T owned the remaining portion of the circuit they were unable to previously trace.  The tester at this time, 1:13 PM, told me he had to call them and would call me back some time later in the afternoon.  I called AT&T back at 2:45 PM not having heard from my tester for over one and a half hours.  After being on hold for twenty-five minutes I finally get back to the tester at 3:10.  He is able to trace the circuit back to Longview and sees an issue on some equipment at AT&T's Longview location.  A ticket then had to be put in with the AT&T Longview office as this tester was unable to do anything else from his location in Chicago, this was at 3:25 PM.  At 4:44 PM our internal monitoring service noticed that everything in the Houston LATA came back up.  Approximately 30 minutes later I got a phone call from my original AT&T Tester.  He got word from the AT&T Technicians in Longview that our circuit was moved to a new port.  This seems to have resolved the issue.