10 Mar 2012 17:53:29
There was a distrubance in the force around 17:25 where one of our LNSs (D) decided it had sent too much data to BT (4.2GB too much). Any programmers amongst you will spot the relevance there. So it immediately reduced throughput and it was not until 17:29 that full capacity was restored. It did not stop all traffic, and did not damp premium customers as much as non premium. It was acting as if the link was over full when it was not. It affected a third of customers - the other active LNSs not being affected.
This buggers up my perfect 100.0% no dropped packets stats, I expect.
We're moving lines off "D" tonight, and investigating how the code could get confused like this. There will be rolling updates over the next week I expect once we find the exact cause.
10 Mar 2012 17:58:21
Actually - there was more - the confused stats may be related to something else - some sort of big blip. So checking that too.
10 Mar 2012 18:04:19
Actually, this started 17:25:45 with a lot of lines dropping on out "D" LNS.
This looks much more like a BT blip now - triggering some minor bug in our traffic sharing code which we'll get fixed in due course.
We saw a lot of lines on BT lines timeout all at the same time. Oddly only about half of the lines on that LNS.
Logs show we got lines up damn fast.
|Started||10 Mar 2012 17:25:41|
|Closed||10 Mar 2012 17:29:06|
10 Mar 2012 17:29:06
D.gormless slight blip - Closed