At just before 17:00 today a minor routing announcement change was made. This was a very minor change and was not run as a proper planned work because it was so minor, or so we thought. Even so the change was not done in the middle of the day.
At 17:36 we were advised by MSO text that customers on Ethernet links were seeing problems. By 17:44 we had identified the cause of the problem and resolved it.
Technical details: The problem is in fact related to exhaustion of IPv4 space. We are involved in arrangements to swap IP blocks around between ISPs to free up some additional space, and this meant fragmenting one of our larger IP blocks. In principle this is simple, we announce the smaller blocks and drop the larger block announcement later. Unfortunately one of the smaller blocks is routed to Ethernet connected customers and was included in the list annonced at Telehouse in London.
The impact was that some of the Internet was not accessable to Ethernet customers, depending on routing. This included all of our ADSL lines and many services peering on LINX in London which could not see Ethernet lines. Unfortunately, as a partial failure it was not picked up by any of our automated monitoring which could still see Ethernet customers.
This was human error, and resolved within 8 minutes of being reported on a bank holiday weekend out of hours.
We will consider ways to manage this better - it is a trade off as we could have put more in automated control which would have stopped this specific error but could lead to more radical errors being possible with a lot less effort. We're reviewing procedures for checking changes like this before hand in future.
Sorry for the inconvenience.