Posted in Service Status by Jon on 25/09/2013 @ 08:07
We're in the process of diagnosing a very odd connectivity issue that seems to be affecting a small amount of international routes.
ICMP based traffic seems unaffected but other services and protocols seems to encounter in the region of 40% PL which therefore may not be triggering peoples monitoring systems.
As soon as we have further information, we'll update this post.
We believe this issue was resolved a few minutes back but are waiting for proper confirmation before we update the issue status accordingly.
Early indications suggest there was a problem with a transit carrier that has been temporarily taken out of service pending further investigation.
We'll update further (hopefully to conclude this open issue) shortly.
Whilst we've had confirmation that the symptoms some users were experiencing earlier have been resolved by downing sessions with carrier NTT at our upstreams POP in SOV, we're still awaiting feedback from NTT as to what exactly the issue was.
Once we've had this information we'll respond to this resolved issue again.
We've had feedback via our upstream that the affected carrier, NTT, suffered from a CAM corruption issue on at least one of its routers. The end result seems to be that smaller/simpler packets such as ICMP based traffic were able to pass without an issue where as other packets got lost within the interwebs!
We understand that NTT are working on diagnosing the who/what/why's with their router vendor and in the interim our upstream has dropped any sessions going over the affected hardware.
We've had confirmation back today that the affected carrier suffered from a software bug which caused programming errors on *some* of their line cards.
As the issue wasn't hardware based as initially suspected, the symptoms have been avoided temporarily (until a OS patch can be provided by the hardware vendor) by implementing additional filters.