Status Blog
Welcome to the official offsite news and network announcements blog for RackSRV Communications. From this blog we will announce any planned maintenance, known service issues, promotions and general industry news so please consider bookmarking or utilising our RSS feed to keep yourself informed!

VPS Node #2 Service Disruption - Resolved

    Posted in Service Status by Jon on 29/12/2015 @ 03:35

We are aware of a service impacting issue on VPS Node #2 and are currently diagnosing the fault.

Further updates will be posted as more information becomes available.

Update @ 04:07 on 29/12/15 by Jon

Our investigations so far has discovered that the relevant Xen kerne isn't fully loading on boot and we believe this issue to be related to a degraded RAID array.

We're currently attempting to get the LSI RAID controller to rebuild the array with one of the two available global hot spares but this has so far, proven problematic.

We'll continue to update this task with our progress.

Update @ 04:47 on 29/12/15 by Jon

Unfortunately we've reached a point in our diagnosis where we can make no further progress until the RAID array issue has been resolved.

We've been able to get the VPS node to boot into it's non Xen DomU kernel and from there can launch LSI Mega RAID tools without a problem which provides us much better tools and diagnostics than the RAID BIOS does.

Currently we are rebuilding the RAID array using one of the hot spares but are refraining from re-attempting to load the Xen kernel until the RAID array health is optimal - this task is due to complete in approx 50 mins.

As such, we expect to post a further update in approx 50-60 mins.

Update @ 05:47 on 29/12/15 by Jon

The RAID array has just finished rebuilding and as such, we're just about to re-try the Xen kernel.

Update @ 05:58 on 29/12/15 by Jon

Despite the RAID array now being healthy, loading the Xen kernel still isn't working properly and as such, we're resuming our investigation.

Update @ 06:14 on 29/12/15 by Jon

After initially being sent down the wrong path for the last few hours due to discovering the degraded RAID issue, it seems that the root of the cause was actually related to an automatically applied Ksplice kernel patch.

We utilise tools like Ksplice and KernelCare as they are designed to provide our customers uninterrupted service whilst retaining the necessary benefits of automatic security updates but clearly in this instance, something went terribly wrong!

In due course, we'll be looking into exactly what went wrong and how we can better protect ourselves and our clients from future, similar issues but to conclude this status announcement thread:
  • Service to VPS2 should now be operational as of 06:15 GMT - if your VPS is not back online already then please contact support
  • The initially reported RAID degradation issue was not the cause and seems to have occurred due to the forced reboot of the VPS node after it initially crashed
  • The RAID array of VPS #2 is back to optimal including with available global hot spares
Please accept our sincere apologies for any inconvenience this extended period of service disruption may have caused.

Recent News
We have received notification from our upstream that maintenance is due to take
We have received notification from our upstream that maintenance is due to take
Just a quick reminder that as per yesterdays network maintenance notification, w
Knowledgebase Articles