Yesterday Windows Azure experienced a worldwide disruption in many services due to an expired PKI certificate for Windows Azure storage. Mary Jo Foley’s article Windows Azure storage issue: Expired HTTPS certificate possibly at fault provides the best coverage of the event as it unfolded. You can also take a look at a few threads on the Windows Azure forum and Stack Overflow that provide a lot of commentary on the event. The effects of this disruption rippled through most of the other Windows Azure services. Even if you modified your application to use HTTP instead of HTTPS it’s likely you still had issues given that the rest of the platform was crippled by the expired certificate.

It’s disappointing this happened but highlights a pretty common situation. This has nothing to do with the merits of the Windows Azure storage service or any other parts of the platform – this is an operations management issue, plain and simple. The irony is that, as a number of folks including Lars Wilhelmsen have pointed out, there are tools like Microsoft SCOM that provide a Certificate Management Pack that can notify operations of expiring certificates. I can’t imagine the operations team at Windows Azure doesn’t use some kind of tool to manage expiring certificates.

As a developer, I found myself curious to see just how hard it is to determine the expiration of a certificate by checking the URI. Turns out, it’s pretty simple by using System.Net.ServicePoint which provides connection management for HTTP/S connections.

private string GetSSLExpiryDate()
{
    string url = "https://www.aditicloud.com/";
    var request = WebRequest.Create(url) as HttpWebRequest;
    var response = request.GetResponse();

    if (request.ServicePoint.Certificate != null)
    {
        return request.ServicePoint.Certificate.GetExpirationDateString();
    }
    else
    {
        return string.Empty;
    }
}

Pretty simple. What’s hard is the practice of managing and tracking these sorts of things.

I would expect that Microsoft will ensure that this kind of problem never happens again. It’s embarrassing yet solvable. Yet it exposes an issue that most of us will also have to account for – expiring certificates. If it can happen to Microsoft, it can happen to us too.

  • Pingback: 2月23日の空は雷雨でした | ブチザッキ

  • jou su

    Thank you for this post, however a concern: This is useful if you have a web page to reference to however what we saw in azure on Friday was blobstorage ( and other services) not being accessible and the closest to a URL these services have is https://mydevstorage.blob.core.windows.net   where there is no default webpage to query against. If we query that URL we get a   400 error “page not found”. Is there a way that I can still query such a services without first having to upload a webpage to a blob container?, what about querying table storage or queues?.  

    • Wade Wegner

      I don’t know how to accomplish this task if there’s not an accessible resource. Take a look at http://pkiexpiration.azurewebsites.net/. You’ll likely have to create a public container – it could just be the root – and make some kind of resource available.

  • Pingback: Reading Notes 2013-03-04 | Matricis