Monthly Archives: February 2021

Data Center Downtime at the Core and the Edge: A Survey of Frequency, Duration and Attitudes

Edge computing is expanding rapidly and re-shaping the data center ecosystem as organizations across industries move computing and storage closer to users to improve response times and reduce bandwidth requirements.

While forms of distributed computing have been common in some sectors for years, this current evolution is distinct in that it is enabling a broad range of new and emerging applications and has higher criticality requirements than traditional distributed computing sites.

At the same time, core data center managers are dealing with increased complexity and balancing multiple and sometimes conflicting priorities that can compromise availability.

As a result, today’s data center networks are more vulnerable to downtime than ever before. In an effort to quantify that vulnerability, the Ponemon Institute conducted a study of downtime frequency, duration and attitudes at the core and the edge, sponsored by Vertiv.

The study is based on responses from 425 participants representing 132 data centers and 1,667 edge locations. All core and edge data centers included in the study are located in the United States/Canada and Latin America (LATAM).

The study found data center networks vulnerable to downtime events across the network. Core data centers experienced an average of 2.4 total facility shutdowns per year with an average duration of more than two hours (138 minutes). This is in addition to almost 10 downtime events annually isolated to select racks or servers. At the edge, the frequency of total facility shutdowns was even higher, although the duration of those outages was less than half that of those in core data centers.

The study also looks at the attitudes that shape decisions regarding core and edge data centers to help identify factors that could be contributing to downtime events. More than half (54%) of all core data centers are not using best practices in system design and redundancy, and 69% say their risk of an unplanned outage is increased as a result of cost constraints.

Leading causes of unplanned downtime events at the core and the edge included cyberattacks, IT equipment failures, human error, UPS battery failure, and UPS equipment failure.

Finally, the study asked participants to identify the actions their organizations could take to prevent future downtime events. They identified activities ranging from investment in new equipment to infrastructure redundancy to improved training and documentation.

Key Findings

Facility Size
Edge data centers aren’t necessarily defined by size but by function. For the purpose of this research, edge data centers are defined as facilities that bring computation and data storage closer to the location where it is needed to improve response times and save bandwidth. Nevertheless edge data centers were on average about one-third the size of the core data centers.

The extrapolated size for core data centers that participated in this study is 15,153 square feet/1,408 square meters. For edge computing facilities, the average size is 5,010 square feet/465 square meters.

Frequency of Core and Edge Downtime

 Figure 3 shows the shutdown experience of participating data centers over the past 24 months. As can be seen, total data center shutdown has the lowest frequency (4.81). However, these events are also the most disruptive, and the 4.81 unplanned total facility shutdowns over a 24-month period would be considered unacceptable for many organizations.

Partial outages of certain racks in the data center have the highest frequency at 9.93, followed by individual server outages at 9.43.

It can be difficult to directly compare the total number of downtime events in edge and core facilities due to the higher complexity generally found in core data centers and the increased presence of personnel in these facilities. However, it is possible to compare total facility shutdowns for core and edge data centers. Edge data centers experienced a slightly higher frequency of total facility shutdowns at an average of 5.39 over 24 months. As edge sites continue to proliferate, reducing the frequency of outages at the edge will become a high priority for many organizations.

TO READ THE REST OF THIS REPORT, VISIT VERTIV’S WEBSITE

Facebook needs a corrections policy, viral circuit breakers, and much more

Bob Sullivan

“After hearing that Facebook is saying that posting the Lord’s Prayer goes against their policies, I’m asking all Christians please post it. Our Father, who art in Heaven…”

Someone posted this obvious falsehood on Facebook recently, and it ended up on my wall. Soon after, a commenter in the responses broke the news to the writer that this was fake. The original writer than did what many social media users do: responded with a simple, “Well, someone else said it and I was just repeating it.” No apology. No obvious shame. And most important, no correction.

I don’t know how many people saw the original post, but I am certain far fewer people saw the response and link to Snopes showing it was a common hoax.

If there’s one thing that people rightly hate about journalists, it’s our tendency to put mistakes on the front page and corrections on the back page. That’s unfair, of course. If you make a mistake, you should try to make sure just as many people see the correction as the mistake. Journalists might be bad at this, but Facebook is awful at it. This is one reason social media is such a dastardly tool for sharing misinformation.

Fixing this is one of the novel recommendations in a great report published late last year by the Forum on Information and Democracy, an international group formed to make recommendations about the future of social media. The report suggests that, when an item like this Our Father assertion spreads on social media and is determined to be misinformation by an independent group of fact checkers, a correction should be shown to each and every user exposed to it. So, not just a lame “I dunno, a friend said it” that’s buried in later comments. Rather, it would require an attempt to undo the actual spread of incorrect information.

Anyone concerned about the future of democracy and discourse should take a look at this report. I’ve summarized a few of the highlights below.

Facebook recently said it would start downplaying political posts in an effort to deal with the disinformation problem. It must do much more than that.   The Forum on Information and Democracy offers a good start.

 

Circuit breakers

The report calls for creation of friction to prevent misinformation from spreading like wildfore. My favorite part of this section calls for creation of “circuit breakers” that would slow down viral content after it reaches a certain threshold, giving fact-checkers a chance to examine it. The concept is borrowed from Wall Street. When trading in a certain stock falls dramatically out of pattern — say there’s a sudden spike in sales — circuit breakers kick in to give traders a moment to breathe and digest whatever news might exist. Sometimes, that short break re-introduces rationality to the market. Note: This wouldn’t infringe on anyone’s ability to say what they want, it would merely slow down the automated augmentation of this speach.

A digital ‘building code’ and ‘agency by design’

You can’t buy a toaster that hasn’t been tested to make sure it’s safe, but you can use software that might, metaphorically, burn your house down. This metaphor was used a lot after the mortgage meltdown in 2008, and it applies here, too. “In the same way that fire safety tests are conducted prior to a building being opened to the public, such a ‘digital building code’ would also result in a shift towards prevention of harm through testing prior to release to the public,” the report says.

The report adds a few great nuggets. While being critical of click-to-consent agreements, it says: “In the case of civil engineering, there are no private ‘terms and conditions’ that can override the public’s presumption of safety.” The report also calls for a concept called “agency by design” that would require software engineers to design opt-ins and other moments of permission so users are most likely to understand what they are agreeing to. This is “proactively choice-enhancing,” the report argues.

Abusability Testing

I’ve long been enamored of this concept. Most software is now tested by security professionals hired to “hack” it, a process called penetration testing. This concept should be expanded to include any kind of misuse of software by bad guys. If you create a new messenger service, could it be abused by criminals committing sweetheart scams? Could it be turned into a weapon by a nation-state promoting disinformation campaigns? Could it aid and abet those who commit domestic violence? Abusability testing should be standard, and critically, it should be conducted early on in software development.

Algorithmic undue influence

In other areas of law. contracts can be voided if one party has so much influence over the other that true consent could not be given. The report suggests that algorithms create this kind of imbalance online, so the concept should be extended to social media.

“(Algorithmic undue influence) could result in individual choices that would not occur but for the duplicitous intervention of an algorithm to amplify or withhold select information on the basis of engagement metrics created by a platform’s design or engineering choice.”

“Already, the deleterious impact of algorithmic amplification of COVID-19 misinformation has been seen, and there are documented cases where individuals took serious risks to themselves and others as a result of deceptive conspiracy theories presented to them on social platforms. The law should view these individuals as victims who relied on a hazardously engineered platform that exposed them to manipulative information that led to serious harm.”

De-segregation

Social media has created content bubbles and echo chambers. In the real world, it can be illegal to engineer residential developments that encourage segregation. The report suggests similar limitations online.

“Platforms … tacitly assume a role akin to town planners. … Some of the most insidious harms documented from social platforms have resulted from the algorithmic herding of users into homogenous clusters.. What we’re seeing is a cognitive segregation, where people exist in their own informational ghettos,” the report says.

In order to promote a shared digital commons, the report makes these suggestions.

Consider applying equivalent anti-segregation legal principles to the digital commons, including a ban on ‘digital redlining’, where platforms allow groups or advertisers to prevent particular racial or religious groups from accessing content.

Create legal tests focused on the ultimate effects of platform design on racial inequities and substantive fairness, regardless of the original intent of design.

Create specific standards and testing requirements for algorithmic bias.

Disclose conflict of interests

The report includes a lot more on algorithms and transparency, but one item I found noteworthy: If a platform promotes or demotes content in a way that is self-serving, it has to disclose that. If Facebook’s algorithm makes a decision about a new social media platform, or about a government’s efforts to regulate social media, Facebook would have to disclose that.

These are nuggets I pulled out, but to grasp the larger concepts, I strongly recommend you read the entire report.