CCSP Domain 3 - Data Center Design

Download a FREE Printable PDF of all the CCSP MindMaps!

Your information will remain 100% private. Unsubscribe with 1 click.

Transcript

Introduction

Hey, I’m Rob Witcher from Destination Certification, and I’m here to help you pass the CCSP exam. We are going to go through a review of the major topics related to the management plane in Domain 3, to understand how they interrelate, and to guide your studies.

Image of Data Design Center - Destination Certification

This is the third of seven videos for Domain 3. I have included links to the other MindMap videos in the description below. These MindMaps are a small part of our complete CCSP MasterClass.

Data Center Design

If you want to design and build a cloud data center, there is a lot you need to think about. Cloud data centers need to achieve a lot of difficult goals. They need to:

  • Integrate a complex set of systems that need to be agile
  • Provide rapid and elastic response to changing client needs
  • Provide high levels of confidentiality, integrity and availability
  • Provide low latency between customers and other data centers

These are all hard things to do, but there’s one more critical requirement: the cloud data center needs to operate at significantly greater levels of efficiency than a traditional data center. 

So you need to achieve all of these difficult goals and do it at lower cost than a traditional data center–if you can’t make it cheaper for customers they probably aren’t going to want to move their systems and data to the cloud. 

Not an easy thing to do, so let’s walk through the critical things you need to consider. 

Security Survey (Risk Management)

Starting with physical security, a cloud data center definitely needs good physical security controls and they need to be cost effective. How do you figure this out? By conducting a security survey–which is similar to the risk management process that we discussed back in domain 1.

As part of the security survey, you are conducting the following steps:

  • Target identification: what valuable assets are there related to the data center.
  • Threat definition: what are the threats related to the assets.
  • Facility characteristics: what vulnerabilities exist related to the assets.

By going through the security survey process you can identify the valuable assets of the cloud data center, the risks associated with those assets, and what controls should be implemented to mitigate the risks. 

Physical security is critically important for a cloud data center, so we devote a complete MindMap to it here in Domain 3.

Logical Design

When designing something as complex as a data center, it helps to start at a fairly abstract high level, which is logical design. Logical design is the abstract, conceptual layout of a data center and it focuses on how data flows, how systems are organized, and how the infrastructure interacts on a functional level. Logical design includes the network architecture, application architecture, overall management strategies, etc. All the conceptual stuff, but not the physical components like servers or cables. 

Physical Design

Which brings us to physical design: the actual, tangible infrastructure of the data center. This includes the specific physical solutions to address the logical needs, such as the types of generators, racks, switches and routers. Physical design covers all of the actual hardware, the layout of equipment rooms, power distribution, cooling and HVAC systems, physical security measures, and fire suppression systems. Just to name a few of the major physical systems.

Design Considerations

There is a lot to take into consideration when logically and physically designing a datacenter. A few major examples include:

Location

Where will the data center be located? Will it be in a cold environment, meaning that you won’t need substantial cooling infrastructure? Or will it be somewhere hot, where cooling will be a significant design concern. 

Services to be provided

What kind of cloud services do we want the data center to provide? A cheap storage provider will have very different requirements compared to a provider of high-end machine learning.

Tenant Partitioning

Most cloud is public cloud, so it’s really important that we isolate our tenants to prevent them from being able to access each other’s data. We can physically partition tenants by placing them on separate pieces of hardware, but this lacks the shared efficiencies that make public cloud so attractive. Instead, most isolation is done logically, so that customers can’t access each other's systems and data, but they can still share the same underlying hardware.

Build or buy

Another thing we need to think about when designing a data center is whether to build or buy. Are we planning to build a completely new data center, or can we buy an existing one? Building your own custom data center can have high upfront costs, and will require substantial time and expertise, but it can also result in an incredibly efficient data center that is well suited to the business’ needs, bringing the long term operating costs down. If you buy an existing data center, or re-purpose some existing location, you may be able to save time and money upfront, but your ongoing operating costs will be higher because the data center isn’t tailor-made to your needs.

(HVAC)

HVAC stands for heating, ventilation and air-conditioning, and it’s incredibly important in data centers. 

Temperature

If we want our hardware to run optimally and to last for a long time, then we need to make sure that the temperature and humidity are within the correct range.

64.4°F / 18°C, 80.6°F / 27°C

Data centers should maintain a minimum temperature of 64.4 degrees Fahrenheit or 18 degrees Celsius and a maximum temperature of 80.6 degrees Fahrenheit, or 27 degrees Celsius. So, the optimal range is 18-27 degrees Celsius. Or for you Americans and your wildly incorrect temperature measurement system: 64.4° to 80.6° Fahrenheit. 

Humidity

The right humidity range is also important.

40 – 60%

It should be kept at 40-60 percent relative humidity.

Image of Temperature & Humidity - Destination Certification

Here’s a helpful table summarizing the ideal temperature and humidity ranges you need to memorize. For the temperature ranges, you can remember the Celsius or Fahrenheit, whichever makes more sense to you–it’s an international exam, so both temperature systems will be given in the answers. 

Air Quality

Filtering the air being pumped into the data center with the HVAC equipment is also important. You do not want dust or contaminants in the air because those contaminants will be sucked into the servers and other equipment, clogging up their heatsinks and causing them to overheat and fail. 

Properly managing the air within our data centers provides big benefits:

  • Reduces the rates of equipment failure.
  • Reduces upfront costs.
  • Allows for increased power density.
  • Reduces ongoing operational costs.

Positive Pressurization

As part of maintaining good air quality, it’s best for a data center to have slightly positive pressurization, so that air is pushed out of the building rather than sucked in. This can help to limit dust and debris from getting sucked into the building through cracks or whenever someone opens the door. Ultimately, this helps to keep the servers clean and helps them last longer.

Latent Cooling (remove moisture)

Here’s a couple of terms you’ve likely never heard of that are important to remember. Latent cooling is the ability of the HVAC system to remove moisture.

Sensible Cooling (remove heat)

Sensible cooling is the ability of the HVAC system to remove heat. 

Containment

The final piece we’ll cover related to HVAC here is containment. We use containment to ensure that the cool air entering into the servers is separated from the hot air exiting the servers.

Hot Aisle

Hot aisle containment involves a sealed hot aisle, separate from the rest of the data center. Servers suck in cool air from the open data center, and then expel the warmed up air into the hot aisle, so that it can be exhausted directly out without mixing with the cooler air in the rest of the data center. .

Cold Aisle

Cold aisle containment is the opposite, with a sealed cold aisle bringing cool air into the server intakes, while the warmed up air sent out by the server is pushed out into the data center. It’s much more pleasant to work in a data center with hot aisle containment!

Image of Aisle Separation - Destination Certification

Here’s a diagram that should help you visualize the difference between hot isle and cold isle containment.

Cable Management

Well-planned cable management is important for preventing obstructions and congestion throughout the data center. The Building Industry Consulting Service International (BISCI), creates standards for cable management.

MVPC

Multi-vendor pathway connectivity involves having multiple power and network connections to a data center. This gives us redundancies in case one provider goes down, eliminating single points of failure. It’s not uncommon for a road crew working beside a data center to dig up and sever the fiber optic cable running into the building. So you want power and network connections coming into the data center from multiple providers on different sides of the data center. 

Availability

Data centers are designed to have different levels of availability, depending on the type of service that the center offers. We measure availability in the number of nines. So for example 5 nines offers 99.999 percent availability, and seven nines offering 99.99999% availability. You don’t need to memorize that. 

Uptime Institute’s Standard

The Uptime Institute develops standards for data center availability and performance. It has created four separate tiers, with Tier 1 providing the least uptime, and Tier 4 providing the most.

Tier 4 (2N+1)

Tier 4 is Fault-Tolerant, offering 99.995% uptime, which means just 2.4 minutes of downtime annually. This is because it has 2N+1 fully redundant systems for power, network and cooling.

Tier 3 (2N)

Tier 3 is Concurrently Maintainable, with 99.982% uptime, and only 1.6 hours of downtime per year. It has 2N fault tolerant systems.

Tier 2 (N+1)

Tier 2 is Redundant Capacity Components, with 99.749% uptime and only 22 hours of downtime per year. It has N+1 partial redundancy in power network and cooling systems.

Tier 1 (N)

Tier 1 is Basic Capacity, with 99.671% uptime and 28.8 hours of downtime each year. It has N no redundancy systems.

Image of Uptime Institute’s Tier Levels - Destination Certification

Here’s a diagram that shows the Uptime Institute’s tier levels–what the key bits of information highlighted in yellow–the names of four tiers and this N stuff.

Image of Fault Tolerance - Destination Certification

This diagram shows the N stuff in more detail: 2N+1, 2N, N+1 and N

Uptime

The terms uptime and availability are often used interchangeably and that’s dangerous. As a customer you really don’t care about uptime. What you really care about is availability. Uptime means that a system is powered on, including when it is in maintenance mode–and not accessible by a customer! In contrast, availability not only means that a system is up, but that a customer can connect to the system and use it. As a customer what you truly care about are the number of nines of availability. 

Maintenance Mode

Let me define this term maintenance mode a little more clearly–this is important to remember for the exam. Maintenance mode is where a system is up and running, but not accessible by the consumer. So for example when the cloud provider needs to patch a system, they can put it into maintenance mode. The system is up, local logging is happening, but the system is not accessible by the consumer. 

Maintenance

Digging more deeply into maintenance, there are a couple of terms you need to know about.

MTBF

The MTBF–mean time between failure–is essentially the average time that a piece of hardware lasts for. A longer lifespan means that you won’t have to replace components as frequently. So you want to maximize the MTBF.

MTTR

MTTR is the mean time to repair. It’s basically a measurement of how long it takes to replace a component in your data center, such as a hard drive. The lower the MTTR, the fewer technicians you will need, and the more efficient your data center will become. So you want to minimize the MTTR.

Design Resilient

Now, the final item! Resiliency is the ability of a system to gracefully handle and recover from failures. If we want to ensure that our data centers offer high availability, we must design resilient data centers that can handle failures and maintain services.

Image of Data Design Center - Destination Certification

That’s it for our overview of data center design in Domain 3, covering the most important topics you need to know for the exam.

Image of next mindmap - Destination Certification

If you found this video helpful you can hit the thumbs up button and if you want to be notified when we release additional videos in this MindMap series, then please subscribe and hit the bell icon to get notifications.

I will provide links to the other MindMap videos in the description below.

Thanks very much for watching! And all the best in your studies!

Image of masterclass video - Destination Certification

The easiest way to get your CCSP Certification 


Learn more about our CCSP MasterClass