• Home
  • /
  • Resources
  • /
  • CCSP Domain 3 Explained: Securing Cloud Platforms and Infrastructure

Estimated reading time minutes

Image of CCSP domain 3 thumbnail - Destination Certification

Rob Witcher

Last Updated On: September 3, 2024

Picture this: You're building a fortress in the sky. Sounds impossible, right? Well, welcome to CCSP Domain 3, where we do just that—but with clouds and data. We're not talking about fluffy white castles, but robust digital fortresses that keep our information safe in the vast expanse of cyberspace.

Securing cloud platforms isn't just about firewalls and encryption. It's about architecting resilient systems that can withstand the storms of cyber threats and other threats. From designing secure data centers to implementing chaos engineering, we're diving deep into the bedrock of cloud security architectures. These strategies form the foundation of a robust cloud infrastructure, ensuring data integrity and service continuity in an ever-evolving threat landscape.

Let's explore the critical aspects of the Domain 3 of the CCSP exam and enhance your cloud security expertise.

3.1 Comprehend cloud infrastructure and platform components

There are two layers to cloud infrastructure:

  • The physical resources – This is the hardware that the cloud is built on top of. It includes the servers for compute, the storage clusters and the networking infrastructure.
  • The virtualized infrastructure – Cloud providers pool together these physical resources through virtualization. Cloud customers then access these virtualized resources.

Physical environment

In a general sense, physical environments include the actual data centers, server rooms or other locations that host infrastructure. If a company runs its own private cloud, it acts as the cloud provider and the physical environment would be wherever the hardware is located.

Compute nodes are one of the most important components. A compute node is essentially what provides the resources, which can include the processing, memory, network and storage that a virtual machine (VM) instance needs. However, in practice, storage is often provided by storage clusters.

Security of the physical environment

Now that we have described some of the major components that make up the physical environment of a cloud data center, it’s time to look at some of the ways we secure these environments. In order to maintain a robust security posture, we must follow a layered defense approach, which is also known as defense in depth.

In essence, we want to have multiple layers of security so that attackers can’t completely compromise an organization just by breaching one of our security controls, as shown in the image below.

Confidentiality

Keeping our data confidential basically means keeping it a secret from everyone except for those who we want to access it.

Integrity

If data maintains its integrity, it means that it hasn’t become corrupted, tampered with, or altered in an unauthorized manner.

Availability

Available data is readily accessible to authorized parties when they need it.

The CIA triad is a fairly renowned model, but confidentiality, integrity and availability aren’t the only properties that we may want for our data. Two other important properties are authenticity and non-repudiation.

Authenticity

Authenticity basically means that a person or system is who it says it is, and not some impostor. When data is authentic, it means that we have verified that it was actually created, sent, or otherwise processed by the entity who claims responsibility for the action.

Non-repudiation

Non-repudiation essentially means that someone can’t perform an action, then plausibly claim that it wasn’t actually them who did it.

Data roles

There are a number of different data security roles that you need to be familiar with.

Data owner/ data controller

The individual within an organization who is accountable for protecting its data, holds the legal rights and defines policies. In the cloud model, the data owner will typically work at the cloud customer organization.

Data processor

An entity or individual responsible for processing data. It’s typically the cloud provider, and they process the data on behalf of the data owner.

Data custodian

Data custodians have a technical responsibility over data. This means that they are responsible for administering aspects like data security, availability, capacity, continuity, backup and restore, etc.

Data steward

Data stewards are responsible for the governance, quality and compliance of data. Their role involves ensuring that data is in the right form, has suitable metadata, and can be used appropriately for business purposes.

Data subject

he individual to whom personal data relates.

Cloud data life cycle phases

The CCSP exam covers the Cloud Security Alliance’s data security life cycle, which was originally developed by Rich Mogull. This model is tailored toward cloud security. There are six phases in the data life cycle.

Image of defense in depth - Destination Certification

Some important physical security considerations:

Guards

Guards can help to administer entry points, patrol the location, and act as deterrents.

CCTV

Closed circuit television cameras (CCTV) are primarily for detecting potentially malicious actions, but they also act as deterrents.

Motion detectors

There are a range of different sensors that can be deployed to detect activity in sensitive areas.

Lighting

Lights can act as safety precautions, deterrents, and give CCTV cameras a better view.

Fences

Fences are a great tool for both keeping people and vehicles away from the premises. Eight feet is a common fence height for deterrence.

Doors

Doors should be constructed securely to limit an attacker’s ability to breach them.

Locks

Locks are critical for restricting access to doors, windows, filing cabinets, etc. There are many types of lock, including:

  • Key
  • Combination
  • RFID
  • Biometric

Mantraps

Mantraps are small spaces in between two doors, where only one door can be opened at a time. 

Turnstiles

Turnstiles prevent people from tailgating or piggybacking behind an authorized person. Tailgating and piggybacking involve following a person who is authorized to enter a restricted area through a door and thus gaining unauthorized access. The difference is that in tailgating the attacker possesses a fake badge. In piggybacking, the attacker doesn’t have any badge at all.

Bollards

Bollards prevent vehicles from entering an area.

Networking and communications

Clouds typically have two or possibly three dedicated networks that are physically isolated from one another, for both security and operational purposes.

Service

The service network is the customer facing network–it’s what the cloud customers have access to.

Storage

The storage network connects virtual machines to storage clusters.

Management

Cloud providers use the management network to control the cloud. Providers use this network to do things like log into hypervisors to make changes or to access the compute node.

There are two major networking models, non-converged and converged networks. In a non-converged network, the management, storage and service networks are separate. The service network generally connects to the local area network across Ethernet switches, while the storage network generally connects to the storage clusters via a protocol like Fibre Channel.

In contrast, a converged network combines the storage and service networks, with storage traffic and service traffic traveling over the same network. However, the management network remains separate for security reasons.

Image of Non-Converged Networks vs. Converged Networks - Destination Certification

Zero trust architecture

Zero trust architectures involve continually evaluating the trust given to an entity. They contrast with earlier models that assumed that once an entity was on the internal network it should be automatically trusted. We all know that attackers can make their way into our network perimeters, so giving anyone free rein once they are inside the network is a recipe for disaster.

A simplified summary of the zero trust approach involves:

  • Not implicitly trusting entities on the internal network.
  • Shrinking implicit trust zones and enforcing access controls on the most granular level possible. Micro-segmentation is useful for dividing enterprise networks into smaller trust zones composed of resources with similar security requirements.
  • Granting access on a per-session basis. Access can be granted or denied based upon an entity’s identity, user location, and other data.
  • Restricting resource access according to the principle of least privilege.
  • Re-authentication and re-authorization when necessary.
  • Extensive monitoring of entities and assets.

Virtual local area networks (VLANs)

A core aspect of cloud computing involves abstracting resources away from the physical hardware via virtualization in order to use and share the resources more efficiently. Networking resources are also abstracted away in this manner.

One way of doing this is through virtual local area networks (VLANs). You can take a physical network and logically segment it into many VLANs. Let’s say that an organization wants to operate two isolated networks. The first is for the company’s general use, while the second is a network for the security department.

The organization could do this by purchasing two separate switches. It could set up the general use network on the first switch, and the security department’s network on the second switch. As long as the two switches aren’t linked up, then the organization would have two physically isolated networks.

Another option would be for the organization to have two logically isolated VLANs on the same physical switch. The diagram below shows a 16-port switch. Four computers are plugged into the switch, the first two for general use, and the second two for the security department. If the switch were just set up by default, all four of these computers would be able to talk to each other, which is not what the company wants—they want the first two to be separate from the second two.

Image of a 16 port switch - Destination Certification

Instead, the image above shows how the first two computers for general use have been grouped into a VLAN—VLAN1—while the second two computers for the security department are grouped separately as VLAN2. This would mean that the first two computers could talk to each other, but not talk to the last two computers. Similarly, the last two computers can communicate with one another, but they cannot talk to the first two general-use computers.

Having two separate VLANs means that the general use network and the security department network are logically isolated and cannot access each other but they are still on the same physical switch.

This same concept can be extended beyond a single physical switch. The image below shows a second 16-port switch that has been connected to the first one. This second switch has an additional four computers connected to it, two more for general use, and an extra two for the security department.

Even though these computers are connected to a separate switch, they have still been set up as part of the preexisting VLANs. This means that the four general use computers can only communicate among themselves in VLAN1. Likewise, the security department computers can only communicate among themselves in VLAN2.

Two VLANs Across Two Separate Physical Switches - Destination Certification

VLANs are commonly used by enterprises to logically separate networks. One example involves providing an isolated guest network to customers. This helps to protect the main network against attackers who are trying to gain a foothold by logging in to the open Wi-Fi. Another use of VLANs is to form trust zones for zero trust architecture.

Software-defined networks (SDNs)

Software-defined networks (SDNs) allow a more thorough layer of abstraction over the physical networking components. These days, SDNs are used for virtualizing networks in most cloud services.

Key benefits of SDNs

They can create virtual, software-controlled networks on top of physical networks. Each of these virtual networks has no visibility into the other virtual networks.

They can decouple the control plane from the data plane.

They can provide more flexibility and make it easier to rapidly reconfigure a network for multiple clients. On a network that’s completely virtualized, you can make configuration changes just through software commands.

They are critical building blocks that enable resource pooling for cloud services. SDNs create a layer of abstraction on top of physical networks, and you can create virtual networks on top of this layer.

They centralize network intelligence into one place.

They allow programmatic network configuration. You can entirely reconfigure the network through API calls.

They allow multiple virtual networks to use overlapping IP ranges on the same hardware. Despite this, the networks are still logically isolated.

Before we can fully explain SDNs, we need to back up a little. Network devices like switches and routers have two major components, the control plane and the data plane. The control plane is the part of the architecture that is responsible for defining what to do with incoming packets and where they should be sent. The data plane does the work of processing data requests. The control plane is essentially the intelligence of the network device and it does the thinking, while the data plane is basically just the worker drone.

In traditional networks, control planes and data planes are built-in to both routers and switches. In the case of a switch, the control plane decides that an incoming packet is destined to MAC address XYZ, and the data plane makes it happen. In a traditional network, if you want to make configuration changes to switches or routers, you have to log in to each device individually, which can be time consuming.

Control planes and data planes in traditional networks vs. SDNs - Destination Certification

One of the major differences in software-defined networks (SDNs) is that the control plane is separated from the data plane and then centralized into one system, as shown in the image above. A big benefit of this is that you don’t have to log in to individual devices to make changes on your network. Instead, you can just log in to the central control plane and make the adjustments there. This makes management and configuration far easier. Another advantage is that if a switch fails, you can just route the traffic around it. In the cloud, the centralized control pane of an SDN is in turn controlled by the management plane.

The security advantages of software-defined networks (SDNs)

Most of the benefits of SDNs center around the fact that virtualized networks are easy and cheap to both deploy and reconfigure. SDNs allow you to easily segment your network to form numerous virtual networks. This approach, known as microsegmentation, allows you to isolate networks in a way that would be cost-prohibitive with physical hardware.

Let’s give you a more concrete example to demonstrate just how advantageous microsegmentation can be. First, let’s say your organization has a traditional network, as shown in the diagram below. You would have the insecure Internet, a physical firewall, and then the DMZ, where you would have things like your web server, your FTP server and your mail server. Under this setup, your firewall rules would need to be fairly loose to allow the web traffic, the FTP traffic and the SMTP traffic through to each of your servers. The downside of this configuration is that if the web server was compromised by an attacker, this would give them a foothold in your network that they could use to access your FTP server or your mail server. This is because all of these servers are on the same network segment.

Image of a traditional network - Destination Certification

In contrast to this traditional network configuration, SDNs allow you to deploy virtual firewalls easily and at low cost. You can easily put virtual firewalls in front of each server, creating three separate DMZs, as shown in the figure below. You could have much tighter rules on the firewalls for each of these network segments because the firewall in front of your web server would only need to let through web traffic, the firewall in front of your FTP server would only need to let through FTP traffic, etc.

Multiple isolated DMZs - Destination Certification

The benefit of having these virtualized segments with their own firewalls is that the much stricter rules limit the opportunities for malicious traffic to get through. In addition, if an attacker does manage to get a foothold on one of your servers, such as your web server, they would not be able to move laterally as easily. They would still need to get through the other firewalls if they wanted to reach your FTP or mail servers.

The security challenges of cloud networking

Cloud networking has a number of benefits that are essential to the functioning of the modern cloud environment. However, there’s no free lunch, and SDNs also come with a range of disadvantages, many of which are related to the fact that the cloud customer has no control of the underlying physical infrastructure. Since physical appliances can’t be installed by the customer, customers must use virtual appliances instead, which have some limitations.

Virtual appliances are pre-configured software solutions made up of at least one virtual machine. Virtual appliances are more scalable and compatible than hardware appliances, and they can be packaged, updated and maintained as a single unit.

Virtual appliances can form bottlenecks on the network, requiring significant resources and expense to deliver appropriate levels of performance. They can also cause scaling issues if the cloud provider doesn’t offer compatible autoscaling. Another complication is that autoscaling in the cloud often results in the creation of many instances that may only last for short periods. This means that different assets can use the same IP addresses. Security tools must adapt to this highly dynamic environment by doing things like identifying assets by unique and static ID numbers, rather than IP addresses that may be constantly changing.

Another complication comes from the way that traffic moves across virtual networks. On a physical network, you can monitor the traffic between two physical servers. However, when two virtual machines are running on top of the same physical compute node, they can send traffic to one another without it having to travel via the physical network, as shown in the diagram below. This means that any tools monitoring the physical network won’t be able to see this communication.

Physical IDS sensor connected to the physical switch - Destination Certification

One option for monitoring the traffic between two VMs on the same hardware is to deploy a virtual network monitoring appliance on the hypervisor. Another is to route the traffic between the two VMs through a virtual appliance over the virtual network. However, these approaches create bottlenecks.

Compute

In the cloud, compute is derived from the physical compute nodes which are made up of CPUs, RAM and network interface cards (NICs). A bunch of these are stored in racks at a provider’s data center, and interconnected to the management network, the service network, and the storage network. These compute resources are then abstracted away through virtualization and provisioned to customers.

Securing compute nodes

Cloud providers control and are responsible for the compute nodes and the underlying infrastructure. They are responsible for patching and correctly configuring the hypervisor, as well as all of the technology beneath it. Cloud providers must strictly enforce logical isolation so that customers are not visible to one another. They also need to secure the processes surrounding the storage of a VM image through to running the VM. Adequate security and integrity protections help to ensure that tenants cannot access another customer’s VM image, even though they share the same underlying hardware. Another critical cloud provider responsibility is to ensure that volatile memory is secure.

Virtualization

Virtualization involves adding a layer of abstraction on top of the physical hardware. It’s one of the most important technologies that enable cloud computing. The most common example is a virtual machine, which runs on top of a host computer. The real, physical resources belong to the host computer, but the virtual machine acts similarly to an actual computer. Its operating system is essentially tricked by software running on the host computer. The OS acts the same way it would if it was running on top of its own physical hardware.

But virtualization is used beyond just compute. We also rely on it to abstract away storage and networking resources (such as the VLANs and SDNs we discussed earlier) from the underlying physical components.

Virtual machines (VMs)

To simplify things, a normal computer runs directly on the hardware. In contrast, a virtual machine runs at a higher layer of abstraction. It runs on top of a hypervisor, which in turn runs on top of physical hardware. The virtual machine is known as the guest or an instance, while the computer that it runs on top of is the host. The diagram below shows multiple virtual machines running on the same compute node. Each virtual machine includes its operating system, as well as any apps running on top of it.

Multiple virtual machines running on the same compute node - Destination Certification

One huge benefit of virtualization is that it frees up virtual environments from the underlying physical resources. You can also run multiple virtual machines simultaneously on the same underlying hardware. In the cloud context, this is incredibly useful because it allows providers to utilize their resources more efficiently.

Hypervisors

Type 1 vs Type2 hypervisors - Destination Certification

Hypervisors are pieces of software that make virtualization possible. There are two types of hypervisors, as shown in the image above and the table below.

Type 1 hypervisor

  • Sometimes known as bare metal or hardware hypervisors because they run directly on top of the host’s hardware.
  • Controls the underlying hardware and creates a layer of virtualization, with virtual CPU, RAM, NIC and other virtualized components that the guest machine needs to run.
  • Generally more efficient and more secure.
  • Commonly used in data centers for efficiency.
  • One example is Microsoft’s Hyper-V.

Type 2 hypervisor

  • Sometimes known as hosted or operating system hypervisors because an OS sits in between the hardware and the hypervisor.
  • The OS adds an extra layer, which is less efficient and introduces another place for vulnerabilities to arise.
  • Mostly used for development and testing, as well as small-scale virtualization, where efficiency is less of a concern.
  • Examples include VirtualBox or Apple’s Parallels.

Hypervisor security

Due to the fact that hypervisors sit between the hardware (or the OS in a type 2 hypervisor) and virtual machines, they have total visibility into every virtual machine that runs on top of them. They can see every command processed by the CPU, observe the data stored in RAM, and look at all data sent by the virtual machine over the network.

An attacker that compromises a hypervisor may be able access and control all of the VMs running on top of it, as well as their data. One threat is known as a VM escape, where a malicious tenant (or a tenant whose VM was compromised by an external attacker) manages to break down the isolation and escape from their VM. They may then be able to compromise the hypervisor and access the VMs of other tenants.

In type 2 hypervisors, the security of the OS that runs beneath the hypervisor is also critical. If an attacker can compromise the host OS, then they may be able to also compromise the hypervisor as well as the VMs running on top of it.

Containers

Containers are highly portable code execution environments that can be very efficient to run. Containers feature isolated user spaces but share the kernel and other aspects with the underlying OS. This contrasts with virtual machines, which require their own entire operating systems, including the kernel.

Multiple containers can run on top of each OS, with the containers splitting the available resources. This makes containerization useful for securely sharing hardware resources among cloud customers, because it allows them to use the same underlying hardware while remaining logically isolated. Each of these containers can in turn run multiple applications.

Another major advantage of containers is that they can help to make more efficient use of computational resources. The image below shows the contrast between virtual machines and containers. If we want to run three VMs on top of our hypervisor, we need three separate operating systems, three separate sets of libraries, and the apps on top of them. In contrast, on the container side, we just have one operating system, one containerization engine, libraries that can be shared between apps, and then our three apps on top.

Virtual Machines vs. Containers - Destination Certification

The image below shows the major components of containerization, as well as the key terms. A container is formed by taking configuration files, application code, libraries, necessary data, etc. and then building them into a binary file known as a container image.

Image of components of containerization - Destination Certification

These container images are then stored in repositories. Repositories are basically just collections of container images. In turn, these repositories are stored in a registry. When you want to run a container, you pull the container image out of its repository, and then run it on top of what is known as a container engine. Container engines essentially add a layer of abstraction above the operating system, which ultimately allows the containers to run on any operating system.

Application virtualization

Application virtualization is similar to containerization in that there is a layer of virtualization between the app and the underlying OS. We often use application virtualization to isolate an app from the operating system for testing purposes. It is shown below:

Image of application virtualization - Destination Certification

Microservices

Monolithic vs. microservice architecture - Destination Certification

Traditionally, apps were monolithic. They were designed to perform every step needed to complete a particular task, without any modularity. This approach creates complications, because even relatively minor changes can require huge overhauls of the app code in order to retain functionality.

With a more modular approach, developers can easily swap out and replace code as needed, without having to redesign major parts of the app. These days, many apps are broken down into loosely coupled microservices that run independently and simultaneously. These are small, self-contained units with their own interfaces, as shown in the image above.

Serverless computing

Serverless computing can be hard to pin down. The term is often used to describe function-as-a-service (FaaS) products like AWS Lambda, but a number of other services are also offered under the serverless model. These include the relational database, Amazon Aurora, or Microsoft’s complex event processing engine, Azure Stream Analytics.

At its heart, serverless refers to a model of providing services where the customer only pays when the code is executed (or when the service is triggered by use, such as Amazon Aurora’s database), generally measured in very small increments.

Function as a service (FaaS)

Function as a service (FaaS) is a subset of serverless computing. In contrast with serverless’ broader set of service offerings, FaaS is used to run specific code functions. Entire applications can be built under the serverless model, while FaaS is limited to just running functions. Under FaaS you are only billed based on the duration and memory used for the code execution, and there aren’t any network or storage fees.

Storage

We will start by discussing the storage types from 2.2 Design and implement cloud data storage architectures. This includes Exam Outline’s subsections on Storage types (e.g., long-term, ephemeral, raw storage), and Threats to storage types. We will also discuss storage controllers and storage clusters.

Storage types

There are a number of different storage types you need to understand to truly grasp cloud computing. They are summarized below:

Long-term

Cheap and slow storage that’s mainly used for long-term record keeping.

Ephemeral

Temporary storage that only lasts until the virtual environment is shut down.

Raw-disk

A high-performance storage option. In the cloud, raw disk storage allows your virtual machine to directly access the storage device via a mapping file as a proxy.

Object

Object storage involves storing data as objects, which are basically just collections of bits with an identifier and metadata.

Volume

In the cloud, volume storage is basically like a virtualized version of a physical hard drive, with the same limitations you would expect from a physical hard drive.

Cloud service models and storage types

Service model

Storage type

SaaS

  • Under the SaaS model, cloud customers have limited control, so they don’t have direct access to raw, ephemeral, volume or object storage.
  • Cloud customers can’t directly access the raw, ephemeral, volume or object storage locations via SaaS.
  • Instead, customers access data through either a web-based user interface or an application.
  • Customers have limited control over data stored in SaaS. Most of the control is in the hands of the provider.

PaaS

  • In PaaS, customers build their own application, so they can choose how the app stores data.
  • This means that customers have some control over data stored in PaaS.
  • Many apps use databases to store data.
  • PaaS customers can use storage solutions like:
  • Database-as-a-service.
  • Open-source solutions based on Apache Hadoop.
  • Application storage which is accessed through APIs. Data can be kept in object storage and accessed via API calls.

IaaS

  • Raw storage – Virtualized access to the physical media where the data is stored.
  • Volume storage – Typically attached to IaaS instances as a virtualized hard drive.
  • Object storage – Objects are accessed via APIs or web interfaces.

Storage controllers

Storage controllers manage your hard drives. They can be involved in tasks like reconstructing fragmented data and access control. Storage controllers can use several different protocols to communicate with storage devices across the network.

Here are three of the most common protocols:

Internet Small Computer System Interface (iSCSI

This is an old protocol that is cost-effective to use and highly compatible. However, it does have limitations in terms of performance and latency.

Fibre Channel (FC)

Fibre Channel offers reliability and high performance, but it can be expensive and difficult to deploy.

Fibre Channel over Ethernet (FCoE)

Fibre Channel over Ethernet relies on Ethernet infrastructure, which reduces the costs associated with FC. It offers high performance, low latency and a high degree of reliability. However, there can be some compatibility issues, depending on your existing infrastructure.

Storage clusters

Cloud providers typically have a bunch of hard drives connected to each other in what we call storage clusters. Storage clusters are generally stored in racks that are separate from the compute nodes. Connecting the drives together allows you to pool storage, which can increase capacity, performance and reliability.

Tightly coupled vs. loosely coupled clusters - Destination Certification

Storage clusters are typically either tightly coupled, or loosely coupled, as shown in the image above. The former is expensive, but it provides high levels of performance, while the latter is cheaper and performs at a lower level. The main difference is that in tightly coupled architectures the drives are better connected to each other and follow the same policies, which helps them work together. If you have a lot of data, and performance isn’t a major concern, a loosely coupled structure is often much cheaper.

Management plane

The management plane is the overarching system that controls everything in the cloud. It’s one of the major differences between traditional infrastructure and cloud computing. Cloud providers can use the management plane to control all of their physical infrastructure and other systems, including the hypervisors, the VMs, the containers, and the code.

The centralized management plane is the secret sauce of the cloud, and it helps to provide the critical components like on-demand self-service and rapid elasticity. Without the management plane, it would be impossible to get all of the separate components to work in unison and respond dynamically to the needs of cloud customers in real time. The diagram below shows the various parts of the cloud under the management plane’s control.

Image of a diagram of various parts of the cloud under the management plane’s control - Destination Certification

The diagram further down shows a simple diagram of the typical components of a cloud. The logical components are highlighted in yellow, while the physical components are shown in purple. Note that the management plane is actually both physical hardware and software.

Image of a simple diagram of the typical components of a cloud - Destination Certification

Management plane capabilities

Management plane capabilities include:

  • Scheduling
  • Orchestration
  • Maintenance
  • Service catalog
  • Self-provisioning
  • Identity and access management
  • Management APIs
  • Configuration management
  • Key management and encryption
  • Financial tracking and reporting
  • Service and helpdesk

Management plane security controls

The management plane is an immensely powerful aspect of cloud computing. Due to the management plane’s immense degree of control and access, it means that if it gets compromised by an attacker, they will have the keys to the castle. This makes securing the management plane one of the most important priorities. Defense in depth is critical—there need to be many layers of security controls keeping the management plane secure.

Orchestration

Orchestration is the centralized control of all data center resources, including things like servers, virtual machines, containers, storage and network components, security, and much more. Orchestration provides the automated configuration and coordination management. It allows the whole system to work together in an integrated fashion. Scheduling is the process of capturing tasks and prioritizing them, then allocating resources to ensure that the tasks can be conducted appropriately. Scheduling also involves working around failures to ensure tasks are completed.


3.2 Design a secure data center

There are many factors that influence the design of a data center. They include:

The type of cloud services provided

Different purposes will require different designs. For a service that offers cheap cloud storage, the data center would need a lot of storage hardware. In contrast, a service that is designed for training large learning models (LLMs) would need a lot of high-end chips.

The location of the data center

Factors that affect the location include:

  • How close the data center needs to be to users.
  • Jurisdiction and compliance requirements.
  • The price of electricity in various regions.
  • Susceptibility to disasters such as earthquakes and flooding.
  • Climate also has an impact, with warmer locations generally requiring more energy to cool the hardware.

Uptime requirements

If a data center aims to have extremely high availability, it will need to be designed with more redundancy built in.

Potential threats

Threats will vary depending on what the cloud service is used for. As an example, if a cloud service is designed to host protected health information (PHI), it will need additional protective measures to mitigate against attackers targeting this highly sensitive data.

Efficiency requirements

Different cloud services will need varying levels of efficiency to ensure cost-effectiveness. The intended use impacts design choices. As an example, a data center that aims to provide cheap service will probably want to use a lot of relatively basic equipment. A data center for training AI models will need niche hardware that drives up costs.

Logical design

Tenant partitioning and access control are two important logical considerations highlighted by the CCSP exam outline that can both be implemented through software.

Tenant partitioning

If resources are shared without appropriate partitioning, a malicious tenant (or a tenant who has been compromised by an attacker) could harm all of the other tenants. Obviously, we do not want this to happen, so we want to isolate the tenants from one another. With appropriate isolation, a compromised or malicious tenant cannot worm their way into the other tenant’s systems.

Tenants can be isolated by providing each one with their own physical hardware. One example is to allocate dedicated servers to each tenant. However, public cloud services tend to partition their tenants logically. They share the same underlying physical resources between their tenants and provide each one with virtualized versions of the hardware.

Looking for some CCSP exam prep guidance and mentoring?


Learn about our personal CCSP mentoring

Image of Lou Hablas mentor - Destination Certification

Access control

Access controls are an essential part of keeping tenants separate. We discuss them in Domain 4.7.

Physical design

The physical design of a data center goes far beyond the architecture. It includes things like the location, the HVAC, the infrastructure setup and much more. Each aspect needs to be carefully considered to produce an efficient and resilient data center.

Buy or build?

When a company needs a data center, it must decide whether to buy an existing one, lease, or build its own. Below are the key differences between buying, leasing and building:

Buy

Lease

Build

High CapEx, low OpEx (but not as low as when building a custom data center)

Low CapEx, high OpEx.

High CapEx, but low OpEx.

Will not be customized to an organization’s needs.

Will not be customized to an organization’s needs.

Can be tailor-made and incredibly efficient.

The organization has a lower degree of control.

The organization has a lower degree of control.

The organization has a high degree of control.

Location

There are many important factors to consider when choosing the location of a data center. Some of the main considerations are:

  • How close the data center needs to be to users.
  • Jurisdiction and compliance requirements. Some jurisdictions may require that any data about their residents be stored within the region.
  • The price of electricity in various regions.
  • Susceptibility to disasters such as earthquakes and flooding.
  • Climate also has an impact, with warmer locations generally requiring more energy to cool the hardware.

Utilities

When designing a data center, we have three primary utilities that we need to worry about. It’s easiest to remember them as the three Ps.

Ping (network)

Your data center will need to have a high-speed fiber optic connection that links it up to the internet backbone.

Power (electricity)

Your data center will need sufficient power to run its equipment. Given that data centers use large amounts of power, it is ideal to locate data centers in areas with affordable electricity.

Pipe (HVAC)

To efficiently run your hardware and limit equipment failures, your data center will need to maintain the right temperature and humidity. This is what we consider “pipe”. It includes your air conditioning, heating, ventilation, dehumidifiers, water, etc

Given that each of these utilities are critical for keeping your service available, you will need to have redundancies for each in place. The more uptime you wish to guarantee your customers, the more elaborate your redundancy plans will need to be.

DiscoveryaInternal vs external redundancies

Redundancies can be categorized as internal or external, depending on whether they are inside the server room or outside of it. Things like power distribution units and chillers are viewed as internal redundancies, while a generator is seen as external. You wouldn’t want to run your generator inside and clog the server room with fumes.

BICSI data center standards

When designing data centers, various resources from the Building Industry Consulting Service International (BICSI) are incredibly useful. For taking care of ping, BICSI has a number of cabling standards, such as ANSI/BICSI N1-2019, Installation Practices for Telecommunications and ICT Cabling and Related Cabling Infrastructure and ANSI/BICSI N2-2017, Practices for the Installation of Telecommunications and ICT Cabling Intended to Support Remote Power Applications.

Standards that focus on overall data center design and operations include ANSI/BICSI 002-2019, Data Center Design and Implementation Best Practices, as well as BICSI 009-2019, Data Center Operations and Maintenance Best Practices.

Keys, secrets and certificate management

HVAC

HVAC stands for heating, ventilation and air conditioning, each of which are critical for operating a data center smoothly. In cold climates, a data center may need heating. Ventilation is important for dehumidifying and filtering a data center’s air. Air conditioning and other types of cooling are critical for keeping the hardware from overheating, especially in hot places.

The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) specifies that data centers should maintain the conditions listed in the table below:

Recommended air