Picture this: You're building a fortress in the sky. Sounds impossible, right? Well, welcome to CCSP Domain 3, where we do just that—but with clouds and data. We're not talking about fluffy white castles, but robust digital fortresses that keep our information safe in the vast expanse of cyberspace.
Securing cloud platforms isn't just about firewalls and encryption. It's about architecting resilient systems that can withstand the storms of cyber threats and other threats. From designing secure data centers to implementing chaos engineering, we're diving deep into the bedrock of cloud security architectures. These strategies form the foundation of a robust cloud infrastructure, ensuring data integrity and service continuity in an ever-evolving threat landscape.
Let's explore the critical aspects of the Domain 3 of the CCSP exam and enhance your cloud security expertise.
3.1 Comprehend cloud infrastructure and platform components
There are two layers to cloud infrastructure:
- The physical resources – This is the hardware that the cloud is built on top of. It includes the servers for compute, the storage clusters and the networking infrastructure.
- The virtualized infrastructure – Cloud providers pool together these physical resources through virtualization. Cloud customers then access these virtualized resources.
Physical environment
In a general sense, physical environments include the actual data centers, server rooms or other locations that host infrastructure. If a company runs its own private cloud, it acts as the cloud provider and the physical environment would be wherever the hardware is located.
Compute nodes are one of the most important components. A compute node is essentially what provides the resources, which can include the processing, memory, network and storage that a virtual machine (VM) instance needs. However, in practice, storage is often provided by storage clusters.
Security of the physical environment
Now that we have described some of the major components that make up the physical environment of a cloud data center, it’s time to look at some of the ways we secure these environments. In order to maintain a robust security posture, we must follow a layered defense approach, which is also known as defense in depth.
In essence, we want to have multiple layers of security so that attackers can’t completely compromise an organization just by breaching one of our security controls, as shown in the image below.
Confidentiality | Keeping our data confidential basically means keeping it a secret from everyone except for those who we want to access it. |
Integrity | If data maintains its integrity, it means that it hasn’t become corrupted, tampered with, or altered in an unauthorized manner. |
Availability | Available data is readily accessible to authorized parties when they need it. |
The CIA triad is a fairly renowned model, but confidentiality, integrity and availability aren’t the only properties that we may want for our data. Two other important properties are authenticity and non-repudiation.
Authenticity | Authenticity basically means that a person or system is who it says it is, and not some impostor. When data is authentic, it means that we have verified that it was actually created, sent, or otherwise processed by the entity who claims responsibility for the action. |
Non-repudiation | Non-repudiation essentially means that someone can’t perform an action, then plausibly claim that it wasn’t actually them who did it. |
Data roles
There are a number of different data security roles that you need to be familiar with.
Data owner/ data controller | The individual within an organization who is accountable for protecting its data, holds the legal rights and defines policies. In the cloud model, the data owner will typically work at the cloud customer organization. |
Data processor | An entity or individual responsible for processing data. It’s typically the cloud provider, and they process the data on behalf of the data owner. |
Data custodian | Data custodians have a technical responsibility over data. This means that they are responsible for administering aspects like data security, availability, capacity, continuity, backup and restore, etc. |
Data steward | Data stewards are responsible for the governance, quality and compliance of data. Their role involves ensuring that data is in the right form, has suitable metadata, and can be used appropriately for business purposes. |
Data subject | he individual to whom personal data relates. |
Cloud data life cycle phases
The CCSP exam covers the Cloud Security Alliance’s data security life cycle, which was originally developed by Rich Mogull. This model is tailored toward cloud security. There are six phases in the data life cycle.

Some important physical security considerations:
Guards | Guards can help to administer entry points, patrol the location, and act as deterrents. |
CCTV | Closed circuit television cameras (CCTV) are primarily for detecting potentially malicious actions, but they also act as deterrents. |
Motion detectors | There are a range of different sensors that can be deployed to detect activity in sensitive areas. |
Lighting | Lights can act as safety precautions, deterrents, and give CCTV cameras a better view. |
Fences | Fences are a great tool for both keeping people and vehicles away from the premises. Eight feet is a common fence height for deterrence. |
Doors | Doors should be constructed securely to limit an attacker’s ability to breach them. |
Locks | Locks are critical for restricting access to doors, windows, filing cabinets, etc. There are many types of lock, including:
|
Mantraps | Mantraps are small spaces in between two doors, where only one door can be opened at a time. |
Turnstiles | Turnstiles prevent people from tailgating or piggybacking behind an authorized person. Tailgating and piggybacking involve following a person who is authorized to enter a restricted area through a door and thus gaining unauthorized access. The difference is that in tailgating the attacker possesses a fake badge. In piggybacking, the attacker doesn’t have any badge at all. |
Bollards | Bollards prevent vehicles from entering an area. |
Networking and communications
Clouds typically have two or possibly three dedicated networks that are physically isolated from one another, for both security and operational purposes.
Service | The service network is the customer facing network–it’s what the cloud customers have access to. |
Storage | The storage network connects virtual machines to storage clusters. |
Management | Cloud providers use the management network to control the cloud. Providers use this network to do things like log into hypervisors to make changes or to access the compute node. |
There are two major networking models, non-converged and converged networks. In a non-converged network, the management, storage and service networks are separate. The service network generally connects to the local area network across Ethernet switches, while the storage network generally connects to the storage clusters via a protocol like Fibre Channel.
In contrast, a converged network combines the storage and service networks, with storage traffic and service traffic traveling over the same network. However, the management network remains separate for security reasons.

Zero trust architecture
Zero trust architectures involve continually evaluating the trust given to an entity. They contrast with earlier models that assumed that once an entity was on the internal network it should be automatically trusted. We all know that attackers can make their way into our network perimeters, so giving anyone free rein once they are inside the network is a recipe for disaster.
A simplified summary of the zero trust approach involves:
- Not implicitly trusting entities on the internal network.
- Shrinking implicit trust zones and enforcing access controls on the most granular level possible. Micro-segmentation is useful for dividing enterprise networks into smaller trust zones composed of resources with similar security requirements.
- Granting access on a per-session basis. Access can be granted or denied based upon an entity’s identity, user location, and other data.
- Restricting resource access according to the principle of least privilege.
- Re-authentication and re-authorization when necessary.
- Extensive monitoring of entities and assets.
Virtual local area networks (VLANs)
A core aspect of cloud computing involves abstracting resources away from the physical hardware via virtualization in order to use and share the resources more efficiently. Networking resources are also abstracted away in this manner.
One way of doing this is through virtual local area networks (VLANs). You can take a physical network and logically segment it into many VLANs. Let’s say that an organization wants to operate two isolated networks. The first is for the company’s general use, while the second is a network for the security department.
The organization could do this by purchasing two separate switches. It could set up the general use network on the first switch, and the security department’s network on the second switch. As long as the two switches aren’t linked up, then the organization would have two physically isolated networks.
Another option would be for the organization to have two logically isolated VLANs on the same physical switch. The diagram below shows a 16-port switch. Four computers are plugged into the switch, the first two for general use, and the second two for the security department. If the switch were just set up by default, all four of these computers would be able to talk to each other, which is not what the company wants—they want the first two to be separate from the second two.

Instead, the image above shows how the first two computers for general use have been grouped into a VLAN—VLAN1—while the second two computers for the security department are grouped separately as VLAN2. This would mean that the first two computers could talk to each other, but not talk to the last two computers. Similarly, the last two computers can communicate with one another, but they cannot talk to the first two general-use computers.
Having two separate VLANs means that the general use network and the security department network are logically isolated and cannot access each other but they are still on the same physical switch.
This same concept can be extended beyond a single physical switch. The image below shows a second 16-port switch that has been connected to the first one. This second switch has an additional four computers connected to it, two more for general use, and an extra two for the security department.
Even though these computers are connected to a separate switch, they have still been set up as part of the preexisting VLANs. This means that the four general use computers can only communicate among themselves in VLAN1. Likewise, the security department computers can only communicate among themselves in VLAN2.

VLANs are commonly used by enterprises to logically separate networks. One example involves providing an isolated guest network to customers. This helps to protect the main network against attackers who are trying to gain a foothold by logging in to the open Wi-Fi. Another use of VLANs is to form trust zones for zero trust architecture.
Software-defined networks (SDNs)
Software-defined networks (SDNs) allow a more thorough layer of abstraction over the physical networking components. These days, SDNs are used for virtualizing networks in most cloud services.
Key benefits of SDNs | |
---|---|
They can create virtual, software-controlled networks on top of physical networks. Each of these virtual networks has no visibility into the other virtual networks. | |
They can decouple the control plane from the data plane. | |
They can provide more flexibility and make it easier to rapidly reconfigure a network for multiple clients. On a network that’s completely virtualized, you can make configuration changes just through software commands. | |
They are critical building blocks that enable resource pooling for cloud services. SDNs create a layer of abstraction on top of physical networks, and you can create virtual networks on top of this layer. | |
They centralize network intelligence into one place. | |
They allow programmatic network configuration. You can entirely reconfigure the network through API calls. | |
They allow multiple virtual networks to use overlapping IP ranges on the same hardware. Despite this, the networks are still logically isolated. |
Before we can fully explain SDNs, we need to back up a little. Network devices like switches and routers have two major components, the control plane and the data plane. The control plane is the part of the architecture that is responsible for defining what to do with incoming packets and where they should be sent. The data plane does the work of processing data requests. The control plane is essentially the intelligence of the network device and it does the thinking, while the data plane is basically just the worker drone.
In traditional networks, control planes and data planes are built-in to both routers and switches. In the case of a switch, the control plane decides that an incoming packet is destined to MAC address XYZ, and the data plane makes it happen. In a traditional network, if you want to make configuration changes to switches or routers, you have to log in to each device individually, which can be time consuming.

One of the major differences in software-defined networks (SDNs) is that the control plane is separated from the data plane and then centralized into one system, as shown in the image above. A big benefit of this is that you don’t have to log in to individual devices to make changes on your network. Instead, you can just log in to the central control plane and make the adjustments there. This makes management and configuration far easier. Another advantage is that if a switch fails, you can just route the traffic around it. In the cloud, the centralized control pane of an SDN is in turn controlled by the management plane.
The security advantages of software-defined networks (SDNs)
Most of the benefits of SDNs center around the fact that virtualized networks are easy and cheap to both deploy and reconfigure. SDNs allow you to easily segment your network to form numerous virtual networks. This approach, known as microsegmentation, allows you to isolate networks in a way that would be cost-prohibitive with physical hardware.
Let’s give you a more concrete example to demonstrate just how advantageous microsegmentation can be. First, let’s say your organization has a traditional network, as shown in the diagram below. You would have the insecure Internet, a physical firewall, and then the DMZ, where you would have things like your web server, your FTP server and your mail server. Under this setup, your firewall rules would need to be fairly loose to allow the web traffic, the FTP traffic and the SMTP traffic through to each of your servers. The downside of this configuration is that if the web server was compromised by an attacker, this would give them a foothold in your network that they could use to access your FTP server or your mail server. This is because all of these servers are on the same network segment.

In contrast to this traditional network configuration, SDNs allow you to deploy virtual firewalls easily and at low cost. You can easily put virtual firewalls in front of each server, creating three separate DMZs, as shown in the figure below. You could have much tighter rules on the firewalls for each of these network segments because the firewall in front of your web server would only need to let through web traffic, the firewall in front of your FTP server would only need to let through FTP traffic, etc.

The benefit of having these virtualized segments with their own firewalls is that the much stricter rules limit the opportunities for malicious traffic to get through. In addition, if an attacker does manage to get a foothold on one of your servers, such as your web server, they would not be able to move laterally as easily. They would still need to get through the other firewalls if they wanted to reach your FTP or mail servers.
The security challenges of cloud networking
Cloud networking has a number of benefits that are essential to the functioning of the modern cloud environment. However, there’s no free lunch, and SDNs also come with a range of disadvantages, many of which are related to the fact that the cloud customer has no control of the underlying physical infrastructure. Since physical appliances can’t be installed by the customer, customers must use virtual appliances instead, which have some limitations.
Virtual appliances are pre-configured software solutions made up of at least one virtual machine. Virtual appliances are more scalable and compatible than hardware appliances, and they can be packaged, updated and maintained as a single unit.
Virtual appliances can form bottlenecks on the network, requiring significant resources and expense to deliver appropriate levels of performance. They can also cause scaling issues if the cloud provider doesn’t offer compatible autoscaling. Another complication is that autoscaling in the cloud often results in the creation of many instances that may only last for short periods. This means that different assets can use the same IP addresses. Security tools must adapt to this highly dynamic environment by doing things like identifying assets by unique and static ID numbers, rather than IP addresses that may be constantly changing.
Another complication comes from the way that traffic moves across virtual networks. On a physical network, you can monitor the traffic between two physical servers. However, when two virtual machines are running on top of the same physical compute node, they can send traffic to one another without it having to travel via the physical network, as shown in the diagram below. This means that any tools monitoring the physical network won’t be able to see this communication.

One option for monitoring the traffic between two VMs on the same hardware is to deploy a virtual network monitoring appliance on the hypervisor. Another is to route the traffic between the two VMs through a virtual appliance over the virtual network. However, these approaches create bottlenecks.
Compute
In the cloud, compute is derived from the physical compute nodes which are made up of CPUs, RAM and network interface cards (NICs). A bunch of these are stored in racks at a provider’s data center, and interconnected to the management network, the service network, and the storage network. These compute resources are then abstracted away through virtualization and provisioned to customers.
Securing compute nodes
Cloud providers control and are responsible for the compute nodes and the underlying infrastructure. They are responsible for patching and correctly configuring the hypervisor, as well as all of the technology beneath it. Cloud providers must strictly enforce logical isolation so that customers are not visible to one another. They also need to secure the processes surrounding the storage of a VM image through to running the VM. Adequate security and integrity protections help to ensure that tenants cannot access another customer’s VM image, even though they share the same underlying hardware. Another critical cloud provider responsibility is to ensure that volatile memory is secure.
Virtualization
Virtualization involves adding a layer of abstraction on top of the physical hardware. It’s one of the most important technologies that enable cloud computing. The most common example is a virtual machine, which runs on top of a host computer. The real, physical resources belong to the host computer, but the virtual machine acts similarly to an actual computer. Its operating system is essentially tricked by software running on the host computer. The OS acts the same way it would if it was running on top of its own physical hardware.
But virtualization is used beyond just compute. We also rely on it to abstract away storage and networking resources (such as the VLANs and SDNs we discussed earlier) from the underlying physical components.
Virtual machines (VMs)
To simplify things, a normal computer runs directly on the hardware. In contrast, a virtual machine runs at a higher layer of abstraction. It runs on top of a hypervisor, which in turn runs on top of physical hardware. The virtual machine is known as the guest or an instance, while the computer that it runs on top of is the host. The diagram below shows multiple virtual machines running on the same compute node. Each virtual machine includes its operating system, as well as any apps running on top of it.

One huge benefit of virtualization is that it frees up virtual environments from the underlying physical resources. You can also run multiple virtual machines simultaneously on the same underlying hardware. In the cloud context, this is incredibly useful because it allows providers to utilize their resources more efficiently.
Hypervisors

Hypervisors are pieces of software that make virtualization possible. There are two types of hypervisors, as shown in the image above and the table below.
Type 1 hypervisor |
|
Type 2 hypervisor |
|
Hypervisor security
Due to the fact that hypervisors sit between the hardware (or the OS in a type 2 hypervisor) and virtual machines, they have total visibility into every virtual machine that runs on top of them. They can see every command processed by the CPU, observe the data stored in RAM, and look at all data sent by the virtual machine over the network.
An attacker that compromises a hypervisor may be able access and control all of the VMs running on top of it, as well as their data. One threat is known as a VM escape, where a malicious tenant (or a tenant whose VM was compromised by an external attacker) manages to break down the isolation and escape from their VM. They may then be able to compromise the hypervisor and access the VMs of other tenants.
In type 2 hypervisors, the security of the OS that runs beneath the hypervisor is also critical. If an attacker can compromise the host OS, then they may be able to also compromise the hypervisor as well as the VMs running on top of it.
Containers
Containers are highly portable code execution environments that can be very efficient to run. Containers feature isolated user spaces but share the kernel and other aspects with the underlying OS. This contrasts with virtual machines, which require their own entire operating systems, including the kernel.
Multiple containers can run on top of each OS, with the containers splitting the available resources. This makes containerization useful for securely sharing hardware resources among cloud customers, because it allows them to use the same underlying hardware while remaining logically isolated. Each of these containers can in turn run multiple applications.
Another major advantage of containers is that they can help to make more efficient use of computational resources. The image below shows the contrast between virtual machines and containers. If we want to run three VMs on top of our hypervisor, we need three separate operating systems, three separate sets of libraries, and the apps on top of them. In contrast, on the container side, we just have one operating system, one containerization engine, libraries that can be shared between apps, and then our three apps on top.

The image below shows the major components of containerization, as well as the key terms. A container is formed by taking configuration files, application code, libraries, necessary data, etc. and then building them into a binary file known as a container image.

These container images are then stored in repositories. Repositories are basically just collections of container images. In turn, these repositories are stored in a registry. When you want to run a container, you pull the container image out of its repository, and then run it on top of what is known as a container engine. Container engines essentially add a layer of abstraction above the operating system, which ultimately allows the containers to run on any operating system.
Application virtualization
Application virtualization is similar to containerization in that there is a layer of virtualization between the app and the underlying OS. We often use application virtualization to isolate an app from the operating system for testing purposes. It is shown below:

Microservices

Traditionally, apps were monolithic. They were designed to perform every step needed to complete a particular task, without any modularity. This approach creates complications, because even relatively minor changes can require huge overhauls of the app code in order to retain functionality.
With a more modular approach, developers can easily swap out and replace code as needed, without having to redesign major parts of the app. These days, many apps are broken down into loosely coupled microservices that run independently and simultaneously. These are small, self-contained units with their own interfaces, as shown in the image above.
Serverless computing
Serverless computing can be hard to pin down. The term is often used to describe function-as-a-service (FaaS) products like AWS Lambda, but a number of other services are also offered under the serverless model. These include the relational database, Amazon Aurora, or Microsoft’s complex event processing engine, Azure Stream Analytics.
At its heart, serverless refers to a model of providing services where the customer only pays when the code is executed (or when the service is triggered by use, such as Amazon Aurora’s database), generally measured in very small increments.
Function as a service (FaaS)
Function as a service (FaaS) is a subset of serverless computing. In contrast with serverless’ broader set of service offerings, FaaS is used to run specific code functions. Entire applications can be built under the serverless model, while FaaS is limited to just running functions. Under FaaS you are only billed based on the duration and memory used for the code execution, and there aren’t any network or storage fees.
Storage
We will start by discussing the storage types from 2.2 Design and implement cloud data storage architectures. This includes Exam Outline’s subsections on Storage types (e.g., long-term, ephemeral, raw storage), and Threats to storage types. We will also discuss storage controllers and storage clusters.
Storage types
There are a number of different storage types you need to understand to truly grasp cloud computing. They are summarized below:
Long-term | Cheap and slow storage that’s mainly used for long-term record keeping. |
Ephemeral | Temporary storage that only lasts until the virtual environment is shut down. |
Raw-disk | A high-performance storage option. In the cloud, raw disk storage allows your virtual machine to directly access the storage device via a mapping file as a proxy. |
Object | Object storage involves storing data as objects, which are basically just collections of bits with an identifier and metadata. |
Volume | In the cloud, volume storage is basically like a virtualized version of a physical hard drive, with the same limitations you would expect from a physical hard drive. |
Cloud service models and storage types
Service model | Storage type |
---|---|
SaaS |
|
PaaS |
|
IaaS |
|
Storage controllers
Storage controllers manage your hard drives. They can be involved in tasks like reconstructing fragmented data and access control. Storage controllers can use several different protocols to communicate with storage devices across the network.
Here are three of the most common protocols:
Internet Small Computer System Interface (iSCSI | This is an old protocol that is cost-effective to use and highly compatible. However, it does have limitations in terms of performance and latency. |
Fibre Channel (FC) | Fibre Channel offers reliability and high performance, but it can be expensive and difficult to deploy. |
Fibre Channel over Ethernet (FCoE) | Fibre Channel over Ethernet relies on Ethernet infrastructure, which reduces the costs associated with FC. It offers high performance, low latency and a high degree of reliability. However, there can be some compatibility issues, depending on your existing infrastructure. |
Storage clusters
Cloud providers typically have a bunch of hard drives connected to each other in what we call storage clusters. Storage clusters are generally stored in racks that are separate from the compute nodes. Connecting the drives together allows you to pool storage, which can increase capacity, performance and reliability.

Storage clusters are typically either tightly coupled, or loosely coupled, as shown in the image above. The former is expensive, but it provides high levels of performance, while the latter is cheaper and performs at a lower level. The main difference is that in tightly coupled architectures the drives are better connected to each other and follow the same policies, which helps them work together. If you have a lot of data, and performance isn’t a major concern, a loosely coupled structure is often much cheaper.
Management plane
The management plane is the overarching system that controls everything in the cloud. It’s one of the major differences between traditional infrastructure and cloud computing. Cloud providers can use the management plane to control all of their physical infrastructure and other systems, including the hypervisors, the VMs, the containers, and the code.
The centralized management plane is the secret sauce of the cloud, and it helps to provide the critical components like on-demand self-service and rapid elasticity. Without the management plane, it would be impossible to get all of the separate components to work in unison and respond dynamically to the needs of cloud customers in real time. The diagram below shows the various parts of the cloud under the management plane’s control.

The diagram further down shows a simple diagram of the typical components of a cloud. The logical components are highlighted in yellow, while the physical components are shown in purple. Note that the management plane is actually both physical hardware and software.

Management plane capabilities
Management plane capabilities include:
- Scheduling
- Orchestration
- Maintenance
- Service catalog
- Self-provisioning
- Identity and access management
- Management APIs
- Configuration management
- Key management and encryption
- Financial tracking and reporting
- Service and helpdesk
Management plane security controls
The management plane is an immensely powerful aspect of cloud computing. Due to the management plane’s immense degree of control and access, it means that if it gets compromised by an attacker, they will have the keys to the castle. This makes securing the management plane one of the most important priorities. Defense in depth is critical—there need to be many layers of security controls keeping the management plane secure.
Orchestration
Orchestration is the centralized control of all data center resources, including things like servers, virtual machines, containers, storage and network components, security, and much more. Orchestration provides the automated configuration and coordination management. It allows the whole system to work together in an integrated fashion. Scheduling is the process of capturing tasks and prioritizing them, then allocating resources to ensure that the tasks can be conducted appropriately. Scheduling also involves working around failures to ensure tasks are completed.
3.2 Design a secure data center
There are many factors that influence the design of a data center. They include:
The type of cloud services provided | Different purposes will require different designs. For a service that offers cheap cloud storage, the data center would need a lot of storage hardware. In contrast, a service that is designed for training large learning models (LLMs) would need a lot of high-end chips. |
The location of the data center | Factors that affect the location include:
|
Uptime requirements | If a data center aims to have extremely high availability, it will need to be designed with more redundancy built in. |
Potential threats | Threats will vary depending on what the cloud service is used for. As an example, if a cloud service is designed to host protected health information (PHI), it will need additional protective measures to mitigate against attackers targeting this highly sensitive data. |
Efficiency requirements | Different cloud services will need varying levels of efficiency to ensure cost-effectiveness. The intended use impacts design choices. As an example, a data center that aims to provide cheap service will probably want to use a lot of relatively basic equipment. A data center for training AI models will need niche hardware that drives up costs. |
Logical design
Tenant partitioning and access control are two important logical considerations highlighted by the CCSP exam outline that can both be implemented through software.
Tenant partitioning
If resources are shared without appropriate partitioning, a malicious tenant (or a tenant who has been compromised by an attacker) could harm all of the other tenants. Obviously, we do not want this to happen, so we want to isolate the tenants from one another. With appropriate isolation, a compromised or malicious tenant cannot worm their way into the other tenant’s systems.
Tenants can be isolated by providing each one with their own physical hardware. One example is to allocate dedicated servers to each tenant. However, public cloud services tend to partition their tenants logically. They share the same underlying physical resources between their tenants and provide each one with virtualized versions of the hardware.
Looking for some CCSP exam prep guidance and mentoring?
Learn about our personal CCSP mentoring

Access control
Access controls are an essential part of keeping tenants separate. We discuss them in Domain 4.7.
Physical design
The physical design of a data center goes far beyond the architecture. It includes things like the location, the HVAC, the infrastructure setup and much more. Each aspect needs to be carefully considered to produce an efficient and resilient data center.
Buy or build?
When a company needs a data center, it must decide whether to buy an existing one, lease, or build its own. Below are the key differences between buying, leasing and building:
Buy | Lease | Build |
---|---|---|
High CapEx, low OpEx (but not as low as when building a custom data center) | Low CapEx, high OpEx. | High CapEx, but low OpEx. |
Will not be customized to an organization’s needs. | Will not be customized to an organization’s needs. | Can be tailor-made and incredibly efficient. |
The organization has a lower degree of control. | The organization has a lower degree of control. | The organization has a high degree of control. |
Location
There are many important factors to consider when choosing the location of a data center. Some of the main considerations are:
- How close the data center needs to be to users.
- Jurisdiction and compliance requirements. Some jurisdictions may require that any data about their residents be stored within the region.
- The price of electricity in various regions.
- Susceptibility to disasters such as earthquakes and flooding.
- Climate also has an impact, with warmer locations generally requiring more energy to cool the hardware.
Utilities
When designing a data center, we have three primary utilities that we need to worry about. It’s easiest to remember them as the three Ps.
Ping (network) | Your data center will need to have a high-speed fiber optic connection that links it up to the internet backbone. |
Power (electricity) | Your data center will need sufficient power to run its equipment. Given that data centers use large amounts of power, it is ideal to locate data centers in areas with affordable electricity. |
Pipe (HVAC) | To efficiently run your hardware and limit equipment failures, your data center will need to maintain the right temperature and humidity. This is what we consider “pipe”. It includes your air conditioning, heating, ventilation, dehumidifiers, water, etc |
Given that each of these utilities are critical for keeping your service available, you will need to have redundancies for each in place. The more uptime you wish to guarantee your customers, the more elaborate your redundancy plans will need to be.
DiscoveryaInternal vs external redundancies
Redundancies can be categorized as internal or external, depending on whether they are inside the server room or outside of it. Things like power distribution units and chillers are viewed as internal redundancies, while a generator is seen as external. You wouldn’t want to run your generator inside and clog the server room with fumes.
BICSI data center standards
When designing data centers, various resources from the Building Industry Consulting Service International (BICSI) are incredibly useful. For taking care of ping, BICSI has a number of cabling standards, such as ANSI/BICSI N1-2019, Installation Practices for Telecommunications and ICT Cabling and Related Cabling Infrastructure and ANSI/BICSI N2-2017, Practices for the Installation of Telecommunications and ICT Cabling Intended to Support Remote Power Applications.
Standards that focus on overall data center design and operations include ANSI/BICSI 002-2019, Data Center Design and Implementation Best Practices, as well as BICSI 009-2019, Data Center Operations and Maintenance Best Practices.
Keys, secrets and certificate management
HVAC
HVAC stands for heating, ventilation and air conditioning, each of which are critical for operating a data center smoothly. In cold climates, a data center may need heating. Ventilation is important for dehumidifying and filtering a data center’s air. Air conditioning and other types of cooling are critical for keeping the hardware from overheating, especially in hot places.
The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) specifies that data centers should maintain the conditions listed in the table below:
Recommended air temperature | Recommended humidity |
---|---|
18-27°C (64.4-80.6°F) | 40-60% |
Managing a data center’s air appropriately has a number of benefits, such as:
- Reduces equipment failures because hardware is running within optimal parameters.
- Increases availability due to fewer failures.
- More effective cooling means that you can increase power density, which in turn means that you can cram more compute into your data center.
- Managing air appropriately allows your data center to run at maximum efficiency, reducing overall costs.
Data centers are designed specifically to ensure good air management. The image below shows a typical aisle in a server room. If you take a look at the bottom of the figure, you will see that there’s a raised floor with blue arrows traveling horizontally underneath. Cold air gets pushed out of the cooling system through this subfloor, as indicated by the blue arrows. Above this is a perforated floor, through which the cold air gets pushed out.
Above the subfloor, we have two rows of four server racks. The row in the foreground has its intakes on the left, with the blue arrows coming up from the floor and into the racks to indicate the cool air coming in. The other row of server racks is partially hidden, but to the right of the diagram you can see blue arrows of cool air coming up through the floor and into the intake of the racks, which is at the back. This cool air lowers the temperature of the racks, but the air itself gets heated up in the process. It’s then pushed out the other side of the servers as hot air, which is indicated by the red arrows coming up in between the two rows of servers.
You can see that the center aisle where this hot air is pushed is sealed with glass and a ceiling that separates it from the rest of the data center. The purpose of this is to separate the hot air blasting out of the servers from the cold air coming in to the servers. We don’t want the hot air to be able to recirculate back down into the intakes, because this would hamper the efficiency of the cooling. The hot air is then taken out through this separate ceiling. This process is known as hot aisle containment because the hot aisle (the area where the hot air is pushed out from the servers) is enclosed from the rest of the server room. In this diagram, the hot air gets drawn out through the ceiling, while the rest of the server room is filled with cool air.
The image below compares hot aisle containment, as well as another type, known as cold aisle containment. In the latter, the cold air goes up through the floor into the enclosed cold aisles, where the intakes for the servers are. It comes out the other side, into the server room, as hot air.

Another important concept for data center air management is positive pressurization. This involves pumping the data center with air to keep it slightly above ambient air pressure. This positive pressurization means that if there are cracks in the walls, air flows out rather than in. Likewise, air flows out of the data center whenever someone opens a door. The big advantage of positive pressurization is that it helps to keep the air clean—pushing air out means that the data center isn’t sucking any air in. We really don’t want much external air coming in, because it brings dirt, dust and other debris, which can clog the hardware.
Multi-vendor pathway connectivity
The CCSP exam uses the term multi-vendor pathway connectivity to refer to the concept of having multiple internet service providers (ISPs) for redundancy. The Internet as a whole is incredibly resilient, but ISPs can go down for a variety of reasons, such as technical faults or natural disasters. Having multiple ISPs can help to give your organization more redundancy if one provider goes down.
Design resilient
It’s important for organizations to design their data centers with resiliency in mind.
The NFPA and fire risks
Fire is a major risk to data centers. A lot of electricity pulses through millions of dollars of hardware, and things can and do go wrong. This means that we need our data centers to have measures in place that prevent, detect and correct fires.
The National Fire Protection Association (NFPA) publishes standards that help data centers and other telecommunications organizations address their fire risk. These include NFPA 75, Standard for the Fire Protection of Information Technology Equipment, and NFPA 76, Standard for the Fire Protection of Telecommunication Facilities.
The purpose of NFPA 75 is “…to set forth the minimum requirements for the protection of ITE [information technology equipment] equipment and ITE areas from damage by fire or its associated effects—namely, smoke, corrosion, heat, and water.” NFPA 76 establishes “…a minimum level of fire protection in telecommunications facilities, provide[s] a minimum level of life safety for the occupants, and protect[s] the telecommunication equipment and service continuity.”
Fire detection
There are three major ways that we can detect fires: flame detectors, smoke detectors and heat detectors. Flame detectors are useful in situations where you anticipate almost instantaneous ignition with a limited smoldering stage at the beginning.
The two most common types of smoke detector are ionization detectors and photoelectric detectors. Each of these are suitable when you expect a fire to smolder in the early stages. Heat detectors detect thermal energy. They are useful in small spaces where a rapid change in temperature can be expected from a quickly growing fire.
Fire suppression
Fires require three things, fuel, oxygen and heat. Once a fire begins, it starts a chain reaction producing more heat, which can continue until the fuel, oxygen or heat are suppressed or consumed.
One of the most common ways to extinguish fires is to use water, which can absorb the heat of the fire and extinguish it. However, data centers are pumping with huge amounts of electricity and lots of expensive hardware, neither of which play well with water. Water conducts electricity, and if equipment gets wet, it generally corrodes and breaks.
Instead, the preferred method is to suppress a fire with a non-combustible gas. Nitrogen, carbon-dioxide and argon are all useful gases. Common brands of fire suppressing gases include INERGEN, Argonite, FM-200 and Aero-K. These aren’t seen as overly toxic to humans, but if Argonite becomes concentrated in a room, employees can suffocate.
When sprinkler systems are in place, there are four common types:
- Wet – Wet pipes are filled with water at all times and they are triggered by heat causing either a fusible link or a glass bulb to break.
- Dry – Dry pipes aren’t filled with water all the time. Dry pipe sprinklers can be installed in rooms that get below freezing, however, the source of the water must be kept above freezing.
- Pre-action – Pre-action systems generally involve multiple triggers.
- Deluge – Deluge systems are like pre-action systems in that they can rely on multiple triggers. The difference is that when pre-action systems are triggered, only the individual sprinklers that have been triggered release water. In contrast, deluge systems release water through all sprinklers once a single sprinkler has been triggered.
The IDCA
The International Data Center Authority (IDCA) is an organization that aims to help the IT industry by developing an open framework for data centers, infrastructure, facilities, IT, IoT, cloud and big data. Its standard is the Infinity Paradigm AE360, which provides a comprehensive approach for streamlining technology strategies, plans, implementations and operations alongside business strategy.
Uptime Institute tier standards
The Uptime Institute is an industry body that’s responsible for developing a global standard in data center performance and availability. The standard is separated into four tiers, each of which specify the requirements and topologies for data centers that operate at different levels.
Note that N is the amount of power, network and cooling supply required to run the data center at maximum load, so 2N means that a Tier III data center has double the capacity.
Header | Tier I – Basic Capacity | Tier II – Redundant Capacity Components | Tier III – Concurrently Maintainable | Tier IV – Fault Tolerant |
---|---|---|---|---|
Description | Site-wide shutdowns for maintenance are still required. Capacity failures may impact the site. Distribution failures will impact the site. | Site-wide shutdowns for maintenance are still required. Capacity failures may impact the site. Distribution failures will impact the site. | Each and every capacity component and distribution path in a site can be removed on a planned basis for maintenance or replacement without impacting operations. The site is still exposed to equipment failure or operator error. | An individual equipment failure or distribution path interruption will not impact operations. A fault tolerant site is also concurrently maintainable. |
Uptime | 99.671% | 99.749% | 99.982% | 99.995% |
Downtime/year | 28.8 hours | 22 hours | 1.6 hours | 26.3 minutes |
Distribution paths | 1 | 1 | 1 active, 1 alternate | 2 active |
Concurrently maintainable | No | No | Yes | Yes |
Fault tolerant | No | No | No | Yes |
Cloud providers can use the tier levels to help them build and maintain data centers that operate at the desired level. Cloud customers can look for data centers with a tier rating that matches their requirements.
The diagrams below show the basics for how a data center can be configured for the appropriate level of redundancy in each tier.
Note that a PDU is a power distribution unit, and a UPS is an uninterruptible power supply.




3.3 Analyze risks associated with cloud infrastructure and platforms
We discuss the risks associated with cloud infrastructure and platforms in Domain 6.4, which is about the implications of cloud to enterprise risk management. This means that we won’t be discussing Risk assessments, Cloud vulnerabilities, threats and attacks, or Risk mitigation strategies until later.
3.4 Plan and implementation of security controls
In order to form a robust security posture, an organization must begin by assessing its risks, and then forming a cohesive security policy based on those assessments. Beneath the overarching security policy, it will have many more specific policies for different aspects of the organization’s security, as well as standards, guidelines, baselines and procedures.
At a lower level, we have the individual security controls, such as encryption, role-based access-control and security awareness training. However, if we don’t have carefully planned security policy based on actual risks, it’s too challenging to come up with a way for each of these individual controls to work together in a way that limits the risks of security incidents.
One of the most important security concepts is defense in depth, which the National Institute of Standards and Technology (NIST) defines in SP 800-53 as:
“An information security strategy that integrates people, technology, and operations capabilities to establish variable barriers across multiple layers and missions of the organization.”
Defense in depth involves controls that fall into a variety of categories. These categories are:
- Administrative
- Logical or technical
- Physical
We discuss these categories in more depth in Domain 6.4.
Physical and environmental protection
We discussed physical security measures in the Comprehend cloud infrastructure and platform components section, under the subheading Security of the physical environment. For environmental protections, we briefly mentioned some concerns in the Design a secure data center section under the Location subheading. It’s important to choose the location of a data center with environmental considerations in mind. This includes making judgements based on the risks of earthquakes, hurricanes, fires and other calamities.
System, storage and communication protection
We discuss system security in Domain 5.2. Storage protections were discussed in Domain 3.1. We discuss many network and communication security considerations in Domain 5.2.
Identification, authentication and authorization in cloud environments
We will discuss identification, authentication and authorization in cloud environments as part of Domain 4.7.
Audit mechanisms
We will be discussing audit mechanisms in two different sections. We will discuss how we use auditing and log collection as part of the accounting stage of IAM in Domain 4.7. We will also discuss auditing, correlation and packet capture in Domain 5.6 in the context of managing security operations.
3.5 Plan business continuity (BC) and disaster recovery (DR)
Business continuity (BC) and disaster recovery (DR) plans are critical for ensuring an organization’s resiliency. Things can and will go drastically wrong, and we need to plan for our biggest risks ahead of time.
Business continuity (BC)/disaster recovery (DR) strategy
Types of intellectual property
A disaster is a sudden, unplanned event that brings about great damage or loss. In a business environment, it is any event that creates an inability on an organization’s part to support critical business functions for some predetermined period of time.
Business Continuity Management (BCM) |
---|
The business function and processes that provide the structure, policies, procedures, and systems to enable the creation and maintenance of BCP and DRP plans. |
Business Continuity Planning (BCP) | Disaster Recovery Planning (DRP) |
---|---|
Focuses on survival of the business and the | Focuses on the recovery of vital technology infrastructure and systems. It is tactical. |
A BCM creates the structure necessary for BCP and DRP. BCP is primarily concerned with the components of the business that are truly critical and essential, while DRP is primarily concerned with the technological components that support critical and essential business functions. BCP focuses on the processes, while DRP focuses on the systems.
Security personnel should be involved in the BCP process from the earliest stages, from defining the scope of the BCP onward. The key BCP/DRP steps are:
1. Develop a contingency | This is a formal policy that provides the authority and guidance necessary to develop an effective contingency plan. |
2. Conduct a business impact analysis (BIA) | Conduct the business impact analysis, which helps identify and prioritize the information systems and the components that are critical to supporting the organization’s mission and business processes. |
3. Identify controls | These are the preventative measures taken to reduce the effects of system disruptions. They can increase system availability and reduce contingency life-cycle costs. |
4. Create contingency strategies | Thorough recovery strategies ensure that the system may be recovered quickly and effectively following a disruption. |
5. Develop contingency plans | Develop an information system contingency plan. |
6. Ensure testing, training, and exercises | Thoroughly plan testing, training, and exercises. Testing validates recovery capabilities, whereas training prepares recovery personnel for plan activation, and exercising the plan identifies gaps. |
7. Maintenance | Ensure that plan maintenance takes place. The plan should be a living document that is updated regularly to remain current with system enhancements and organizational changes. |
Business requirements
RPO, RTO, WRT, and MTD
When dealing with BCP and DRP procedures, there are four key measurements of time to be aware of. These are:
- Maximum tolerable downtime (MTD) – Maximum tolerable downtime (MTD) (also known as maximum allowable downtime (MAD)) refers to the maximum amount of time that an organization’s critical processes can be impacted.
- Recovery time objective (RTO) – Recovery time objective (RTO) refers to the amount of time expected to restore services or operations to a defined service level.
- Recovery point objective (RPO) – Recovery point objective (RPO) refers to the maximum amount of data that can be lost in terms of time.
- Work recovery time (WRT) – The work recovery time (WRT) is the time needed to verify the integrity of systems and data as they’re being brought back online.
The diagram below helps to show how these measurements of time all fit together. The horizontal axis is time, starting on the left with business as usual. As we progress to the right, a disaster occurs. The first measurement that we see is the RPO, the maximum amount of data loss as a measurement of time. After the disaster has occurred, the next measurement of time is the RTO, the maximum amount of time to restore processes and systems to a defined service level. WRT is the time required to validate systems as they are brought back online and return to business as usual. Finally, the MTD is the maximum amount of time that processes and systems can be down before the business may be forced to cease operations. MTD is the most important measurement of time to consider when making the decision to declare a disaster.

Creation, implementation and testing of plan
Business impact analysis (BIA)
A business impact analysis (BIA) is one of the most important steps in business continuity planning. Its purpose is to assess the potential consequences that a disaster or a disruption would have on business processes. A BIA should then gather information to develop recovery strategies for each critical function and process. The output of a BIA includes key measurements of time: RPO, RTO, WRT, and MTD.
The BIA Process
Identifying and assigning values to an organization’s most critical and essential functions and assets is the first step in determining what processes to prioritize in an organization’s recovery efforts. Employees from various company departments should be involved in the process so that they can give their insights surrounding critical systems and services.
Once asset values have been determined, and priorities have been established, an organization can set up processes to protect the most important assets. The steps of the BIA process are:
- Determine the business processes and recovery criticality.
- Identify resource requirements.
- Identify recovery priorities for system resources.
Disaster response process
The incident response process should be followed prior to a disaster being declared. Once an incident is identified, an assessment of its severity must be made. During the assessment of an incident, one specific variable should be carefully considered—MTD. If it’s clear that the MTD will be exceeded, a disaster should be declared and the disaster recovery plan immediately initiated.
Dependency Charts
When recovering from a disaster, dependency charts, like the one shown below, can map out exactly which components are required and even their initiation order.

Options for cloud recovery
The cloud can be used for data recovery in multiple ways:
- The primary copy can be stored on premises, while the backup is in the cloud.
- The primary copy can be stored in the cloud, with the backup stored in the same cloud.
- The primary copy can be stored in the cloud, with the backup stored in a separate provider’s cloud.
Failover architecture
Most organizations aim for more than just keeping their data safe—they also want to keep their services online. To achieve this, they need failover architecture that can take over automatically. When it is set up correctly, the service can quickly switch over to the backup architecture if the primary one goes down, as shown in the following diagrams.


Chaos engineering
Through tools like Chaos Mesh, you can bring fault simulation to Kubernetes in a way that allows you to test what will happen in a range of strange situations. You can use these simulations of various fault scenarios to help you design your architecture to be more robust. If something fails in the simulation, you can then plan for that eventuality.
Test disaster recovery plans (DRP)
BCP and DRP Testing
After recovery plans have been created, it’s important to test them. Tests can range from simple to complex, with each type having its own value. Some of the most common tests are:
Type | Description | Affects backup/parallel systems | Affects production systems |
---|---|---|---|
Read-through/ checklist | Involves reviewing the DR plan against a standard checklist for missing components and completeness. | Cell | Cell |
Walk-through | Relevant stakeholders walk through the plan and provide their input based on their expertise. | Cell | Cell |
Simulation | Involves following a plan based on a simulated disaster scenario. It stops short of affecting systems or data. | Cell | Cell |
Parallel | Involves testing the DR plan on parallel systems. | ✓ | Cell |
Full-interruption/ full-scale | Involves production systems, which makes these tests the most valuable, but also the most risky. | ✓ | ✓ |
Goals of business continuity management (BCM)
The three primary goals of business continuity management (BCM) are simple:
- Safety of people.
- Minimization of damage.
- Survival of business.
CCSP Domain 3 key takeaways
Note: Some concepts in Domain 3 are explored in greater detail in other CCSP domains. These topics are referenced in their relevant sections above but are not included in the Key Takeaways for Domain 3. For comprehensive coverage, refer to the indicated domains where these concepts are primary focuses.
3.1 Comprehend cloud infrastructure and platform components
Physical environment
Security of the physical environment
Networking and communications
Zero trust architecture
Virtual local area networks (VLANs)
Software-defined networks (SDNs)
The security advantages of software-defined networks (SDNs)
The security challenges of cloud networking
Additional network security considerations in the cloud
Compute
Virtualization
Virtual machines (VMs)
Hypervisors
Containers
Microservices
Serverless computing
Function as a service (FaaS)
Storage
Storage clusters
Orchestration
Management plane
3.2 Design a secure data center
Logical design
Physical design
Utilities
Environmental design
Multi-vendor pathway connectivity
Design resilient
Fire detection
Fire suppression
The IDCA
Uptime Institute tier standards
3.3 Analyze risks associated with cloud infrastructure and platforms
3.4 Plan and implementation of security controls
3.5 Plan business continuity (BC) and disaster recovery (DR)
Business continuity (BC)/disaster recovery (DR) strategy
Business requirements
Creation, implementation and testing of plan
Disaster response process
Restoration order
Failover architecture
Chaos engineering
BCP and DRP testing
Goals of business continuity management (BCM)
Preparing for CCSP Domain 3 exam
So, you've built your cloud fortress in theory—now it's time to prove you can do it in practice. The CCSP Domain 3 exam is your chance to showcase your cloud platform and infrastructure security expertise. But don't worry, we've got your back. We'll walk you through what to expect on the exam and arm you with the best resources to ace it.
CCSP Domain 3 exam expectations: What you need to know
Important reminder: The topics below are critical for CCSP Domain 3, but they're just the tip of the iceberg. Cloud platforms and infrastructure are the bedrock of cloud security—every concept matters. While we've highlighted likely exam topics, be prepared for questions that test your holistic understanding.
Remember, in real-world scenarios, a gap in your knowledge could compromise your entire cloud security strategy. Dive deep, connect the dots, and build a comprehensive understanding of how these components work together to create robust cloud security architecture.
3.1 Comprehend cloud infrastructure and platform components
3.2 Design a secure data center
3.3 Analyze risks associated with cloud infrastructure and platforms
3.4 Plan and implementation of security controls
3.5 Plan business continuity (BC) and disaster recovery (DR)
Resource recommendations
Mastering CCSP Domain 3 requires a multifaceted approach. Like building a robust cloud infrastructure, your study strategy should have multiple layers of defense against knowledge gaps.
Let's explore some key resources that can fortify your understanding of cloud platform and infrastructure security:
- Destination Certification CCSP MasterClass: This comprehensive program offers a structured approach to mastering Domain 3. Its adaptive learning system and expert-led Q&A sessions are particularly valuable for navigating the complex infrastructure and platform security concepts, ensuring you build a solid foundation in cloud security architecture.
- Destination CCSP: The Comprehensive Guide: Our guide excels at breaking down the intricate technical aspects of cloud platforms and infrastructure. Its innovative diagrams and real-world examples are especially helpful for visualizing complex Domain 3 concepts like secure data center design and cloud architecture.
- Destination Certification CCSP App: For studying on the go, this app is invaluable. Its flashcards and practice questions are particularly useful for reinforcing your knowledge of Domain 3's technical terminology and key principles of cloud infrastructure security.
FAQs
Domain 2 (Cloud Data Security) accoaDomain 3 (Cloud Platform and Infrastructure Security) makes up 17% of the CCSP exam.unts for 20% of the CCSP exam questions.
While there are no formal prerequisites for Domain 3, a solid understanding of general IT and network security principles will be beneficial.
Although hands-on experience is not strictly required, practical knowledge of cloud platforms can significantly help in understanding and applying the concepts tested in Domain 3.
Elevate Your Cloud Security Expertise with Destination Certification
Domain 3 is the cornerstone of cloud security, demanding a robust understanding of cloud platforms and infrastructure. It challenges you to think like a cloud architect while maintaining the mindset of a security professional—a delicate balance that Destination Certification helps you master.
Our CCSP MasterClass is engineered to demystify the complexities of Domain 3 and beyond. We don't just provide information; we help you build a comprehensive mental model of cloud security. Our adaptive learning system identifies and strengthens your weak points, ensuring you're well-prepared for every aspect of cloud platform and infrastructure security.
With real-world scenarios and expert-led discussions, we equip you with practical knowledge applicable to your career. Our weekly live Q&A sessions give you direct access to CCSP experts, allowing you to dive deep into the nuances of secure cloud architecture and design.
Ready to transform your understanding of cloud platform and infrastructure security? Join our CCSP MasterClass today and experience the Destination Certification advantage. We're not just preparing you for an exam—we're equipping you to become a leader in cloud security.
Rob is the driving force behind the success of the Destination Certification CISSP program, leveraging over 15 years of security, privacy, and cloud assurance expertise. As a seasoned leader, he has guided numerous companies through high-profile security breaches and managed the development of multi-year security strategies. With a passion for education, Rob has delivered hundreds of globally acclaimed CCSP, CISSP, and ISACA classes, combining entertaining delivery with profound insights for exam success. You can reach out to Rob on LinkedIn.
Rob is the driving force behind the success of the Destination Certification CISSP program, leveraging over 15 years of security, privacy, and cloud assurance expertise. As a seasoned leader, he has guided numerous companies through high-profile security breaches and managed the development of multi-year security strategies. With a passion for education, Rob has delivered hundreds of globally acclaimed CCSP, CISSP, and ISACA classes, combining entertaining delivery with profound insights for exam success. You can reach out to Rob on LinkedIn.
The easiest way to get your CCSP Certification
Learn more about our CCSP MasterClass
