Home
/
Resources
/
CCSP Domain 3 Explained: Securing Cloud Platforms and Infrastructure

CCSP Domain 3 Explained: Securing Cloud Platforms and Infrastructure

Estimated reading time: minutes

Image of CCSP domain 3 thumbnail - Destination Certification

Picture this: You're building a fortress in the sky. Sounds impossible, right? Well, welcome to CCSP Domain 3, where we do just that—but with clouds and data. We're not talking about fluffy white castles, but robust digital fortresses that keep our information safe in the vast expanse of cyberspace.

Securing cloud platforms isn't just about firewalls and encryption. It's about architecting resilient systems that can withstand the storms of cyber threats and other threats. From designing secure data centers to implementing chaos engineering, we're diving deep into the bedrock of cloud security architectures. These strategies form the foundation of a robust cloud infrastructure, ensuring data integrity and service continuity in an ever-evolving threat landscape.

Let's explore the critical aspects of the Domain 3 of the CCSP exam and enhance your cloud security expertise.

3.1 Comprehend cloud infrastructure and platform components

There are two layers to cloud infrastructure:

The physical resources – This is the hardware that the cloud is built on top of. It includes the servers for compute, the storage clusters and the networking infrastructure.
The virtualized infrastructure – Cloud providers pool together these physical resources through virtualization. Cloud customers then access these virtualized resources.

Physical environment

In a general sense, physical environments include the actual data centers, server rooms or other locations that host infrastructure. If a company runs its own private cloud, it acts as the cloud provider and the physical environment would be wherever the hardware is located.

Compute nodes are one of the most important components. A compute node is essentially what provides the resources, which can include the processing, memory, network and storage that a virtual machine (VM) instance needs. However, in practice, storage is often provided by storage clusters.

Security of the physical environment

Now that we have described some of the major components that make up the physical environment of a cloud data center, it’s time to look at some of the ways we secure these environments. In order to maintain a robust security posture, we must follow a layered defense approach, which is also known as defense in depth.

In essence, we want to have multiple layers of security so that attackers can’t completely compromise an organization just by breaching one of our security controls, as shown in the image below.

Confidentiality	Keeping our data confidential basically means keeping it a secret from everyone except for those who we want to access it.
Integrity	If data maintains its integrity, it means that it hasn’t become corrupted, tampered with, or altered in an unauthorized manner.
Availability	Available data is readily accessible to authorized parties when they need it.

The CIA triad is a fairly renowned model, but confidentiality, integrity and availability aren’t the only properties that we may want for our data. Two other important properties are authenticity and non-repudiation.

Authenticity	Authenticity basically means that a person or system is who it says it is, and not some impostor. When data is authentic, it means that we have verified that it was actually created, sent, or otherwise processed by the entity who claims responsibility for the action.
Non-repudiation	Non-repudiation essentially means that someone can’t perform an action, then plausibly claim that it wasn’t actually them who did it.

Data roles

There are a number of different data security roles that you need to be familiar with.

Data owner/ data controller	The individual within an organization who is accountable for protecting its data, holds the legal rights and defines policies. In the cloud model, the data owner will typically work at the cloud customer organization.
Data processor	An entity or individual responsible for processing data. It’s typically the cloud provider, and they process the data on behalf of the data owner.
Data custodian	Data custodians have a technical responsibility over data. This means that they are responsible for administering aspects like data security, availability, capacity, continuity, backup and restore, etc.
Data steward	Data stewards are responsible for the governance, quality and compliance of data. Their role involves ensuring that data is in the right form, has suitable metadata, and can be used appropriately for business purposes.
Data subject	The individual to whom personal data relates.

Cloud data life cycle phases

The CCSP exam covers the Cloud Security Alliance’s data security life cycle, which was originally developed by Rich Mogull. This model is tailored toward cloud security. There are six phases in the data life cycle.

Image of defense in depth - Destination Certification

Some important physical security considerations:

Guards	Guards can help to administer entry points, patrol the location, and act as deterrents.
CCTV	Closed circuit television cameras (CCTV) are primarily for detecting potentially malicious actions, but they also act as deterrents.
Motion detectors	There are a range of different sensors that can be deployed to detect activity in sensitive areas.
Lighting	Lights can act as safety precautions, deterrents, and give CCTV cameras a better view.
Fences	Fences are a great tool for both keeping people and vehicles away from the premises. Eight feet is a common fence height for deterrence.
Doors	Doors should be constructed securely to limit an attacker’s ability to breach them.
Locks	Locks are critical for restricting access to doors, windows, filing cabinets, etc. There are many types of lock, including: Key Combination RFID Biometric
Mantraps	Mantraps are small spaces in between two doors, where only one door can be opened at a time.
Turnstiles	Turnstiles prevent people from tailgating or piggybacking behind an authorized person. Tailgating and piggybacking involve following a person who is authorized to enter a restricted area through a door and thus gaining unauthorized access. The difference is that in tailgating the attacker possesses a fake badge. In piggybacking, the attacker doesn’t have any badge at all.
Bollards	Bollards prevent vehicles from entering an area.

Networking and communications

Clouds typically have two or possibly three dedicated networks that are physically isolated from one another, for both security and operational purposes.

Service	The service network is the customer facing network–it’s what the cloud customers have access to.
Storage	The storage network connects virtual machines to storage clusters.
Management	Cloud providers use the management network to control the cloud. Providers use this network to do things like log into hypervisors to make changes or to access the compute node.

There are two major networking models, non-converged and converged networks. In a non-converged network, the management, storage and service networks are separate. The service network generally connects to the local area network across Ethernet switches, while the storage network generally connects to the storage clusters via a protocol like Fibre Channel.

In contrast, a converged network combines the storage and service networks, with storage traffic and service traffic traveling over the same network. However, the management network remains separate for security reasons.

Image of Non-Converged Networks vs. Converged Networks - Destination Certification

Zero trust architecture

Zero trust architectures involve continually evaluating the trust given to an entity. They contrast with earlier models that assumed that once an entity was on the internal network it should be automatically trusted. We all know that attackers can make their way into our network perimeters, so giving anyone free rein once they are inside the network is a recipe for disaster.

A simplified summary of the zero trust approach involves:

Not implicitly trusting entities on the internal network.
Shrinking implicit trust zones and enforcing access controls on the most granular level possible. Micro-segmentation is useful for dividing enterprise networks into smaller trust zones composed of resources with similar security requirements.
Granting access on a per-session basis. Access can be granted or denied based upon an entity’s identity, user location, and other data.
Restricting resource access according to the principle of least privilege.
Re-authentication and re-authorization when necessary.
Extensive monitoring of entities and assets.

Virtual local area networks (VLANs)

A core aspect of cloud computing involves abstracting resources away from the physical hardware via virtualization in order to use and share the resources more efficiently. Networking resources are also abstracted away in this manner.

One way of doing this is through virtual local area networks (VLANs). You can take a physical network and logically segment it into many VLANs. Let’s say that an organization wants to operate two isolated networks. The first is for the company’s general use, while the second is a network for the security department.

The organization could do this by purchasing two separate switches. It could set up the general use network on the first switch, and the security department’s network on the second switch. As long as the two switches aren’t linked up, then the organization would have two physically isolated networks.

Another option would be for the organization to have two logically isolated VLANs on the same physical switch. The diagram below shows a 16-port switch. Four computers are plugged into the switch, the first two for general use, and the second two for the security department. If the switch were just set up by default, all four of these computers would be able to talk to each other, which is not what the company wants—they want the first two to be separate from the second two.

Image of a 16 port switch - Destination Certification

Instead, the image above shows how the first two computers for general use have been grouped into a VLAN—VLAN1—while the second two computers for the security department are grouped separately as VLAN2. This would mean that the first two computers could talk to each other, but not talk to the last two computers. Similarly, the last two computers can communicate with one another, but they cannot talk to the first two general-use computers.

Having two separate VLANs means that the general use network and the security department network are logically isolated and cannot access each other but they are still on the same physical switch.

This same concept can be extended beyond a single physical switch. The image below shows a second 16-port switch that has been connected to the first one. This second switch has an additional four computers connected to it, two more for general use, and an extra two for the security department.

Even though these computers are connected to a separate switch, they have still been set up as part of the preexisting VLANs. This means that the four general use computers can only communicate among themselves in VLAN1. Likewise, the security department computers can only communicate among themselves in VLAN2.

Two VLANs Across Two Separate Physical Switches - Destination Certification

VLANs are commonly used by enterprises to logically separate networks. One example involves providing an isolated guest network to customers. This helps to protect the main network against attackers who are trying to gain a foothold by logging in to the open Wi-Fi. Another use of VLANs is to form trust zones for zero trust architecture.

Software-defined networks (SDNs)

Software-defined networks (SDNs) allow a more thorough layer of abstraction over the physical networking components. These days, SDNs are used for virtualizing networks in most cloud services.

Key benefits of SDNs
They can create virtual, software-controlled networks on top of physical networks. Each of these virtual networks has no visibility into the other virtual networks.
They can decouple the control plane from the data plane.
They can provide more flexibility and make it easier to rapidly reconfigure a network for multiple clients. On a network that’s completely virtualized, you can make configuration changes just through software commands.
They are critical building blocks that enable resource pooling for cloud services. SDNs create a layer of abstraction on top of physical networks, and you can create virtual networks on top of this layer.
They centralize network intelligence into one place.
They allow programmatic network configuration. You can entirely reconfigure the network through API calls.
They allow multiple virtual networks to use overlapping IP ranges on the same hardware. Despite this, the networks are still logically isolated.

Before we can fully explain SDNs, we need to back up a little. Network devices like switches and routers have two major components, the control plane and the data plane. The control plane is the part of the architecture that is responsible for defining what to do with incoming packets and where they should be sent. The data plane does the work of processing data requests. The control plane is essentially the intelligence of the network device and it does the thinking, while the data plane is basically just the worker drone.

In traditional networks, control planes and data planes are built-in to both routers and switches. In the case of a switch, the control plane decides that an incoming packet is destined to MAC address XYZ, and the data plane makes it happen. In a traditional network, if you want to make configuration changes to switches or routers, you have to log in to each device individually, which can be time consuming.

Control planes and data planes in traditional networks vs. SDNs - Destination Certification

One of the major differences in software-defined networks (SDNs) is that the control plane is separated from the data plane and then centralized into one system, as shown in the image above. A big benefit of this is that you don’t have to log in to individual devices to make changes on your network. Instead, you can just log in to the central control plane and make the adjustments there. This makes management and configuration far easier. Another advantage is that if a switch fails, you can just route the traffic around it. In the cloud, the centralized control pane of an SDN is in turn controlled by the management plane.

The security advantages of software-defined networks (SDNs)

Most of the benefits of SDNs center around the fact that virtualized networks are easy and cheap to both deploy and reconfigure. SDNs allow you to easily segment your network to form numerous virtual networks. This approach, known as microsegmentation, allows you to isolate networks in a way that would be cost-prohibitive with physical hardware.

Let’s give you a more concrete example to demonstrate just how advantageous microsegmentation can be. First, let’s say your organization has a traditional network, as shown in the diagram below. You would have the insecure Internet, a physical firewall, and then the DMZ, where you would have things like your web server, your FTP server and your mail server. Under this setup, your firewall rules would need to be fairly loose to allow the web traffic, the FTP traffic and the SMTP traffic through to each of your servers. The downside of this configuration is that if the web server was compromised by an attacker, this would give them a foothold in your network that they could use to access your FTP server or your mail server. This is because all of these servers are on the same network segment.

Image of a traditional network - Destination Certification

In contrast to this traditional network configuration, SDNs allow you to deploy virtual firewalls easily and at low cost. You can easily put virtual firewalls in front of each server, creating three separate DMZs, as shown in the figure below. You could have much tighter rules on the firewalls for each of these network segments because the firewall in front of your web server would only need to let through web traffic, the firewall in front of your FTP server would only need to let through FTP traffic, etc.

The benefit of having these virtualized segments with their own firewalls is that the much stricter rules limit the opportunities for malicious traffic to get through. In addition, if an attacker does manage to get a foothold on one of your servers, such as your web server, they would not be able to move laterally as easily. They would still need to get through the other firewalls if they wanted to reach your FTP or mail servers.

The security challenges of cloud networking

Cloud networking has a number of benefits that are essential to the functioning of the modern cloud environment. However, there’s no free lunch, and SDNs also come with a range of disadvantages, many of which are related to the fact that the cloud customer has no control of the underlying physical infrastructure. Since physical appliances can’t be installed by the customer, customers must use virtual appliances instead, which have some limitations.

Virtual appliances are pre-configured software solutions made up of at least one virtual machine. Virtual appliances are more scalable and compatible than hardware appliances, and they can be packaged, updated and maintained as a single unit.

Virtual appliances can form bottlenecks on the network, requiring significant resources and expense to deliver appropriate levels of performance. They can also cause scaling issues if the cloud provider doesn’t offer compatible autoscaling. Another complication is that autoscaling in the cloud often results in the creation of many instances that may only last for short periods. This means that different assets can use the same IP addresses. Security tools must adapt to this highly dynamic environment by doing things like identifying assets by unique and static ID numbers, rather than IP addresses that may be constantly changing.

Another complication comes from the way that traffic moves across virtual networks. On a physical network, you can monitor the traffic between two physical servers. However, when two virtual machines are running on top of the same physical compute node, they can send traffic to one another without it having to travel via the physical network, as shown in the diagram below. This means that any tools monitoring the physical network won’t be able to see this communication.

Physical IDS sensor connected to the physical switch - Destination Certification

One option for monitoring the traffic between two VMs on the same hardware is to deploy a virtual network monitoring appliance on the hypervisor. Another is to route the traffic between the two VMs through a virtual appliance over the virtual network. However, these approaches create bottlenecks.

Compute

In the cloud, compute is derived from the physical compute nodes which are made up of CPUs, RAM and network interface cards (NICs). A bunch of these are stored in racks at a provider’s data center, and interconnected to the management network, the service network, and the storage network. These compute resources are then abstracted away through virtualization and provisioned to customers.

Securing compute nodes

Cloud providers control and are responsible for the compute nodes and the underlying infrastructure. They are responsible for patching and correctly configuring the hypervisor, as well as all of the technology beneath it. Cloud providers must strictly enforce logical isolation so that customers are not visible to one another. They also need to secure the processes surrounding the storage of a VM image through to running the VM. Adequate security and integrity protections help to ensure that tenants cannot access another customer’s VM image, even though they share the same underlying hardware. Another critical cloud provider responsibility is to ensure that volatile memory is secure.

Virtualization

Virtualization involves adding a layer of abstraction on top of the physical hardware. It’s one of the most important technologies that enable cloud computing. The most common example is a virtual machine, which runs on top of a host computer. The real, physical resources belong to the host computer, but the virtual machine acts similarly to an actual computer. Its operating system is essentially tricked by software running on the host computer. The OS acts the same way it would if it was running on top of its own physical hardware.

But virtualization is used beyond just compute. We also rely on it to abstract away storage and networking resources (such as the VLANs and SDNs we discussed earlier) from the underlying physical components.

Virtual machines (VMs)

To simplify things, a normal computer runs directly on the hardware. In contrast, a virtual machine runs at a higher layer of abstraction. It runs on top of a hypervisor, which in turn runs on top of physical hardware. The virtual machine is known as the guest or an instance, while the computer that it runs on top of is the host. The diagram below shows multiple virtual machines running on the same compute node. Each virtual machine includes its operating system, as well as any apps running on top of it.

Multiple virtual machines running on the same compute node - Destination Certification

One huge benefit of virtualization is that it frees up virtual environments from the underlying physical resources. You can also run multiple virtual machines simultaneously on the same underlying hardware. In the cloud context, this is incredibly useful because it allows providers to utilize their resources more efficiently.

Hypervisors

Hypervisors are pieces of software that make virtualization possible. There are two types of hypervisors, as shown in the image above and the table below.

Type 1 hypervisor	Sometimes known as bare metal or hardware hypervisors because they run directly on top of the host’s hardware. Controls the underlying hardware and creates a layer of virtualization, with virtual CPU, RAM, NIC and other virtualized components that the guest machine needs to run. Generally more efficient and more secure. Commonly used in data centers for efficiency. One example is Microsoft’s Hyper-V.
Type 2 hypervisor	Sometimes known as hosted or operating system hypervisors because an OS sits in between the hardware and the hypervisor. The OS adds an extra layer, which is less efficient and introduces another place for vulnerabilities to arise. Mostly used for development and testing, as well as small-scale virtualization, where efficiency is less of a concern. Examples include VirtualBox or Apple’s Parallels.

Type 1 hypervisor

Sometimes known as bare metal or hardware hypervisors because they run directly on top of the host’s hardware.
Controls the underlying hardware and creates a layer of virtualization, with virtual CPU, RAM, NIC and other virtualized components that the guest machine needs to run.
Generally more efficient and more secure.
Commonly used in data centers for efficiency.
One example is Microsoft’s Hyper-V.

Type 2 hypervisor

Sometimes known as hosted or operating system hypervisors because an OS sits in between the hardware and the hypervisor.
The OS adds an extra layer, which is less efficient and introduces another place for vulnerabilities to arise.
Mostly used for development and testing, as well as small-scale virtualization, where efficiency is less of a concern.
Examples include VirtualBox or Apple’s Parallels.

Hypervisor security

Due to the fact that hypervisors sit between the hardware (or the OS in a type 2 hypervisor) and virtual machines, they have total visibility into every virtual machine that runs on top of them. They can see every command processed by the CPU, observe the data stored in RAM, and look at all data sent by the virtual machine over the network.

An attacker that compromises a hypervisor may be able access and control all of the VMs running on top of it, as well as their data. One threat is known as a VM escape, where a malicious tenant (or a tenant whose VM was compromised by an external attacker) manages to break down the isolation and escape from their VM. They may then be able to compromise the hypervisor and access the VMs of other tenants.

In type 2 hypervisors, the security of the OS that runs beneath the hypervisor is also critical. If an attacker can compromise the host OS, then they may be able to also compromise the hypervisor as well as the VMs running on top of it.

Containers

Containers are highly portable code execution environments that can be very efficient to run. Containers feature isolated user spaces but share the kernel and other aspects with the underlying OS. This contrasts with virtual machines, which require their own entire operating systems, including the kernel.

Multiple containers can run on top of each OS, with the containers splitting the available resources. This makes containerization useful for securely sharing hardware resources among cloud customers, because it allows them to use the same underlying hardware while remaining logically isolated. Each of these containers can in turn run multiple applications.

Another major advantage of containers is that they can help to make more efficient use of computational resources. The image below shows the contrast between virtual machines and containers. If we want to run three VMs on top of our hypervisor, we need three separate operating systems, three separate sets of libraries, and the apps on top of them. In contrast, on the container side, we just have one operating system, one containerization engine, libraries that can be shared between apps, and then our three apps on top.

Virtual Machines vs. Containers - Destination Certification

The image below shows the major components of containerization, as well as the key terms. A container is formed by taking configuration files, application code, libraries, necessary data, etc. and then building them into a binary file known as a container image.

Image of components of containerization - Destination Certification

These container images are then stored in repositories. Repositories are basically just collections of container images. In turn, these repositories are stored in a registry. When you want to run a container, you pull the container image out of its repository, and then run it on top of what is known as a container engine. Container engines essentially add a layer of abstraction above the operating system, which ultimately allows the containers to run on any operating system.

Application virtualization

Application virtualization is similar to containerization in that there is a layer of virtualization between the app and the underlying OS. We often use application virtualization to isolate an app from the operating system for testing purposes. It is shown below:

Microservices

Monolithic vs. microservice architecture - Destination Certification

Traditionally, apps were monolithic. They were designed to perform every step needed to complete a particular task, without any modularity. This approach creates complications, because even relatively minor changes can require huge overhauls of the app code in order to retain functionality.

With a more modular approach, developers can easily swap out and replace code as needed, without having to redesign major parts of the app. These days, many apps are broken down into loosely coupled microservices that run independently and simultaneously. These are small, self-contained units with their own interfaces, as shown in the image above.

Serverless computing

Serverless computing can be hard to pin down. The term is often used to describe function-as-a-service (FaaS) products like AWS Lambda, but a number of other services are also offered under the serverless model. These include the relational database, Amazon Aurora, or Microsoft’s complex event processing engine, Azure Stream Analytics.

At its heart, serverless refers to a model of providing services where the customer only pays when the code is executed (or when the service is triggered by use, such as Amazon Aurora’s database), generally measured in very small increments.

Function as a service (FaaS)

Function as a service (FaaS) is a subset of serverless computing. In contrast with serverless’ broader set of service offerings, FaaS is used to run specific code functions. Entire applications can be built under the serverless model, while FaaS is limited to just running functions. Under FaaS you are only billed based on the duration and memory used for the code execution, and there aren’t any network or storage fees.

Storage

We will start by discussing the storage types from 2.2 Design and implement cloud data storage architectures. This includes Exam Outline’s subsections on Storage types (e.g., long-term, ephemeral, raw storage), and Threats to storage types. We will also discuss storage controllers and storage clusters.

Storage types

There are a number of different storage types you need to understand to truly grasp cloud computing. They are summarized below:

Long-term	Cheap and slow storage that’s mainly used for long-term record keeping.
Ephemeral	Temporary storage that only lasts until the virtual environment is shut down.
Raw-disk	A high-performance storage option. In the cloud, raw disk storage allows your virtual machine to directly access the storage device via a mapping file as a proxy.
Object	Object storage involves storing data as objects, which are basically just collections of bits with an identifier and metadata.
Volume	In the cloud, volume storage is basically like a virtualized version of a physical hard drive, with the same limitations you would expect from a physical hard drive.

Cloud service models and storage types

Service model	Storage type
SaaS	Under the SaaS model, cloud customers have limited control, so they don’t have direct access to raw, ephemeral, volume or object storage. Cloud customers can’t directly access the raw, ephemeral, volume or object storage locations via SaaS. Instead, customers access data through either a web-based user interface or an application. Customers have limited control over data stored in SaaS. Most of the control is in the hands of the provider.
PaaS	In PaaS, customers build their own application, so they can choose how the app stores data. This means that customers have some control over data stored in PaaS. Many apps use databases to store data. PaaS customers can use storage solutions like: Database-as-a-service. Open-source solutions based on Apache Hadoop. Application storage which is accessed through APIs. Data can be kept in object storage and accessed via API calls.
IaaS	Raw storage – Virtualized access to the physical media where the data is stored. Volume storage – Typically attached to IaaS instances as a virtualized hard drive. Object storage – Objects are accessed via APIs or web interfaces.

Storage controllers

Storage controllers manage your hard drives. They can be involved in tasks like reconstructing fragmented data and access control. Storage controllers can use several different protocols to communicate with storage devices across the network.

Here are three of the most common protocols:

Internet Small Computer System Interface (iSCSI	This is an old protocol that is cost-effective to use and highly compatible. However, it does have limitations in terms of performance and latency.
Fibre Channel (FC)	Fibre Channel offers reliability and high performance, but it can be expensive and difficult to deploy.
Fibre Channel over Ethernet (FCoE)	Fibre Channel over Ethernet relies on Ethernet infrastructure, which reduces the costs associated with FC. It offers high performance, low latency and a high degree of reliability. However, there can be some compatibility issues, depending on your existing infrastructure.

Storage clusters

Cloud providers typically have a bunch of hard drives connected to each other in what we call storage clusters. Storage clusters are generally stored in racks that are separate from the compute nodes. Connecting the drives together allows you to pool storage, which can increase capacity, performance and reliability.

Tightly coupled vs. loosely coupled clusters - Destination Certification

Storage clusters are typically either tightly coupled, or loosely coupled, as shown in the image above. The former is expensive, but it provides high levels of performance, while the latter is cheaper and performs at a lower level. The main difference is that in tightly coupled architectures the drives are better connected to each other and follow the same policies, which helps them work together. If you have a lot of data, and performance isn’t a major concern, a loosely coupled structure is often much cheaper.

Management plane

The management plane is the overarching system that controls everything in the cloud. It’s one of the major differences between traditional infrastructure and cloud computing. Cloud providers can use the management plane to control all of their physical infrastructure and other systems, including the hypervisors, the VMs, the containers, and the code.

The centralized management plane is the secret sauce of the cloud, and it helps to provide the critical components like on-demand self-service and rapid elasticity. Without the management plane, it would be impossible to get all of the separate components to work in unison and respond dynamically to the needs of cloud customers in real time. The diagram below shows the various parts of the cloud under the management plane’s control.

Image of a diagram of various parts of the cloud under the management plane’s control - Destination Certification

The diagram further down shows a simple diagram of the typical components of a cloud. The logical components are highlighted in yellow, while the physical components are shown in purple. Note that the management plane is actually both physical hardware and software.

Image of a simple diagram of the typical components of a cloud - Destination Certification

Management plane capabilities

Management plane capabilities include:

Scheduling
Orchestration
Maintenance
Service catalog
Self-provisioning
Identity and access management
Management APIs
Configuration management
Key management and encryption
Financial tracking and reporting
Service and helpdesk

Management plane security controls

The management plane is an immensely powerful aspect of cloud computing. Due to the management plane’s immense degree of control and access, it means that if it gets compromised by an attacker, they will have the keys to the castle. This makes securing the management plane one of the most important priorities. Defense in depth is critical—there need to be many layers of security controls keeping the management plane secure.

Orchestration

Orchestration is the centralized control of all data center resources, including things like servers, virtual machines, containers, storage and network components, security, and much more. Orchestration provides the automated configuration and coordination management. It allows the whole system to work together in an integrated fashion. Scheduling is the process of capturing tasks and prioritizing them, then allocating resources to ensure that the tasks can be conducted appropriately. Scheduling also involves working around failures to ensure tasks are completed.

3.2 Design a secure data center

There are many factors that influence the design of a data center. They include:

The type of cloud services provided	Different purposes will require different designs. For a service that offers cheap cloud storage, the data center would need a lot of storage hardware. In contrast, a service that is designed for training large learning models (LLMs) would need a lot of high-end chips.
The location of the data center	Factors that affect the location include: How close the data center needs to be to users. Jurisdiction and compliance requirements. The price of electricity in various regions. Susceptibility to disasters such as earthquakes and flooding. Climate also has an impact, with warmer locations generally requiring more energy to cool the hardware.
Uptime requirements	If a data center aims to have extremely high availability, it will need to be designed with more redundancy built in.
Potential threats	Threats will vary depending on what the cloud service is used for. As an example, if a cloud service is designed to host protected health information (PHI), it will need additional protective measures to mitigate against attackers targeting this highly sensitive data.
Efficiency requirements	Different cloud services will need varying levels of efficiency to ensure cost-effectiveness. The intended use impacts design choices. As an example, a data center that aims to provide cheap service will probably want to use a lot of relatively basic equipment. A data center for training AI models will need niche hardware that drives up costs.

Logical design

Tenant partitioning and access control are two important logical considerations highlighted by the CCSP exam outline that can both be implemented through software.

Tenant partitioning

If resources are shared without appropriate partitioning, a malicious tenant (or a tenant who has been compromised by an attacker) could harm all of the other tenants. Obviously, we do not want this to happen, so we want to isolate the tenants from one another. With appropriate isolation, a compromised or malicious tenant cannot worm their way into the other tenant’s systems.

Tenants can be isolated by providing each one with their own physical hardware. One example is to allocate dedicated servers to each tenant. However, public cloud services tend to partition their tenants logically. They share the same underlying physical resources between their tenants and provide each one with virtualized versions of the hardware.

Looking for some CCSP exam prep guidance and mentoring?

Learn about our personal CCSP mentoring

1-on-1 Personal CCSP Mentoring

Access control

Access controls are an essential part of keeping tenants separate. We discuss them in Domain 4.7.

Physical design

The physical design of a data center goes far beyond the architecture. It includes things like the location, the HVAC, the infrastructure setup and much more. Each aspect needs to be carefully considered to produce an efficient and resilient data center.

Buy or build?

When a company needs a data center, it must decide whether to buy an existing one, lease, or build its own. Below are the key differences between buying, leasing and building:

Buy	Lease	Build
High CapEx, low OpEx (but not as low as when building a custom data center)	Low CapEx, high OpEx.	High CapEx, but low OpEx.
Will not be customized to an organization’s needs.	Will not be customized to an organization’s needs.	Can be tailor-made and incredibly efficient.
The organization has a lower degree of control.	The organization has a lower degree of control.	The organization has a high degree of control.

Location

There are many important factors to consider when choosing the location of a data center. Some of the main considerations are:

How close the data center needs to be to users.
Jurisdiction and compliance requirements. Some jurisdictions may require that any data about their residents be stored within the region.
The price of electricity in various regions.
Susceptibility to disasters such as earthquakes and flooding.
Climate also has an impact, with warmer locations generally requiring more energy to cool the hardware.

Utilities

When designing a data center, we have three primary utilities that we need to worry about. It’s easiest to remember them as the three Ps.

Ping (network)	Your data center will need to have a high-speed fiber optic connection that links it up to the internet backbone.
Power (electricity)	Your data center will need sufficient power to run its equipment. Given that data centers use large amounts of power, it is ideal to locate data centers in areas with affordable electricity.
Pipe (HVAC)	To efficiently run your hardware and limit equipment failures, your data center will need to maintain the right temperature and humidity. This is what we consider “pipe”. It includes your air conditioning, heating, ventilation, dehumidifiers, water, etc

Given that each of these utilities are critical for keeping your service available, you will need to have redundancies for each in place. The more uptime you wish to guarantee your customers, the more elaborate your redundancy plans will need to be.

Discovery of internal vs external redundancies

Redundancies can be categorized as internal or external, depending on whether they are inside the server room or outside of it. Things like power distribution units and chillers are viewed as internal redundancies, while a generator is seen as external. You wouldn’t want to run your generator inside and clog the server room with fumes.

BICSI data center standards

When designing data centers, various resources from the Building Industry Consulting Service International (BICSI) are incredibly useful. For taking care of ping, BICSI has a number of cabling standards, such as ANSI/BICSI N1-2019, Installation Practices for Telecommunications and ICT Cabling and Related Cabling Infrastructure and ANSI/BICSI N2-2017, Practices for the Installation of Telecommunications and ICT Cabling Intended to Support Remote Power Applications.

Standards that focus on overall data center design and operations include ANSI/BICSI 002-2019, Data Center Design and Implementation Best Practices, as well as BICSI 009-2019, Data Center Operations and Maintenance Best Practices.

Keys, secrets and certificate management

HVAC

HVAC stands for heating, ventilation and air conditioning, each of which are critical for operating a data center smoothly. In cold climates, a data center may need heating. Ventilation is important for dehumidifying and filtering a data center’s air. Air conditioning and other types of cooling are critical for keeping the hardware from overheating, especially in hot places.

The American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) specifies that data centers should maintain the conditions listed in the table below:

Recommended air temperature	Recommended humidity
18-27°C (64.4-80.6°F)	40-60%

Managing a data center’s air appropriately has a number of benefits, such as:

Reduces equipment failures because hardware is running within optimal parameters.
Increases availability due to fewer failures.
More effective cooling means that you can increase power density, which in turn means that you can cram more compute into your data center.
Managing air appropriately allows your data center to run at maximum efficiency, reducing overall costs.

Data centers are designed specifically to ensure good air management. The image below shows a typical aisle in a server room. If you take a look at the bottom of the figure, you will see that there’s a raised floor with blue arrows traveling horizontally underneath. Cold air gets pushed out of the cooling system through this subfloor, as indicated by the blue arrows. Above this is a perforated floor, through which the cold air gets pushed out.

Above the subfloor, we have two rows of four server racks. The row in the foreground has its intakes on the left, with the blue arrows coming up from the floor and into the racks to indicate the cool air coming in. The other row of server racks is partially hidden, but to the right of the diagram you can see blue arrows of cool air coming up through the floor and into the intake of the racks, which is at the back. This cool air lowers the temperature of the racks, but the air itself gets heated up in the process. It’s then pushed out the other side of the servers as hot air, which is indicated by the red arrows coming up in between the two rows of servers.

You can see that the center aisle where this hot air is pushed is sealed with glass and a ceiling that separates it from the rest of the data center. The purpose of this is to separate the hot air blasting out of the servers from the cold air coming in to the servers. We don’t want the hot air to be able to recirculate back down into the intakes, because this would hamper the efficiency of the cooling. The hot air is then taken out through this separate ceiling. This process is known as hot aisle containment because the hot aisle (the area where the hot air is pushed out from the servers) is enclosed from the rest of the server room. In this diagram, the hot air gets drawn out through the ceiling, while the rest of the server room is filled with cool air.

The image below compares hot aisle containment, as well as another type, known as cold aisle containment. In the latter, the cold air goes up through the floor into the enclosed cold aisles, where the intakes for the servers are. It comes out the other side, into the server room, as hot air.

Hot aisle containment and cold aisle containment - Destination Certification

Another important concept for data center air management is positive pressurization. This involves pumping the data center with air to keep it slightly above ambient air pressure. This positive pressurization means that if there are cracks in the walls, air flows out rather than in. Likewise, air flows out of the data center whenever someone opens a door. The big advantage of positive pressurization is that it helps to keep the air clean—pushing air out means that the data center isn’t sucking any air in. We really don’t want much external air coming in, because it brings dirt, dust and other debris, which can clog the hardware.

Multi-vendor pathway connectivity

The CCSP exam uses the term multi-vendor pathway connectivity to refer to the concept of having multiple internet service providers (ISPs) for redundancy. The Internet as a whole is incredibly resilient, but ISPs can go down for a variety of reasons, such as technical faults or natural disasters. Having multiple ISPs can help to give your organization more redundancy if one provider goes down.

Design resilient

It’s important for organizations to design their data centers with resiliency in mind.

The NFPA and fire risks

Fire is a major risk to data centers. A lot of electricity pulses through millions of dollars of hardware, and things can and do go wrong. This means that we need our data centers to have measures in place that prevent, detect and correct fires.

The National Fire Protection Association (NFPA) publishes standards that help data centers and other telecommunications organizations address their fire risk. These include NFPA 75, Standard for the Fire Protection of Information Technology Equipment, and NFPA 76, Standard for the Fire Protection of Telecommunication Facilities.

The purpose of NFPA 75 is “…to set forth the minimum requirements for the protection of ITE [information technology equipment] equipment and ITE areas from damage by fire or its associated effects—namely, smoke, corrosion, heat, and water.” NFPA 76 establishes “…a minimum level of fire protection in telecommunications facilities, provide[s] a minimum level of life safety for the occupants, and protect[s] the telecommunication equipment and service continuity.”

Fire detection

There are three major ways that we can detect fires: flame detectors, smoke detectors and heat detectors. Flame detectors are useful in situations where you anticipate almost instantaneous ignition with a limited smoldering stage at the beginning.

The two most common types of smoke detector are ionization detectors and photoelectric detectors. Each of these are suitable when you expect a fire to smolder in the early stages. Heat detectors detect thermal energy. They are useful in small spaces where a rapid change in temperature can be expected from a quickly growing fire.

Fire suppression

Fires require three things, fuel, oxygen and heat. Once a fire begins, it starts a chain reaction producing more heat, which can continue until the fuel, oxygen or heat are suppressed or consumed.

One of the most common ways to extinguish fires is to use water, which can absorb the heat of the fire and extinguish it. However, data centers are pumping with huge amounts of electricity and lots of expensive hardware, neither of which play well with water. Water conducts electricity, and if equipment gets wet, it generally corrodes and breaks.

Instead, the preferred method is to suppress a fire with a non-combustible gas. Nitrogen, carbon-dioxide and argon are all useful gases. Common brands of fire suppressing gases include INERGEN, Argonite, FM-200 and Aero-K. These aren’t seen as overly toxic to humans, but if Argonite becomes concentrated in a room, employees can suffocate.

When sprinkler systems are in place, there are four common types:

Wet – Wet pipes are filled with water at all times and they are triggered by heat causing either a fusible link or a glass bulb to break.
Dry – Dry pipes aren’t filled with water all the time. Dry pipe sprinklers can be installed in rooms that get below freezing, however, the source of the water must be kept above freezing.
Pre-action – Pre-action systems generally involve multiple triggers.
Deluge – Deluge systems are like pre-action systems in that they can rely on multiple triggers. The difference is that when pre-action systems are triggered, only the individual sprinklers that have been triggered release water. In contrast, deluge systems release water through all sprinklers once a single sprinkler has been triggered.

The IDCA

The International Data Center Authority (IDCA) is an organization that aims to help the IT industry by developing an open framework for data centers, infrastructure, facilities, IT, IoT, cloud and big data. Its standard is the Infinity Paradigm AE360, which provides a comprehensive approach for streamlining technology strategies, plans, implementations and operations alongside business strategy.

Uptime Institute tier standards

The Uptime Institute is an industry body that’s responsible for developing a global standard in data center performance and availability. The standard is separated into four tiers, each of which specify the requirements and topologies for data centers that operate at different levels.

Note that N is the amount of power, network and cooling supply required to run the data center at maximum load, so 2N means that a Tier III data center has double the capacity.

Header	Tier I – Basic Capacity	Tier II – Redundant Capacity Components	Tier III – Concurrently Maintainable	Tier IV – Fault Tolerant
Description	Site-wide shutdowns for maintenance are still required. Capacity failures may impact the site. Distribution failures will impact the site.	Site-wide shutdowns for maintenance are still required. Capacity failures may impact the site. Distribution failures will impact the site.	Each and every capacity component and distribution path in a site can be removed on a planned basis for maintenance or replacement without impacting operations. The site is still exposed to equipment failure or operator error.	An individual equipment failure or distribution path interruption will not impact operations. A fault tolerant site is also concurrently maintainable.
Uptime	99.671%	99.749%	99.982%	99.995%
Downtime/year	28.8 hours	22 hours	1.6 hours	26.3 minutes
Distribution paths	1	1	1 active, 1 alternate	2 active
Concurrently maintainable	No	No	Yes	Yes
Fault tolerant	No	No	No	Yes

Cloud providers can use the tier levels to help them build and maintain data centers that operate at the desired level. Cloud customers can look for data centers with a tier rating that matches their requirements.

The diagrams below show the basics for how a data center can be configured for the appropriate level of redundancy in each tier.

Note that a PDU is a power distribution unit, and a UPS is an uninterruptible power supply.

Basic site infrastructure - Destination Certification

Redundant site infrastructure - Destination Certification

Concurrently maintainable site infrastructure - Destination Certification

Fault tolerant site infrastructure - Destination Certification

3.3 Analyze risks associated with cloud infrastructure and platforms

We discuss the risks associated with cloud infrastructure and platforms in Domain 6.4, which is about the implications of cloud to enterprise risk management. This means that we won’t be discussing Risk assessments, Cloud vulnerabilities, threats and attacks, or Risk mitigation strategies until later.

3.4 Plan and implementation of security controls

In order to form a robust security posture, an organization must begin by assessing its risks, and then forming a cohesive security policy based on those assessments. Beneath the overarching security policy, it will have many more specific policies for different aspects of the organization’s security, as well as standards, guidelines, baselines and procedures.

At a lower level, we have the individual security controls, such as encryption, role-based access-control and security awareness training. However, if we don’t have carefully planned security policy based on actual risks, it’s too challenging to come up with a way for each of these individual controls to work together in a way that limits the risks of security incidents.

One of the most important security concepts is defense in depth, which the National Institute of Standards and Technology (NIST) defines in SP 800-53 as:

“An information security strategy that integrates people, technology, and operations capabilities to establish variable barriers across multiple layers and missions of the organization.”

Defense in depth involves controls that fall into a variety of categories. These categories are:

Administrative
Logical or technical
Physical

We discuss these categories in more depth in Domain 6.4.

Physical and environmental protection

We discussed physical security measures in the Comprehend cloud infrastructure and platform components section, under the subheading Security of the physical environment. For environmental protections, we briefly mentioned some concerns in the Design a secure data center section under the Location subheading. It’s important to choose the location of a data center with environmental considerations in mind. This includes making judgements based on the risks of earthquakes, hurricanes, fires and other calamities.

System, storage and communication protection

We discuss system security in Domain 5.2. Storage protections were discussed in Domain 3.1. We discuss many network and communication security considerations in Domain 5.2.

Identification, authentication and authorization in cloud environments

We will discuss identification, authentication and authorization in cloud environments as part of Domain 4.7.

Audit mechanisms

We will be discussing audit mechanisms in two different sections. We will discuss how we use auditing and log collection as part of the accounting stage of IAM in Domain 4.7. We will also discuss auditing, correlation and packet capture in Domain 5.6 in the context of managing security operations.

3.5 Plan business continuity (BC) and disaster recovery (DR)

Business continuity (BC) and disaster recovery (DR) plans are critical for ensuring an organization’s resiliency. Things can and will go drastically wrong, and we need to plan for our biggest risks ahead of time.

Business continuity (BC)/disaster recovery (DR) strategy

Types of intellectual property

A disaster is a sudden, unplanned event that brings about great damage or loss. In a business environment, it is any event that creates an inability on an organization’s part to support critical business functions for some predetermined period of time.

Business Continuity Management (BCM)
The business function and processes that provide the structure, policies, procedures, and systems to enable the creation and maintenance of BCP and DRP plans.

Business Continuity Planning (BCP)	Disaster Recovery Planning (DRP)
Focuses on survival of the business and the capability for an effective response. It is strategic.	Focuses on the recovery of vital technology infrastructure and systems. It is tactical.

A BCM creates the structure necessary for BCP and DRP. BCP is primarily concerned with the components of the business that are truly critical and essential, while DRP is primarily concerned with the technological components that support critical and essential business functions. BCP focuses on the processes, while DRP focuses on the systems.

Security personnel should be involved in the BCP process from the earliest stages, from defining the scope of the BCP onward. The key BCP/DRP steps are:

1. Develop a contingency planning policy	This is a formal policy that provides the authority and guidance necessary to develop an effective contingency plan.
2. Conduct a business impact analysis (BIA)	Conduct the business impact analysis, which helps identify and prioritize the information systems and the components that are critical to supporting the organization’s mission and business processes.
3. Identify controls	These are the preventative measures taken to reduce the effects of system disruptions. They can increase system availability and reduce contingency life-cycle costs.
4. Create contingency strategies	Thorough recovery strategies ensure that the system may be recovered quickly and effectively following a disruption.
5. Develop contingency plans	Develop an information system contingency plan.
6. Ensure testing, training, and exercises	Thoroughly plan testing, training, and exercises. Testing validates recovery capabilities, whereas training prepares recovery personnel for plan activation, and exercising the plan identifies gaps.
7. Maintenance	Ensure that plan maintenance takes place. The plan should be a living document that is updated regularly to remain current with system enhancements and organizational changes.

Business requirements

RPO, RTO, WRT, and MTD

When dealing with BCP and DRP procedures, there are four key measurements of time to be aware of. These are:

Maximum tolerable downtime (MTD) – Maximum tolerable downtime (MTD) (also known as maximum allowable downtime (MAD)) refers to the maximum amount of time that an organization’s critical processes can be impacted.
Recovery time objective (RTO) – Recovery time objective (RTO) refers to the amount of time expected to restore services or operations to a defined service level.
Recovery point objective (RPO) – Recovery point objective (RPO) refers to the maximum amount of data that can be lost in terms of time.
Work recovery time (WRT) – The work recovery time (WRT) is the time needed to verify the integrity of systems and data as they’re being brought back online.

The diagram below helps to show how these measurements of time all fit together. The horizontal axis is time, starting on the left with business as usual. As we progress to the right, a disaster occurs. The first measurement that we see is the RPO, the maximum amount of data loss as a measurement of time. After the disaster has occurred, the next measurement of time is the RTO, the maximum amount of time to restore processes and systems to a defined service level. WRT is the time required to validate systems as they are brought back online and return to business as usual. Finally, the MTD is the maximum amount of time that processes and systems can be down before the business may be forced to cease operations. MTD is the most important measurement of time to consider when making the decision to declare a disaster.

Image of MDT, RPO, RTO and WRT diagram - Destination Certification

Creation, implementation and testing of plan

Business impact analysis (BIA)

A business impact analysis (BIA) is one of the most important steps in business continuity planning. Its purpose is to assess the potential consequences that a disaster or a disruption would have on business processes. A BIA should then gather information to develop recovery strategies for each critical function and process. The output of a BIA includes key measurements of time: RPO, RTO, WRT, and MTD.

The BIA Process

Identifying and assigning values to an organization’s most critical and essential functions and assets is the first step in determining what processes to prioritize in an organization’s recovery efforts. Employees from various company departments should be involved in the process so that they can give their insights surrounding critical systems and services.

Once asset values have been determined, and priorities have been established, an organization can set up processes to protect the most important assets. The steps of the BIA process are:

Determine the business processes and recovery criticality.
Identify resource requirements.
Identify recovery priorities for system resources.

Disaster response process

The incident response process should be followed prior to a disaster being declared. Once an incident is identified, an assessment of its severity must be made. During the assessment of an incident, one specific variable should be carefully considered—MTD. If it’s clear that the MTD will be exceeded, a disaster should be declared and the disaster recovery plan immediately initiated.

Dependency Charts

When recovering from a disaster, dependency charts, like the one shown below, can map out exactly which components are required and even their initiation order.

example of dependency chart - Destination Certification

Options for cloud recovery

The cloud can be used for data recovery in multiple ways:

The primary copy can be stored on premises, while the backup is in the cloud.
The primary copy can be stored in the cloud, with the backup stored in the same cloud.
The primary copy can be stored in the cloud, with the backup stored in a separate provider’s cloud.

Failover architecture

Most organizations aim for more than just keeping their data safe—they also want to keep their services online. To achieve this, they need failover architecture that can take over automatically. When it is set up correctly, the service can quickly switch over to the backup architecture if the primary one goes down, as shown in the following diagrams.

Failover architecture after the primary architecture has failed - Destination Certification

Chaos engineering

Through tools like Chaos Mesh, you can bring fault simulation to Kubernetes in a way that allows you to test what will happen in a range of strange situations. You can use these simulations of various fault scenarios to help you design your architecture to be more robust. If something fails in the simulation, you can then plan for that eventuality.

Test disaster recovery plans (DRP)

BCP and DRP Testing

After recovery plans have been created, it’s important to test them. Tests can range from simple to complex, with each type having its own value. Some of the most common tests are:

Type	Description	Affects backup/parallel systems	Affects production systems
Read-through/ checklist	Involves reviewing the DR plan against a standard checklist for missing components and completeness.	Cell	Cell
Walk-through	Relevant stakeholders walk through the plan and provide their input based on their expertise.	Cell	Cell
Simulation	Involves following a plan based on a simulated disaster scenario. It stops short of affecting systems or data.	Cell	Cell
Parallel	Involves testing the DR plan on parallel systems.	✓	Cell
Full-interruption/ full-scale	Involves production systems, which makes these tests the most valuable, but also the most risky.	✓	✓

Goals of business continuity management (BCM)

The three primary goals of business continuity management (BCM) are simple:

Safety of people.
Minimization of damage.
Survival of business.

CCSP Domain 3 key takeaways

Note: Some concepts in Domain 3 are explored in greater detail in other CCSP domains. These topics are referenced in their relevant sections above but are not included in the Key Takeaways for Domain 3. For comprehensive coverage, refer to the indicated domains where these concepts are primary focuses.

3.1 Comprehend cloud infrastructure and platform components

Physical environment

Physical environments are the data centers, server rooms and other locations that host hardware.

Security of the physical environment

Defense in depth is an approach to defense where there are multiple layers of security controls.
Common physical security controls include:

Guards
CCTV
Motion detectors
Lighting
Fences
Doors locks
Mantraps
Turnstiles

Networking and communications

Service networks are customer facing. Virtual machines use the service network to send data across the network.
Storage networks connect virtual machines to storage clusters.
Cloud providers use management networks to administer their clouds.
In non-converged networks, the service, storage and management networks are separate.
A converged network combines the storage and service network, while the management network remains separate.

Zero trust architecture

Zero trust architecture involves continually evaluating the trust of entities on the network.
The importance of shrinking trust zones, microsegmentation, granular access controls and re-authenticating users when appropriate.

Virtual local area networks (VLANs)

VLANs are more flexible than LANs.
Logically isolating networks through VLANs can enhance security at a relatively low cost.

Software-defined networks (SDNs)

SDNs allow us to create virtual, software-controlled networks on top of physical networks.
SDNs can decouple the control plane from the data plane.
SDNs can provide more flexibility and allow for rapid network reconfiguration.
SDNs are critical components of cloud computing’s resource-pooling characteristic.

The security advantages of software-defined networks (SDNs)

SDNs make microsegmentation easy and cheap.
They allow granular and dynamic firewall rules.
SDNs also allow extensive compartmentalization.

The security challenges of cloud networking

Virtual appliances can form bottlenecks on a network.
Monitoring can be more difficult on cloud networks.

Additional network security considerations in the cloud

Providers are responsible for enforcing isolation and segregation in addition to many other things.
Responsibilities for customers vary depending on the cloud model, but in IaaS they are responsible for deploying virtual firewalls.

Compute

Compute nodes provide the resources to VMs.
Cloud providers are responsible for patching and configuring hypervisors, enforcing logical isolation, and more.
Customer responsibilities vary according to the service model.

Virtualization

Virtualization involves a layer of abstraction on top of the hardware.
In the cloud, we commonly use virtualization to make our underlying compute, storage and network resources more flexible.

Virtual machines (VMs)

A virtual machine is also known as a guest or an instance, and it runs on top of the hypervisor, which tricks the VM’s operating system.
The underlying physical computer is known as a host.

Hypervisors

Type 1 hypervisors run directly on top of hardware.
Type 2 hypervisors have an operating system sitting in between the hardware and the hypervisor.

Containers

Containers are highly portable code execution environments that can be very efficient to run.
Containers are lighter and more flexible than VMs.
Containers are deployed by taking a container image from the registry and running it with a container engine like Docker.

Microservices

Microservices are small self-contained units with their own interfaces.
Many apps are broken down into loosely coupled microservices.

Serverless computing

Serverless is a computing model that can include function-as-a-service products like Amazon Lambda, as well as serverless relational databases like Amazon Aurora.
Under the serverless model, customers generally pay for services that have been triggered by use.

Function as a service (FaaS)

FaaS services like Amazon Lambda are used to run specific code functions, with the customer only paying when a function is triggered by use.

Storage

Long-term storage – Cheap and slow.
Ephemeral storage – Temporary storage that only lasts until the virtual environment is shut down.
Raw storage – High performance storage that allows your virtual machine to directly access the storage device via a mapping file as a proxy.
Object storage – Data is stored as objects, which are basically just collections of bits with an identifier and metadata.
Volume storage – Volume storage is like a virtualized version of a physical hard drive.

Storage clusters

Storage clusters are groups of hard drives connected together.
The two major architectures are tightly coupled and loosely coupled.

Orchestration

Orchestration provides the configuration and coordination management that are necessary to keep the many cloud components running smoothly.

Management plane

The management plane is the system that controls all other systems in the cloud.
It has a huge amount of access and control, so it must be tightly secured.

3.2 Design a secure data center

Logical design

Tenant partitioning can be done logically by using software to prevent customers from being able to access each other’s systems.
Access controls play a critical role in ensuring that only authorized entities are granted access to resources.

Physical design

Buy or build?

Buying a data center is quick and easy, but it involves high CapEx and reasonably low OpEx. However, it won’t run as efficiently as a data center designed specifically for your organization’s needs.
Leasing is a low CapEx, high OpEx option, but again, it won’t be overly efficient.
Building a data center is time consuming and involves a lot of CapEx, but it ultimately results in a purpose-built data center that is efficient.

Utilities

Ping is your network connection. Data centers need high-speed network connections with adequate redundancy.
Power is your electricity. Data centers will ideally be located in a place with affordable electricity, with sufficient backup power for emergencies.
Pipe is your HVAC. Data centers need to have sufficient air conditioning, ventilation, dehumidifiers, heating, etc. They also need redundancies in place.

Environmental design

Keeping your servers at the right temperature and humidity is critical for smooth operation and minimizing failures.

Multi-vendor pathway connectivity

Multi-vendor pathway connectivity is important for redundancy.

Design resilient

The NFPA and fire risks

The NFPA publishes standards that help data centers address their fire risk.

Fire detection

Flame detectors detect flames, with UV light detectors optically detecting UV radiation from flames, while IR flame detectors detect infrared radiation.
Flame detectors are suitable for situations where you expect almost instantaneous ignition.
The most common smoke detectors are ionization detectors and photoelectric detectors.
Both are suitable for when you expect the fire to smolder for a while in the early stages.
Heat detectors detect thermal energy.

Fire suppression

Fires require fuel, oxygen and heat. They can be extinguished by eliminating one component.
Non-combustible gases are ideal for extinguishing fires in data centers because they don’t involve water, which will ruin the equipment.
Sprinkler systems can be:

Wet
Dry
Pre-action
Deluge

The IDCA

The IDCA’s Infinity Paradigm AE360 is an open framework for data centers and similar facilities.

Uptime Institute tier standards

The Uptime Institute’s tiers range from I: Basic Capacity to IV: Fault Tolerant.

3.3 Analyze risks associated with cloud infrastructure and platforms

Please refer to section 3.3 above for the corresponding domain where this topic is discussed in greater detail.

3.4 Plan and implementation of security controls

Please refer to section 3.4 above for the corresponding domain where this topic is discussed in greater detail.

3.5 Plan business continuity (BC) and disaster recovery (DR)

Business continuity (BC)/disaster recovery (DR) strategy

A disaster is an event that interrupts normal business operations.
BCM is the process and function by which an organization is responsible for creating, maintaining, and testing BCP and DRP plans.
BCP focuses on survival of the business processes when something unexpected impacts it.
DRP focuses on the recovery of vital technology infrastructure and systems.

Business requirements

RPO, RTO, WRT, and MTD/MAD are all measurements of time.

RPO – The maximum tolerable amount of data loss measured in time.
RTO – The maximum tolerable time to recover systems to a defined service level
WRT – The maximum tolerable time to verify system and data integrity as part of resumption of normal operations.
MTD/MAD – The maximum time that a critical system, function, or process can be disrupted before it leads to unacceptable or irrecoverable consequences for a business.

Creation, implementation and testing of plan

Business impact analysis (BIA) process identifies:

The most critical business functions, processes, and systems.
The potential impacts of an interruption that results from a disaster.
The key measurements of time (RPO, RTO, WRT, and MTD) for each critical function, process, and system.

Disaster response process

Disaster response should include all relevant personnel and resources so that the organization can quickly respond to the situation and restore normal operations.
Disaster response team personnel should include stakeholders from throughout the organization.

Restoration order

The BIA determines restoration order when recovering systems—the most important and critical systems should be recovered first.
Dependency charts and mapping can help inform system restoration order.

Failover architecture

Failover architecture allows you to switch over to a backup system if your primary architecture goes down.

Chaos engineering

Chaos engineering allows you to simulate a range of fault scenarios.
You can use the information you learn from these scenarios to make your systems more robust.

BCP and DRP testing

DRP testing is a critical component of plan creation and development.
DRP tests include: read-through/checklist, walk-through, simulation, parallel, full-interruption/full-scale.
A full-interruption test should only be performed after management approval has been obtained.

Goals of business continuity management (BCM)

BCP and DRP = Business Continuity Management (BCM).
BCM includes three primary goals: safety of people, minimization of damage, survival of the business.
The number one goal of BCM is safety of people.

Preparing for CCSP Domain 3 exam

So, you've built your cloud fortress in theory—now it's time to prove you can do it in practice. The CCSP Domain 3 exam is your chance to showcase your cloud platform and infrastructure security expertise. But don't worry, we've got your back. We'll walk you through what to expect on the exam and arm you with the best resources to ace it.

CCSP Domain 3 exam expectations: What you need to know

Important reminder: The topics below are critical for CCSP Domain 3, but they're just the tip of the iceberg. Cloud platforms and infrastructure are the bedrock of cloud security—every concept matters. While we've highlighted likely exam topics, be prepared for questions that test your holistic understanding.

Remember, in real-world scenarios, a gap in your knowledge could compromise your entire cloud security strategy. Dive deep, connect the dots, and build a comprehensive understanding of how these components work together to create robust cloud security architecture.

3.1 Comprehend cloud infrastructure and platform components

The common security controls for physical environments
The difference between service, storage and management networks.
The difference between converged and non-converged networks.
The fundamental concepts of zero trust architectures.
The seven tenets of ZTA.
How VLANs can be used to form trust zones.
The limitations of VLANs.
The key benefits of VLANs.
The difference between the control plane and the data plane.
The benefits that come separating and centralizing the control plane.
What is microsegmentation?
The benefits of microsegmentation.
The responsibilities of cloud providers and cloud customers.
The cloud provider and cloud customer security responsibilities.
What is a virtual machine?
The difference between type 1 and type 2 hypervisors.
Which type of hypervisor is more efficient and secure?
The important security measures for hypervisors.
The core components that make up containers and how they work.
Important security measures for containers.
What is serverless computing?
The differences between the various types of storage.
The threats to storage types.
The different protocols used to connect storage.
The difference between tightly coupled and loosely coupled storage.
The management plane capabilities.
What is orchestration and scheduling?

3.2 Design a secure data center

The important data center design considerations.
What is tenant partitioning?
The pros and cons of buying, leasing and building.
The various location considerations.
The difference between internal and external redundancies.
What are the BICSI data center standards used for?
The recommended air temperature and humidity for data centers.
The difference between hot aisle containment and cold aisle containment.
What is multi-vendor pathway connectivity and why do organizations need it?
What are the NFPA standards?
Know the different types of fire detectors and their pros and cons.
The gases used for fire suppression.
The different types of sprinkler systems as well as their pros and cons.
What is the IDCA Infinity Paradigm AE360 standard?
The Uptime Institute’s different tier levels.

3.3 Analyze risks associated with cloud infrastructure and platforms

We discuss these topics in the other domains.

3.4 Plan and implementation of security controls

We discuss these topics in the other domains.

3.5 Plan business continuity (BC) and disaster recovery (DR)

BCP vs. DRP.
The steps in the BCP/DRP process.
Definitions of RPO, RTO, WRT, and MTD.
Cost implications of RPO and RTO.
MTD = RTO + WRT.
How to reduce the cost of BCP and DRP plans.
The purpose of the BIA process.
The steps of the BIA process.
Understand the role maximum tolerable downtime (MTD) plays in the declaration of a disaster.
Understand what constitutes a disaster.
Understand how a dependency chart informs the restoration order of systems.
Know the order of DRP testing and which test is least and most impactful.
Understand the three goals of business continuity management (BCM).

Resource recommendations

Mastering CCSP Domain 3 requires a multifaceted approach. Like building a robust cloud infrastructure, your study strategy should have multiple layers of defense against knowledge gaps.

Let's explore some key resources that can fortify your understanding of cloud platform and infrastructure security:

Destination Certification CCSP MasterClass: This comprehensive program offers a structured approach to mastering Domain 3. Its adaptive learning system and expert-led Q&A sessions are particularly valuable for navigating the complex infrastructure and platform security concepts, ensuring you build a solid foundation in cloud security architecture.
Destination CCSP: The Comprehensive Guide: Our guide excels at breaking down the intricate technical aspects of cloud platforms and infrastructure. Its innovative diagrams and real-world examples are especially helpful for visualizing complex Domain 3 concepts like secure data center design and cloud architecture.
Destination Certification CCSP App: For studying on the go, this app is invaluable. Its flashcards and practice questions are particularly useful for reinforcing your knowledge of Domain 3's technical terminology and key principles of cloud infrastructure security.

FAQs

How much of the overall CCSP exam does Domain 3 represent?

Domain 2 (Cloud Data Security) accoaDomain 3 (Cloud Platform and Infrastructure Security) makes up 17% of the CCSP exam.unts for 20% of the CCSP exam questions.

Are there any prerequisites I should cover before tackling Domain 3 of the CCSP exam?

While there are no formal prerequisites for Domain 3, a solid understanding of general IT and network security principles will be beneficial.

Is hands-on experience with cloud platforms necessary to pass the Domain 3 section of the CCSP exam?

Although hands-on experience is not strictly required, practical knowledge of cloud platforms can significantly help in understanding and applying the concepts tested in Domain 3.

What are the components of cloud infrastructure?

Cloud infrastructure typically includes compute resources (virtual machines, containers, serverless), storage services, networking (VPCs, subnets, gateways), identity and access management, monitoring and management tools. On top of this, platform services like databases, message queues and container orchestration form the cloud platform layer. Security must be integrated into each component through configuration, access control and continuous monitoring.

What is a cloud infrastructure platform?

A cloud infrastructure platform provides the foundational services needed to build and run applications in the cloud. It combines compute, storage and networking with higher-level services like managed databases, Kubernetes, serverless functions and security tools. Platforms such as AWS, Azure or Google Cloud let organizations focus more on application logic while relying on the provider for scalable, on-demand infrastructure.

What is data center security?

Data center security protects the physical and logical environment where servers, storage and networking equipment reside. It includes physical controls (guards, badges, cameras, locked racks), environmental controls (power, cooling, fire suppression) and logical controls (segmentation, access control, monitoring). In cloud, the provider handles most data center security, while customers focus on securing workloads and configurations.

Elevate Your Cloud Security Expertise with Destination Certification

Domain 3 is the cornerstone of cloud security, demanding a robust understanding of cloud platforms and infrastructure. It challenges you to think like a cloud architect while maintaining the mindset of a security professional—a delicate balance that Destination Certification helps you master.

Our CCSP MasterClass is engineered to demystify the complexities of Domain 3 and beyond. We don't just provide information; we help you build a comprehensive mental model of cloud security. Our adaptive learning system identifies and strengthens your weak points, ensuring you're well-prepared for every aspect of cloud platform and infrastructure security.

With real-world scenarios and expert-led discussions, we equip you with practical knowledge applicable to your career. Our weekly live Q&A sessions give you direct access to CCSP experts, allowing you to dive deep into the nuances of secure cloud architecture and design.

Ready to transform your understanding of cloud platform and infrastructure security? Join our CCSP MasterClass today and experience the Destination Certification advantage. We're not just preparing you for an exam—we're equipping you to become a leader in cloud security.

Rob is the driving force behind the success of the Destination Certification CISSP program, leveraging over 15 years of security, privacy, and cloud assurance expertise. As a seasoned leader, he has guided numerous companies through high-profile security breaches and managed the development of multi-year security strategies. With a passion for education, Rob has delivered hundreds of globally acclaimed CCSP, CISSP, and ISACA classes, combining entertaining delivery with profound insights for exam success. You can reach out to Rob on LinkedIn.