CCSP Domain 2 - Data Discovery, Asset Classification & Log Review & Analysis MindMap
Download a FREE Printable PDF of all the CCSP MindMaps!
Your information will remain 100% private. Unsubscribe with 1 click.
Transcript
Introduction
Hey, I’m Rob Witcher from Destination Certification, and I’m here to help you pass the CCSP exam. We are going to go through a review of the major topics related to data discovery, classification, and log review in Domain 2, to understand how they interrelate, and to guide your studies.

This is the fourth of five videos for Domain 2. I have included links to the other MindMap videos in the description below. These MindMaps are a small part of our complete CCSP MasterClass.
Data Discovery
Data discovery is the process of identifying, locating, and cataloging data across an organization’s various systems, storage locations, and applications.
Data discovery helps organizations understand what data they have, where it resides, and how it is being used, all of which are essential for data security, regulatory compliance, and efficient data management.
Data discovery is exactly what it sounds like: organizations discovering what data they have.
Types of data
Lets now define two major types of data:
Structured
Structured data is highly organized and formatted so that it can be easily stored, accessed, and processed. It typically resides in relational databases, spreadsheets, or other structured formats that adhere to a specific schema–rows and columns of data.
Unstructured
Unstructured data is the opposite. It lacks a predefined structure, making it more difficult to store and analyze using traditional database tools. Unstructured data often consists of text, pictures, videos, or other content that does not fit into neat rows and columns.
Metadata
Metadata is data about other data. Metadata helps describe the structure, origin, usage, and characteristics of a dataset, making it easier to manage and discover. Metadata is stuff like: when a file was created, last modified, and by whom.
Asset Classification
Moving on to the next major subject. Asset classification is fundamentally about ensuring that assets receive the appropriate level of protection.
What is an asset? Anything of value to the organization: people, buildings, equipment, software, data, and intellectual property are all assets, among many others.
In security, we often just speak of data classification. We should be talking about asset classification which encompasses data classification and clearly implies that we should be classifying all the assets of the organization and protecting them appropriately.
Asset Inventory
The first step in the asset classification process is creating and maintaining an asset inventory–a catalog, a listing, of all the assets from across the organization
Assign Ownership
For every single asset there should be a clearly defined owner. It is critical to determine who the asset owner is as the owner is accountable for the protection of an asset. The owner is best positioned to determine how valuable an asset is to the organization, and thus what classification the asset should be assigned.
Classify
As I already mentioned, and want to emphasize here, the reason we classify assets is so that we can identify how valuable they are to the organization and therefore the appropriate level of protection required.
Classification
Classification is a system of classes, ordered according to value. For example, public, proprietary and confidential could be the three classes that an organization defines with public being the least valuable, and confidential being the most. Different organizations will choose different classes based on whatever best suits their needs. So don’t memorize any particular classification scheme as they vary significantly from organization to organization.
Labeling
Security Labels are the means used to associate a set of security attributes with a specific information object as part of the data structure for that object. In other words, labels are meant to be read by the system to understand the classification of data and therefore the protection required.
Marking
Security marking is very similar. The means used to associate a set of security attributes with objects in a human-readable form. In other words, labels are meant to be read by people to understand the classification of data and therefore the protection required.
Categorization
The final major piece here is categorization which is the act of sorting assets into the defined classes. Categorization is a process of putting assets into different classes.
Sensitive Data
I’ll give you an extra couple of definitions here: Sensitive data refers to any information that must be protected due to its potential for misuse or harm if exposed. This covers personal information, financial details, or any other confidential information.
Personal Data
Personal data is a subset of sensitive data. Personal data is any information relating to a person–a data subject–which can include names, addresses, email addresses, or even IP addresses.
Log Review & Analysis
Moving on again to the final major subject of this MindMap: log review and analysis. It involves logging events from multiple systems, aggregating the data and analyzing it–logging and monitoring is an important part of security.
Where can we collect logging data from across the cloud? The answer is essentially everywhere. Almost every system can generate log event data:
- Virtual and physical network devices like firewalls, routers, and switches
- IDS and IPS - Intrusion detection and intrusion prevention systems
- Virtual machines, containers, serverless functions
- Operating systems
- Applications
- Anti-malware
- Etc.
We must be selective though. Many systems are capable of generating an avalanche of event data, so we need to configure systems to only log what is relevant.
We also need the capability to review all the logging event data that is being generated. Ideally in as close to real-time as possible. It’s not super ideal to review your logs and realize you’ve had a significant breach months after it has occurred.
There are also significant cost considerations for logging and monitoring in the cloud. Remember, you pay for what you use in the cloud. Every log event that is generated, stored, and replicated to the SIEM system incurs costs. Generating mountains of log event data and consuming significant compute time to analyze all the events can get expensive very fast. It’s food for thought.
Monitor for
What specifically are we monitoring for in the log event data?
Errors
Errors–if we see for example that our web server is generating many Error 404 – file not found messages, this is a clear indication that something is broken and needs to be fixed on the web server.
Modification
Modifications–but more specifically, unauthorized modifications. It is not uncommon for attackers to exploit a vulnerability to break into a system, and then patch that vulnerability behind themselves–after they have installed something like a backdoor. Therefore, looking for unauthorized patching of a system may be an indication of a breach.
Breaches
Of course from a security perspective, one of the main things we are monitoring for is if any of our systems have been breached: Are systems being used for crypto currency mining? Is data exfiltration occurring? Are about to have a really bad time with ransomware.
Logging & Monitoring Service (SIEM systems)
As I mentioned, one of the major challenges is the plethora of devices and systems that can generate log data across the cloud, and the volume of event data that they can produce. Finding significant events is very much like looking for a needle in a haystack. Accordingly, we need to use systems that automate many of the tasks and much of the analysis required for logging and monitoring.
These systems are commonly referred to as SIEMs–security information and event management systems.
I’ll now go through the major capabilities of any good SIEM system.
Aggregation
Aggregation is a SIEM system’s ability to aggregate or collect event data from across the environment in one central system
Normalization
Next the SIEM system will normalize the data. Clean up the event data from disparate devices so that all the data and the variables are comparable.
Correlation
Correlation is analyzing incoming logs from multiple systems for logical sequences, patterns and values to identify events that are invisible to individual systems.
Secure Storage
SIEM systems are often used to provide the long term secure storage of log event data. Individual devices may have limited storage capabilities so by replicating and aggregating the log event data in the SIEM system, it can provide long-term secure storage to meet regulatory requirements, company policies, etc.
Analysis
Analysis is the SIEM system’s ability to analyze all the event data that is pouring in to look for that proverbial needle in the haystack. The SIEM system will apply various analysis techniques such as event correlation, statistical models, rules, etc. to look for errors and anomalies.
Reporting
If something suspicious is detected, the SIEM system will raise an alert that an analyst can look at and act upon as necessary.
Continuous updates & Continuous optimization (tuning)
Continuous monitoring, which is sometimes referred to as continuous security monitoring (CSM) is the process where an organization identifies all of their systems, identifies the risks associated with each system, applies the appropriate controls to mitigate the risks, and then continuously monitors the controls to assess their effectiveness against the ever changing threat landscape.
Chain-of-custody
Chain-of-custody is the chronological documentation that records the sequence of custody, control, transfer, analysis, and disposition of evidence. Why are we talking about the chain-of-custody here in the section on logging and monitoring? Because our logs may contain important evidence for an investigation and it may be required to establish a chain of custody for log event data.
Non-repudiation
Finally, the last definition in this MindMap: Non-repudation is the concept that the sender of data or the signer of a document cannot deny having sent or signed it. Non-repudation basically means that someone can’t deny having sent or done something.
That’s the connection to logging and monitoring. Log event data could be used to show someone did something–so they can’t deny it later on.

That’s it for our overview of data discovery, classification, and log review in Domain 2, covering the most important concepts you need to know for the exam.

If you found this video helpful you can hit the thumbs up button and if you want to be notified when we release additional videos in this MindMap series, then please subscribe and hit the bell icon to get notifications.
I will provide links to the other MindMap videos in the description below.
Thanks very much for watching! And all the best in your studies!