After reviewing the managed data identifiers provided by Macie and creating the custom data identifiers needed for your POC, it’s time to stage data sets that will help demonstrate the capabilities of these identifiers and better understand how Macie identifies sensitive data. If your testing requires that Macie identify additional sensitive data types that are offered as managed data identifiers but aren’t part of the recommended list, choose the Custom option and select the managed data identifiers that you need. Instead, you can create a few to help confirm that you can use custom data identifiers for sensitive data detection and that Macie can support your data discovery goals. Macie offers a default collection of recommend managed data identifiers to use for detecting general categories and types of sensitive data while optimizing data discovery results and reducing noise. Amazon Macie is a data security service that discovers sensitive data using machine learning and pattern matching, provides visibility into data security risks, and enables automated protection against those risks. Macie offers an automated data discovery feature that can continually discover sensitive data within your S3 buckets. These data discovery results can assist with analysis of records over time or to get a broader sense of what data Macie has scanned and which objects had sensitive data and which did not. Now that you understand how to use Macie to discover sensitive data, the next step in the POC is to enable automated discovery and use Macie to discover sensitive data across a larger collection of your existing S3 data. The core capabilities of Macie are focused on the security of your S3 buckets and helping to identify sensitive data including financial data, personal data, and credentials as well as sensitive data that’s unique to your organization, such as intellectual property. The heatmap provides point-in-time insights into the data that Macie has scanned, and in which buckets sensitive data has been identified or no sensitive data has been found. You can view the findings through the Findings tab in Macie or by choosing the sensitive data type when looking at the summary detections for a bucket. Because the goal is to use Macie to identify sensitive information in your S3 buckets, including examples of your own data that contains sensitive information can be helpful to test the capabilities of Macie. The POC steps demonstrate how you can use Macie to detect and alert you to sensitive data discovered in your AWS environment and help you determine the value of using Macie to enhance your current data protection strategies. As you identify the sensitive data discovery jobs to run as part of your production use of Macie, keep in mind that these jobs are immutable. Macie selects samples of the objects within S3 buckets and inspects them for the presence of sensitive data daily, providing insight into where sensitive data might reside in your overall Amazon S3 data estate. Additionally, confirm that you don’t have findings for objects that you staged that were not supposed to have sensitive data so that you can confirm how Macie handles these types of objects. In addition to reviewing the bucket level statistics that are generated by automated discovery, you can view the individual findings that were generated for each S3 object that was identified as having sensitive data. Now that you’ve reviewed the managed data identifiers, defined custom data identifiers, and staged sample data, it’s time to run a sensitive data discovery job. Macie comes with over 150 managed data identifiers that are designed to identify sensitive data in your S3 objects. With the preceding POC, you should now have a more complete understanding of how Macie identifies sensitive data and how you can use the information that Macie provides about that data identified. Amazon Macie helps customers identify, discover, monitor, and protect sensitive data stored in Amazon S3. Sensitive data discovery jobs provide a way to target a specific S3 bucket or group of buckets to do a deep analysis of the objects in those buckets and identify if sensitive data is present in the objects and if so, the type of data. This can help you understand how Macie handles data that you believe doesn’t contain sensitive information. Each Macie finding contains not only the details on the types of sensitive data identified, but also the location within the file where the sensitive data is located so that you can confirm the identified data is sensitive. After a POC with Macie, you can set the scope of how you will use Macie in production by deciding which buckets don’t need to be evaluated and so can be excluded, such as buckets used for AWS logs and buckets deemed not in scope for sensitive data identification. With the managed data identifiers that Macie offers, you should stage data files that you believe don’t contain information that aligns to the managed data identifiers. These records are an audit of every object that Macie attempted to scan, including objects that didn’t contain sensitive data. Macie covers a wide number of use cases with its managed data identifiers, but some use cases need custom data identifiers for data types that aren’t included in the managed data identifiers. We recommend that you stage data sets that contain sensitive data as well as data sets that do not to gain a full understanding of how Macie detects and reports on each of these situations. If your requirements for identifying sensitive data include detecting sensitive data that isn’t part of the current list of managed data identifiers, then you can create custom data identifiers for those data types. This information includes the sensitivity score of the bucket, a summary of the types of sensitive data found in the bucket, which objects within the bucket have been sampled, statistics related to the data that has been scanned and data that is still to be scanned, and other information about the bucket. If a bucket is blue, that means only that automated data discovery hasn’t identified sensitive data up to the point in time of the last scan, not that there is no sensitive data in the bucket. By using automated data discovery, you can focus your resources on deeper investigations of the security of buckets identified to have sensitive data. Objects that Macie found with sensitive data will be presented as Findings in the Macie console. Investigating sensitive data with findings has detailed guidance around locating sensitive data from Macie findings, retrieving the sensitive data, and the schema for sensitive data locations. In this post, we show you how to define and run a proof of concept (POC) to validate using Macie and automated discovery to enhance your current data protection strategies. Many managed data identifiers require keywords to be in proximity of the data for Macie to be able to detect findings. Each object where Macie found sensitive data will be listed as a single finding. This post outlined how you can use a POC to better understand how Amazon Macie can help meet your data discovery and classification needs. Keywords are an important component for Macie to be able to detect sensitive data. When preparing data to stage, keep in mind the keyword requirements for many of the Macie managed data identifiers. After automated discovery starts producing results, you will start seeing data in the Automated Discovery section of the Macie summary page in the console. Macie creates an analysis record for each S3 object that’s in scope for a data discovery job or an automated discovery scan. Examples of Macie managed data identifiers include credit card numbers, AWS secret access keys, and national identification numbers. If you want to use a customer managed AWS KMS key to encrypt the S3 data at rest, follow the instructions in Allowing Macie to use a customer managed AWS KMS key to give Macie access to decrypt the data in the bucket. Note that, in the automated data discovery phase, it will take 24–48 hours for Macie to perform the first scan after the feature is enabled. The summary includes metrics for the total number of buckets eligible for discovery, counts for the number of buckets where sensitive data was or was not found, and how many of these buckets are public. The heatmap view provides information on each organizational member account and insight about sensitive data within each bucket in the account. This feature is intended to help customers who have large amounts of S3 buckets and data better understand where sensitive data might be stored without having to scan all their data. This POC is intended to help you gain an understanding of what Macie is capable of and how you can use it to achieve your data discovery goals. Customers use Amazon S3 for a variety of use cases and store various types of data in S3 buckets, including sensitive data. However, it’s important that customers evaluate and test the capabilities of Macie to verify that they can meet their specific data identification and protection goals. Red indicates that some type of sensitive data has been found in the bucket, while blue indicates no sensitive data has been identified. You can also refine the managed data identifiers that are required for detecting sensitive data. After the POC is complete, evaluate the results to determine how much using Macie can strengthen your organization’s data protection program. Choose each of the findings that was produced and review the details to confirm what sensitive data was identified and if the sensitive data was discovered as you expected. Prior to beginning your POC, review the list of managed data identifiers and determine which ones you feel will be necessary to use for your data discovery requirements. For example, customers might need to identify sensitive data that’s specific to their company, such as an employee ID or project number. Each square represents a bucket in that account and the color of the square indicates whether sensitive data was discovered in that bucket. As part of your POC, it’s recommended that you investigate buckets that are reported to contain sensitive data. Continuously monitoring these buckets for the presence of sensitive data is a vital part of a data protection strategy. After the job completes, it’s time to review what Macie found in the data. Stage data files that don’t contain sensitive information. These repositories are often comprised of publicly available data sets or were created to help with testing machine learning models or sensitive data detection. If there were multiple types of sensitive data found in the object, each type of sensitive data and a count will be included in the details. When you’re staging your data, reference the keywords that are supported for the managed data identifiers you are using to help ensure that the data can be identified in your POC tests. This will help ensure that the remediation steps for identified sensitive data are directed to the correct parties. To avoid incurring additional charges, disable Macie while you evaluate the value of the additional data protection provided. Stage data that contains information that’s representative of data that you would want to detect using custom data identifiers. Over time, this heatmap might change as automated data discovery continues sampling the data in each bucket. Understanding the keywords that are used as part of sensitive data detection is important when it comes to building test data for a POC. If you created custom data identifiers, review findings for the objects that included the custom data that you detect to confirm that the data was detected. There are various repositories staged with information that could be used for sensitive data detection. Stage data files of your own data with sensitive information. A successful POC of Macie includes understanding what data Macie can detect. Staged data must be in file formats that Macie supports. It’s important to first understand the available managed data identifiers and which ones align with the use cases you want to address. Make sure that the recommended managed data identifiers are part of the custom list that you construct. For your investigation, validate if the data identified is sensitive based on your organization’s data classification policy. Defining detection criteria for custom data identifiers provides details for the types of data that require keywords. Similar to managed data identifiers, custom data identifiers have keyword requirements. This example shows the row and column where the sensitive data was found. Most customers use automated data discovery to get sample scans instead of adjusting the sampling depth for individual jobs. If the findings are true positives, make sure that the bucket has the right level of security configurations and permissions based on the data stored in the bucket. You can create custom data identifiers to help meet your data detection needs if necessary. There’s an exponential growth of digital data and organizations are grappling with not only managing it but also determining where their sensitive data exists. Additionally, identify which managed data identifiers, which are applicable to your POC, fall outside of the default list of identifiers. There’s also a 30-day free trial for automated data discovery, which is covered later in this post. Data discovery results are written to an S3 bucket that you own and where you control the data retention. To determine which managed data identifiers have keyword requirements, see Managed data identifiers by type. A well thought-out and implemented POC can provide valuable early insights and help you develop a more thorough understanding of what your data discovery and classification strategy should be. The data discovery scan results are stored as JSON Lines files. Select the recommended managed data identifiers. Choose the custom data identifiers that you want to be used in the job. Building custom data identifiers has a thorough explanation of how to define a custom data identifier. Data security is a broad concept that revolves around protecting digital information from unauthorized access, corruption, theft, and other forms of malicious activity throughout its lifecycle. There is no free trial for running targeted data discovery jobs. Amazon Web Services (AWS) customers of various sizes across different industries are pursuing initiatives to better classify and protect the data they store in Amazon Simple Storage Service (Amazon S3).
This Cyber News was published on aws.amazon.com. Publication date: Tue, 01 Oct 2024 20:43:05 +0000