Amazon Security Lake: A centralized purpose-built data lake for security data.

Introduction

Security best practices require effective logging across various resources and security event data management processes to centralize and analyze data. Logs from firewalls, on-premises infrastructure, and cloud services such as Amazon VPC and Amazon CloudTrail are collected into Amazon S3 and AWS Lake Formation to simplify the management of the AWS data lake. But still, it isn’t easy to implement security domain-specific aspects such as data ownership, normalization, and enrichment. Amazon Security Lake can be used to analyze security data to get a complete insight into your security across the entire organization. With Amazon Security Lake, you can create a purpose-built customer data lake that automatically centralizes security data from on-premises, custom sources, and the cloud. It also helps to protect your applications, workloads, and data.

In this blog, we discuss Amazon Security Lake, a new service launched in November 2022 to centralize, manage and optimize large volumes of logs and event data to enable incident response, threat detection, and investigation to address and analyze security issues using preferred analytics tools.

Earn Multiple AWS Certifications for the Price of Two

AWS Authorized Instructor led Sessions
AWS Official Curriculum

Get Started Now

Challenges for Security Team

Customers want to prevent their entire organization from future security events by identifying potential threats and vulnerabilities, assessing security alerts, and responding accordingly by collecting logs and event data from different sources. To gather security insights from the data, the data needs to be aggregated and normalized into a consistent form. It is very time-consuming and costly, as customers use different security solutions for specific use cases with their own data stores and formats. There are mainly four challenges faced by the security team while analyzing the organization-wide security data:

1. Large Volumes of Security Data: The logs and data events are collected from various data sources from on-premises infrastructure, cloud, and custom sources; a huge amount of data is collected over a short span of time. To get effective security insights from aggregated data, sometimes it’s necessary to store data for a long duration, leading to storage in GBs or TBs.

2. Inconsistent and Incomplete Data: As the logs and event data are collected from different sources, different types of logs have different formats, making it difficult to query them. You must get log data to gain visibility. It is important to properly configure security solutions for your applications and workloads. Also, some security solutions store logs only for a specific period, like 30 days, but what if we need data for a longer period?

3. Lack of Data Ownership: Direct data ownership is another challenge. Customers ingest the security data to analytic solutions to get insights, because of which data is insulated from the security industry. Many innovations happening in the security industry needs data ownership.

4. More Data Wrangling, less analysis: It is necessary to track infrastructure changes, generate alarms, get performance, and normalize data regularly requires more manpower. Achieving this in a defined budget is complex and leads to data wrangling instead of accurate analysis.

Amazon Security Lake is the solution to automate the security data analysis in your entire organization.

Amazon Security Lake

Amazon Security Lake automatically centralizes security data into a purpose-built data lake in your account. It aggregates, normalizes, and manages security data across your entire organization into a security data lake, which further helps to analyze security data using preferred analytics solutions.

Features of Amazon Security Lake

1. Data aggregation: Amazon Security Lake creates a purpose-built security data lake in your account, collects logs and event data from various sources like cloud, on-premises, and custom sources, and stores it in Amazon S3 buckets so that you have control and ownership of your security data.

2. Data Normalization and Support for OCSF: Security Lake has adopted an open standard, the Open Cybersecurity Schema Framework (OCSF), to normalize and combine security data from various enterprise security data sources and AWS. You can aggregate and normalize data from Amazon VPC Flow Logs, AWS CloudTrail Management events, Amazon Route 53 Resolver query logs, and security findings from solutions integrated through the AWS security hub and from custom data and third-party security solutions into OCSF format. With support for OCSF, Security Lake makes security data available to your preferred analytics tool.

3. Multi-account and multi-Region support: Amazon Security Lake service can be enabled across multiple accounts and regions where the service is available. Security data across accounts can be aggregated per region or consolidated from multiple regions into roll-up regions for compliance requirements.

4. Data lifecycle management and optimization: The lifecycle of security data is managed by setting the retention period and storage costs with automated tiering using Amazon Security Lake. It also automatically partitions and converts security data to storage and query efficient Apache Parquet format.

Configure Amazon Security Lake for Security Data collection

Prerequisite to configure Amazon Security Lake

To start with Amazon Security Lake, first, delegate an AWS account with it from the management account of AWS Organization. The delegated account enables Amazon Security Lake, which aggregates security data across multiple accounts and regions. You can also enable Amazon Security Lake for a standalone AWS account.
To enable Amazon Security Lake to perform ETL (Extract, Transform, and Load) jobs on logs and event data from various sources, create a role named AmazonSecurityLakeMetaStoreManager, so you can be able to create a data lake or query data.

Once the AWS account is delegated to enable Amazon Security Lake and the role is created, you can configure Security Data Lake in your account to aggregate, normalize and manage security data from various data sources.

Step 1: Define the collection Objective: To enable Amazon Security Lake, select data sources, regions, and accounts and specify the role ARN created in the prerequisite.

Figure 1: Define Collective Objective

Step 2: Define Target Objective: In this step, you define the roll-up region and set storage classes, if required, so that data is ingested from the multiple areas and accounts in your organization.

Figure 2: Define Target Objective and Enable Amazon Security Lake

Step 3: In the Sources options, you can enable different data sources like CloudTrail, VPC Flow Logs, Route 53, and Security Hub Findings in all or a specific region.

Figure 3: Different Data Sources enabled across multiple regions.

Step 4: You can view the Regions in which buckets are created and can view the buckets for logs stored in Apache Parquet format.

Figure 4: Region-wise buckets and Logs in Apache Parquet format

Conclusion

Amazon Security Lake is a fully managed security data lake service that automatically centralizes, normalizes, manages and analyzes security data from various sources like AWS and third-party into a data lake stored in your AWS account. It is easy to enable and aggregate logs and event data from the cloud, on-premises, and custom sources in a few clicks.

Upskill Your Teams with Enterprise-Ready Tech Training Programs

Team-wide Customizable Programs
Measurable Business Outcomes

Learn More

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

WRITTEN BY Rashmi D

Rashmi Dhumal is working as a Subject Matter Expert in AWS Team at CloudThat, India. Being a passionate trainer, “technofreak and a quick learner”, is what aptly describes her. She has an immense experience of 20+ years as a technical trainer, an academician, mentor, and active involvement in curriculum development. She trained many professionals and student graduates pan India.