OpenSearch Service Zero-ETL integration with Amazon S3

Overview

During AWS reInvent 2023, AWS released a preview of Amazon OpenSearch Service zero-ETL integration with Amazon S3, providing a new approach to query operational logs in Amazon S3 and S3-based data lakes without switching between services. You may now analyze infrequently queried data in cloud object stores while utilizing OpenSearch Service’s operational analytics and visualization capabilities.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Zero ETL

Zero-ETL is a collection of connectors that eliminates or reduces ETL data pipeline requirements. ETL combines, cleans, and normalizes data from many sources to prepare it for analytics, artificial intelligence (AI), and machine learning (ML) applications. Traditional ETL methods take time and are difficult to design, maintain, and scale. On the other hand, zero-ETL connectors allow for point-to-point data flow without the requirement for ETL data pipelines. Zero-ETL can also enable cross-data-silos querying without the requirement for data migration.

Different use cases for Zero-ETL

Federated querying

Using federated querying technologies, you can query multiple data sources without worrying about data movement. Using well-known SQL commands, you can run queries and join data from multiple sources, including operational databases, data warehouses, and data lakes. With In-Memory Data Grids (IMDG), you can benefit from instantaneous analysis and query response times by storing data in memory for caching and processing. The join results can then be kept for later use and analysis in a data store.

Streaming ingestion

Platforms for message queuing and data streaming provide real-time data streaming from multiple sources. You can almost instantly present data for analytics after ingesting it from several of these streams using a zero-ETL integration with a data warehouse. The streaming data does not need to be staged for transformation on any other storage service.

Instant replication

Traditionally, an intricate ETL solution was always needed to transfer data from a transactional database to a central data warehouse. These days, data can be instantaneously replicated from the transactional database to the data warehouse using zero-ETL as a data replication tool. The duplication mechanism may be integrated into the data warehouse, and change data capture (CDC) techniques may be used. Users are unaware of the duplication because analysts can easily query data from the warehouse, and applications store data in the transactional database.

OpenSearch Service Zero-ETL integration with Amazon S3

By allowing users to query their operational data directly, Amazon OpenSearch Service direct queries with Amazon S3 offer a zero-ETL integration that lowers the operational complexity of duplicating data or managing multiple analytics tools, saving money and time to action. OpenSearch Service will offer a configurable zero-ETL integration. From there, you can utilize different log type templates, including pre-made dashboards, and set up data accelerations specific to that type. Skipping indexes, materialized views, and covered indexes are examples of accelerations; templates include VPC Flow Logs, Elastic Load Balancing Logs, and NGINX Logs.

Direct queries with Amazon S3 enable you to run intricate queries essential for threat and security forensic analysis. These queries correlate data from various sources, assisting teams in investigating security events and service outages. Once you’ve created an integration, you can begin directly querying their data from the OpenSearch Dashboards or OpenSearch API. Connections can be easily audited to make sure they are configured in a secure, scalable, and economical manner.

Limitations

Direct queries using Amazon S3 through OpenSearch Service are subject to the following restrictions.

To support OpenSearch Service direct queries, your OpenSearch domain needs to be 2.11 or later.
Only Spark tables in the AWS Glue Data Catalogue are supported by OpenSearch Service direct queries with Amazon S3. Index updates depend on Spark streaming, which is not supported by Hive tables.
Certain data types are not compatible. The only supported data formats are Parquet, CSV, and JSON.
The direct query preview release does not support AWS CloudFormation templates.
The AWS Glue Data Catalogue and your OpenSearch domain must be in the same AWS account. Although they must be in the same AWS Region as your domain, your Amazon S3 tables may be in a different account.
There is no support for nested Spark structures. If they are present, you must explode any nested structures in your source data to rows. There is no support for nested Spark structures. If they are present, you must explode any nested structures in your source data to rows.

Conclusion

OpenSearch Service users also use Amazon S3 as an affordable means of storing operational log data that is not frequently accessed. Customers had to copy data from Amazon S3 into OpenSearch Service to use its rich analytics and visualization features, which aid in understanding data, spotting anomalies, and spotting possible threats. This required customers to analyze Amazon S3 data and correlate data from multiple sources. On the other hand, constantly maintaining and replicating data between services can be costly.

Customers can access operational log data stored in Amazon S3 using OpenSearch Service thanks to its zero-ETL integration with Amazon S3. This enables customers to perform sophisticated queries and visualizations on their data without requiring data movement.

Drop a query if you have any questions regarding Amazon S3 and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

Reduced infrastructure costs
Timely data-driven decisions

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As an AWS Premier Tier Services Partner, AWS Advanced Training Partner, Microsoft Solutions Partner, and Google Cloud Platform Partner, CloudThat has empowered over 1.1 million professionals through 1000+ cloud certifications, winning global recognition for its training excellence, including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 14 awards in the last 9 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, Security, IoT, and advanced technologies like Gen AI & AI/ML. It has delivered over 750 consulting projects for 850+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. What are the benefits of zero-ETL?

ANS: – Increased agility, Cost efficiency, and Real-time insights

2. How many types of Direct queries with Amazon S3 are available?

ANS: – Amazon S3 can be queried directly in two ways: interactively or through index maintenance. Analytics are performed on your data in Amazon S3 by interactive queries. OpenSearch Service launches a fresh session lasting at least ten minutes whenever you run a new query. Compute is used by index maintenance queries in OpenSearch Service to maintain indexes. Because they ingest a configurable amount of data into OpenSearch Service to speed up interactive queries, these queries typically take longer.