AWS, Cloud Computing, Data Analytics

5 Mins Read

Empowering Data Lake Apps with Mountpoint for Amazon S3

Overview

Amazon S3 provides durability, availability, scalability, and security, which should be used to build a Data Lake. To store various collections of unstructured data for use in analytics pipelines, machine learning training, business intelligence, and other applications, hundreds of thousands of data lakes have been constructed using the Amazon S3 platform. These jobs commonly build upon free and open-source analytics engines like Apache Spark, which connect to Amazon S3 by default as a data source and use Amazon S3’s adaptability to finish tasks quickly and economically.

A new open-source file client called Mountpoint for Amazon S3 is currently supported by Amazon S3, making it simple for Linux-based programs to connect directly to Amazon S3 buckets and access objects using file APIs. Mountpoint is made for massively parallel reading and writing of Amazon S3 data by large-scale analytics applications that don’t need to write to the center of already-existing objects. By mapping Amazon S3 buckets or prefixes into your instance’s file system namespace, accessing the contents of your buckets like they were local files, and doing so without having to worry about performance tuning or provisioning, you may achieve high throughput access to objects with the mount point. Mountpoint maps file activity to GET and PUT operations against Amazon S3, enabling scalable file-based applications to burst to aggregate throughput of terabits per second without requiring any code adjustments.

It’s important to remember that Mountpoint doesn’t function as a general-purpose networked file system and has several restrictions on file operations. For instance, it does not support writing in the initial release and only permits sequential writing to new objects moving forward. Suppose you wish to run file-based applications that need to collaborate or coordinate on shared data across instances or users (EFS). We suggest using AWS fully managed file services, such as Amazon FSx or Amazon Elastic File System. For instance, clients can use Amazon FSx for Lustre to link a fully POSIX-compliant file system to an Amazon S3 bucket.

Introduction

Mountpoint for Amazon S3, a simple, high-throughput file client, allows you to mount an Amazon S3 bucket as a local file system. Thanks to Mountpoint for Amazon S3, your applications can access objects stored in Amazon S3 using file operations like open and read. Mountpoint for Amazon S3 gives your apps file-based access to Amazon S3’s elastic storage and throughput by automatically turning these actions into Amazon S3 object API calls.

Mountpoint for Amazon S3 is designed for workloads that require high throughput but are read-intensive. Since we don’t want to offer features that can’t be used successfully with S3’s object APIs, it falls short of fully conforming to the POSIX file system specification. See SEMANTICS.md for a comprehensive discussion of Mountpoint’s behavior and POSIX support.

Mountpoint for Amazon S3 is made for high-performance access to the Amazon S3 service. We cannot support particular use cases, and they might unintentionally stop working when we make modifications to support Amazon S3 better. However, it might work with other storage services that use Amazon S3-like APIs. We would appreciate contributions of minor compatibility fixes or performance improvements for these services if the modifications could be evaluated against Amazon S3.

The alpha edition of Mountpoint for Amazon S3 can only be obtained by source-code compilation and runs on Linux. These instructions are for Amazon Linux 2, though they should work similarly for other Linux distributions.

There are some notable restrictions in this first release:

  • Since the mount point for Amazon S3 is now read-only, you cannot use the file system to write things back to S3. We plan to provide sequential writing to new objects in a future release.
  • Mountpoint caches no object data or information for Amazon S3.
  • The alpha release of the mount point for Amazon S3 may only be installed by building from the source. This will alter in a later version.
  • Manual endpoint configuration might be required for some Amazon S3 customers.

Only Linux is supported in the Alpha edition of Mountpoint for Amazon S3, which is only available by building from source. These directions are for Amazon Linux 2, but they should also work for other Linux distributions.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Step-by-Step Guide

  1. Launch an instance of Amazon Linux 2 AMI, then use rustup to run the install requirements, including the Rust compiler.

Run the following commands, then follow the instructions.

step1

step1b

2. Now clone this repository and its submodules:

step2

Finally, compile the client:

step2b

Create a Directory to mount in which your Amazon S3 bucket should be mounted

step2c

Go to the Mount point Directory and check the Files inside our bucket,

step2d

3. We can’t see any Files inside our S3 Bucket. So we can go and check our Bucket.

step3

Currently, we don’t have any files inside our bucket. So we can upload some files to it.

step3b

step3c

4. Now we can go and check it in our mountpoint.

step4

The current version won’t support writing to the Amazon S3 bucket. Let us try to create some files for our Mount Directory and see.

step4b

Conclusion

Mountpoint for Amazon S3 makes it simple to translate file actions into Amazon S3 API queries, and Data Lake apps can access your Amazon S3 data without requiring code modifications.

With the Alpha release of Mountpoint today, AWS is taking the first steps towards a dependable, high-performance file client for these apps.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding Mountpoint for Amazon S3, I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.

FAQs

1. What version of Mountpoint is currently offered for Amazon S3?

ANS: – The alpha release of Mountpoint for Amazon S3 is presently available. Hence it should not be utilized in live operations.

2. Does Mountpoint for Amazon S3's enable write operations?

ANS: – Since Amazon S3’s mountpoint is now read-only, you won’t be able to use the file system to write objects back to S3.

3. Has Mountpoint for Amazon S3 got any caching system?

ANS: – Since Amazon S3’s mountpoint is now read-only, you won’t be able to use the file system to write objects back to S3.

WRITTEN BY Deepak Surendran

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!