Empowering Data Lake Apps with Mountpoint for Amazon S3

Overview

Amazon S3 provides durability, availability, scalability, and security, which should be used to build a Data Lake. To store various collections of unstructured data for use in analytics pipelines, machine learning training, business intelligence, and other applications, hundreds of thousands of data lakes have been constructed using the Amazon S3 platform. These jobs commonly build upon free and open-source analytics engines like Apache Spark, which connect to Amazon S3 by default as a data source and use Amazon S3’s adaptability to finish tasks quickly and economically.

A new open-source file client called Mountpoint for Amazon S3 is currently supported by Amazon S3, making it simple for Linux-based programs to connect directly to Amazon S3 buckets and access objects using file APIs. Mountpoint is made for massively parallel reading and writing of Amazon S3 data by large-scale analytics applications that don’t need to write to the center of already-existing objects. By mapping Amazon S3 buckets or prefixes into your instance’s file system namespace, accessing the contents of your buckets like they were local files, and doing so without having to worry about performance tuning or provisioning, you may achieve high throughput access to objects with the mount point. Mountpoint maps file activity to GET and PUT operations against Amazon S3, enabling scalable file-based applications to burst to aggregate throughput of terabits per second without requiring any code adjustments.

It’s important to remember that Mountpoint doesn’t function as a general-purpose networked file system and has several restrictions on file operations. For instance, it does not support writing in the initial release and only permits sequential writing to new objects moving forward. Suppose you wish to run file-based applications that need to collaborate or coordinate on shared data across instances or users (EFS). We suggest using AWS fully managed file services, such as Amazon FSx or Amazon Elastic File System. For instance, clients can use Amazon FSx for Lustre to link a fully POSIX-compliant file system to an Amazon S3 bucket.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Introduction

Mountpoint for Amazon S3, a simple, high-throughput file client, allows you to mount an Amazon S3 bucket as a local file system. Thanks to Mountpoint for Amazon S3, your applications can access objects stored in Amazon S3 using file operations like open and read. Mountpoint for Amazon S3 gives your apps file-based access to Amazon S3’s elastic storage and throughput by automatically turning these actions into Amazon S3 object API calls.

Mountpoint for Amazon S3 is designed for workloads that require high throughput but are read-intensive. Since we don’t want to offer features that can’t be used successfully with S3’s object APIs, it falls short of fully conforming to the POSIX file system specification. See SEMANTICS.md for a comprehensive discussion of Mountpoint’s behavior and POSIX support.

Mountpoint for Amazon S3 is made for high-performance access to the Amazon S3 service. We cannot support particular use cases, and they might unintentionally stop working when we make modifications to support Amazon S3 better. However, it might work with other storage services that use Amazon S3-like APIs. We would appreciate contributions of minor compatibility fixes or performance improvements for these services if the modifications could be evaluated against Amazon S3.

The alpha edition of Mountpoint for Amazon S3 can only be obtained by source-code compilation and runs on Linux. These instructions are for Amazon Linux 2, though they should work similarly for other Linux distributions.

There are some notable restrictions in this first release:

Since the mount point for Amazon S3 is now read-only, you cannot use the file system to write things back to S3. We plan to provide sequential writing to new objects in a future release.
Mountpoint caches no object data or information for Amazon S3.
The alpha release of the mount point for Amazon S3 may only be installed by building from the source. This will alter in a later version.
Manual endpoint configuration might be required for some Amazon S3 customers.

Only Linux is supported in the Alpha edition of Mountpoint for Amazon S3, which is only available by building from source. These directions are for Amazon Linux 2, but they should also work for other Linux distributions.

Step-by-Step Guide

Launch an instance of Amazon Linux 2 AMI, then use rustup to run the install requirements, including the Rust compiler.

Run the following commands, then follow the instructions.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

1	curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \| sh

step1

sudo yum install fuse fuse-devel cmake3 clang-devel

1	sudo yum install fuse fuse-devel cmake3 clang-devel

step1b

2. Now clone this repository and its submodules:

git clone --recurse-submodules https://github.com/awslabs/mountpoint-s3.git

1	git clone --recurse-submodules https://github.com/awslabs/mountpoint-s3.git

step2

Finally, compile the client:

cd mountpoint-s3
cargo build --release

1 2	cd mountpoint-s3 cargo build --release

step2b

Create a Directory to mount in which your Amazon S3 bucket should be mounted

mkdir ~/mnt
mount-s3 my-s3-bucket-name ~/mnt

1 2	mkdir ~/mnt mount-s3 my-s3-bucket-name ~/mnt

Go to the Mount point Directory and check the Files inside our bucket,

cd ~/mnt/
ls

1 2	cd ~/mnt/ ls

step2d

3. We can’t see any Files inside our S3 Bucket. So we can go and check our Bucket.

step3

Currently, we don’t have any files inside our bucket. So we can upload some files to it.

step3b

step3c

4. Now we can go and check it in our mountpoint.

The current version won’t support writing to the Amazon S3 bucket. Let us try to create some files for our Mount Directory and see.

step4b

Conclusion

Mountpoint for Amazon S3 makes it simple to translate file actions into Amazon S3 API queries, and Data Lake apps can access your Amazon S3 data without requiring code modifications.

With the Alpha release of Mountpoint today, AWS is taking the first steps towards a dependable, high-performance file client for these apps.

Making IT Networks Enterprise-ready – Cloud Management Services

Accelerated cloud migration
End-to-end view of the cloud environment

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.