There has been increased demand to back up our files to a cloud environment for long-term storage to cover any disaster-related incidents. Customers want to securely migrate their data with the help of reliable utility tools which help them to do so. Moreover, they want to automate these tasks through a reliable mechanism to transfer their data to the cloud-like AWS in S3, EFS, Windows file server, etc.
AWS DataSync helps connect your on-prem storage to S3 and much more with a reliable automation architecture. File share systems like NFS (network file system) and SMB (Server Message Block) can now be integrated into AWS DataSync to transfer your required files.
DataSync allows you to transfer all the files or only the changed data the next time you start transferring; it does so by using metadata related to previous data captured, which helps decrease the transfer size and the related time to transfer. AWS DataSync uses a DataSync agent, either installed on on-premises hardware such as VMware, Hyper-V, or an EC2 machine using AWS-provided AMI. This server helps in connecting the source endpoint of the local server to the target endpoint on S3 EFS, etc.
I will be showing a small but powerful setup where you can transfer the files into the S3 from SMB (Samba) server using AWS data DataSync.
Step to Covert your Linux machine into a Samba Server
Here I am using an Amazon Linux AMI in another VPC to act as a remote Samba server
Launch Linux EC2 of your choice here; I am using Amazon Linux to install the SMB server.
In the security group, open port 22and port 445 to anywhere, i.e., 0.0.0.0/0 range (later, we can change this to a dedicated IP)
It is an Amazon Linux 2 AMI; we need to follow the below steps to change the hostname to a user-friendly name
At the end of the file, add the below line and save
$sudo hostnamectl set-hostname samba-server
$sudo yum update-y
Install Samba, Samba-client, and cifs-utils
# yum install -y samba samba-client cifs-utils
Configuration file changes
# vim /etc/samba/smb.conf
hosts allow=ip address of Data DataSync agent VM
***Add the loopback IP andthe VPC starting ip asshown above
Make sure to edit the config as above and change the host allow to your data DataSync agent to connect
We will now create a samba user to access the samba folder directory
# useradd sambauser
# passwd sambauser
# smbpasswd -a sambauser
# service smb restart
Create a directory for file share and give permissions
# mkdir /smbfolder
# chmod 777 /smbfolder
Restart and test with the data DataSync agent
# service smb restart
Create a DataSync agent in EC2 in a VPC
You can choose your DataSync AMI using the below command
Create the agent in the DataSync in the AWS Portal by navigating to DataSync in the AWS Console
Place the Data DataSync agent ID in the agent address
Create a Task
Choose location type as SMB server
Choose the Data DataSync agent you configured
Put IP of SMB server launched up earlier
Share the name for the folder
User and Password for SMB server user
Choose destination location type as S3 / EFS/ NFS as per requirement
Give the task a name and verify the options selected in the review screen
Start the transfer
Once the task is created you can start transferring files using the start tab above
Navigate to the S3 bucket and now you can find the files transferred
Data DataSync helps create a reliable connection between your on-premises storage to AWS S3 / EFS/ NFS servers to transfer data. You can also change the storage classes to store the data when it comes to S3 as the destination. By clicking start on tasks, you can start transferring all the files or files that are not yet transferred. You can also delete the files in AWS storage by choosing the delete option to DataSynchronize your on-premises drives completely.
Here at CloudThat are the official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft gold partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best in industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.
Feel free to drop a comment or any queries that you have regarding AWS services, cloud adoption, or consulting and we will get back to you quickly. To get started, go through our Expert Advisory page and Managed Services Package that is CloudThat’s offerings.
Which file system is from where we can replicate our data to AWS?
Ans: File systems such as NFS, SMB, and HDFS can be set up for on-premises storage locations. Moreover, Amazon EFS, Amazon FSx, and AWS S3 can also be made as source points for data capture.
Can we change the storage class when choosing the destination location to AWS S3?
Ans: Yes, at any point in time, once the AWS DataSync agent is set up, you can change the storage class for a new task where the destination is S3, such as Standard, Glacier, or Deep-archive.
Can we use CloudWatch to monitor Data DataSync tasks?
Ans: Yes, you can monitor the files which are copied using AWSDataSync through AWS CloudWatch metrics.