{"id":4252,"date":"2016-04-17T13:39:11","date_gmt":"2016-04-17T13:39:11","guid":{"rendered":"http:\/\/blog.cloudthat.com\/?p=4252"},"modified":"2024-06-25T11:12:43","modified_gmt":"2024-06-25T11:12:43","slug":"streaming-nginx-logs-to-s3-using-td-agent-and-kinesis-firehose","status":"publish","type":"blog","link":"https:\/\/www.cloudthat.com\/resources\/blog\/log-streaming-on-aws","title":{"rendered":"Log Streaming on AWS"},"content":{"rendered":"<p>Data is the most superior aspect in this existing world. Configuring and managing data is the basic requirement in every aspect of life. Data is the facts and statistics collected during every operations of business, They can be used to measure or record business activities may be internal or external. Here we are streaming logs to S3 bucket so that there won&#8217;t be any data loss.<\/p>\n<h4><strong><span style=\"color: #666699;\">Kinesis Firehose:<\/span><\/strong><\/h4>\n<p>Kinesis firehose captures data from\u00a0web\u00a0applications, sensors, mobile applications, and various different sources and\u00a0streams into Amazon S3 or Redshift. Kinesis Firehose takes care of monitoring, scaling and management of data.<br \/>\n[showhide type=&#8221;diagram&#8221;\u00a0more_text=&#8221;Show Diagram&#8221; less_text=&#8221;Hide Diagram&#8221; hidden=&#8221;yes&#8221;]<\/p>\n<p><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/kinesis1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4630\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/kinesis1.jpg\" alt=\"kinesis\" width=\"634\" height=\"158\" \/><\/a>[\/showhide]<\/p>\n<h4><strong>Fluentd:<\/strong><\/h4>\n<p>Fluentd is a unified open source data collector which unify data collection and consumption for data use and understanding data.<\/p>\n<h5><strong>Features<\/strong><\/h5>\n<ul>\n<li>Unified logging with JSON (structured logs)<\/li>\n<li>Pluggable architecture<\/li>\n<li>Supports memory and file-based\u00a0buffering to prevent data loss<\/li>\n<\/ul>\n<h5><strong>td-agent<\/strong><\/h5>\n<p>td-agent is the stable lightweight server agent distribution of fluentd which resides on data generating application.td-agent is a data collection daemon. It collects data from various data sources and uploads them to treasure datastore.<br \/>\n[showhide type=&#8221;diagram2&#8243;\u00a0more_text=&#8221;Graphical Representation&#8230;&#8221; less_text=&#8221;Hide Image&#8221; hidden=&#8221;yes&#8221;]<br \/>\n<a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_0061.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4629\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_0061.png\" alt=\"Selection_006\" width=\"733\" height=\"371\" \/><\/a><br \/>\n[\/showhide]<\/p>\n<p>Here, we are going to stream Nginx logs to S3 using td-agent (logging tool) through Kinesis Firehose which is a managed service for streaming data to S3 or Red Shift.<\/p>\n<h4><strong>Prerequisites<\/strong><\/h4>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>An EC2 instance with IAM Role &#8220;fluentd&#8221; and attach to an instance<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>[showhide type=&#8221;fluent_iam_role&#8221;\u00a0more_text=&#8221;Click here for the role&#8221; less_text=&#8221;Hide Details&#8221; hidden=&#8221;yes&#8221;]<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"text-align: justify; color: #808080;\">Role &#8220;<\/span>fluentd<span style=\"text-align: justify; color: #808080;\">&#8221; is as follows:<\/span><\/p>\n<pre class=\"lang:default decode:true\">{ \"Effect\":\"Allow\",\r\n  \"Action\":[ \"s3:Get*\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"s3:List*\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"s3:Put*\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"s3:Post*\" \r\n           ],\r\n  \"Resource\":[ \"arn:aws:s3:::YOUR_BUCKET_NAME\/logs\/*\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\"arn:aws:s3::: YOUR_BUCKET_NAME\" ]\r\n}<\/pre>\n<p>[\/showhide]<\/p>\n<ul>\n<li>Install Nginx server on instance and start the service<\/li>\n<li>Create S3 bucket (destination) to which the logs to be streamed<\/li>\n<\/ul>\n<h4><strong>Tasks to stream logs to S3 bucket.<\/strong><\/h4>\n<ol>\n<li>Create Kinesis Firehose Delivery Stream<\/li>\n<li>Install Fluentd and Plugin<\/li>\n<li>Configure the agent file<\/li>\n<li>Start Service<\/li>\n<li>Check the Operation<\/li>\n<\/ol>\n<h3><span style=\"color: #666699;\"><strong>Task 1: Creating Kinesis Firehose Delivery stream<\/strong><\/span><\/h3>\n<p>Initially, create Kinesis Firehose Delivery stream using AWS Management console For more information, go through \u00a0<a href=\"https:\/\/aws.amazon.com\/kinesis\/firehose\/\">https:\/\/aws.amazon.com\/kinesis\/firehose\/<\/a><\/p>\n<p>[showhide type=&#8221;firehose&#8221;\u00a0more_text=&#8221;For Detailed Instructions..&#8221; less_text=&#8221;Hide Details&#8221; hidden=&#8221;yes&#8221;]<\/p>\n<p>&nbsp;<\/p>\n<h4><span style=\"color: #666699;\"><strong>Step 1: Select the Destination where you want to stream the logs<\/strong><\/span><\/h4>\n<p>After creating Kinesis Firehose delivery stream, select the destination where to send the streamed data. The destination might be either S3 bucket or Redshift.<\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>Select destination as Amazon S3 from the drop down list<\/li>\n<li>Provide your own Delivery stream name<\/li>\n<li>Select S3 bucket you have created from the list<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 30px;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4254\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/129.png\" alt=\"12\" width=\"904\" height=\"509\" \/><\/p>\n<h4><span style=\"color: #666699;\"><strong>Step 2: Configuring Firehose<\/strong><\/span><\/h4>\n<p>Here we are going to configure buffer size, buffer interval and compression options for a stream<\/p>\n<p style=\"padding-left: 30px;\">Enter the Buffer size as 5 and Buffer interval as 300<\/p>\n<p style=\"padding-left: 30px;\">Kinesis Firehose buffers data up to\u00a05 MB or 300 seconds whichever condition is satisfied first.\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4255 size-full\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/1331.png\" alt=\"133\" width=\"881\" height=\"472\" \/><\/p>\n<h4><span style=\"color: #666699;\"><strong>Step 3: Select IAM role \u201cfirehose_delivery_role\u201d<\/strong><\/span><\/h4>\n<div><\/div>\n<div>Firehose needs access to your S3 bucket, you are required to have an IAM role to access bucket. Firehose assumes that IAM role and gain access to the bucket.<\/div>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_003.png\">\u00a0<\/a> <img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4257 size-full\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_003.png\" alt=\"Selection_003\" width=\"758\" height=\"523\" \/><\/p>\n<p>[\/showhide]<\/p>\n<h3><span style=\"color: #666699;\"><strong>Task 2: Install Fluentd<\/strong><\/span><\/h3>\n<p>Fluentd is available as a Ruby gem (<code style=\"color: #b12704;\">gem install fluentd<\/code>). Also,\u00a0Treasure Dat<a style=\"color: #005b86;\" href=\"https:\/\/docs.treasuredata.com\/articles\/td-agent\">a<\/a>\u00a0packages it with all the dependencies as\u00a0<code style=\"color: #b12704;\">td-agent<\/code>.<\/p>\n<p>[showhide type=&#8221;td-agent&#8221;\u00a0more_text=&#8221;For Detailed Instructions..&#8221; less_text=&#8221;Hide Details&#8221; hidden=&#8221;yes&#8221;]<\/p>\n<p>Here, we proceed with td-agent.<\/p>\n<h4><strong><span style=\"color: #666699;\">Install td-agent using the following command<\/span><\/strong><\/h4>\n<table style=\"height: 57px;\" width=\"622\">\n<tbody>\n<tr>\n<td width=\"624\">\n<pre class=\"lang:default decode:true\">curl -L https:\/\/toolbelt.treasuredata.com\/sh\/install-ubuntu-trusty-td-agent2.sh | sh<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h4><span style=\"color: #666699;\"><strong>Install Plugin \u201cfluent-plugin-kinesis-firehose\u201d using the following command<\/strong><\/span><\/h4>\n<p>It is a Fluentd output plugin for Kinesis Firehose. It will push the logs out of kinesis firehose to destination<\/p>\n<pre class=\"lang:default decode:true\">sudo\u00a0td-agent-gem install fluent-plugin-kinesis-firehose<\/pre>\n<h4 style=\"padding-left: 30px;\"><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_002.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4616\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_002.png\" alt=\"Selection_002\" width=\"740\" height=\"383\" \/><\/a><\/h4>\n<p>[\/showhide]<\/p>\n<h3><span style=\"color: #666699;\"><strong>Task 3: Configuring\u00a0td-agent.conf file<\/strong><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_006.png\"><span style=\"color: #666699;\">\u00a0<\/span><\/a><\/span><\/h3>\n<p>Configuration file is located in<strong> &#8220;\/etc\/td-agent\/td-agent.conf&#8221;<\/strong>.<\/p>\n<p>[showhide type=&#8221;agent_conf&#8221;\u00a0more_text=&#8221;For Detailed Instructions..&#8221; less_text=&#8221;Hide Details&#8221; hidden=&#8221;yes&#8221;]<br \/>\nCopy and paste the following contents into the file and provide your access key and secret key.<\/p>\n<p style=\"padding-left: 30px;\"><strong><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_008.png\">\u00a0<\/a><\/strong><\/p>\n<pre class=\"lang:default decode:true \">&lt;source&gt;\r\n    @type tail\r\n    format nginx\r\n    path \/var\/log\/nginx\/access.log\r\n    tag nginx.access\r\n&lt;\/source&gt;\r\n\r\n&lt;match **&gt;\r\n@type kinesis_firehose\r\ndelivery_stream_name incoming-stream\r\n\r\naws_key_id XXXXXXXXXXXX\r\naws_sec_key XXXXXXXXXXXX\r\n\r\nregion us-west-2\r\n\r\nflush_interval 1s\r\n&lt;\/match&gt;<\/pre>\n<p>[\/showhide]<\/p>\n<h3><span style=\"color: #666699;\"><strong>Task 4: Start service td-agent<\/strong><\/span><\/h3>\n<p>We now have to start the agent service.<br \/>\n[showhide type=&#8221;start_agent&#8221;\u00a0more_text=&#8221;For Detailed Instructions..&#8221; less_text=&#8221;Hide Details&#8221; hidden=&#8221;yes&#8221;]<br \/>\nStart service using the following command:<\/p>\n<pre class=\"lang:default decode:true\">sudo \/etc\/init.d\/td-agent start<\/pre>\n<p>Use the below commands to create logs This command is used for simple load testing. It will create 1000 requests with 10 requests running concurrently.<\/p>\n<pre class=\"lang:default decode:true\">ab -n 1000 -c 10 https:\/\/localhost\/\r\n<\/pre>\n<p style=\"padding-left: 30px;\"><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_0041.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4618\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_0041.png\" alt=\"Selection_004\" width=\"652\" height=\"413\" \/><\/a><\/p>\n<p>[\/showhide]<\/p>\n<h3><span style=\"color: #666699;\"><strong>Task 5: Operation Check<\/strong><\/span><\/h3>\n<p>Make sure that the logs are getting streamed to S3 bucket.<\/p>\n<p>[showhide type=&#8221;diagram3&#8243;\u00a0more_text=&#8221;Show Image&#8221; less_text=&#8221;Hide Details&#8221; hidden=&#8221;yes&#8221;]<br \/>\n<a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_007.png\">\u00a0<\/a> <img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4262 size-full\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/Selection_007.png\" alt=\"Selection_007\" width=\"858\" height=\"169\" \/>[\/showhide]<\/p>\n<p><strong>NOTE:<\/strong> It might take 10 minutes for data to appear in your bucket due to buffering. Make sure that role should be attached to an\u00a0instance, so that Fluentd had access to write data into bucket.<\/p>\n<p>We have configured td-agent.conf file to collect access logs from Nginx server from the path \/var\/log\/nginx\/access.log and send logs to Kinesis Firehose, which in turn stream logs to S3 bucket which can be used for other purposes.<\/p>\n","protected":false},"author":219,"featured_media":0,"parent":0,"comment_status":"open","ping_status":"open","template":"","blog_category":[3607],"user_email":"prarthitm@cloudthat.com","published_by":"324","primary-authors":"","secondary-authors":"","acf":[],"_links":{"self":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog\/4252"}],"collection":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/users\/219"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/comments?post=4252"}],"version-history":[{"count":1,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog\/4252\/revisions"}],"predecessor-version":[{"id":45773,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog\/4252\/revisions\/45773"}],"wp:attachment":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/media?parent=4252"}],"wp:term":[{"taxonomy":"blog_category","embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog_category?post=4252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}