{"id":4003,"date":"2016-06-01T12:48:21","date_gmt":"2016-06-01T12:48:21","guid":{"rendered":"http:\/\/blog.cloudthat.com\/?p=4003"},"modified":"2024-06-25T11:12:35","modified_gmt":"2024-06-25T11:12:35","slug":"kinesis-firehose-bridle-path-to-stream-data","status":"publish","type":"blog","link":"https:\/\/www.cloudthat.com\/resources\/blog\/getting-started-with-kinesis-firehose","title":{"rendered":"Getting Started with Kinesis Firehose"},"content":{"rendered":"<p>In this\u00a0fast\u00a0growing world,\u00a0humongous amount of data\u00a0is being\u00a0produced from all sources\u00a0in every part of the world. It\u00a0can\u00a0be anything like logs from the machines, data produced from the traffic signals, data from the\u00a0IoT\u00a0devices, smart devices installed in\u00a0homes\/IT industries and a lot of other sources.\u00a0After\u00a0production of\u00a0this\u00a0vast\u00a0amount of\u00a0data,\u00a0another problem arises of storing, configuring, managing and streaming of data.<\/p>\n<p><strong>How to manage\u00a0data which occupies\u00a0storage, utilizes compute power,\u00a0used for analysis is an important aspect for decision making?<\/strong><\/p>\n<p>AWS has a solution to it.\u00a0Amazon Kinesis\u00a0streams\u00a0is the service that you are looking for to\u00a0stream the data.<\/p>\n<p>Kinesis Streams\u00a0will\u00a0collect\u00a0data form the source\u00a0and\u00a0stream to application for further analysis. The\u00a0data\u00a0is\u00a0replicated across\u00a0<a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/using-regions-availability-zones.html\">availability zones<\/a>\u00a0 \u00a0for\u00a0high\u00a0availability\u00a0and\u00a0reliability of data.\u00a0It\u00a0can scale based on the incoming data. It can scale from megabytes to terabytes\u00a0while\u00a0streaming data.\u00a0It loads data into stream using HTTPs, Kinesis Producer library, Kinesis Client Library and Kinesis Agent.\u00a0Basically in Kinesis Streams the data is available up\u00a0to 24 hours and can also be extended up\u00a0till\u00a07 days.<\/p>\n<p>Kinesis Streams resolved the problem of analysis, compute power and decision making. But, we still have a problem of\u00a0storing the data.\u00a0Since Kinesis Streams can only save data up\u00a0to 24Hrs initially and can be saved till\u00a07 days.<\/p>\n<p>What if\u00a0we\u00a0need to store the data for\u00a0long???<\/p>\n<p>What if\u00a0we\u00a0need to access the data afterwards for another set of task???<\/p>\n<p>There comes Kinesis Firehose\u00a0into picture, AWS\u00a0introduced new service called <strong>Kinesis Firehose<\/strong>.<b>\u00a0<\/b><\/p>\n<h3><b>Kinesis Firehose<\/b><b><\/b><\/h3>\n<p>This\u00a0is the easiest way of streaming data when compared to Kinesis Streams. It\u00a0will take care\u00a0of monitoring, scaling,\u00a0data management\u00a0and provides data security. This blog\u00a0will\u00a0take you through\u00a0Kinesis Firehose\u00a0in an out.<\/p>\n<p>Kinesis firehose captures data from\u00a0web app, sensors, mobile applications\u00a0and various different sources and streams them into Amazon S3 and\/or\u00a0Amazon\u00a0Redshift and\/or Amazon\u00a0Elasticsearch.<\/p>\n<p><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/kinesis2.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4827\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/kinesis2.jpg\" alt=\"kinesis\" width=\"634\" height=\"158\" \/><\/a><\/p>\n<p>It load\u2019s massive volume of streaming data into Amazon S3 and Amazon Redshift.<\/p>\n<p>It is fully managed service, which automatically scales the stream based on data and no need of administration. It can also batch, compress\u00a0and encrypt data before loading, minimizes the storage used at the destination and increase security.<\/p>\n<p>It\u00a0automatically loads\u00a0data into S3 or Redshift and\u00a0can\u00a0also compress and encrypt data, which\u00a0helps in decreasing the storage and increasing the security.<\/p>\n<p>&nbsp;<\/p>\n<h3><b>Kinesis\u00a0<\/b><b>Firehose\u00a0<\/b><b>Vs Kinesis\u00a0<\/b><b>Streams<\/b><\/h3>\n<h3 style=\"text-align: left;\"><b><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/149.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4838\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/149.png\" alt=\"1\" width=\"644\" height=\"417\" \/><\/a><\/b><\/h3>\n<h3 style=\"text-align: left;\"><b><br \/>\n<\/b><b>Key Concepts<\/b><\/h3>\n<p>Delivery stream is a stream of data\u00a0or collection of data records.\u00a0Initially, Firehose creates\u00a0the delivery stream and sends data to it, which will be stored either in S3 or Redshift.<\/p>\n<p>You can create the delivery stream using Firehose console or\u00a0CreateDeliveryStream\u00a0API call.<\/p>\n<h4><b>Record<\/b><\/h4>\n<p>Records are\u00a0data blobs (blobs are binary data), which\u00a0are sent by\u00a0data producer. Each data blob\u00a0should be maximum of 1000 KB to delivery stream. Data blobs are named as records.<\/p>\n<h4><b>Destination<\/b><\/h4>\n<p>Destination is data store where the data is delivered. Here,\u00a0Amazon S3 and Redshift are\u00a0destinations.<\/p>\n<p>&nbsp;<\/p>\n<blockquote>\n<h3><b>Features of Kinesis Firehose<\/b><\/h3>\n<\/blockquote>\n<ul>\n<li>\n<h5><strong>\u00a0Zero Administration<\/strong><\/h5>\n<\/li>\n<\/ul>\n<p>Kinesis Firehose take care of infrastructure, storage, networking and also the configuration needed to load data to S3 and Redshift. There\u2019s no need to worry about the provisioning, deployment and maintenance of hardware or software to manage the process.<\/p>\n<ul>\n<li>\n<h5><b style=\"line-height: 18px;\">Scales Elastically<\/b><\/h5>\n<\/li>\n<\/ul>\n<address>Firehose scales elastically based on data automatically. \u202fIt can handle hundreds of gigabytes of data from thousands of sources simultaneously.<\/address>\n<ul>\n<li>\n<h5><b style=\"line-height: 18px;\">Supports Multiple Destination<\/b><\/h5>\n<\/li>\n<\/ul>\n<address>Amazon Kinesis Firehose replicates data across the AWS Regions to provide high availability and durability for data.<\/address>\n<ul>\n<li>\n<h5><b style=\"line-height: 18px;\">\u202fPricing<\/b><\/h5>\n<\/li>\n<\/ul>\n<p>Pay only for the amount of data transmitted through the service. There is no minimum fees or upfront commitments.<\/p>\n<ul>\n<li>\n<h5><b style=\"line-height: 18px;\">Buffering<\/b><\/h5>\n<\/li>\n<\/ul>\n<p>Kinesis Firehose buffers incoming stream for certain period or based on the amount of data buffered. If any one of the feature fulfills it will stream data to destination<\/p>\n<ul>\n<li>\n<h5><b style=\"line-height: 18px;\">Encryption<\/b><\/h5>\n<\/li>\n<\/ul>\n<p>Its provides high level of data security. Firehose also have an option to encrypt the data automatically before moving data to destination.<\/p>\n<p>&nbsp;<\/p>\n<blockquote>\n<h3><b>Configuration Management<\/b><\/h3>\n<\/blockquote>\n<ul>\n<li>\n<h5><b><b>Buffer size and Buffer Interval<\/b>\u00a0<\/b><\/h5>\n<\/li>\n<\/ul>\n<p>Firehose buffers incoming stream before driving to destination for certain period of time. Buffer size is in MBs and Interval in seconds.<\/p>\n<p>Choose Buffer size (1 &#8211; 128 MBs) and Buffer Interval (60 &#8211; 900 seconds) based on data delivery to Amazon S3.<\/p>\n<ul>\n<li>\n<h5><b><b>Data Compression<\/b>\u00a0<\/b><\/h5>\n<\/li>\n<\/ul>\n<p>Data Compression reduces the number of bits needed to store same amount of data. Three compression formats supported are GZIP, ZIP and SNAPPY or choose no data compression.<\/p>\n<ul>\n<li>\n<h5><b><b>Data Encryption<\/b>\u00a0<\/b><\/h5>\n<\/li>\n<\/ul>\n<p>Choose Encrypt data or no encryption with a key from AWS Key Management Service.<\/p>\n<h3><b>What Kinesis Firehose made simpler??<\/b><\/h3>\n<p>Kinesis does not process or interrupt the raw data, you need to simply create a stream and writes data record to it.<\/p>\n<ul>\n<li>The compression of data (client-side) and encryption (server-side) appear based on request, and then data is driven into specified bucket<\/li>\n<li>Control Buffer size and Buffer interval for stream<\/li>\n<li>Add delimiter to isolate record sets<\/li>\n<\/ul>\n<h2><\/h2>\n<h2><b>Steps to Create a Delivery Stream<\/b><\/h2>\n<ul>\n<li style=\"text-align: justify;\">Select Destination of stream as Amazon S3 or Redshift based on requirements<\/li>\n<li style=\"text-align: justify;\">Provide Delivery Stream name<\/li>\n<li style=\"text-align: justify;\">Select bucket name from the list<\/li>\n<li style=\"text-align: justify;\">Create Kinesis Firehose IAM role<\/li>\n<\/ul>\n<h3><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/destination.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4830\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/destination.png\" alt=\"destination\" width=\"480\" height=\"291\" \/><\/a><\/h3>\n<h3><b>Configuring Firehose<\/b><\/h3>\n<p>Configure buffer and compression options.<\/p>\n<ul>\n<li>Select\u202fBuffer Size in the range of 1 MB &#8211; 128 MB<\/li>\n<li>Select Buffer Interval between 60 &#8211; 900 seconds<\/li>\n<li>Select Data Compression format as you required or specify it as Uncompressed<\/li>\n<\/ul>\n<p><a href=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/configuration.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4831\" src=\"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2022\/11\/configuration.png\" alt=\"configuration\" width=\"507\" height=\"340\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>In this blog, we saw what is Kinesis Firehose, what is the need of it, where to use Kinesis Firehose and how to configure it. In my next blog, we will see how we should use this firehose to get the analytics of an application logs.<\/p>\n","protected":false},"author":219,"featured_media":0,"parent":0,"comment_status":"open","ping_status":"open","template":"","blog_category":[3607],"user_email":"prarthitm@cloudthat.com","published_by":"324","primary-authors":"","secondary-authors":"","acf":[],"_links":{"self":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog\/4003"}],"collection":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/users\/219"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/comments?post=4003"}],"version-history":[{"count":2,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog\/4003\/revisions"}],"predecessor-version":[{"id":43278,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog\/4003\/revisions\/43278"}],"wp:attachment":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/media?parent=4003"}],"wp:term":[{"taxonomy":"blog_category","embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/blog_category?post=4003"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}