Voiced by Amazon Polly |
What is Big data?
It is definitely the most talked of ‘new kid on the block’ in the analytics fraternity. Everyone seems to be talking about it. So, what exactly is Big data?
Big data consists of data sets that grow so large that they become ‘difficult / awkward’ to work with using existing database management systems (Oracle, Sybase, MySQL, Teradata etc.). Difficulties include capture, storage, search, sharing, and mining the data for analytics. With an explosion in sources of data – internet forms, cookies, sensors, mobile applications, satellite data etc., the quantum on data is growing and will continue to grow at an astronomical pace. The cost of storage of data is reducing exponentially too. The cost of a 4 GB pen drive is now 10% of what it was a couple of years ago. Coupled together, these two trends will fuel the growth in quantum of data that we will have access to.
The world’s technological per capita capacity to store information has roughly doubled every 40 months since the 1980s (about every 3 years). Some people say it is no going to double every 1.5 years. Every day 2.5 quintillion bytes of data is created.
As you can visualize, these new sources of data will mostly be non-relational data and the storage is in non-relational DBMS. Thus, transactional data within an organization will, bynecessity, be in the traditional relational database while there is this ‘other’ data which is where there will be maximum growth. This ‘other’ data will need to be mined and put into MIS and reports, analyzed for trends and used to create probability equations.
This ‘other’ data is generally called Big data.
Customized Cloud Solutions to Drive your Business Success
- Cloud Migration
- Devops
- AIML & IoT
Connecting Big Data
(Source:-https://connollyshaun.blogspot.in/)
As systems and processes stabilize and mature on capture and storage of Big Data, the focus is shifting to ‘WHAT NEXT’?
Logically, the next step is to mine the data for information – business intelligence and Analytics.
I will specifically look at Apache Hadoop in this context. HadoopMapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte datasets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
Many vendors that have caught the Hadoop bug and released versions of the software such as Cloudera, HortonWorks, Microsoft with HDInsight.
Cloudera’s Hadoop
Sounds simple, but for a data analyst with no Java coding skills, it is all Latin and Greek . Until you take a look at Pig – high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. It abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. High Level language is very close to natural language and spoken English, thus making it very user friendly for the non – coder.
Wow ..this makes life so much better for data analysts. And enables us Analysts to look forward to many more projects where we will effectively crunch ‘Big Data‘.
Software that competes with Hadoop is Google’s BigQuery. And the comparisons between these two giants is a story for another day. But in the real world out there, Hadoop is the current favorite Big Data Management and Analysis system.
Interesting aside:- Apache Hadoop is an open-source software framework that supports data-intensive distributed applications. Hadoop is written in the Java programming language Hadoop was created by Doug Cutting and Mike Cafarella and Doug named it after his son’s toy elephant!! All parts of the Hadoop framework have names commonly found in a Zoo J.
From 2002 onwards, Subhashini has a decade of experience across roles in Analytics in Retail Finance and Banking. These roles have been across Risk Management, Collections strategy, Fraud Control and Marketing in GE Money, Standard Chartered Bank, Tata Motors Finance and Citi GDM. Her area of interest is the integration of results / outputs of Analytics with Business Decisions – Tactics and Strategy.
She is currently active in the Analytics Training and Consulting arena.
(Link to LinkedIn profile – https://in.linkedin.com/pub/subhashini-s-tripathi/3/405/77b )
Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.
- Cloud Training
- Customized Training
- Experiential Learning
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Premier Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner, AWS Config, Amazon EMR and many more.

WRITTEN BY CloudThat
CloudThat is a leading provider of cloud training and consulting services, empowering individuals and organizations to leverage the full potential of cloud computing. With a commitment to delivering cutting-edge expertise, CloudThat equips professionals with the skills needed to thrive in the digital era.
Comments