What is Bigdata? what does Bigdata Mean?
π✍️What is "BIGDATA" ? What Does "BIGDATA" mean ?ππ»
.
.
.
.
What is Bigdata?
=============
✏️Some says when dataset is large (or) volume is huge then data is termed as "BIGDATA".
✏️Some says when we are not able to store the data on laptop then it is termed as "BIGDATA"
✏️We get many such definations regarding "BIGDATA"
✍️FORMAL DEFINATION GIVEN BY IBM
==============================
π Any data which is characterized by 3v's is termed as "BIGDATA"
✍️They are
1)volume
2)variety
3)velocity
1)Volume
======
✏️ Volume of data should be Large, It should be some Terabytes or petabytes
✏️A single System(Machine) is incapable of handling it
Example
=======
✏️Facebook users upload more than 900 million photos a day. Huge volume of data, traditional system incapable of handling them.
2) Variety
=======
π Data can be of any type:-
a)Structured (RDBMS Databases, Oracle,Mysql)
b)Semistructured(Csv,Xml,Json)
c)unstructured(Audio,Video,Image,Logfiles)
✏️ it is not like traditional "DBMS" where data we get in a structured manner can be of any type as shown above.
3)Velocity:
=========
✏️Speed at which data is coming is termed to be "Velocity".
✏️In simple words the speed at which "Ingesting data", "processing data" and "retriving data"(response) is termed as "Velocity"
Example
===========
✏️Remember our Facebook example? 250 billion images may seem like a lot. But if you want your mind blown,
consider this: Facebook users upload more than 900 million photos a day.
✏️A day. So that 250 billion number from last year will seem like a drop in the bucket in a few months.
✏️Velocity is the measure of how fast the data is coming in.
✏️Facebook has to handle a tsunami of photographs every day.
✏️It has to ingest it all, process it, file it, and somehow, later, be able to retrieve it.
✍️According to Ibm Formal defination , the Dataset with above 3 characteristics is termed as "Bigdata."
π₯There might be 4v's,5v's so-on.......
✍️4v's are somewhat relevant and lets talk about 4th v:
======================
π 3v's are same as above discussed and 4th 'V' is "Veracity"
✍️Veracity:
==========
✏️Quality of the data that is being analyzed is termed as "Veracity".
✍️ Low veracity:
============
✏️We can find meaningless, poor quality data like :-
a)we can find lot of null values
b)age might be in negative
✏️Such Low veracity data doesn't contribute any meaning-full insight.
✍️ High Veracity:
=============
✏️On the other hand High veracity data contribute meaning-full insights
✍️Example:
=======
a)High veracity data set would be "" data from a medical experiment or trial "", which gives us a meaning-full insight of an "Experiment"
Comments
Post a Comment