While talking to NO SQL enthusiastics, I often hear that No
SQL database can handle unstructured data. Similar arguments are also echoed by
Hadoop and BigData devotes. Are these
people are technically correct or just using marketing hype to influence IT
decision makers who are business savvy but technical dependent?
In my point of view, there is nothing called unstructured data.
NO SQL, Hadoop and BigData zesty people are calling any dataset which does not
fit in relational data base as unstructured data. What do you think?
In the context of data, there are two attributes which
defines complexity. First is relationship among objects (equivalent to tables
in relational database) of data and second is varying number of elements
(equivalent to columns in a table in relational database) in objects. With
respect to these two parameters there are four possible combinations:
* Both number of elements in objects and relationship
among objects is fixed; it is not changing over time period.
a.
Numbers of elements in objects are fixed and
relationships among objects are simple and can be described using relational math.
This type of data is prime candidate for relational database.
b.
Numbers of elements in objects are fixed and
relationships among objects are not simple and difficult/nearly impossible to
describe using relational math. For example if relationships among objects are
mimicking graph structure than graph database (e.g. Neo4j) is better choice than relational or any
other type of database.
* Numbers of elements in objects are varying on ad
hoc basis irrespective of complexity of relationships among objects than
relational database is not the solution. You need database which can
accommodate varying number of elements in
objects such as MongoDB
* Numbers of elements in objects are fixed but
relationship among object is varying on ad hoc basis. Again relational database
is not the solution. You should explore HBase
or MongoDBfor this scenario.
*Both numbers of elements in objects are varying
and relationships among objects are changing on ad hoc basis. Yep, you guessed
correctly, relational database is not the part of the solution. For this
scenario you can explore HBase or MongoDB.
In above discussion, I have not considered volume of data.
In truly unstructured data, structure of data is not
definable. If one can’t define a structure, then structure does not exist from
programming perspective.
There is no unstructured data. Data has structure, we may
have not been able to discover or comprehend it yet.
No comments:
Post a Comment