Big Data, Small Data, Grid Computing, Cloud Computing, Distributed Computation vs Distributed Persistence

gridwizard

Hadoop (http://hadoop.apache.org) has gained a lot of popularity in recent years – and have claimed to throne in Grid Computing. There’s has been a lot of confusion what’s meant by Grid, Load Balancing, Big Data, Cloud…etc. There’re grids that’s geared towards persisting, parsing and analyzing non-relational data (Social media, web scrapping …etc) – Hadoop is one such example. There’re software vendors that cater for simple Enterprise workflow (i.e. Scheduling, Job Chaining) – BMC Control-M, Schedulix for instance. There’re also data platform that’s geared towards Numerical and Quantitative Analysis (Data in relational format)Applied Algo ETL Suite. How do we decide what’s suitable for what purpose?

What is Big Data?

We will start with this easy one – Big Data is just Big Data. Wikipedia says Big Data is simply a lot of data: Big data[1][2] is the term for a collection of 

View original post 1,451 more words

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s