Is Big Data Really Worth the Time?
Keval BaxiNovember 04th, 20165 minute read
Keval Baxi is the Chief Executive Officer at Codal. He oversee's the corporate direction and strategy of the company, focusing on innovation and customer experience. Outside of Codal, his interests include running and exploring new Chicago restaurants.
Well, at least according to Andrew Oliver of InfoWorld.
The big data industry is rapidly coming to a close.
Apache Hadoop—the open-source platform that hosts data for Facebook and Yahoo—has, alongside Google Cloud Platform, all but mastered cloud based big data analysis. Nonetheless, the big data fad seems to be heading in the same direction as flash based storage.
One of the problems with big data analysis is that cognitive bias sometimes causes users to misread the data and make poor decisions. If you do decide to analyze big data, be sure to have patience and not make snap decisions.
Rather than creating a more accurate picture, use of big data (as opposed to small volumes of data) sometimes makes the picture less clear. As you increase the population of data, you generally see regression to the mean, or "normalization" of data.
Proponents of big data point to the way that it allows businesses to better understand their market demographics. The Phoenix Suns, a professional basketball team, use Precision Market Insights (owned by Verizon) to answer many questions.
Many of these questions are vital to business operations. How far does the average fan travel to attend each game? How many fans also purchased tickets to a baseball game recently? How many fans stop at restaurants on the way home from games?
Cloud Storage vs. Flash Storage: Where Do I Store All of My Data?
Cloud storage has opened up new lanes for big data platform developers. On the one hand, the cloud allows you to access your data from anywhere with an internet access. On the other, you can keep your data tangibly close to you on flash storage, and farther from hackers.
The cloud can be affordable, too. Amazon Web Services offers cloud storage starting at $3.00 a month for the first terabyte of storage. The cheaper cost reflects the upload rate, which is unfortunately capped at 0.03 gigabytes per month.
It's not all bad, though. Amazon Web Services will track some app or web metrics for you, and will condense them into meaningful statistics. And in 2013, the CIA signed a contract with Amazon Web Services for 6 millions dollars, which speaks to the security of the data platform.
Big Data Solutions (BDS) is a data platform that acts as an extension of Google Cloud. Because BDS is open source, users who have experience with fellow open source data analysis tools such as Hadoop, Map Resource, and Spark will be able to import the Google Cloud Platform counterpart to match the software they are familiar with.
For comparison, BigQuery, one of the more useful data analysis tools, goes for $5 a terabyte.
MongoDB is a free and open source platform that was released in 2009 and includes all the query support of its paid counterparts. Unlike the other two platforms, however, MongoDB stores your data in collections of documents rather than tables.
Because of this, MongoDB is better suited for storing and mining strings of data as opposed to arrays of numbers. As such, the query package for MongoDB is better suited for users who have string based data as opposed to numerical data, such as those who need to analyze a document to find out the most commonly used phrase or keyword.
Which Platform is right for me?
Of the three platforms, Amazon Web Services is by far the most usable for those who don't have a background in statistical analysis. Not only does it record website and application traffic, but the app will condense your big data into applicable statistics before you even enter a query.
If you're familiar with basic statistics operations, such as finding correlations, covariances, linear regression models, and chi-squared models, Big Data Solutions on the Google Cloud Platform is the best route for you.
The third party integration allows you to "recruit" different apps to help you mine your data, so your statistics toolbox can always grow to accommodate your needs.
MongoDB is best suited for users who have their data stored in documents rather than spreadsheets. If need to analyze writing, MongoDB has a lot of potential. If you're analyzing data stored in spreadsheets, it would be best to steer clear of MongoDB.