The advent of Hadoop for processing Big Data solved the problem of needing to scale hardware to specialized and impractical dimensions, both from the specification as well as cost points of view. Hadoop distributes processing and storage to smaller units of hardware and allows new hardware to be added as required. These smaller units of hardware could be cheap, non-specialized commodity server hardware. This makes the proposition of working with Big Data more attractive from the point of view of the investment required in hardware over time.
The cost illusion. At the same time, even commodity hardware may reveal some issues of its own, ones that may not seem visible at first. In an environment where the demand for Hadoop usage occurs at regular intervals, and usage is either more or less constant or usually on a rising trend, the popular decision is to keep adding new hardware as needed, and to replace any old hardware that reaches end-of-life.
But this may not be the usual pattern of Hadoop usage everywhere. There may be times when Hadoop usage needs to increase for short durations, and new hardware is added in response. But what happens when usage comes down for long periods of time or when usage is relatively infrequent? Regardless of whether the hardware used is relatively inexpensive, chances are that it still piles up, representing a waste of capital and maintenance. Apart from that it would need to be replaced upon obsolescence, even if only occasionally used.
Evaluating cloud solutions. This is where the cloud providers come in with the business case for their offerings. By moving Hadoop processing and storage to the cloud the problem of accumulating hardware in-house goes away. If only it were that easy to decide, though. One of the obvious questions that arises is about whether to build a private cloud or go to a completely public cloud.
The answer there lies in the pattern of usage. If demand size is predictable a private cloud makes sense. But again, this is valid for the case where utilization is fairly high and constant, otherwise the expense on unused cloud storage and processing gets wasted. In many cases demand will neither be completely predictable nor constant, in which case the public cloud may make better sense….until one factors in network performance issues. If data has to move frequently between in-house storage and Hadoop in a public cloud there are always the risks of network delays, latency or service disruption due to a variety of possible causes. The decision about cloud utilization and privacy has to be made depending on the business criticality versus the cost of latency (or an outage) and the cost of cloud utilization. Since it isBig Data involved, the time taken to upload data may itself be significant, and this is a factor that may not arise among small data considerations.
Even the frequency of data movement between in-house storage and the cloud may not always be quite predictable. In certain scenarios data that is processed once may not be needed again for the foreseeable future, in which case it can be backed up outside of the cloud. But if there’s a good possibility it may be recalled within a short period (i.e., less than the archive window) it makes sense to retain it on the cloud rather than move it back in-house. This again, points to a larger utilization of cloud capacity, and hence expense.
There’s more. Another factor to consider, and to address in terms of finding a solution, is that most cloud offerings keep the processing layer separate and distinct from the storage layer, whereas Hadoop in principal, is all about keeping processing and storage together in the same hardware or hardware cluster. Solutions are being worked out for this on a case by case basis, but are not likely to be available as part of the default cloud offering unless these are analytics services. Last but not least there’s the issue of data security. Cloud providers do offer security and how it works with Hadoop implementations needs to be looked at closely.
Obviously, moving Hadoop to the cloud to take advantage of its benefits is viable and can make sense. However, as with moving any other enterprise system to the cloud, there are always some considerations to be taken into account that may be unique to the concept of Hadoop.