Multi-tenant Hadoop Clusters: The Obstacles are Not Just Technical
Our recent research on “big data” use cases uncovered many large financial services firms using Hadoop and other technologies to build “multi‐tenant” analytics platforms—that is, shared environments meant to support multiple independent applications. In some cases, these platforms are being built in advance of demand, trying to preempt individual businesses from deploying their own clusters. The goals are simple and very appropriate for banks today: reduce costs and share data within the bank to satisfy regulatory and legal requirements and to support new business.
But some of our interviews with individuals on the business side also revealed deep concerns about the acceptability of centralized platforms, and cause me to send out a word of warning to the technologists responsible for their deployment.
This warning starts with the two issues that get the most attention: service levels and security. Internal users question your ability to maintain service levels in the presence of “noisy neighbors,” poorly behaved applications that consume so much resource that they degrade the performance of other apps in the cluster. And security groups wonder whether you have adequate controls to prevent data sharing of the evil sort. But these issues are solvable by technology that’s either here today or on the near horizon.
Another set of concerns focus on things that are harder to measure but are just as likely to cause angst among application owners. This angst will be familiar to anyone who was involved in the early stages of centralization of new technology platforms such as market data systems or grid computing clusters. We found it lurking in some of the interviews we held, so I recently probed it further in follow-up conversations with business line and technology experts. Among their concerns are:
Flexibility: If your multi-tenant environment is one-size-fits-all, users have to “bend the [application] around the infrastructure.” You are constraining how your users engineer their applications; how they test; even the structure of their operations organizations. Owners of critical applications such as client reporting may view this as unacceptable. So you may find yourself only serving applications that are “blind to the infrastructure” – non-critical apps such as ad-hoc queries that are not impacted by your constraints.
Cost: Owners of business-critical apps are more focused on the business costs associated with error than with the operational cost of running the application. They end up with private resources because they need a high level of determinism; it may cost a lot but they know what they are getting. As one user put it, this is “a kind of ‘protectionism’ but it’s not a political statement.”
Monitoring: Since the costs of monitoring are high and can impact application performance, users only monitor what they need to. A batch process pricing a single derivative position may have fewer monitoring checkpoints than one that is processing fifty million credit card transactions. The former jobs are not going to be prepared to look for problems that are introduced because of the environment (i.e., the other tenants), and they are not going to want to bear the costs of additional monitoring.
Change management: The adoption of a shared platform means being subject to the change management and testing practices not only of the platform provider, but also of the other applications in the environment. Major upgrades can’t be done until all of the applications are ready to go. A user of a shared environment complains: “The apps with the most extensive testing requirement can hold everyone else up. Those apps that are inward facing can afford to make changes that may have an unexpected impact, but the outward facing ones can’t.”
Again, I’m not saying these objections to centralization are new. We see them with every move toward consolidation. I’m just suggesting that IT groups should be sure to pay attention to them. Instead of focusing entirely on the logistics of deployment – on getting CIO support lined up, selecting the right Hadoop distro, and training up your staff – take time to understand the deep-seated angst your users have about moving to your platform and address it head-on. Find the folks in your organization who have been there before – the IT managers who headed up grid computing deployments and server virtualization programs. They can provide valuable insight into what to do, and what not to do, as you try to convince app owners to come on board.
About the STAC Blog
STAC and members of the STAC community post blogs from time to time on issues related to technology selection, development, engineering, and operations in financial services.