The Dark Arts is a series of blog posts sharing perspectives and lessons learned on transforming what many may consider technical 'Dark Art' into proper Engineering and Science.
Growing Your Infrastructure: Part 1
Knowing when, and how, to scale your infrastructure at just the right time has been a bit of a dark art in the past. Planning too far ahead may sink too much of your budget in hardware before you see any real returns for quite some time. Not planning in advance of customer growth could potentially lead to a very precarious scenario of balancing customer needs against performance, as well as downtime while adding new hardware ad-hoc.
Over the years, we've learned to turn this dark art into more of an engineering task, which is predictable, repeatable, and cost effective. As a rapidly growing SaaS provider guaranteeing 99.9% uptime to our customers, this exercise of going from mysterious dark art to science didn't exactly come easily.
Plans and Preparations
If you've ever hosted a platform or software as a service and you're onboarding a new customer, you might have had conversations like this:
You: "So how many users do you expect to use the system?"
Customer: "Maybe 1200. Actually it could be closer to 10,000 once we get several other divisions and departments signed up."
You: "That's great. We'll plan for that, then."
Customer: "On second thought, our distribution partners might want to use the platform. So it could be closer to 18,000 or so. Oh, I almost forgot, there could be folks from our EMEA region needing to place orders. I guess we'll just need to see how popular it gets."
You: "…."
So how do you plan for something like that? Of course, a big part of the answer to that depends on your software architecture and whether it supports horizontal scaling, vertical scaling, or both. What we've learned in the past is to plan for what the customer expects in terms of usage, but prepare to scale for magnitudes of usage beyond that.
The planning part is simple; the preparation is where this once dark art is transmuted into science and engineering.
Data, Usage, and Elasticity
Our team knows, of course, how Pando is architected. We also know where we tend to see higher system resource consumption based on customer usage patterns. If several thousand users are all placing an order at nearly the same time for one of our customers, we know the areas of our system which will need more processing power, memory, storage, etc.
This is key in preparing for growth.
Metrics around how your customers use your system, not simply how many use your system, is critical. Using that data combined with data from your application's performance across the stack can help you see how to prepare for unexpected growth.
For instance, if you know that a stress point is with your database, simply adding another database server may be a costly, inefficient way to deal with unexpected growth. That unexpected growth might be temporary; perhaps seasonal, or based around quarter-end activity, perhaps even a once in a year spike, etc. That means the extra database server you've added now goes largely unused for much of the time and you've now tied your budget into something that really won't provide much of a return for quite a while.
So the idea is to grow smartly, efficiently, and in an elastic way so that temporary spikes in traffic don't sink resources, both money and hardware, permanently based a sporadic usage pattern that may not happen again for months. Your input is usage trends and behavioral patterns combined with operational data points indicating key stress points needing potential (temporary or not) expansion.
Collecting Data
Our most fundamental component in preparing and growing efficiently is data. How do you get that data? We use a number of tools internally for collecting data metrics.
We use tools such as:
Each of these collect all of the data needed for a thorough analysis of resource bottlenecks across our entire infrastructure and applications.
These tools all play an important role in collecting data points in specific areas of our architecture across the stack. However, once this data has been collected, and it can be a lot of data, the next step is to find ways to analyze, correlate, and find patterns within the data to ascertain where growth is needed as opposed to perhaps fine tuning a certain area of the application.
In our next entry in the series, Growing Your Infrastructure Part 2 we will discuss approaches to analyzing large amounts of operational data, choosing elastic hardware for short term growth, and monitoring trends.
Strengthen Your Tech Stack with a Distributed Marketing Platform
If your marketing team is struggling to effectively control your corporate brand while empowering field users to market your offering on a local level, a distributed marketing platform could help. Discover how the right distributed marketing platform integrates seamlessly in your existing tech stack (or even simplifies it by reducing redundant functions!)—get The Ultimate Guide to Distributed Marketing now: