As we are now seeing, it is important to follow best practices in terms of architecture, security, and governance to ensure that critical systems stay available no matter what.
I encourage folks to leverage architecture reviews and penetration testing wherever possible. Bringing experts in for architectural review will help gain clarity into how your systems are designed, and it can give us a better sense for where there might be room for improvement in terms of availability and security posture. One should hope their systems fail fast, and improve consistently; this is really the goal of any proactive architectural or security review.
Leading edge penetration testing services offer detailed scenarios from both blind and architecturally aware perspectives, poking and prodding at all parts of your systems to evaluate where there might be security holes. Modern software systems are so complex that it often helps to have an outside firm looking for potential blind spots in how a system is designed. In many cases, what they come up with would not have been readily apparent to the engineers working hands-on with the system day in and day out. This is just the nature of focus and attention; engineers can only apply so much of it to whatever we are working on, and so extra sets of eyes from an outside firm are almost always helpful.
Governance is the practice of improving the visibility, security, and cost of the resources in your network. Really what we're talking about is making it so that any resources you have are well accounted for, and they're only accessible by people who need to have access to them. When it's easy to see at a glance what's running where and who has access to it, you're assuredly setting your engineers up for success.
This means ensuring that all resources are labeled for whichever department or system they belong to, making sure that all credentials are freshly rotated and belonging to known actors within the company, and building access policies that are secure, sensible, and selective. Ideally, very few administrators should have broad access to your cloud, and any credentials in your system should be authorized on the principle of least-privilege. This means that any engineers should only have access to the resources they manage, and any instances, servers, or workers should only have access to the resources they operate on. This limits the surface area of any exposed credentials, and it can even prevent mistakes in programming due to incorrect resource or stage access by an application.
Real-time notifications can be set up that alert engineers when resources they manage are mis-configured, mis-labeled, or exposed to parts of the network that they shouldn't be. We're even seeing more advanced AI-driven monitoring now, that can account for a variety of heuristics and notify security engineers of any unusual patterns of traffic or unusual infrastructure configurations. These alerting systems are great, because it's easy for engineers to accidentally launch resources with incorrect configuration and not even know it. Setting up the appropriate guardrails ensures that any mistakes like this are readily apparent and easy to fix.
In critical infrastructure, it is more imperative than ever that we utilize hardware security appliances like firewalls, data diodes, and security SoCs that make silicon-level guarantees for the robustness and integrity of data. More and more, we're seeing these sorts of devices that can perform real-time analysis at a physical level of memory, storage, and network traffic, ensuring that data only ever flows where it's meant to be going.
Many critical systems benefit from being fully-offline, but where connectivity may be required, VPCs, firewalls, and data diodes can be leveraged. We should look again to the principle of least-privilege to take a restrictive approach to how systems are exposed to the network and to the internet; any given system should be exposed carefully, with little surface area, and only to parts of the network that are necessary. Even the existence of bastion hosts is additional surface area that can likely be mitigated with on-site engineering staff and robust network topology.
Many folks are curious whether it makes sense for them to opt for a templated, off-the-shelf solution, or to build a bespoke system in house. The answer is... it depends, and the answer might be different for different parts of your architecture.
Opting for a templated architecture means you're going to get certain stability and performance guarantees that are made by your cloud or IT vendor, and you're going to get these guarantees with minimal configuration or tuning on your part. Things like high-availability and multi-zone/multi-region design will be handled for you, and your networking configuration will be built to spec by certified experts that understand your business needs and service topology.
On the other hand, building a system in-house gives you more visibility and control over exactly how your system works, which can sometimes translate to better results because your engineers know the task at hand better than anyone else. Security through obscurity is a very real benefit too; it's more difficult for attackers to discover security vulnerabilities when you are designing your systems yourself than when you have a similar architecture to everyone else that is running vendored templates. Still, many would argue the templates are so widely used and have so many eyes on them that they are sure to be robust.
It does seem to be the case that bespoke solutions are more flexible and more portable across different clouds and on-premises. There's more room for you to optimize your cost and performance, and if your engineers are thinking multi-cloud or hybrid-cloud is a good approach for your systems, it might be best to build around generalized tooling that can easily be launched into different clouds or environments.
Built for the Task
Your leaders and engineers are going to understand your business and technical needs better than anyone else. When you bring vendors into these discussions, it's important to help them understand what you're trying to accomplish. A great solutions architect is going to be able to ask the right questions to get to the brass tacks of what's needed, so that the technology itself can best serve your team. Great technology gets out of the way; it's highly performant, easy to configure, and flexible enough that you can adapt it to your evolving applications. It's important to always start with your business's needs first, and work from there. I do recommend getting a hold of talented folks that can help you evaluate the variety of options for cloud technologies and network topology; decisions you make now can significantly enhance the performance and stability of your applications, making it effortless for your business to accomplish its goals, stay secure, and succeed.