AWS Well Architected Framework is AWS’s offering to help builders build secure, high performing, resilient and efficient infrastructure for their applications. It assists in understanding the benefits and drawbacks of decisions made while creating systems on AWS. It guides the customers towards best architectural practices for creating and maintaining cloud-based systems that are reliable, secure, efficient, and cost-effective. The framework establishes a standardized method for assessing systems against the attributes customers expect from modern cloud-based systems, as well as the repairs necessary to attain those attributes. Mostlt, chief technology officers (CTOs), architects, developers, and members of the operations team are concerend with following the framework.
- operational excellence
- security
- reliability
- performance efficiency
- cost optimization
One of the important pillars of the AWS Well-Architected Framework is Operational Excellence. This is about running workloads, monitoring these workloads and responding to various events efficiently generated by the workloads. Design principles for operational excellence in the cloud are as following:
- Operations as Code – Automate the creation of different infrastructure using tools like CloudFormation
- Automated Documentation from Annotations – We should document how different components of the system interact with each other. Whenever there are some changes in the systems, the documentation should also update automatically. This will prevent integrations from breaking apart upon some changes
- Make frequent and reversible changes – It is a good idea to make small and reversible changes to the production environment, rather than big time changes. This helps to quickly restore to a version in case there are some issues with the changes
- Anticipate Failure – Always design your system to anticipate and accept failures, test them to make your system more robust
- Learn from Operational Failures – Whenever there is a failure, make a note of the root cause and take lessons
Services like CloudWatch, CloudTrail, X-Ray and VPC Flow Logs are used for implementing Operational Excellence.
The Security pillar throws light on the concepts of protecting your data and system from unauthorized access and threats by conducting continuous risk assessments and figuring out strategies to mitigate the risks.
Design Principles
- Strong Identity Foundation – Follow key principles like granting least privilege, separation of duties, appropriate authorization level, etc.
- Enable Traceability – Audit any change or action to any environment and by whom. This enables us to maintain transparency within the organization. Monitor logs and takes action when an anomaly is detected
- Security at all Layers – Apply security at multiple layers, like VPC, Load Balancers, Security Groups, EC2 instances, etc.
- Automate Security Best Practices – Implement security as code and version control all security measures for future use
- Protect Data in Transit and at Rest – Data should be protected using encryption, authorization tokens and Access Control Mechanisms
- Keep people away from data – As far as possible, data should be kept away from handling by many people by implementing proper policies and access control
Services like Identity and Access Management (IAM), Multi-Factor Authentication (MFA) and Organizations are used to secure the account. GuardDuty and CloudTrail are used to monitor any unwanted access and take appropriate actions. VPC, Shield and WAF used to define rules on who is authorized to access the applications and how.
For a system to be reliable, the failures should be minimized and in case of failures, how quickly or efficiently can the system recover from the failure. It is also important that your applications can scale dynamically based on the workload rather than depending on static inputs for scaling which might lead to under provisioning or overprovisioning the resources.
Design Principles
- Test Recovery Procedures – Inject or simulate failures to your system and test how it recovers from the failure
- Automatically Recover from Failure – Ensure that recoveries from failures are always automated, monitor metrics on CloudWatch, and take proper actions whenever any thresholds are reached. Automated notifications to humans should also be set up as a best practice
- Scale horizontally – Avoid using monolithic architectures and use smaller resources to keeps multiple systems isolated from one another
- Stop guessing capacity – Since the cloud allows dynamic capacity management, you should never guess your capacity beforehand. Let the system automatically scale up and down based on the demand
Detect and respond to failures and prevent recurrence of the same failures in the future. Backup data and environment configurations to improve recovery time and ensure that a proper disaster recovery plan is made and ready to be implemented whenever necessary. Use the Personal Health Dashboard to understand the health of your resources.
There are two parts to maintain performance efficiency, one to choose the correct resource and services and the second to continuously evolve your resources as the technology changes.
Design Principles
- Consume advanced technologies as a service – Use more managed services as they reduce efforts on provisioning, configuring, scaling, backing up, etc.
- Go global in minutes – Since AWS is globally deployed across multiple regions, you can leverage this and deploy your application to multiple regions to help lower the latency of your application
- Use serverless architectures – Using serverless architectures helps you to run your code directly without managing any other services. For example, use S3 to host a static website instead of running it on an EC2 instance
- Experiment more often – Experimenting your solution across several metrics helps you identify performance bottlenecks and take appropriate actions
Deploy services to multiple regions and use serverless functions like AWS Lambda instead of running applications on an EC2 instance.
The Cost Optimization pillar is another important part of the AWS Well-Architected Framework which allows AWS customers to deliver business values at the lowest possible cost.
Design Principles
- Adopt a consumption model – Pay only for those resources which are actually in use and scale your resources up or down as per the demand. No need to pay static payment charges as per the demand forecasted
- Measure overall efficiency – Keep measuring your costs over a time period to understand and keep track of the trends. Optimize wherever and whenever possible
- Avoid spending money on operations – Leave all your operational expenditure on AWS and focus on your customers and business logic
- Analyze and attribute expenditure – Attribute your own resources, analyze and monitor the expenditure for each individual department or team. Implement resource tagging and resource groups to analyze expenditure more efficiently
- Use managed and app-level services to reduce TCO – Using managed apps helps to save overall costs of maintaining the services
AWS Cost Explorer to monitor current and forecast future costs. Visualize costs by resources and cut down costs by removing unused resources. Use Budgets to get alerts based on your budgeted value
The following picture gives a brief summary about AWS well-architected framework.