Service Delivery Requirements for RDS:
1. Guidance to enable the customer to meet their Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
- The point in time functionality
- The point in time restore is used to restore DB instance to any specific time during the backup retention period, creating a new DB instance. The latest restorable time for a DB instance is typically within 5 minutes of the current time.
- We create snapshots of rds in a regular interval using custom lambda scripts. Whenever we need to recover the database to a point in time, we use the snapshot to restore the database to the point where they want to go back to.
- We use pg_dump for restoring the database.
- Recovering in the same AWS region and in a different AWS Region?
- We launch the snapshots in the default settings while recovering it into the same region.
- We use the copy snapshot option in the RDS console to migrate the snapshots in a different region.
- Use cases where snapshots and point and time recovery can be used
- When instances fail due to any bugs in the application.
- Inconsistency in data and need to revert back to a certain point in time.
- Periodic Testing to make sure.
- Our test shows that the traffic in their system is least during the weekends.
- We turn off their primary database and simulate the environment
- We created a secondary database from the snapshot of the primary database and we got success in making system up and running.
2. Customer enablement to allow the customer to use and evolve the solution over time.
- Training and reference materials provided to the customer to ensure that they understand the Amazon RDS service.
- We suggest customer to regularly follow AWS Documentation, recent news, AWS blogs, podcasts, etc to stay updated with RDS.
- Different training and reference materials were provided to the customer to ensure that they understand the Amazon RDS service like Amazon PostgreSQL demo videos, FAQ of Amazon PostgreSQL and different blogs about the Amazon PostgreSQL uses.
- We provided them with the demo of PostgreSQL and shared them some doc file provided by Amazon on how they can adjust their Amazon PostgreSQL based on performance and cost measurements.
- Proper guidelines of Amazon Cloudwatch is also provided by our AWS experts during the involvement with the clients for proper monitoring on the performance of RDS.
- We have suggested the following documentation and video links:
3. Properly sized Amazon RDS architecture based on the customer’s non-AWS architecture
- Existing Architecture
- Earlier they were using local data center VPS service where the database was running. The database crashed most of the time due to the high traffic and improper management.
- The data from the production server was backed up to the recovery data center, systems are both protected from component failures at the production data center and can be recovered during a disaster at the recovery data center.
- The existing architecture of Leapfrog was very less secured and less reliable.
- In the final architecture of the LeapFrog, they were suggested to create a private VPC with Public and Private subnet. Inside the VPC 1 application load balancer is used, two EC2 Instances have been launched and PostgreSQL with Multi-AZ and two read replica was deployed.
- New Application
- Leapfrog needed 99.9% of the data available from the AWS RDS. There was also a high demand for Multi-Az for the high availability of the data. Leapfrog has 1k transaction per second in the database and the initial size of the database is 25GB which is expected to grow 5GB yearly.
- Here is the final Amazon architecture with RDS.
- In this architecture, user request hit the Application load balancer first and then the request gets passed to the EC2 instances used as an application server. The servers are set up in Multi-AZ (2 different Availability Zones) mode to ensure that even if one availability zone goes down, the application continues to work. The servers fetch and write data to the Multi-AZ RDS databases.
4. Implementing database security related to Amazon RDS.
- Implement password policies for their database (password strength, rotation policies, etc.
- To implement password policies for their database we have used AWS Secrets Manager.
- Implement secure password storage, retrieval, and rotation for human and application access to the database.
- We have suggested using the CloudWatch console to find different logs and also the logs of RDS which will show them potential security events related to their database.
- Encryption options for data at rest or at the column level.
- For the encryption of data at rest in RDS AWS Key Management System(KMS) is used. AWS KMS combines secure, highly available hardware and software to provide a key management system scaled for the cloud. Using AWS KMS, we have created encryption keys and define the policies that control how these keys can be used.
- Relevant AWS security features:
- Identity and Access Management configuration.
- In the case of IAM in security features MFA has been enabled, also asked them to set a password rotation policy for all the users and recommended them to give only required permission to the users.
- The configuration of the VPC and overall network containing the database and applications interacting with the database.
- The configuration is done according to the requirement of the client. For the security of the database, we have hosted database in the private subnet and application which will be interacting with the database is hosted in the public subnet
- Access controls to the database and database subnets via security groups and Access Control Lists.
- Security to the database is perfectly done. Only required ports are allowed in the security group and in NACLs.
- Identity and Access Management configuration.
5. Assistance on application architecture to take advantage of functionality that exists within the Amazon PostgreSQL Engine.
- Changing the application to utilize concurrent connections to the Amazon RDS database engine.
- We sat with the DB engineer of leapfrog and found the bottleneck connections and optimized the code to properly utilize concurrent connections with the database and its read replicas.
- Changing their application to utilize the different read and write endpoints of the Amazon RDS service.
- Made a change in their code during the implementation of a read replica in their architecture.
- Different endpoints are used from the application code level to differentiate between read and write workloads and request accordingly.
- Changing of other applications or processes to utilize the read endpoint of the Amazon RDS service to enable reporting or data warehouse operations.
- Before moving to Amazon RDS, the customer used only one endpoint for both their main application where write occurs and in their BI tool where read occurs. After starting to use Amazon RDS, the BI tool has been using the read endpoints only.
6. Solutions Characteristics
- Cross-regional replication or another cross-regional DR setup.
- First, The snapshots of the Primary DB was taken.
- Then, those snapshots were used to make the Secondary DB in another region in order to get cross-regional DR.
- Segregation of master instance and read replicas among AZ’s
- Implemented at least 2 read replicas in each of their database instances in different AZ’s
- Made sure the master instance and read replicas reside in different Availablity Zone to prevent any issues during failover.
- AWS KMS is used in the database for the encryption of data in rest.
- Migration to Amazon RDS from a non-AWS environment.
- Before AWS they were using datahub(on-premise) which is a local data center facility provider of Nepal.
- Performance and cost-effective architecture.
- To remove the downtime, Multi-AZ is implemented in the architecture and also read replicas are used for a fast read capacity which will reduce the traffic in the database.
- A proper notification system is created using the Cloudwatch and SNS to notify any issues regarding performance so that we can take proper actions to solve that issue.
7. Solution Complexity:
- Initial size of the database
- The Initial size of the database they are using is 40GB.
- Expected yearly growth of the database
- It is expected to grow in 5GB data every year.
- Numbers of tables in the database.
- There are a total of 10 tables in the database.
- Anticipated number of concurrent requests during peak use of the database
- They anticipated a total of 1000 users requests during the peak use of the database.
- Anticipated percentage of read operations against the entire database during peak usage.
- They anticipated almost 70% of the read operation against the entire database during the peak usage.