This post's content
- Step 1: Understand Your Database Usage
- Step 2: Optimize your schema and remove redundant indexes
- Step 3: Vaccum?
- Step 4: Optimize top slow queries using automatic tools
- Step 5: Understand if your current instance is fully utilized
- Step 6: Optimize Your Database Configuration
- Step 7: Amazon RDS Reserved Instances? up to 73% discount on a 3 Years commitment.
- Step 8: Use Database Compression (In case you are looking to save on storage)
- Step 9: Ongoing Cost Monitoring
- Summary
PostgreSQL is one of the most popular databases among developers, with a growing number of companies choosing it as their main relational production database. In 2021 Stack Overflow Developer Survey, PostgreSQL was the fourth most commonly used database among developers, after MySQL, SQL Server, and Oracle. And today, 2023, it is the most popular database among professional developers with 46%.
As more and more companies collect data and have more users, the cost of their PostgreSQL databases is getting higher. According to a study by ParkMyCloud, the average RDS cost as a percentage of total AWS spend was approximately 15%.
To optimize your PostgreSQL costs, you need to understand how you're using the database. There are several ways to analyze your database usage, such as monitoring resource utilization and analyzing query patterns using tools like EverSQL, CloudWatch, AWS Performance Insights, Datadog, and more.
Step 1: Understand Your Database Usage
Before you can optimize your PostgreSQL costs, you need to understand how you're using the database. There are several ways to analyze your database usage, such as:
- Monitoring resource utilization and analyzing query patterns using tools like EverSQL, CloudWatch, AWS Performance insights, Datadog, and more.
- Understanding storage size using tools like AWS or by running the following command in PostgreSQL:
SELECT pg_size_pretty(pg_database_size('your_database_name'));
- Understanding Database CPU cost using tools like AWS or by running the following command in PostgreSQL
SELECT pid, datname, usename, state, query, round((query_duration::numeric/1000)::numeric,2) as duration_sec, round((usertime::numeric/1000)::numeric,2) as cpu_sec FROM pg_stat_activity ORDER BY duration_sec DESC LIMIT 10;
This command will show the top 10 queries that are taking the most CPU time. You can use this information to identify queries that need optimization.
Here are a few more PostgreSQL commands you can use to gather more information about database usage:
- pg_stat_database: This command displays statistics about database activity, including the number of connections, transactions, and queries.
SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit FROM pg_stat_database;
Sample output:
datname | numbackends | xact_commit | xact_rollback | blks_read | blks_hit --------------+-------------+-------------+---------------+-----------+---------- postgres | 1 | 5 | 0 | 35 | 10749 mydatabase | 1 | 0 | 0 | 0 | 3
- pg_stat_activity: This command displays information about the currently executing SQL statements, including the username, database name, and the SQL text.
SELECT datname, usename, query FROM pg_stat_activity;
Sample output:
datname | usename | query --------------+-----------+----------------------------------------------- postgres | postgres | SELECT datname, numbackends, xact_commit, xact mydatabase | myuser | SELECT COUNT(*) FROM mytable WHERE mycolumn = 1
- pg_stat_user_tables: This command displays statistics about table activity, including the number of rows inserted, updated, and deleted.
SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch FROM pg_stat_user_tables; schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch - ------------+-------------+----------+--------------+----------+--------------- public | mytable | 12 | 382 | 121 | 253 public | myothertable| 10 | 187 | 31 | 77
Step 2: Optimize your schema and remove redundant indexes
Optimizing your schema and indexes is critical to keeping your PostgreSQL costs under control. By creating a streamlined schema and removing redundant indexes, you can significantly reduce disk space usage and I/O operations, which can have a major impact on your PostgreSQL cost.
Here are some best practices for optimizing your schema:
- Normalize your database schema: Normalizing your database schema can reduce redundant data and minimize data duplication. This can help reduce the overall size of your database and improve query performance.
Here's an example of normalizing a database schema:Let's say we have a table called orders with the following columns:order_id (primary key) customer_name customer_email item_name item_price item_quantity order_date
Now, let's say we have a requirement to store shipping addresses for each order. We could simply add the following columns to the orders table:
ship_to_name ship_to_street_address ship_to_city ship_to_state ship_to_zipcode
However, this would violate the principle of database normalization, which states that each piece of data should only be stored in one place.
Instead, we could create a new table called order_shipments with the following columns:
order_id (foreign key referencing the orders table) ship_to_name ship_to_street_address ship_to_city ship_to_state ship_to_zipcode
This would allow us to store shipping addresses for each order without duplicating data or violating normalization principles. By following normalization principles, we can avoid redundant data and make our schema more efficient and flexible.
- Remove unused indexes: Unused indexes can take up valuable disk space and cause unnecessary I/O operations. Use the following command to list all indexes that have not been used in the last 30 days:
SELECT relname, lastused FROM pg_stat_user_indexes WHERE lastused < NOW() - INTERVAL '90 days';
Here is an example output of this command:
relname | lastused ---------+-------------------------- idx1 | 2022-12-01 14:23:00.100387 idx2. | 2022-12-02 12:00:00.000000 (2 rows)
- Remove redundant indexes: Redundant indexes can also take up valuable disk space and cause unnecessary I/O operations. You can use tools like EverSQL Index Advisor that can analyze your schema and recommend on redundant indexesDeleting these indexes can save your storage cost and reduce CPU usage
- Use partial indexes: If you have large tables with lots of data, you can use partial indexes to create smaller, more efficient indexes. Partial indexes only index a subset of the data in a table, which can help reduce disk space usage and I/O operations. Here is an example of creating a partial index:
CREATE INDEX my_table_partial_idx ON my_table (column1) WHERE column2 = true;
By optimizing your schema and indexes, you can reduce your PostgreSQL costs and improve your database performance.
Step 3: Vaccum?
While VACUUMing your database won't directly reduce your PostgreSQL cost, it can help optimize the database and improve its performance, which can indirectly lower your costs. Here are a few ways you can use VACUUM to optimize your database:
- Free up space: When a row is deleted, its space isn't immediately freed up. Instead, it becomes available for future insertions or updates. However, as the number of deleted rows increases, it can lead to table bloat and negatively affect the database's performance. Running VACUUM can help reclaim the space used by deleted rows and free up space in the table.
- Reduce the need for autovacuum: PostgreSQL has an autovacuum process that runs automatically in the background and performs similar tasks as VACUUM. However, it can be resource-intensive and impact the performance of the database. By running VACUUM manually, you can reduce the need for autovacuum and potentially improve the overall performance of the database.
- Improve query performance: When a table is vacuumed, it can update statistics used by the PostgreSQL query planner, which can help improve query performance. By running VACUUM, you can ensure that your query planner has the most up-to-date statistics and can optimize queries more effectively.
To run VACUUM, you can use the following command:
VACUUM [FULL] [FREEZE] [ANALYZE] [VERBOSE] [table_name]
Here are some of the parameters you can use:
- FULL: Performs a more thorough vacuuming of the table, which can take longer but can also free up more space.
- FREEZE: Freezes the transaction ID wraparound for the table, which is necessary for very large tables that are frequently updated.
- ANALYZE: Updates statistics used by the query planner.
- VERBOSE: Outputs detailed progress information while vacuuming.
- table_name: The name of the table to vacuum. If omitted, it will vacuum all tables in the database.
Here's an example of how you can use VACUUM:
VACUUM FULL VERBOSE ANALYZE;
This command will perform a full vacuum on all tables in the database and provide verbose output of the process. The VERBOSE option will display progress messages as the command progresses, giving the developer more insight into the status of the vacuuming process.
It's worth noting that running a VACUUM command on a large database can take a long time and put a heavy load on the system. It's generally recommended to schedule vacuuming during periods of low activity to minimize impact on users and the database itself. Additionally, some hosting providers offer managed PostgreSQL services that handle vacuuming and other maintenance tasks automatically, which can help reduce costs associated with manual maintenance.
Step 4: Optimize top slow queries using automatic tools
Optimizing slow queries is crucial to improving the performance and reducing the cost of your PostgreSQL. One way to do this is by using automatic tools that can help identify and optimize slow queries. One such tool is EverSQL, which provides an AI-powered database optimization platform that can analyze and optimize slow queries automatically.
By using EverSQL, developers can gain valuable insights into their slow queries, including query execution time, estimated cost, and recommendations for optimization. The tool offers a user-friendly interface that allows developers to easily analyze their queries and optimize them with just a few clicks.
Moreover, EverSQL also provides an API that can be integrated into your development workflow, making it easier to optimize queries automatically as part of your continuous integration and deployment (CI/CD) pipeline. With the API, you can automate the process of identifying and optimizing slow queries, which can save you time and effort in the long run.
One of the main advantages of using EverSQL is that it can help you optimize queries that would otherwise be difficult to optimize manually. For example, queries that involve multiple joins or complex subqueries can be challenging to optimize manually. However, with EverSQL, you can get automated recommendations for optimizing these queries, which can help you improve their performance significantly.
Step 5: Understand if your current instance is fully utilized
One important aspect of reducing your PostgreSQL costs is to ensure that your RDS instance is properly sized. If your instance is oversized, you could be paying for unused resources, while an undersized instance could result in poor performance.
To determine if your instance is properly sized, you can use several AWS tools. The first one is the RDS Console, where you can see the metrics of your instance's CPU, memory, and storage usage. By monitoring these metrics, you can identify if your instance is over or underutilized.
Another AWS tool that can help you determine if your instance is properly sized is Amazon CloudWatch. With CloudWatch, you can monitor your RDS instance's performance metrics, such as CPU utilization, network traffic, and disk usage. By analyzing these metrics, you can identify if your instance is operating at peak efficiency and adjust its size accordingly.
Lastly, AWS offers the Performance Insights feature, which is a dashboard that displays a comprehensive view of your instance's performance, such as top SQL queries, waits, and system metrics. By using Performance Insights, you can get a deeper understanding of how your instance is being used and identify areas for optimization.
Step 6: Optimize Your Database Configuration
Optimizing your database configuration can help you get better performance and reduce costs. You can find and edit your PostgreSQL configuration by selecting the relevant cluster and clicking on the configuration tab
Here are a few configuration parameters that you should pay attention to:
- shared_buffers - This parameter determines the amount of memory used by the database for caching data. It is recommended to set this value to at least 25% of the available memory on your server. For example, if you have a server with 16GB of RAM, you should set shared_buffers to 4GB.
To view the current value of shared_buffers, you can run the following command in your PostgreSQL console:SHOW shared_buffers;
- effective_cache_size - This parameter determines the amount of memory available for caching data. It is recommended to set this value to at least 50% of the available memory on your server. For example, if you have a server with 16GB of RAM, you should set effective_cache_size to 8GB.
To view the current value of effective_cache_size, you can run the following command in your PostgreSQL console:SHOW effective_cache_size;
- work_mem - This parameter determines the amount of memory used by each PostgreSQL session for sorts and other operations. It is recommended to set this value to at least 64MB. However, if you have a lot of complex queries or large data sets, you may need to increase this value.
To view the current value of work_mem, you can run the following command in your PostgreSQL console:SHOW work_mem;
- max_connections - This parameter determines the maximum number of concurrent connections to the database. It is recommended to set this value based on the maximum number of connections you expect to have at any given time. However, keep in mind that increasing the value of max_connections also increases the amount of memory used by PostgreSQL.
To view the current value of max_connections, you can run the following command in your PostgreSQL console:SHOW max_connections;
Step 7: Amazon RDS Reserved Instances? up to 73% discount on a 3 Years commitment.
Amazon RDS Reserved Instances (RIs) can help save on cloud costs by committing to a certain usage over a period of time. RIs offer significant discounts of up to 75% compared to on-demand instance pricing, making them a cost-effective option for long-term usage.
For example, let's say an organization is running a PostgreSQL RDS instance on-demand for a year, and the cost is $1,000 per month. That's a total of $12,000 for the year. However, if the organization purchases a one-year RDS Reserved Instance upfront, they can save up to 40% on the instance cost. In this case, the upfront cost for the RI would be $6,000, resulting in a savings of $6,000 for the year.
In addition to one-year RIs, Amazon also offers three-year RIs for PostgreSQL RDS instances, which provide even greater discounts of up to 73% compared to on-demand pricing. The upfront cost for a three-year RI is higher, but the total savings over the three-year period can be significant.
It's important to note that RIs are best suited for long-term usage, as the cost savings are realized over time. Organizations should analyze their usage patterns and commit to an RI if they have consistent and predictable usage for the duration of the RI.
To purchase an RI, users can go to the Amazon RDS console and select "Reserved Instances" from the left navigation pane. They can then choose the region and the PostgreSQL RDS instance type for which they want to purchase the RI. Users can choose between one-year and three-year terms and select the payment option that works best for them.
You can also use AWS Cost Explorer to estimate how much reserve instances will save you:
Step 8: Use Database Compression (In case you are looking to save on storage)
Database compression is a technique that helps to reduce the size of a database by compressing the data that is stored in it. This can result in significant cost savings, as smaller databases require less storage and can be processed more quickly. In PostgreSQL, there are several types of compression techniques that can be used to reduce the size of the database, including table compression, page-level compression, and column-level compression.
Table compression is a type of compression that compresses the entire table, and it can be applied to both data and indexes. When you compress a table, PostgreSQL stores the data in a more compact format, which results in a smaller database size. Page-level compression, on the other hand, compresses individual database pages, which can help to reduce the amount of disk I/O required to read and write data from the database.
Column-level compression is another type of compression that compresses individual columns in a table. This type of compression can be useful if you have a table with one or more columns that contain repetitive or highly compressible data. By compressing these columns, you can reduce the size of the database and improve performance.
To determine whether database compression can help reduce costs for your specific use case, you can use the AWS Cost Explorer to analyze the cost of storing your data in uncompressed vs. compressed formats. Additionally, you can use the pgstattuple module in PostgreSQL to analyze the size of your database and determine which tables and columns are taking up the most space.
To enable compression in PostgreSQL, you can use a variety of techniques. For example, you can use the built-in table compression feature by setting the storage parameter to 'PLAIN' or 'EXTERNAL' during table creation. Additionally, you can use third-party tools like Postgres-XL or Greenplum to enable database compression at a more granular level.
It's important to note that while compression can help to reduce storage costs, it can also increase CPU usage and may have an impact on query performance. Therefore, it's important to carefully evaluate your specific use case and performance requirements before enabling compression.
Although not a lot of developers know about compression, it can be quite effective in reducing database storage and improving performance. The effectiveness of compression in reducing database size depends on the type of data being stored and the compression technique being used. In some cases, compression can reduce database size by over 50%. For example, a customer at Crunchy Data reported a reduction in database size from 45TB to 19TB using PostgreSQL's page-level compression.
Compression can also improve query performance, as smaller databases can be read from disk more quickly. However, there is a trade-off between compression and CPU usage, as compressed data requires more CPU cycles to compress and decompress.
To check if compression is enabled on a table, you can use the following command in PostgreSQL:
SELECT relname, relkind, relpages, reltuples, reltoastrelid, reltoastidxid, relcompression FROM pg_class WHERE relkind = 'r' AND relname = '<table_name>';
This will display information about the specified table, including whether compression is enabled, for example:
relname | relkind | relpages | reltuples | reltoastrelid | reltoastidxid | relcompression ------------------+---------+----------+-----------+---------------+---------------+---------------- sample_table | r | 5609 | 16294 | 0 | 0 | pglz
Step 9: Ongoing Cost Monitoring
As a final step, it's important to establish ongoing checks and balances to make sure that your PostgreSQL database is optimized for cost efficiency. This involves monitoring the system regularly to identify any issues that may be causing costs to rise.
One effective tool for monitoring your PostgreSQL database is the EverSQL Sensor. This tool uses machine learning to identify high load and high-cost queries and provides developers with actionable insights on how to optimize their queries for better performance and cost efficiency. By using the EverSQL Sensor, you can identify issues before they become too expensive and take action to fix them.
In addition to the EverSQL Sensor, there are several other tools you can use to monitor your PostgreSQL database and keep your costs under control. For example, AWS provides a range of monitoring tools, such as CloudWatch, that can help you track database metrics and identify issues. You can also set up alerts to notify you when specific metrics exceed certain thresholds, allowing you to take action quickly.
Another important aspect of ongoing checks and balances is to regularly review your database configuration and schema to ensure that they are still optimized for cost efficiency. This involves periodically revisiting the previous steps outlined in this guide, such as analyzing slow queries and optimizing database configuration, to make sure that your database is still running efficiently and cost-effectively.
By implementing these ongoing checks and balances, you can ensure that your PostgreSQL database remains optimized for cost efficiency over the long term, reducing the risk of unexpected cost spikes and keeping your AWS bills under control.
Summary
You will be surprised how much you can save on your PostgreSQL RDS cost. The most common method to cut PostgreSQL RDS cost is by optimizing the database usage, as well as implementing cost-saving measures such as using reserved instances, compression, and connection pooling. This involves understanding the usage patterns of your database, optimizing your schema, using efficient data types, caching queries, and monitoring performance. By implementing these measures, you can reduce the number of unnecessary resources used by your PostgreSQL RDS instance, resulting in significant cost savings.
Start by using AWS Cost Explorer and by deploying EverSQL Sensor to better understand your cost reduction opportunities.