- Use 1x virtual host per project/environment on a RabbitMQ. Using
/
virtual host if there's only 1x project is perfectly fine. - Delete
guest
user on PROD. Use a separated user per application. vm_memory_high_watermark.relative = 0.66
: RabbitMQ will not accept any new messages when it detects that it's using more than 66% of the available memory (recommended). Nodes hosting RabbitMQ should have at least 256 MiB of memory available at all times.disk_free_limit.relative = 2.0
: If available disk space drops below 2xRAM, all publishers will be blocked and no new messages will be accepted. This is the most conservative production value.- Increase number of concurrently open files. At least 50k. As a rule of thumb, multiple the 95th percentile number of concurrent connections by 2 and add total number of queues to calculate recommended open file handle limit. Values as high as 500K are not inadequate and won't consume a lot of hardware resources, and therefore are recommended for production setups.
- Collect and aggregate LOGS
- Small number of long lived connections. Minimize number of connections and channels used. Don’t open and close connections or channels repeatedly.
- Keep
RABBITMQ_ERLANG_COOKIE
secure and secret. - Keep queues as short as possible.
- Optimally, you should have as many queues as cores.
- Use an odd number of nodes, typically 3x nodes.
- Partition handling strategy: in doubt, use
autoheal
. On EC2, typically use pause-minority
. - Use NTP to keep clocks in sync.
- Restrict ports to the minimum necessary.
- (HA) Queue mirroring: Replicate queues on all mirrors.
- Datadog RabbitMQ monitoring.
- The recommended metric collection interval is 15 second. To collect at an interval which is closer to real-time, use 5 second - but not lower.