ActiveMQ is often a critical component in Enterprise systems, and therefore High Availability (HA) is a “must have” in order to meet production Service Level Agreements (SLA). This blog aims at providing a deployment architecture based on a set of Apache ActiveMQ brokers wired up to a SQL database.
This blog targets Apache ActiveMQ “classic” as opposed to Apache ActiveMQ Artemis.
Failover architecture for high availability
What are we trying to achieve?
A single instance of ActiveMQ can receive and deliver a very high volume of messages. It is very easy to increase performance and handle more messages by simply growing the server running ActiveMQ (vertical scalability). Of course, it’s also possible to use horizontal scalability based on the network of brokers deployment. We’ll look at this deployment methodology in another article.
Let’s assume that you can easily serve messages with a single instance, and you now want to make sure your messaging service is not a single point of failure for your production environment. You will need another (slave) instance to be ready to take the lead in order to accept and deliver your messages.
You may also have another production site to do disaster recovery and provide high availability to your system.
Failover fits very well in these cases where you can serve your request with a master, and you can have one or multiple slave servers ready to go when the master goes down for one reason or another.
On the other hand, failover isn’t the right approach to horizontally scale your system or to implement an active/active messaging service.
Shared SQL store versus KahaDB store
Apache ActiveMQ relies on a data store to persist your messages. This is useful to avoid flooding the broker’s memory in case you produce more messages than what you can consume or if you want to make sure even if the server goes down, all unconsumed messages won’t be lost.
ActiveMQ has some built-in support for various data stores. The default and most common one is KahaDB which is a file-based data store. It is optimized for high throughput. When messages aren’t consumed as fast as they are produced, or some of them stay unconsumed, or messages aren’t equally balanced across destinations, it may generate disk issues (see this article: https://www.tomitribe.com/blog/kahadb-logs-increasing-when-messages-are-purged/).
In the proposed failover architecture, we will be using a SQL database instead. It’s a bit slower for high throughput but does not have all file issues you may encounter with KahaDB because the database will be carefully optimizing disk operations.
In the context of the proposed failover high availability, it’s also easier to rely on SQL databases to share the data across multiple servers. Additionally, the database can be inspected with standard tools, and organizations are usually more confident with a SQL Store because they have expertise already.
ActiveMQ essentially relies on three main tables
- ACTIVEMQ_MSGS: this is where the data is stored. All destinations are in the same table, and ActiveMQ will perform insertion and deletion based on producer/consumer activities.
- ACTIVEMQ_ACKS: this is for durable subscribers, so it may remain empty.
- ACTIVEMQ_LOCKS: this is the table to store locks so a slave can become a master at any time. It essentially holds a timestamp with the name of the current master broker name.
Using the previous database structure allows you to know at any time which server is the master. You can also monitor the broker name column to figure out when the master changes.
Key differences between default database locker and lease database locker
Apache ActiveMQ has two lock strategies to implement the master/slave mechanism.
This is the default locker for a JDBC store. This locker opens a JDBC transaction against a database table (activemq_lock) that lasts as long as the broker remains alive. This locks the entire table and prevents another broker from accessing the store. In most cases, this will be a fairly long-running JDBC transaction that occupies resources on the database over time.
A problem with this locker can arise when the master broker crashes or loses its connection to the database causing the lock to remain in the database until the database responds to the half-closed socket connection via a TCP timeout. The database lock expiry requirement can prevent the slave from starting sometimes. In addition, if the database supports failover and the connection is dropped in the event of a replica failover, that JDBC transaction will be rolled back. The broker sees this as a failure. Both master and slave brokers will again compete for a lock.
The configuration looks like this
<persistenceAdapter> <jdbcPersistenceAdapter dataSource="#mariadb-ds" lockKeepAlivePeriod="10000"> <locker> <database-locker lockAcquireSleepInterval="5000"/> </locker> </jdbcPersistenceAdapter> </persistenceAdapter>
The Lease Database Locker was created to solve the shortcomings of the Database Locker. The Lease Database Locker does not open a long running JDBC transaction. Instead, it lets the master broker acquire a lock that’s valid for a fixed (usually short) duration, after which it expires. To retain the lock, the master broker must periodically extend the lock’s lease before it expires. Simultaneously the slave broker checks periodically to see if the lease has expired. If for whatever reason, the master broker fails to update its lease on the lock, the slave will take ownership of the lock, becoming the new master in the process. The leased lock can survive a DB replica failover.
Like many things in ActiveMQ, you can implement your own locking strategy by implementing the Locker interface.
The configuration now looks like this
<persistenceAdapter> <jdbcPersistenceAdapter dataSource="#mariadb-ds" lockKeepAlivePeriod="10000"> <locker> <lease-database-locker lockAcquireSleepInterval="5000"/> </locker> </jdbcPersistenceAdapter> </persistenceAdapter>
A Docker-based example is available at https://github.com/tomitribe/activemq-failover-jdbc with all running instructions.
For bigger deployment or more critical SLA a network of broker approach is probably of a better fit. The failover approach is simpler and may solve already high availability problems.