Why not Envers?
Envers is an auditing extension of Hibernate, following the mantra that Java developers should code in Java and not think in terms of the database. Java developers are shielded from the relational features of the database; SQL is evil as it leads to vendor lock-in (when was the last time you switched the database of an existing system?), and the ORM pretends that what you load from the database is objects. The performance often suffers, the framework at times does not support features databases have had from late 80ties and you have to fight its leaky abstraction and bugs. But back to Envers.
Envers comes with limitations that emerge from its very nature. It tracks the Java entities changed within the transaction and then issues inserts into the auditing tables and the revinfo table. For us the main problems of this approach are with its performance:
- The number of resulting SQL statements is too high. Every insert/update/delete triggers another statement run against the database. These statements need to passed to the database over the wire, increasing the duration of the transaction.
- Bulk updates, deletes are not supported. Operations such as
'DELETE FROM foo'
are not auditable because Envers has no way to know which entities were affected. If you want auditing you have to update every entity separately, leading to potentially many updates and auditing inserts instead of one simple statement.
Auditing with database triggers
Auditing
First, let's define what we mean by auditing. We need to audit for regulatory reasons, so we need to know who (which customer, not database user) and when performed what operation. For those tables we need to audit we defined auditing tables which have the same columns as original tables.
Generic Trigger
First, we created a generic database trigger that inserts the row being modified into the auditing table: There are few things to note here:
- Auditing table for foo is foo_aud.
- We used the PostgreSQL trigger variables:
TG_TABLE_NAME
,OLD
,NEW
,TG_OP
. - The query is by design dynamic and must be run with
'EXECUTE ... USING ...'
. - The construct '$1.column_name' references a column in a record. The record is the first parameter to the dynamic query and is either the
NEW
or theOLD
trigger variable. You don't want to know how long it took me to figure this one out :). - The PostgreSQL transaction ID is used to find corresponding row in the
revinfo
table - see section Who and When bellow. - Along with the columns of the original table, the audited table has a varchar 'op' and a integer 'rev' column, containing the operation (insert, update, delete) and the rev from revinfo respectively.
The following script assigns the trigger:
Element collections
Auditing of updates to element collections works out of the box. The respective auditing tables need to have the two columns (op, rev) and the trigger must be assigned.
Who and when
The original goal was to know which user made the change and when. We ripped off Envers and introduced a table like the revinfo table. It contains one row for every transaction that modifies an audited table, together with the who and the when: The table contains information about the application user, the time and the transaction ID. The transaction ID allows us to link auditing entries to a revinfo row. We cannot use transaction ID directly. It is unique only within one PostgreSQL installation. If we migrate the database to a different server old txid entries might clash with its txid sequence. With our solution, we only need to set txid to null and then migrate the database.
The problem that remains is inserting a row into this table for every write transaction. Ideally, we would insert it only when a transaction modifies an audited entity. So, we added an interface, @AuditedEntity
, to mark the audited entities and we registered a listener on the very same events Envers does. The listener then inserts a row into the revinfo
table when a transaction performs the first write operation to an audited entity.
A few points to note:
- The insert must happen before the entity change due to the referential integrity from auditing table to the
revinfo
table. - We need to perform the
revinfo
insert withsession.doWork()
. A simplesession.createSQLQuery()
and thenquery.executeUpdate()
would cause Hibernate to flush the dirty entities. - HQL/SQL bulk updates do not trigger Hibernate events, hence we must register the transaction manually using the
register(SessionImpl session)
method. - It is plain evil to use internal Hibernate classes like
SessionImpl
but we need methods from bothSessionImplementor
andSession
interfaces. Those are combined in theEventSource
but that is not available at the point of registering a transaction manually, there we only have theSession
. It all boils down to theSessionImpl
anyway so we bit the bullet and made the cast. We know we will suffer when we upgrade to Hibernate 5.
ServiceContributingIntegrator
:
Conclusion
Our approach performs auditing on the database level. The application only needs to call one insert per write transaction to store user and time information under the transaction ID. The approach supports auditing of SQL/HQL bulk updates.
We haven't done any rigid performance test yet. But we do generate data for the application performance test; it results in inserting approximately 100k rows. The data generation took 50 seconds with Envers auditing, it takes only 35 seconds with database triggers so we look at the improvement of 30%.
Have we overlooked anything? Any suggestions are welcome.