Why not Envers?
Envers is an auditing extension of Hibernate, following the mantra that Java developers should code in Java and not think in terms of the database. Java developers are shielded from the relational features of the database; SQL is evil as it leads to vendor lock-in (when was the last time you switched the database of an existing system?), and the ORM pretends that what you load from the database is objects. The performance often suffers, the framework at times does not support features databases have had from late 80ties and you have to fight its leaky abstraction and bugs. But back to Envers.
Envers comes with limitations that emerge from its very nature. It tracks the Java entities changed within the transaction and then issues inserts into the auditing tables and the revinfo table. For us the main problems of this approach are with its performance:
- The number of resulting SQL statements is too high. Every insert/update/delete triggers another statement run against the database. These statements need to passed to the database over the wire, increasing the duration of the transaction.
- Bulk updates, deletes are not supported. Operations such as
'DELETE FROM foo'
are not auditable because Envers has no way to know which entities were affected. If you want auditing you have to update every entity separately, leading to potentially many updates and auditing inserts instead of one simple statement.
Auditing with database triggers
Auditing
First, let's define what we mean by auditing. We need to audit for regulatory reasons, so we need to know who (which customer, not database user) and when performed what operation. For those tables we need to audit we defined auditing tables which have the same columns as original tables.
Generic Trigger
First, we created a generic database trigger that inserts the row being modified into the auditing table: There are few things to note here:
- Auditing table for foo is foo_aud.
- We used the PostgreSQL trigger variables:
TG_TABLE_NAME
,OLD
,NEW
,TG_OP
. - The query is by design dynamic and must be run with
'EXECUTE ... USING ...'
. - The construct '$1.column_name' references a column in a record. The record is the first parameter to the dynamic query and is either the
NEW
or theOLD
trigger variable. You don't want to know how long it took me to figure this one out :). - The PostgreSQL transaction ID is used to find corresponding row in the
revinfo
table - see section Who and When bellow. - Along with the columns of the original table, the audited table has a varchar 'op' and a integer 'rev' column, containing the operation (insert, update, delete) and the rev from revinfo respectively.
The following script assigns the trigger:
Element collections
Auditing of updates to element collections works out of the box. The respective auditing tables need to have the two columns (op, rev) and the trigger must be assigned.
Who and when
The original goal was to know which user made the change and when. We ripped off Envers and introduced a table like the revinfo table. It contains one row for every transaction that modifies an audited table, together with the who and the when: The table contains information about the application user, the time and the transaction ID. The transaction ID allows us to link auditing entries to a revinfo row. We cannot use transaction ID directly. It is unique only within one PostgreSQL installation. If we migrate the database to a different server old txid entries might clash with its txid sequence. With our solution, we only need to set txid to null and then migrate the database.
The problem that remains is inserting a row into this table for every write transaction. Ideally, we would insert it only when a transaction modifies an audited entity. So, we added an interface, @AuditedEntity
, to mark the audited entities and we registered a listener on the very same events Envers does. The listener then inserts a row into the revinfo
table when a transaction performs the first write operation to an audited entity.
A few points to note:
- The insert must happen before the entity change due to the referential integrity from auditing table to the
revinfo
table. - We need to perform the
revinfo
insert withsession.doWork()
. A simplesession.createSQLQuery()
and thenquery.executeUpdate()
would cause Hibernate to flush the dirty entities. - HQL/SQL bulk updates do not trigger Hibernate events, hence we must register the transaction manually using the
register(SessionImpl session)
method. - It is plain evil to use internal Hibernate classes like
SessionImpl
but we need methods from bothSessionImplementor
andSession
interfaces. Those are combined in theEventSource
but that is not available at the point of registering a transaction manually, there we only have theSession
. It all boils down to theSessionImpl
anyway so we bit the bullet and made the cast. We know we will suffer when we upgrade to Hibernate 5.
ServiceContributingIntegrator
:
Conclusion
Our approach performs auditing on the database level. The application only needs to call one insert per write transaction to store user and time information under the transaction ID. The approach supports auditing of SQL/HQL bulk updates.
We haven't done any rigid performance test yet. But we do generate data for the application performance test; it results in inserting approximately 100k rows. The data generation took 50 seconds with Envers auditing, it takes only 35 seconds with database triggers so we look at the improvement of 30%.
Have we overlooked anything? Any suggestions are welcome.
This comment has been removed by the author.
ReplyDeleteThe approach speaks for itself. I agree that Hibernate Envers may be good for prototypes but Audit Trailing with Triggers is always better performant and takes out the overhead from the application itself.
ReplyDeleteSimilarly do you have a generic script for generating the create table scripts for the audit tables ?
ReplyDeleteI don't. We used manually written DDL scripts.
DeleteThank you for sharing this approach. I found the stored procedure implementation particularly helpful and the use of the transaction ID is a good solution to an otherwise difficult problem.
ReplyDeleteThe one thing I'm wondering about that maybe you've already considered...we have an application that has used Envers for a while. We use the Envers query API for things like history reporting. I'd like to not re-write all that code. So instead of removing Envers completely, I wonder if it might be possible to implement a custom AuditStrategy then just change our configuration (org.hibernate.envers.audit_strategy setting) to use the new strategy. The custom AuditStrategy might just need to extend DefaultAuditStrategy and do nothing in perform and performCollectionChange (letting the triggers do all the heavy lifting).
My big concern is making sure the triggers write out the data in the same way that the query API expects, although if it doesn't then perhaps some of the other AuditStrategy methods could be overridden to handle that.
Any thoughts on this approach?
Hi Allen,
Deletewe used it only for auditing since we were obliged to audit changes. The requests to check who did what when were so rare we did not implement anything to view the history.
But the approach is heavily based on Envers and even features the 'revinfo' table so I imagine your extension may work just right. I wonder what would happen for 'delete from ...' or 'update ... set' queries but I guess the code for retrieval of history shouldn't care about that.
Since I was rather late to answer, have you succeeded to implement that?
How does 'AuditedEntity' interface look like for this code sample? Any code snippets would be helpful please.
ReplyDeleteThank you
Hi Anna,
Deleteit is a simple 'marker' interface:
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.TYPE)
public @interface AuditedEntity {
}
Thank you so much. Have a good day!!
DeleteHello,
DeleteMy team is researching on auditing to track the data changes. We researched and tried out some samples using Hibernate Envers. And I came across this article and wanted to validate other alternative to envers. We use postgres as our database, and postgres has its own audit strategy by having one giant table for all the action happen across all tables (We are currently using this for other projects and not completely happy), we are not sure yet if we want to go this route.
Under "A few points to note" section, bullet point#4 - what kind of problems you are foreseeing when hibernate upgrade happens?
Thank you.
Hi Anna,
ReplyDeleteat the time of writing, two years ago, Hibernate 5 was not out yet. Using internal implementation classes can cause troubles in the future but now that Hibernate is out you can just go and see for yourself if they changed anything relevant and eventually broke the solution and then you would need to find some other way. But the chances are there was no breaking change and the solution still works as is.