Thursday, November 13, 2014

Envers: it's slower but at least it doesn't work in some cases

Why not Envers?

Envers is an auditing extension of Hibernate, following the mantra that Java developers should code in Java and not think in terms of the database. Java developers are shielded from the relational features of the database; SQL is evil as it leads to vendor lock-in (when was the last time you switched the database of an existing system?), and the ORM pretends that what you load from the database is objects. The performance often suffers, the framework at times does not support features databases have had from late 80ties and you have to fight its leaky abstraction and bugs. But back to Envers.

Envers comes with limitations that emerge from its very nature. It tracks the Java entities changed within the transaction and then issues inserts into the auditing tables and the revinfo table. For us the main problems of this approach are with its performance:

  • The number of resulting SQL statements is too high. Every insert/update/delete triggers another statement run against the database. These statements need to passed to the database over the wire, increasing the duration of the transaction.
  • Bulk updates, deletes are not supported. Operations such as 'DELETE FROM foo' are not auditable because Envers has no way to know which entities were affected. If you want auditing you have to update every entity separately, leading to potentially many updates and auditing inserts instead of one simple statement.
It should be said that when Envers knows about the entities it works correctly, so for most applications it can be good enough. We used Envers in the prototyping phase of our project but it made it hard for us to handle high loads. Hence, we investigated those relational features we are told to avoid. We put the database to work and replaced Envers with database triggers. There were some turns back and forth and in the end it took us four days to get to a state we are happy with. In this article I want to share our solution for PostgreSQL.

Auditing with database triggers

Auditing

First, let's define what we mean by auditing. We need to audit for regulatory reasons, so we need to know who (which customer, not database user) and when performed what operation. For those tables we need to audit we defined auditing tables which have the same columns as original tables.

Generic Trigger

First, we created a generic database trigger that inserts the row being modified into the auditing table:
There are few things to note here:

  • Auditing table for foo is foo_aud.
  • We used the PostgreSQL trigger variables: TG_TABLE_NAME, OLD, NEW, TG_OP.
  • The query is by design dynamic and must be run with 'EXECUTE ... USING ...'.
  • The construct '$1.column_name' references a column in a record. The record is the first parameter to the dynamic query and is either the NEW or the OLD trigger variable. You don't want to know how long it took me to figure this one out :).
  • The PostgreSQL transaction ID is used to find corresponding row in the revinfo table - see section Who and When bellow.
  • Along with the columns of the original table, the audited table has a varchar 'op' and a integer 'rev' column, containing the operation (insert, update, delete) and the rev from revinfo respectively.

The following script assigns the trigger:

Element collections

Auditing of updates to element collections works out of the box. The respective auditing tables need to have the two columns (op, rev) and the trigger must be assigned.

Who and when

The original goal was to know which user made the change and when. We ripped off Envers and introduced a table like the revinfo table. It contains one row for every transaction that modifies an audited table, together with the who and the when:
The table contains information about the application user, the time and the transaction ID. The transaction ID allows us to link auditing entries to a revinfo row. We cannot use transaction ID directly. It is unique only within one PostgreSQL installation. If we migrate the database to a different server old txid entries might clash with its txid sequence. With our solution, we only need to set txid to null and then migrate the database.

The problem that remains is inserting a row into this table for every write transaction. Ideally, we would insert it only when a transaction modifies an audited entity. So, we added an interface, @AuditedEntity, to mark the audited entities and we registered a listener on the very same events Envers does. The listener then inserts a row into the revinfo table when a transaction performs the first write operation to an audited entity.
A few points to note:

  • The insert must happen before the entity change due to the referential integrity from auditing table to the revinfo table.
  • We need to perform the revinfo insert with session.doWork(). A simple session.createSQLQuery() and then query.executeUpdate() would cause Hibernate to flush the dirty entities.
  • HQL/SQL bulk updates do not trigger Hibernate events, hence we must register the transaction manually using the register(SessionImpl session) method.
  • It is plain evil to use internal Hibernate classes like SessionImpl but we need methods from both SessionImplementor and Session interfaces. Those are combined in the EventSource but that is not available at the point of registering a transaction manually, there we only have the Session. It all boils down to the SessionImpl anyway so we bit the bullet and made the cast. We know we will suffer when we upgrade to Hibernate 5.
We register the events in a ServiceContributingIntegrator:

Conclusion

Our approach performs auditing on the database level. The application only needs to call one insert per write transaction to store user and time information under the transaction ID. The approach supports auditing of SQL/HQL bulk updates.

We haven't done any rigid performance test yet. But we do generate data for the application performance test; it results in inserting approximately 100k rows. The data generation took 50 seconds with Envers auditing, it takes only 35 seconds with database triggers so we look at the improvement of 30%.

Have we overlooked anything? Any suggestions are welcome.

Monday, May 26, 2014

Selenium in Sonar code coverage metrics

Goal

  • Run Selenium GUI tests as part of the build on Jenkins.
  • Collect code usage data to be able to report code coverage on Sonar.

Project Layout

  • Cloudex-parent
    • Core module - services, daos
    • GUI module - for instance Wicket web GUI
    • Test module - Selenium integration tests

Configuration

There are two relevant parts of Maven config - parent POM which defines JaCoCo offline instrumentation and the test POM which covers deployment to embedded Tomcat and running.

parent pom


Test POM

Just few more points:
  • The config module contains build folder with /lib folder with the configuration which needs to be in Tomcat.
  • We need to include jacoco-agent.properties file in the /lib folder from the previous point. It should contain at least a single line: destfile=${sonar.jacoco.itReportPath} (see offline instrumentation).
  • Selenium tests which should run are defined in the testng.xml.
  • We have a dedicated database on external server which is migrated and against which runs the deployed application.
  • The tests run in Firefox on a headless system (Jenkins master) thanks to Xvfb (X virtual frame buffer), so the Jenkins server must have Firefox and Xvfb installed.

Tuesday, July 30, 2013

Non-transactional tests with Spring and H2

The general paradigm for testing Spring based application back-end is to use a sort of unit testing which loads the whole application (the application context) and performs certain tests on it. Spring provides all sort of helper classes among others AbstractTestNGSpringContextTests and AbstractTransactionalTestNGSpringContextTests. I don't know when that decision happened but we always used the transactional one hence every test method run in a separate transaction which was rolled back after each method. It seems to me that such an approach is prevalent in current development yet it is as well flawed.

I bet you encountered cases when the test cannot be transactional, you want to test that your fix for a concurrent race condition works, the locking on the DB does really prevents another user to access something, etc. So you write a non-transactional test class, annotate it with the Spring annotation @DirtiesContext(classMode = ClassMode.AFTER_EACH_TEST_METHOD) and you're done. The context is reloaded several times for the start so it's not a big deal. But then you write tens of non-transactional tests and all of a sudden build with tests takes time.

That happened to us and made us thinking, do we need to recreate application context so often? The only reason we do it is that the DB is polluted by the non-transactional tests data. I started the investigation. We use in-memory H2 for our tests. I found that H2 has a wonderful feature which let's you drop its content easily: All you need is the following method in your test parent class:




So no more context recreation. It sped our test build from 1:30 to 55 seconds, boohoo. But than a heretic thought began to creep in our brains. What if we don't need to wrap each test method in a transaction? What if we don't gain much speed there? What if we actually oversee a bug here and there as our tests don't work the same as when client works with the application?

We gave it a try and made all tests non-transactional. The build takes the same time as before as populating and dropping in-memory DB is apparently not an issue, we would do the same in separate transactions anyway. The tests now mimic exactly use cases we want to test as often many transactions happen before the app is in certain state.

We found three bugs straight away. I hope you find some too.

Thursday, January 3, 2013

Non-blocking server push with Atmosphere, Wicket and Spring Security

Previous blog post described how to make Wicket, Spring Security and Atmosphere work together. However, it covered only one part of the story - the blocking transport technique (e.g. long polling). We tried to enable non-blocking protocol and run instantly into several problems emerging mainly from the necessity to use Spring Security filter chain along with Wicket filter and Atmosphere servlet:

The problems were:
  • Spring Security authentication did not work with NIO (non-blocking) connector on Tomcat. My guess is that the filter was not applied.
  • AtmosphereServlet is configurable via atmosphere.xml where you define AtmosphereHandlers. The problem is that you cannot define multiple filters similarly to web.xml.
  • AtmosphereFramework has a regexp which is used to map request path to AtmosphereHandlers mapped to /*. It does not contain underscore which is part of default form-login processing-url of SpringSecurity (j_spring_security_check).
Let's see how we dealt with these problems.

First of all, install the newest version of Tomcat 7 as non-blocking support was developed rather recently and it may not work properly with older versions. We used Tomcat 7.0.34.

Modify the HTTP connector in server.xml

Atmosphere will initiate non-blocking transport only if you include some context.xml in META-INF or exclude atmosphere-compat-tomcat7 from your classpath. We encountered a problem with SpringSecurity SessionFixationProtectionStrategy as request.getSession(true) did not return new session but this was fixed by using newer version of Atmosphere runtime. Our respective pom.xml part looks like this:
Then we need to define the two filters inside AtmosphereHandler defined in atmosphere.xml. We decided to do it programmatically. Here is our atmosphere.xml

And the CustomAtmosphereHandler looks like this:

CustomAtmosphereHandler was inspired by Atmosphere ReflectorServletProcessor. It creates both SpringSecurity and Wicket filters programmatically. In case of Wicket it loads the application in the same manner as the org.apache.wicket.spring.SpringWebApplicationFactory.
Custom FilterConfig gives us control over init parameters. Interestingly, Wicket-Atmosphere integration uses in the mapping filter mapping path from InitParam instead of the filter path set on the WicketFilter. Thus we need to add the init param FILTER_MAPPING_PARAM. We decided to do it this way because otherwise we will have filter configuration scattered over CustomAtmosphereHandler and web.xml.

Our web.xml is then fairly simple:

Now one last finishing touch:) We need to alter default SpringSecurity path so that it is mapped to our CustomAtmosphereHandler so modify the form-login line in your spring-security.xml:

Here is a link to a sample application: http://dl.dropbox.com/u/71731051/push-test.zip

Thursday, December 20, 2012

Apache Wicket + Spring Security + Atmosphere server push

We recently started using Apache Wicket for the front end of one of our new applications. The work on the prototype went fine, well, actually it was a pleasant surprise after two years with JSF.

We have decided to dive into more advanced topics like performance, clustering and server push as we know they'll come on the way and we want to be sure we made right decision by picking Wicket.

I spent last couple of days with my colleague figuring out configuration of Apache Wicket together with Spring Security and Atmosphere framework to allow for server push.

Server push is pretty sleek feature of modern web applications which allows client to subscribe for certain events on the server and then server notifies client (the page in Wicket case) when such an event occurs. Page refreshes without user interaction which comes handy for feeds, messages, etc. There are several transfer technologies at place - starting with long polling and streaming ending with web sockets and SSE. As usual support varies from browser to browser so we can't for instance use only web sockets as they are supported very well by Firefox and Chrome but IE supports them from version 10! (see http://caniuse.com/#search=websocket). Hence we decided to give a try to Atmosphere - framework developed by Jeanfrancois Arcand which claims to mitigate these issues from you. Wicket provides some experimental integration for Atmosphere (wicket-atmosphere - in version 0.5) so let's give it a try.

 (I assume you already have an application using Maven, Spring, Spring Security and Apache Wicket)
pom.xml




Add atmosphere.xml to webapp/META-INF.

The web.xml should look like this:

Few points to the web.xml:

  • We need to provide a custom Broadcaster to the Atmosphere as broadcast happens in different thread, here the SecurityContext is not set to the thread so it results in unauthenticated error hence a redirect to the login page:)
  • Definition of WicketFilter in atmosphere.xml does not allow us to add init-params. Though when AtmosphereFramework sets up filters it provides them servlet context so we can specify applicationFactoryClassName and applicationClassName and it will just work:)

The SpringSecurityAwareBroadcaster looks like this:

Now the application should start and we can use wicket-atmosphere EventBus and @Subscribe to get it working.

In your WicketApplication.java add to the init method:


Now you can subscribe for event in a page by adding method like:


Now let's trigger some event. We have a separate Maven module (let's call it core module) for services and DB access which is not aware of Wicket existence thus we needed to get events to the EventBus from services somehow. Here I would point out Google Guava com.google.common.eventbus.EventBus.

All we you need is to add
to the spring-context.xml of the core module. This can be then autowired to services and in our example in we would have a MessageServiceImpl like:


Then we have to register a handler to Guava EventBus which would then post events to Atmosphere EventBus. Go to the WicketApplication.java and add these few lines:


Now you can schedule for instance a cron job which submits new message every now and then and observe it popping up in the application.

One would thing that that is all. Well, not really. We found out that if you let the session expire it blows up. Well, long story short, Wicket has its WicketSession which has a session ID. Our login page is a Wicket page so this session ID is set from the first http session which is created when user enters the page. Spring Security has a SessionFixationProtectionStrategy which invalidates this HttpSession and creates new one with authentication in onAuthentication method. Hence WicketSession ID no longer matches HttpSession ID. That was not a problem until we started with Atmosphere.

Wicket-Atmosphere EventBus manages subscriptions to events. It also unsubscribes client when session expires (in unboundSession method). The subscription is identified by a page and by a session ID. I guess you sense the disruption in the Source here;). The problem is that client subscribes under the WicketSession ID which is the ID of the first HttpSession. When it unsubscribes in sessionUnbound() it uses the second HttpSession ID hence it does not unsubscribe client and EventBus keeps sending events to that subscription.

We handled this problem by adding a RequestCycleListner in the WicketApplication. This listener removes old HttpSession from WicketSession and binds the new HttpSession to it.


The Listener looks like this:



Well, we will be happy to hear your comments.