Hibernate has extensive support for caching built in (well, provided by various plugins, actually). This caching means that potentially expensive database operations can be avoided, especially in smaller apps (ie, non-clustered, which actually can go to quite a large size).
Most people who use Hibernate caching are used to having object instances cached. Hibernate can also cache query results. However, there are some things you need to be aware of when you do this.
Some background (which may be familar to you. If so, skip it): Hibernate uses two levels of caches. The first level, the “session cache” belongs to the currently open session. Objects “loaded” from the database live here: load the same object again, and you get another reference to it, instead of a second copy. This object is yours until you give it up. The session cache is always used, and can not be disabled.
Optionally, you can use a second-level cache, often referred to just as the cache as there are very few subtleties to the session cache. When I refer to a Hibernate cache, it’s the second-level cache I mean, unless I explictly say “session cache”. Using the second-level cache is a Good Thing(tm) generally. Hibernate has a “number”:http://www.hibernate.org/hib_docs/v3/reference/en/html/performance.html#performance-cache-readonly “of”:http://www.hibernate.org/hib_docs/v3/reference/en/html/performance.html#performance-cache-readwrite “strategies”:http://www.hibernate.org/hib_docs/v3/reference/en/html/performance.html#performance-cache-nonstrict for using the second-level cache, which generally revolve around how likely it is that someone else has changed the data in the database.
If you are not using a second-level cache, then stop reading. From here on in I assume you are.
Most times, when you load an object from the database, it looks to see if an object of the right type with that id is in the cache already. If so, it gives you a copy of it (you don’t get a direct reference as that could cause threading problems), rather than trying to recreate the object graph (which can be quite nasty).
So, let’s say you do a query, like so: @from Order o where o.customer = :customer@. As you can guess, this gets all the orders for the customer mentioned. Hibernate translates this query to SQL, executes it, gets the results back, iterates over them, checks to see if an Order is already in the cache, creates an instance if needed, then gives them back to you.
So far so good. Now for the point of the article (yes, you knew there was one in here somewhere…): you can “cache queries”:http://www.hibernate.org/hib_docs/v3/reference/en/html/performance.html#performance-querycache. This can be handy if a query is expensive to run, or if you are just going to call it a lot. When you cache a query, what you actually end up caching is a list of the ids of the objects – i.e. what Hibernate uses to determine if the object is in the second level cache or not. When you “execute” the query again (with the same parameters), Hibernate simply pulls the pre-determined results out of the cache, and populates the results out of the second-level cache.
Here’s one implication: when you “execute” a cached query, there is no certainty that the objects returned still match the criteria – they are not re-evaluated against the criteria to see if they fit (because that would need a database call). So you shouldn’t use criteria that change a lot if they are important.
*UPDATE:* As Gavin King notes in the comments, I was mistaken on this point. Hibernate keeps track of the tables used in the query, and checks against an internal cache of tables to see if those tables have been updated. If they have, the query is re-executed. This simultaneously makes the caches safer and less useful. Note that you can still get stale data – if something besides the instance of HIbernate is updating the database
Paying attention to interaction paths may help out here. For example, in the customer self-service kiosk, the user probably has already executed a query to get back all of his or her orders. So doing another database query to get the shipped ones is a little silly – the orders are already in the cache, so just filter that. Remember: while databases are king of querying, it is faster to iterate over a collection already in memory than it is to go and ask the database.
Another thing to remember when creating Hibernate queries (cached or otherwise) is how the objects interact with other objects. Sometimes it’s faster to walk an object tree and inspect data than it is to do a the query. This is particularly true if the criteria you are looking for isn’t indexed. And, of course, it isn’t worth caching via a query objects that get heavily updated.
2 thoughts on “Hibernate Queries, caching, and mutable criteria”
>> when you “execute” a cached query, there is no certainty that the objects returned still match the criteria
This is not really true. The only way Hibernate can return a stale query result from the query cache is if you have modified the database outside of Hibernate. If you go via Hibernate to update the database, the query cache is aware of that.
Thank you, Gavin. I stand corrected. I missed the UpdateTimestampCache functionality.