Hibernate has long had a feature known as “query caches” – you can run a query, cache the result, and thus avoid running it repeatedly. The only problem is that it doesn’t do what you think it does.
Here’s an example of what I mean by a query cache:
List blogs = sess.createQuery("from Blog blog where blog.blogger = :blogger") .setEntity("blogger", blogger) .setMaxResults(15) .setCacheable(true) .setCacheRegion("frontpages") .list();
(The example code has been lifted from the Hibernate on-line manual
On the surface, this sounds like it would cache the results of the query until the (separately configured) cache expires the results. And if you never change any data, that’s what happens. The catch, however, is that any change to any of the criteria fields results in the cache being invalidated.
Let’s go for a realistic example. Customers have Orders. You create a page where customers can browse their order history. This doesn’t have to be up-to-the-minute, so you put a query cache in, with a suitable expiry policy. The query would look similar to the above: from Order order where order.customer = :customer
. So far so good. In testing, you verify that, yes, the query cache works and the database is not repeatedly called with queries.
However, when this goes live, the database starts getting swamped with these queries. It’s almost like the query cache isn’t getting used at all. Why? Because every time a customer (any customer) places an order, all the cached query results get removed – even if they were for a different customer.
This explains a little throw-away line in the Hibernate manual:
“Most queries do not benefit from caching, so by default queries are not cached.”
Why do most queries not benefit from caching? Because, in order to get cache hits, you need to essentially be querying for static or nearly-static data.
BTW, the same goes for collections. If your Customer class has a collection of Orders, you can define that collection to be cacheable; it’s a separate cache from the Customer object, but it’s cacheable. However, as soon as any order is made – or even merely modified – the cache will be invalidated.
This can make one-to-many relationships insanely expensive, particularly non-lazy ones; every time the entity goes into a session, the collection will be populated – even if the entity was already in the second level cache. If that collection cache is invalidated, as it probably is, then a database query will be run to populate the collection. If these objects also have a collection cache (think Customer -> Order -> Order LineItem), you will get N queries run. Go 4 levels (SalesPerson.getRecentCustomers()?), and you get n*m queries. Ouch.
Hibernate has very flexible caching policies for entities, but no flexibility around the relationships between the entities. This is a shame, particularly for the query caches – it means that for a number of types of applications, you need to build a separate caching layer above Hibernate, simply because the query caches get wiped far too often. At the very least, the Hibernate documentation could explain this better, elaborating on exactly why “most queries do not benefit from caching”.
I noticed that Hibernate’s cache is much less effective than I thought, but I hadn’t dug deep enough to come to these conclusions. Isn’t there a way to make Hibernate’s cache more effective instead of creating a separate cache layer above it?
I would suggest you to remove cache from your collections and relations and put all lazy true and test because if your cache is getting evicted the load/fetch process is much faster..much faster in collection without cache defined.
or i didn’t test yet … put the cache read-only and implement by your self the eviction policy .. i never tested a cache read-only on a modifiable entity yet..
Try implementing a StaleTolerantQueryCache as described here: http://www.hibernate.org/213.html
This kind of behavior from Hibernate makes perfect sense to me. How would Hibernate know when an update occurs on one entity that it doesn’t affect the cached query results. The implementation in Hibernate is generic and is designed to work correctly in all type of applications. Like you wrote in the conclusion of your article, if you need a different behavior you need to implement your own cache mechanism on top of Hibernate.
Bernard, how does Hibernate know if an update in the database occurs? It can’t. Therefore, for entity caching, Hibernate provides numerous cache modes for entities, where as a developer we can choose to trade off performance vs. staleness. Query caches could have the same. They don’t.
So one option would be to configure stale-tolerant vs. non-stale tolerant directly into the query cache.
Another alternative would be to monitor changes to entities. Hibernate knows which entities were returned in a query; it could monitor those to see if they still match. Updates to non-matching entities could be evaluated against cached queries to see if they should be inserted. (Both as configurable options; the former is more efficient than the latter)
So there are a number of options.
The Hibernate documentation also provides detailed breakdowns of how the entity caches works. There’s just a brief description of how to configure query and collection caching, but with no explanation of the implications (as mentioned above).
Hibernate already monitors changes to entities. Your sentences “it could monitor those to see if they still match” and “Updates to non-matching entities could be evaluated against cached queries to see if they should be inserted.” would be VERY VERY difficult to implement and would drive hibernate to take over the responsibility of the database itself. If you know Hibernate, you should be aware that it never does that, it’s not in its philosophy.
It’s true that Hibernate documentation, although good, is sometimes shallow on some aspects. In order to do efficient Hibernate development you really need a good Hibernate book on your side like ‘Hibernate in action’ or ‘Java Persistence with Hibernate’.
Granted, Bernard. I merely threw those out as options. Probably the only suggestion I made above that was serious was the option of making the queries themselves more tolerable of staleness – that is, to take something like the StaleTolerantQueryCache suggested above, and build it into Hibernate directly (including an option to defer to the caching system itself on how stale something can be).
For an alternative that is inline with the Hibernate philosophy, the event mechanism could be leveraged to allow vetoing of query cache evictions, even if the default Hibernate configuration never vetoed.
I own “Hibernate In Action”; I’ve read “Java Persistence with Hibernate”. Neither really covers query caching properly, from memory (my books are at work; I’m at home). According to the online table-of-contents, HIA gives 1 page, and JPwH gives 3-4.
In any case, this article isn’t an attack on Hibernate, or even a set of suggestions of what Hibernate could do differently. I stopped trying to do that (shortly after I started trying) when a discussion with Gavin King resulted in him saying “well, I’ll get the (EJB 3 persistence) spec changed then”. I use Hibernate, I recommend Hibernate, I think it’s a good tool, and I even enjoy pushing its limits and trying to work more inline with how it wants to work. But it’s not perfect, and when I come across limitations, I make a note here as an aide-mémoire. Nor do I pick on just Hibernate – lots of tools that I use get this treatment.
No worries. Interesting discussion anyway. As I have an heavy loaded application that somehow relies on cache to improve performance, your article raised a big fear in me. It made reread the part in JPA with Hibernate on query cache (chapter 15). This confirmed that my implementation was ok.
I am a regular reader of your blog and enjoying it. I usually don’t write comments but this time I thought your initial post needed clarification.
Cheers,
Bernard.