Correct use of ConcurrentHashMap
Monday January 7, 2008 by Derek Young
ConcurrentHashMap has been pitched as a simple alternative for HashMap, eliminating the need for a synchronized blocks. I had some simple event counting code that created count records on the fly. Although I could have used synchronized blocks for safety I used ConcurrentHashMap for this situation, partly for efficiency but mostly for the exercise. Going through this made me realize how carefully ConcurrentHashMap must be used for your code to work correctly and efficiently.
When using a HashMap, the standard idiom to add a value if it doesn’t exist is to use code that looks something like this:
synchronized (this) { Record rec = records.get(id); if (rec == null) { rec = new Record(id); records.put(id, rec); } return rec; }
If you were to simply replace HashMap with ConcurrentHashMap and remove the synchronized keyword your code would be exposed to a race condition. If a new Record was put into the map just after the call to get returned null the put operation would overwrite the value. You could add synchronized back in but this defeats the purpose of using ConcurrentHashMap.
To safely create values on demand you must use putIfAbsent (and avoid making extra calls to get in the process).
First check to see if a value with the key already exists in the map and use this value if it does. Otherwise, create a new value for the map and add it with putIfAbsent. putIfAbsent returns any existing value if there is one, otherwise null (this is why ConcurrentHashMap can’t contain null values).
private ConcurrentMap<String, Record> records = new ConcurrentHashMap<String, Record>(); private Record getOrCreate(String id) { Record rec = records.get(id); if (rec == null) { // record does not yet exist Record newRec = new Record(id); rec = records.putIfAbsent(id, newRec); if (rec == null) { // put succeeded, use new value rec = newRec; } } return rec; }
If putIfAbsent does return a value, it’s the one that must be used. It may have already been used by other threads at this point. The new value created must be abandoned. Although it sounds wasteful this case should happen very infrequently.
I’ve seen other code on the net that ignores the return value of putIfAbsent and makes another call to get at the end to figure out which value made it into the map (the new value created or a value from another thread). Although this will work it introduces an unnecessary lookup.

Java coding decisions I struggle with Frustration a necessary part of coding?

Sounds clever, but don’t do this… It will only work in a reliable way with atomic objects (Integer, Long, ...).
Without synchronization you will get problems (on multicore processor computers). It’s possible that another thread accesses a partly initialized object (some fields might not yet be initialized). You have to see it yourself before you believe this.
It’s quite easy to make a test case that will show the problems on a multicore processor computer. I just can’t directly point an example. I once had to try it myself before i understood how JMM works.
I’d recommend reading articles about Java Memory Model and there’s also a nice book called Java concurrency in practice.
Usually it’s better to reduce information that is shared between threads. you must always synchronize when you pass information across threads. Both threads must also synchronize on the same object monitor, synchronizing on 2 different monitors isn’t enough (you can’t rely on some logical order that makes it seem like some situation isn’t possible in execution order).
This presentation might help to understand the point:
http://www.javapolis.com/JP05Content/talks/day3/brian2/index.html
— Lari H Jan 18, 05:21 PM #
@Lari – ConcurrentHashMap provides the necessary synchronization barriers internally so puts from one thread are seen from another (hence the name, ConcurrentHashMap). As stated in the documentation, “all operations are thread-safe”. There is no need for an additional synchronized block as long as you use the collection carefully as I describe.
Now if you’re talking about my example Record object, it is correct that Record must take care of synchronizing its methods, if necessary. Usually there is a lot less contention on the value in a map than the map itself, so this is acceptable.
— Derek Young Jan 18, 05:39 PM #
One more addition: you must always synchronize when you pass new objects or modified object fields across threads. both threads must synchronize on the same object monitor: the thread creating or modifying a field and the thread reading or modifying the field.
Sometimes you might like to take some risk when the objects are fully immutable and they have been created early before another thread accesses it. I think that’s the only case where you can safely use same information across multiple threads without synchronization. You might hit the “partly initialized object”- problem if the immutable object gets hit by another thread exactly at the same time while it’s still “under construction”. If the code can execute at the same time, remember to synchronize.
— Lari H Jan 18, 05:40 PM #
Yes, If the Record class takes care of synchronization of all possible field access, then it’s safe. You must also synchronize code in immutable child objects (and their fields) if you want to make sure that you don’t hit a “partly initialized object”. I think it’s easier to just synchronize at a higher level.
I think most will just forget that any child object field access (also immutable) must be synchronized if you don’t synchronize earlier.
I’ve had problems in this area in a very high transactional system running on several multicore processors. The problems will even get more worse in the future when computers get even more cores.
Understanding this rule was a paradigm change for me.
I know it’s a bad moment when one realizes that there’s a bug in my multithread code and that’s been causing the strange problems (“that must be JVM bugs”). :)
This has happened to me and many others…
You won’t see the problems if your application isn’t highly concurrent, but if it is, then it’s a disaster to not understand the JMM and the rules it sets.
— Lari H Jan 18, 05:54 PM #
Quoting JMM FAQ
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#synchronization
“Important Note: Note that it is important for both threads to synchronize on the same monitor in order to set up the happens-before relationship properly. It is not the case that everything visible to thread A when it synchronizes on object X becomes visible to thread B after it synchronizes on object Y. The release and acquire have to “match” (i.e., be performed on the same monitor) to have the right semantics. Otherwise, the code has a data race.”
ConcurrentHashMap won’t shield from “data races”. It just keeps the HashMap data structure thread safe. This is very important to understand. Your code example “Correct use of ConcurrentHashMap” is basicly another form of double checked locking (DCL).
DCL doesn’t work.
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#dcl
It would be cool if ConcurrentHashMap had a callback interface for creating an absent object (object factory provided in the call). The object factory would be called inside the correct object monitor (synchronization). This is on my JVM wishlist…
I think some cache frameworks have this kind of feature.
— Lari H Jan 18, 06:36 PM #
I am well aware of the Java memory model and the dangers of double checked locking. ConcurrentHashMap internally uses volatile values to safely allow one thread to read what another thread has written. The article you pointed too also makes this clear: “Under the new memory model, making the instance field volatile will “fix” the problems with double-checked locking because then there will be a happens-before relationship between the initialization of the Something by the constructing thread and the return of its value by the thread that reads it”. This is not exactly the pattern ConcurrentHashMap follows, but it does make guarantees about the safety of threads reading and writing values at the same time. The example I give is not operating without using any synchronization primitives, it’s just that the synchronization is not visible since it’s handled inside the map.
If you think of it another way, if an additional synchronized block was necessary, ConcurrentHashMap would be of no value. You could use any simple Map implementation.
— Derek Young Jan 18, 07:54 PM #
You are right about using ConcurrentHashMap.
Things have changed in Java 1.5. Good thing that you pointed this out! Thanks!
I mentioned about the concurrency problems in a system I was building.
The system ran on IBM Java 1.4.2 + backport concurrent util. The backported ConcurrentHashMap in pre-1.5 jvms behaves in a different way because of the old memory model. I bet there are still some people stuck on pre-1.5 jvms and they are using ConcurrentHashMap backport. ConcurrentHashMap is not really useable in pre-1.5 JVMs. (I thought that the same applies to Java 1.5)
quoting http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
“Nothing in the old memory model treated final fields differently from any other field—meaning synchronization was the only way to ensure that all threads see the value of a final field that was written by the constructor. As a result, it was possible for a thread to see the default value of the field, and then at some later time see its constructed value. This means, for example, that immutable objects like String can appear to change their value—a disturbing prospect indeed.”
— Lari H Jan 18, 09:15 PM #
Its worth pointing out that, had creation been expensive, you could have instead used a memorizior idiom and leverage futures to create a blocking call. This makes it very easy to roll your own self-populating map.
— Ben Jan 19, 05:29 AM #
Phew, for moment I was getting very nervous. My production code uses ConcurrentHashMap(1.5) and I’ve always assumed read/write to be atomic. App is running on 2 cpu box and highly concurrent and I haven’t noticed any partial read in any of my load test where the cpus are excercised quiet a bit.
— Kumar Pandey Feb 8, 12:55 PM #
If you need a map that will just be populated once initially and then access there after you might consider using a CopyOnWriteMap. That will use no synchronization at all. ConcurrentHashMap still uses a lock for readValueUnderLock.
— Robert DiFalco Apr 13, 08:58 PM #