Atomic-XX与volatile实现的差异

Posted by CodingCat on May 9, 2015

今天问了连城一个脑洞大开的问题,JVM里volatile变量的happen-before原则,是否适用于offheap的空间,估计连神也是一把被我整懵了,第一反应猜测说,“如果调用的是 intrinsic,应该还适用,如果是 JNI 或别的什么,就不适用了吧”, 但是1秒之后又反问我,你确定能在offheap空间整出一个volatile变量? (连神还是发现了我奇异的脑洞)

其实我的本意是要在声明出一个包含volatile element的collection,然后把这个collection扔到offheap上去,那这些volatile变量是否还有原先的特性呢?

就这个问题经过一番搜索,我发现其实从语法上来说,基本是没法表达出offheap collection of the volatile elements这种东西的,这种需求往往是用AtomicIntegerArray来代替的,但是这真的是基于一个完美的匹配的代替吗?

###AtomicIntegerArray的实现###

在Java中,所有的原子类的存在目的就是提供API将multi steps的操作组成一个原子操作,例如为人熟知的getAndIncrement等等。这种multi steps的组合通常都是基于compareAndSet操作实现的,我们以AtomicInteger的getAndIncrement为例

getAndIncrement()
public final int getAndIncrement() {
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next))
return current;
}
}

所谓compareAndSet,可以理解为当且仅当当前变量值为current的时候,将其更新为next; 这个方法其实是避免使用锁来提高性能,并且还能达到atomic 更新的目的, 至于compareAndSet,其实是调用了unsafe.compareAndSwapInt(Object o, long offset, int expected, int x);

compareAndSet(int expect, int update)
public final boolean compareAndSet(int expect, int update) {
//valueOffSet是这个类所代表的实际整数值的偏移量 - 稍微解释一下
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}

###volatile的实现###

volatile与Atomic-xxx相比其应用场景就很不相同了。最根本的一点他不是线程安全的,volatile保证的是每一个线程对某一个volatile变量进行的读操作,返回的一定是进程范围内最后一次成功的写操作所产生的值; 同时所有发生在这个写操作之前的操作 (无论这个操作是不是之于volatile变量), 在发生于这个写操作之后的所有时刻都是可见的。

这是如何实现的呢?

其基本原理是编译器在产生对应于堆volatile变量的写操作的代码的时候,会添加一条指令:

0x01010101: lock addl $0x0,(%esp);

正是这一条lock指令的存在保证了上述的功能实现。为了更深入的理解这其中的奥妙,我们需要深入到x86处理器内部去看Lock指令的工作原理

参考intel最新的Intel® 64 and IA-32 Architectures Software Developer’s Manual

For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor.

For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow it’s cache coherency mechanism to ensure that the operation is carried out atomically. This operation is called “cache locking.” The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area.

所以在最新的处理器中,遇到lock指令,thread的缓存会被更改,并且数据被写回主存,这个操作一定是原子操作,那么如何使得其他处理器的缓存失效呢

When operating in an MP system, IA-32 processors (beginning with the Intel486 processor) and Intel 64 processors have the ability to snoop other processor’s accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus. For example, in the Pentium and P6 family processors, if through snooping one processor detects that another processor intends to write to a memory location that it currently has cached in shared state, the snooping processor will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the same memory location.

Beginning with the P6 family processors, if a processor detects (through snooping) that another processor is trying to access a memory location that it has modified in its cache, but has not yet written back to system memory, the
snooping processor will signal the other processor (by means of the HITM# signal) that the cache line is held in modified state and will preform an implicit write-back of the modified data. The implicit write-back is transferred
directly to the initial requesting processor and snooped by the memory controller to assure that system memory has been updated. Here, the processor with the valid data may pass the data to the other processors without actually
writing it to system memory; however, it is the responsibility of the memory controller to snoop this operation and update memory.

这里值得注意的一点是,在最新的处理器中,更新主存已经不是通过简单的CPU写回策略进行的了,而是处理期间直接传送有效数据,由内存管理器负责主存的更新。

参考文献

[1] http://www.infoq.com/cn/articles/ftf-java-volatile

[2] Intel® 64 and IA-32 Architectures Software Developer’s Manual