注册

ART虚拟机 | 锁

本文基于Android 11(R)


Java中对临界区的锁定通常用synchronize代码块完成,因此标题中的“锁”实际上是对synchronize关键字的剖析。Synchronize代码块使用时必须传入一个对象,这个对象可以是this对象,可以是类对象(e.g. Foo.class),也可以是任何其他对象。因此我们可以说,锁的状态和对象关联。亦或者,每个对象天生都是一把锁。


Synchronize生成的字节码会对应两条指令,分别是monitor-entermonitor-exit。下面我们针对monitor_enter,分别从解释执行和机器码执行两个方向去寻找这个指令的最终实现。


解释执行


[art/runtime/interpreter/interpreter_switch_impl-inl.h]


HANDLER_ATTRIBUTES bool MONITOR_ENTER() {
...
ObjPtr<mirror::Object> obj = GetVRegReference(A());
if (UNLIKELY(obj == nullptr)) {
...
} else {
DoMonitorEnter<do_assignability_check>(self, &shadow_frame, obj); <===调用
...
}
}
复制代码

[art/runtime/interpreter/interpreter_common.h]


static inline void DoMonitorEnter(Thread* self, ShadowFrame* frame, ObjPtr<mirror::Object> ref)
NO_THREAD_SAFETY_ANALYSIS
REQUIRES(!Roles::uninterruptible_) {
...
StackHandleScope<1> hs(self);
Handle<mirror::Object> h_ref(hs.NewHandle(ref)); <===调用
h_ref->MonitorEnter(self);
...
}
复制代码

[art/runtime/mirror/object-inl.h]


inline ObjPtr<mirror::Object> Object::MonitorEnter(Thread* self) {
return Monitor::MonitorEnter(self, this, /*trylock=*/false); <===调用
}
复制代码

解释执行会使用switch-case方式分别解析每一条指令,由上述代码可知,monitor-enter指令最终会调用Monitor::MonitorEnter静态函数。


机器码执行


[art/runtime/arch/arm64/quick_entrypoints_arm64.S]


ENTRY art_quick_lock_object_no_inline
// This is also the slow path for art_quick_lock_object.
SETUP_SAVE_REFS_ONLY_FRAME // save callee saves in case we block
mov x1, xSELF // pass Thread::Current
bl artLockObjectFromCode // (Object* obj, Thread*) <===调用
...
END art_quick_lock_object_no_inline
复制代码

[art/runtime/entrypoints/quick/quick_lock_entrypoints.cc]


extern "C" int artLockObjectFromCode(mirror::Object* obj, Thread* self){
...
if (UNLIKELY(obj == nullptr)) {
...
} else {
ObjPtr<mirror::Object> object = obj->MonitorEnter(self); // May block <===调用
...
}
}
复制代码

[art/runtime/mirror/object-inl.h]


inline ObjPtr<mirror::Object> Object::MonitorEnter(Thread* self) {
return Monitor::MonitorEnter(self, this, /*trylock=*/false); <===调用
}
复制代码

殊途同归,机器码执行时最终也会调用Monitor::MonitorEnter


锁的两种形态


虚拟机中将锁实现为两种形态,一种称为Thin Lock,另一种称为Fat Lock。


Thin Lock用于竞争较弱的场景。在竞争发生时,采用自旋(spin)和让渡CPU(yield)的方式等待锁,而不是进行系统调用和上下文切换。当持有锁的线程很快完成操作时,短暂的自旋会比上下文切换开销更小。


可是如果自旋一段时间发现还无法获取到锁时,Thin Lock就会膨胀为Fat Lock,一方面增加数据结构存储与锁相关的具体信息,另一方面通过系统调用挂起线程。


总结一下,Fat Lock功能健全,但开销较大。而Thin Lock开销虽小,但无法用于长时间等待的情况。所以实际的做法是先使用Thin Lock,当功能无法满足时再膨胀为Fat Lock。


文章开头提到,每个对象天生都是一把锁。那么这个锁的信息到底存在对象的什么位置呢?


答案是存在art::mirror::Object的对象头中(详见ART虚拟机 | Java对象和类的内存结构)。对象头中有一个4字节长的字段monitor_,其中便存储了锁相关的信息。


monitor字段.png


4字节共32bits,高位的两个bits用于标记状态。不同的状态,存储的信息含义也不同。两个bits共4种状态,分别为ThinOrUnlock(Thin/Unlock共用一个状态),Fat,Hash和ForwardingAddress。ThinOrUnlock和Fat表示锁的状态,Hash是为对象生成HashMap中所用的哈希值,ForwardingAddress是GC时使用的状态。


上图中的m表示mark bit state,r表示read barrier state,都是配合GC使用的标志,在讨论锁的时候可以不关心。


当我们对一个空闲对象进行monitor-enter操作时,锁的状态由Unlock切换到Thin。代码如下。


switch (lock_word.GetState()) {
case LockWord::kUnlocked: {
// No ordering required for preceding lockword read, since we retest.
LockWord thin_locked(LockWord::FromThinLockId(thread_id, 0, lock_word.GCState()));
if (h_obj->CasLockWord(lock_word, thin_locked, CASMode::kWeak, std::memory_order_acquire)) {
...
return h_obj.Get(); // Success!
}
continue; // Go again.
}
复制代码

LockWord对象的大小就是4字节,所以可以将它等同于art::mirror::Objectmonitor_字段,只不过它内部实现了很多方法可以灵活操作4字节中的信息。锁状态切换时,将当前线程的thread id(thread id并非tid,对每个进程而言它都从1开始)存入monitor_字段,与GC相关的mr标志保持不变。


当对象被线程锁定后,假设我们在同线程内对该它再次进行monitor-enter操作,那么就会发生Thin Lock的重入。如果在不同线程对该对象进行monitor-enter操作,那么就会发生Thin Lock的竞争。代码和流程图如下。


case LockWord::kThinLocked: {
uint32_t owner_thread_id = lock_word.ThinLockOwner();
if (owner_thread_id == thread_id) {
uint32_t new_count = lock_word.ThinLockCount() + 1;
if (LIKELY(new_count <= LockWord::kThinLockMaxCount)) {
LockWord thin_locked(LockWord::FromThinLockId(thread_id,
new_count,
lock_word.GCState()));
if (h_obj->CasLockWord(lock_word,
thin_locked,
CASMode::kWeak,
std::memory_order_relaxed)) {
AtraceMonitorLock(self, h_obj.Get(), /* is_wait= */ false);
return h_obj.Get(); // Success!
}
continue; // Go again.
} else {
// We'd overflow the recursion count, so inflate the monitor.
InflateThinLocked(self, h_obj, lock_word, 0);
}
} else {
// Contention.
contention_count++;
Runtime* runtime = Runtime::Current();
if (contention_count <= runtime->GetMaxSpinsBeforeThinLockInflation()) {
sched_yield();
} else {
contention_count = 0;
// No ordering required for initial lockword read. Install rereads it anyway.
InflateThinLocked(self, h_obj, lock_word, 0);
}
}
continue; // Start from the beginning.
}
复制代码

ThinLock.png


在ThinLock膨胀为FatLock前,需要执行50次sched_yieldsched_yield会将当前线程放到CPU调度队列的末尾,这样既不用挂起线程,也不用一直占着CPU。不过android master分支已经将这个流程再度优化了,在50次sched_yield之前,再执行100次自旋操作。和sched_yield相比,自旋不会释放CPU。由于单次sched_yield耗时也有微秒,对于锁持有时间极短的情况,用自旋更省时间。


接下来介绍锁的膨胀过程。


void Monitor::InflateThinLocked(Thread* self, Handle<mirror::Object> obj, LockWord lock_word,
uint32_t hash_code) {
DCHECK_EQ(lock_word.GetState(), LockWord::kThinLocked);
uint32_t owner_thread_id = lock_word.ThinLockOwner();
if (owner_thread_id == self->GetThreadId()) {
// We own the monitor, we can easily inflate it.
Inflate(self, self, obj.Get(), hash_code);
} else {
ThreadList* thread_list = Runtime::Current()->GetThreadList();
// Suspend the owner, inflate. First change to blocked and give up mutator_lock_.
self->SetMonitorEnterObject(obj.Get());
bool timed_out;
Thread* owner;
{
ScopedThreadSuspension sts(self, kWaitingForLockInflation);
owner = thread_list->SuspendThreadByThreadId(owner_thread_id,
SuspendReason::kInternal,
&timed_out);
}
if (owner != nullptr) {
// We succeeded in suspending the thread, check the lock's status didn't change.
lock_word = obj->GetLockWord(true);
if (lock_word.GetState() == LockWord::kThinLocked &&
lock_word.ThinLockOwner() == owner_thread_id) {
// Go ahead and inflate the lock.
Inflate(self, owner, obj.Get(), hash_code);
}
bool resumed = thread_list->Resume(owner, SuspendReason::kInternal);
DCHECK(resumed);
}
self->SetMonitorEnterObject(nullptr);
}
}
复制代码

void Monitor::Inflate(Thread* self, Thread* owner, ObjPtr<mirror::Object> obj, int32_t hash_code) {
DCHECK(self != nullptr);
DCHECK(obj != nullptr);
// Allocate and acquire a new monitor.
Monitor* m = MonitorPool::CreateMonitor(self, owner, obj, hash_code);
DCHECK(m != nullptr);
if (m->Install(self)) {
if (owner != nullptr) {
VLOG(monitor) << "monitor: thread" << owner->GetThreadId()
<< " created monitor " << m << " for object " << obj;
} else {
VLOG(monitor) << "monitor: Inflate with hashcode " << hash_code
<< " created monitor " << m << " for object " << obj;
}
Runtime::Current()->GetMonitorList()->Add(m);
CHECK_EQ(obj->GetLockWord(true).GetState(), LockWord::kFatLocked);
} else {
MonitorPool::ReleaseMonitor(self, m);
}
}
复制代码

膨胀(Inflate)的具体操作比较简单,简言之就是创建一个Monitor对象,存储更多的信息,然后将Monitor Id放入原先的monitor_字段中。


关键的地方在于膨胀的充分条件。如果Thin Lock本来就由本线程持有,那么膨胀不需要经过任何人同意,可以直接进行。但如果该Thin Lock由其他线程持有,那么膨胀之前必须先暂停(这里的暂停并不是指将线程从CPU上调度出去,而是不允许它进入Java世界改变锁状态)持有线程,防止膨胀过程中对锁信息的更新存在竞争。膨胀之后,持有线程恢复运行,此时它看到的Lock已经变成了Fat Lock。


当锁膨胀为Fat Lock后,由于持有锁的动作并未完成,所以该线程会再次尝试。只不过这次走的是Fat Lock分支,执行如下代码。


case LockWord::kFatLocked: {
// We should have done an acquire read of the lockword initially, to ensure
// visibility of the monitor data structure. Use an explicit fence instead.
std::atomic_thread_fence(std::memory_order_acquire);
Monitor* mon = lock_word.FatLockMonitor();
if (trylock) {
return mon->TryLock(self) ? h_obj.Get() : nullptr;
} else {
mon->Lock(self);
DCHECK(mon->monitor_lock_.IsExclusiveHeld(self));
return h_obj.Get(); // Success!
}
}
复制代码

{
ScopedThreadSuspension tsc(self, kBlocked); // Change to blocked and give up mutator_lock_.

// Acquire monitor_lock_ without mutator_lock_, expecting to block this time.
// We already tried spinning above. The shutdown procedure currently assumes we stop
// touching monitors shortly after we suspend, so don't spin again here.
monitor_lock_.ExclusiveLock(self);
}
复制代码

上述代码的ScopedThreadSuspension对象用于完成线程状态的切换,之所以叫scoped,是因为它是通过构造和析构函数完成状态切换和恢复的。在作用域内的局部变量会随着作用域的结束而自动析构,因此花括号结束,线程状态也就由Blocked切换回Runnable了。


最终调用monitor_lock_(Mutex对象)的ExclusiveLock方法。


void Mutex::ExclusiveLock(Thread* self) {
if (!recursive_ || !IsExclusiveHeld(self)) {
#if ART_USE_FUTEXES
bool done = false;
do {
int32_t cur_state = state_and_contenders_.load(std::memory_order_relaxed);
if (LIKELY((cur_state & kHeldMask) == 0) /* lock not held */) {
done = state_and_contenders_.CompareAndSetWeakAcquire(cur_state, cur_state | kHeldMask);
} else {
...
if (!WaitBrieflyFor(&state_and_contenders_, self,
[](int32_t v) { return (v & kHeldMask) == 0; })) {
// Increment contender count. We can't create enough threads for this to overflow.
increment_contenders();
// Make cur_state again reflect the expected value of state_and_contenders.
cur_state += kContenderIncrement;
if (UNLIKELY(should_respond_to_empty_checkpoint_request_)) {
self->CheckEmptyCheckpointFromMutex();
}
do {
if (futex(state_and_contenders_.Address(), FUTEX_WAIT_PRIVATE, cur_state,
nullptr, nullptr, 0) != 0) {
...
cur_state = state_and_contenders_.load(std::memory_order_relaxed);
} while ((cur_state & kHeldMask) != 0);
decrement_contenders();
}
}
} while (!done);
...
exclusive_owner_.store(SafeGetTid(self), std::memory_order_relaxed);
RegisterAsLocked(self);
}
recursion_count_++;
...
}
复制代码

Mutex::ExclusiveLock最终通过futex系统调用陷入内核态,在内核态中将当前线程从CPU中调度出去,实现挂起。值得注意的是,FatLock中依然有spin和yield的操作(WaitBrieflyFor函数),这是因为Thin Lock一旦膨胀为Fat Lock就很难deflate回去,而后续对Fat Lock的使用依然会碰到短时持有锁的情况,这也意味先前的优化此处依然可用。


上面这一块代码算是锁的核心实现,被调用的次数也非常多,因此任何一点微小的优化都很重要。我之前写过一篇文章调试经验 | C++ memory order和一个相关的稳定性问题详细分析了一个由memory order使用错误导致的线程卡死的问题,其中还介绍了C++的memory order,它也正是Java volatile关键字的(ART)底层实现。


此外我还给谷歌提过ExclusiveLock的bug,这个bug既会消耗battery,也会在某些情况下导致系统整体卡死。下面是谷歌的具体回复,感兴趣的可以查看修复


Hans Reply.png


作者:芦航
链接:https://juejin.cn/post/6956213033806872606
来源:掘金
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

0 个评论

要回复文章请先登录注册