Issaac's Blog: 何時使用哪個spin

需要澄清的是，互斥手段的選擇，不是根據臨界區的大小，而是根據臨界區的性質，以及有哪些部分的代碼，即哪些內核執行路徑來爭奪。

從嚴格意義上說，semaphore和spinlock_XXX屬於不同層次的互斥手段，前者的實現有賴於後者，這有點象HTTP和TCP的關係，都是協議，但層次是不同的。

先說semaphore，它是進程級的，用於多個進程之間對資源的互斥，雖然也是在內核中，但是該內核執行路徑是以進程的身份，代表進程來爭奪資源的。如果競爭不上，會有context switch，進程可以去sleep，但CPU不會停，會接著運行其他的執行路徑。從概念上說，這和單CPU或多 CPU沒有直接的關係，只是在 semaphore本身的實現上，為了保證semaphore結構存取的原子性，在多CPU中需要spinlock來互斥。

在內核中，更多的是要保持內核各個執行路徑之間的數據訪問互斥，這是最基本的互斥問題，即保持數據修改的原子性。 semaphore的實現，也要依賴這個。在單CPU中，主要是中斷和bottom_half的問題，因此，開關中斷就可以了。在多CPU中，又加上了其他CPU的干擾，因此需要spinlock來幫助。這兩個部分結合起來，就形成了spinlock_XXX。它的特點是，一旦CPU進入了 spinlock_XXX，它就不會幹別的，而是一直空轉，直到鎖定成功為止。因此，這就決定了被spinlock_XXX鎖住的臨界區不能停，更不能 context switch，要存取完數據後趕快出來，以便其他的在空轉的執行路徑能夠獲得spinlock。這也是spinlock的原則所在。如果當前執行路徑一定要進行context switch，那就要在schedule()之前釋放spinlock，否則，容易死鎖。因為在中斷和bh中，沒有context，無法進行context switch，只能空轉等待spinlock，你context switch走了，誰知道猴年馬月才能回來。

因為spinlock的原意和目的就是保證數據修改的原子性，因此也沒有理由在spinlock
鎖住的臨界區中停留。

spinlock_XXX有很多形式，有

spin_lock()/spin_unlock()，
spin_lock_irq()/spin_unlock_irq()，
spin_lock_irqsave/spin_unlock_irqrestore()
spin_lock_bh()/spin_unlock_bh()

local_irq_disable/local_irq_enable
local_bh_disable/local_bh_enable

那麼，在什麼情況下具體用哪個呢？這要看是在什麼內核執行路徑中，以及要與哪些內核
執行路徑相互斥。我們知道，內核中的執行路徑主要有：

1 用戶進程的內核態，此時有進程context，主要是代表進程在執行系統調用
等。
2 中斷或者異常或者自陷等，從概念上說，此時沒有進程context，不能進行
context switch。
3 bottom_half，從概念上說，此時也沒有進程context。
4 同時，相同的執行路徑還可能在其他的CPU上運行。

這樣，考慮這四個方面的因素，通過判斷我們要互斥的數據會被這四個因素中
的哪幾個來存取，就可以決定具體使用哪種形式的spinlock。如果只要和其他CPU
互斥，就要用spin_lock/spin_unlock，如果要和irq及其他CPU互斥，就要用
spin_lock_irq/spin_unlock_irq，如果既要和irq及其他CPU互斥，又要保存
EFLAG的狀態，就要用spin_lock_irqsave/spin_unlock_irqrestore，如果
要和bh及其他CPU互斥，就要用spin_lock_bh/spin_unlock_bh，如果不需要和
其他CPU互斥，只要和irq互斥，則用local_irq_disable/local_irq_enable，
如果不需要和其他CPU互斥，只要和bh互斥，則用local_bh_disable/local_bh_enable，
等等。值得指出的是，對同一個數據的互斥，在不同的內核執行路徑中，
所用的形式有可能不同(見下面的例子)。

舉一個例子。在中斷部分中有一個irq_desc_t類型的結構數組變量irq_desc[]，
該數組每個成員對應一個irq的描述結構，裡面有該irq的響應函數等。
在irq_desc_t結構中有一個spinlock，用來保證存取(修改)的互斥。

對於具體一個irq成員，irq_desc[irq]，對其存取的內核執行路徑有兩個，一是
在設置該irq的響應函數時(setup_irq)，這通常發生在module的初始化階段，或
系統的初始化階段；二是在中斷響應函數中(do_IRQ)。代碼如下：

int setup_irq(unsigned int irq, struct irqaction * new)
{
int shared = 0;
unsigned long flags;
struct irqaction *old, **p;
irq_desc_t *desc = irq_desc + irq;

/*
* Some drivers like serial.c use request_irq() heavily,
* so we have to be careful not to interfere with a
* running system.
*/
if (new->flags & SA_SAMPLE_RANDOM) {
/*
* This function might sleep, we want to call it first,
* outside of the atomic block.
* Yes, this might clear the entropy pool if the wrong
* driver is attempted to be loaded, without actually
* installing a new handler, but is this really a problem,
* only the sysadmin is able to do this.
*/
rand_initialize_irq(irq);
}

/*
* The following block of code has to be executed atomically
*/
[1] spin_lock_irqsave(&desc->lock,flags);
p = &desc->action;
if ((old = *p) != NULL) {
/* Can't share interrupts unless both agree to */
if (!(old->flags & new->flags & SA_SHIRQ)) {
[2] spin_unlock_irqrestore(&desc->lock,flags);
return -EBUSY;
}

/* add new interrupt at end of irq queue */
do {
p = &old->next;
old = *p;
} while (old);
shared = 1;
}

*p = new;

if (!shared) {
desc->depth = 0;
desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);
desc->handler->startup(irq);
}
[3] spin_unlock_irqrestore(&desc->lock,flags);

register_irq_proc(irq);
return 0;
}

asmlinkage unsigned int do_IRQ(struct pt_regs regs)
{
/*
* We ack quickly, we don't want the irq controller
* thinking we're snobs just because some other CPU has
* disabled global interrupts (we have already done the
* INT_ACK cycles, it's too late to try to pretend to the
* controller that we aren't taking the interrupt).
*
* 0 return value means that this irq is already being
* handled by some other CPU. (or is disabled)
*/
int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */
int cpu = smp_processor_id();
irq_desc_t *desc = irq_desc + irq;
struct irqaction * action;
unsigned int status;

kstat.irqs[cpu][irq]++;
[4] spin_lock(&desc->lock);
desc->handler->ack(irq);
/*
REPLAY is when Linux resends an IRQ that was dropped earlier
WAITING is used by probe to mark irqs that are being tested
*/
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
status |= IRQ_PENDING; /* we _want_ to handle it */

/*
* If the IRQ is disabled for whatever reason, we cannot
* use the action we have.
*/
action = NULL;
if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
action = desc->action;
status &= ~IRQ_PENDING; /* we commit to handling */
status |= IRQ_INPROGRESS; /* we are handling it */
}
desc->status = status;

/*
* If there is no IRQ handler or it was disabled, exit early.
Since we set PENDING, if another processor is handling
a different instance of this same irq, the other processor
will take care of it.
*/
if (!action)
goto out;

/*
* Edge triggered interrupts need to remember
* pending events.
* This applies to any hw interrupts that allow a second
* instance of the same irq to arrive while we are in do_IRQ
* or in the handler. But the code here only handles the _second_
* instance of the irq, not the third or fourth. So it is mostly
* useful for irq hardware that does not mask cleanly in an
* SMP environment.
*/
for (;;) {
[5] spin_unlock(&desc->lock);
handle_IRQ_event(irq, &regs, action);
[6] spin_lock(&desc->lock);

if (!(desc->status & IRQ_PENDING))
break;
desc->status &= ~IRQ_PENDING;
}
desc->status &= ~IRQ_INPROGRESS;
out:
/*
* The ->end() handler has to deal with interrupts which got
* disabled while the handler was running.
*/
desc->handler->end(irq);
[7] spin_unlock(&desc->lock);

if (softirq_pending(cpu))
do_softirq();
return 1;
}

在setup_irq()中，因為其他CPU可能同時在運行setup_irq()，或者在運行setup_irq()時，
本地irq中斷來了，要執行do_IRQ()以修改desc->status。為了同時防止來自其他CPU和
本地irq中斷的干擾，如[1][2][3]處所示，使用了spin_lock_irqsave/spin_unlock_irqrestore()

而在do_IRQ()中，因為do_IRQ()本身是在中斷中，而且此時還沒有開中斷，本CPU中沒有
什麼可以中斷其運行，其他CPU則有可能在運行setup_irq()，或者也在中斷中，但這二者
對本地do_IRQ()的影響沒有區別，都是來自其他CPU的干擾，因此只需要用spin_lock/spin_unlock，
如[4][5][6][7]處所示。值得注意的是[5]處，先釋放該spinlock，再調用具體的響應函數。

再舉個例子：

static void tasklet_hi_action(struct softirq_action *a)
{
int cpu = smp_processor_id();
struct tasklet_struct *list;

[8] local_irq_disable();
list = tasklet_hi_vec[cpu].list;
tasklet_hi_vec[cpu].list = NULL;
[9] local_irq_enable();

while (list) {
struct tasklet_struct *t = list;

list = list->next;

if (tasklet_trylock(t)) {
if (!atomic_read(&t->count)) {
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
tasklet_unlock(t);
continue;
}
tasklet_unlock(t);
}

[10] local_irq_disable();
t->next = tasklet_hi_vec[cpu].list;
tasklet_hi_vec[cpu].list = t;
__cpu_raise_softirq(cpu, HI_SOFTIRQ);
[11] local_irq_enable();
}
}

這裡，對tasklet_hi_vec[cpu]的修改，不存在CPU之間的競爭，因為每個CPU有各自獨立的數據，
所以只要防止irq的干擾，用local_irq_disable/local_irq_enable即可，如[8][9][10][11]處所示。

Issaac's Blog

2007年11月2日星期五

何時使用哪個spin_lock?

linux程式學習

最愛連結

文章分類

網誌紀錄

##HIDEME##