You've come to this page because you've made an erroneous claim about IRQ levels. These are the Frequently Given Answers to such claims.
This myth actually originates with Microsoft, and is an instance of an explanation so dumbed down that is actually wrong. It has been copied and parrotted by many people without correct attribution over the years, so it may appear that it originates with other people. But it's Microsoft's sales pitch for everyone to switch to local APICs whence it originates. The other people are simply parrotting Microsoft without actual understanding of the hardware involved.
8259 Hardware Is Slow
When the operating system raises or lowers IRQL, a new mask is written into the 8259 that enables only the interrupts allowed at this IRQL. Therefore, raising or lowering IRQL causes either two "out" instructions or software simulation of one sort or another. Each of these I/O instructions causes bus cycles that must make it all the way to the South Bridge and back.
Yes, 8259 hardware is located on the PCI-to-ISA/PCI-to-LPC bridge. But that only makes it as "slow" as an I/O space bus cycle to any other PCI device. The dual 8259s aren't actual ISA devices located on the ISA, or LPC, bus. They are implemented within the PCI-to-ISA bridge on pretty much all Intel and VIA chipsets, and are effectively ordinary PCI I/O space devices. They aren't even distinct devices. They are simply several more I/O space registers of the PCI-to-ISA/PCI-to-LPC bridge device itself.
The problem with 8259s isn't that they are slow. It's that they don't actually work in the way that everyone naïvely thinks them to work. This is another myth.
Even Mark Russinovich believes this myth. Here's what Windows Internals says on the matter:
HALs that use a PIC implement a performance optimization, called lazy IRQL, that avoids PIC accesses. When the IRQL is raised, the HAL notes the new IRQL internally instead of changing the interrupt mask. If a lower-priority interrupt subsequently occurs, the HAL sets the interrupt mask to the settings appropriate for the first interrupt and postpones the lower-priority interrupt until the IRQL is lowered. Thus, if no lower-priority interrupts occur while the IRQL is raised, the HAL doesn't need to modify the PIC.
This "Lazy IRQL" myth is predicated on a model of an 8259's operation that simply isn't true. One can find this model repeated all over the place, and it's how most people think an 8259 works. Here's a web diarist (who, you'll note, elsewhere on the same diary entry plagiarized Microsoft without attribution) propounding the myth:
The IMR. This register lets the programmer disable or "mask" individual interrupts so that the PIC doesn't interrupt the processor when the corresponding interrupt is signaled. For an interrupt to be disabled, its corresponding bit in the IMR must be 1. To be enabled, its bit must be 0. Interrupts can be enabled or disabled by the programmer by reading the IMR, setting or clearing the appropriate bits, then writing the new value back to the IMR.
Here's Open Systems Resources, Incorporated parrotting the same idea:
Each one of the IRQs is individually maskable, meaning it can be programmatically disabled via the 8259's Interrupt Mask Register (IMR). If an IRQ is masked, any device that is connected to that IRQ's requests for interrupts are ignored.
This is not how 8259s work, and there is explicit wording in
Intel's original datasheet in several places warning the reader that
8259s don't work this way. Designers are warned not to depend
on the INT
signal from the 8259 to go inactive for any
specific period of time; and systems programmers are warned that various
things are not affected by the IMR.
The simple truth is that the IMR operates upon the interrupt request lines
feeding into the IRR. It doesn't operate upon the generation of the
INT
output caused by the IRR. If an interrupt line signals
an interrupt, and an IRR bit is set to 1 as a result, then setting the
associated IMR bit to 1 has no effect. The INT
signal
from the chip remains active, and is not deactivated by the
interrupt request being "masked out". The IMR prevents new
interrupts from reaching the IRR, but it does not mask out
interrupt requests that have already been signalled and recorded
in the IRR. (One can read the source code for the Bochs virtual machine
emulation of a dual-8259 system and notice this. Writing to the IMR
doesn't turn interrupts off that have already been set on. This is
not in fact an error in Bochs, as it may seem. The real 8259
hardware actually works this way.)
So the IMR cannot be used as an interrupt priority register as most people
naïvely think it can be used. Once an interrupt has been raised, one
cannot "raise the priority", by setting bits to 1 in the mask register, to
stop the CPU from receiving it. Once the IRR has a 1 bit, the
INT
signal to the CPU goes active, and the only
route for it to go inactive is for the CPU to perform an Interrupt
Acknowledge bus cycle, thereby receiving the interrupt and executing its
handler.
So the myth about "lazy IRQL" is predicated upon the idea, that is wrongly incorporated into many people's mental model of an 8259, that the IMR reflects the current IRQ level. In fact it does not. The IMR does not reflect the IRQL and never has. There is no new "Lazy IRQL" mechanism. The IMR reflects which interrupt requests the operating system wants to be temporarily silenced. One might think that that's saying the same thing. But it is not. There is a subtle but crucial difference.
The difference lies in a race condition. Hypothecate that one
were using the IMR as an IRQ level register. One would "raise
IRQL" by masking out more interrupt requests and one would "lower IRQL" by
unmasking them again. But this does not work. If two interrupts are
requested simultaneously — for the sake of exposition let us suppose
they are interrupt requests #0 and #4 — then IRR bits 0 and 4 are
set. (We are also supposing, for the sake of exposition, a simple IRQ
number to priority mapping. 8259s can map priorities in several ways.
They don't change the nature of the race condition, though, so we use a
simple mapping to avoid complicating the explanation.) The 8259 signals
INT
to the processor, which issues an acknowledge bus cycle
and executes the IRQ #4 handler, whatever that is. The IRQ #4 handler is
of course using the IMR as its IRQ level register, so the first thing that
it does is "raise the IRQ level" to mask out IRQs 4, 3, 2, 1, and 0.
Unfortunately, bit 0 is already set in the IRR, and no
manipulation of the IMR changes that. As soon as the IRQ #4 handler
issues an END-OF-INTERRUPT to the 8259, it will assert the INT
signal to the CPU again, for IRQ #0, even though the "IRQ level"
in the IMR is supposedly at level #4, masking out IRQ #0.
Worse, most operating systems issue EOIs immediately, because they don't want the interrupt priority semantics that 8259s and their In-Service Register priority mechanism enforce. Thus what results, from the operating system's perspective, when it uses the IMR in this naïve and incorrect manner, is that the IRQ #4 handler is triggered, it raises IRQL to mask out IRQ #0, it issues an EOI to turn off the 8259's ISR priority semantics, and immediately the supposedly masked IRQ #0 occurs.
The "lazy IRQL" idea comes from a misunderstanding of how (the 8259 HAL in) x86 Windows NT uses the Interrupt Mask Register(s) of the 8259(s). The writers of x86 Windows NT are some of the few people in the world who have read the Intel 8259 datasheets and understood how they really operate. What x86 Windows NT actually does is implement the "IRQ level register", of the abstract CPU that the rest of the kernel talks to, entirely in software. It's just a location in memory, a field in a per-CPU data structure, that stores the current IRQ level of that CPU. Any IRQ can occur at any IRQ level, because IRQ levels have no hardware existence. What happens instead is fourfold:
Raising the IRQ level is simplicity. One simply checks that one is, indeed, actually raising the IRQ level (it being a fatal error for a device driver to say that it is raising the IRQ level but actually ending up lowering it) and stores the new level in the per-CPU data structure field.
If an interrupt occurs which is of greater priority than the current IRQ
level, stored in the data structure, it is handled as normal. The IRQ
level is raised, the 8259s are sent an EOI to clear the In-Service
Register bit (and prevent an unwanted hardware priority mechanism from
kicking in), and the handlers in the KINTERRUPT
object chain
are invoked.
If an interrupt occurs which is of lesser or equal priority than the
current IRQ level, its interrupt handler is still executed but it is
deferred, early on in the handler. The relevant bit in the IMR is
set in order to prevent this from happening again, and a software bit for
the CPU records that an interrupt at this level is pending. It has been
acknowledged in hardware (the EOI being sent as normal to stop the ISR
priority mechanism from kicking in), and masked from further occurrence.
The operating system has to re-raise the interrupt
itself at a later point. Fortunately, the x86 architecture
provides an easy way to do this: the INT
instruction.
A deferred hardware interrupt becomes a software interrupt.
Lowering the IRQ level is the mirror image of raising the IRQ level, with
an addition. It checks that it is really lowering the value, and updates
the per-CPU data structure field. It also checks whether it is
lowering the level below the point where deferred, pending, interrupts
would (had this been a real hardware mechanism) have become unmasked in
hardware. For any that have, it simply executes an appropriate
INT
instruction to re-raise the interrupt, which triggers the
CPU's interrupt handler as if an actual interrupt cycle had occurred on
the system bus. (It turns APCs and DPCs into real interrupts with this
mechanism too, principally because it is simple to do so. This deferral
mechanism is turning hardware interrupts into software interrupts
anyway, so it's a minor matter to add in to the mix some more
interrupts that never existed in hardware in the first place, because the
processor architecture had no equivalent for them.) The IMR bits for the
deferred interrupts are set back to 0, to allow further interrupt requests
with this level to hit the Interrupt Request Register once more, now that
the kernel is not deferring such interrupt requests any more.
The important distinction to bear in mind here is that the IMR is not an "IRQ level register". It instead masks out the interrupts that the 8259 hardware has triggered in the CPU at too high an IRQ level for them to be processed. It's not an "IRQ level" register. It's a "pending lesser/equal-priority interrupts already deferred once" register. In other words: If no lesser/equal priority interrupt happens to be raised during a higher priority IRQ level period, the IMR isn't touched at all. It only needs mask further (superfluous) assertions of those IRQs that have had to be deferred, and that will be raised as software interrupts once the IRQ level is once again low enough.
Because x86 PCs with 8259 interrupt controllers are widespread, and because such processor/system architectures have no notion of DPC or APC hardware interrupts, a myth has evolved that IRQ levels are somehow in two classes:
DISPATCH_LEVEL
andAPC_LEVEL
are software IRQLs and the higher IRQ levels are hardware IRQLs.
As noted,
the idea that so-called "hardware" IRQLs map to hardware is a myth,
and in fact the so-called "hardware" interrupt requests are often raised
via software interrupts, namely the INT
instruction,
because they've had to be acknowledged in hardware but deferred in
software because the current IRQ level prohibits them.
The simple truth is that there is no real distinction between "hardware" and "software" IRQ levels. This is because such a mental model is missing one important fact: The whole IRQ level model is that of an abstract CPU in the first place. Saying that some interrupts are "hardware" and some are "software" misses the fact that in the CPU abstraction that is presented, all interrupts are hardware, and it is simply an accident of implementation which ones have to be implemented as software mechanisms under the covers. As far as the abstraction to which one, as a kernel or device driver programmer, programs is concerned, the abstract CPU has an IRQ level register and all IRQs are hardware interrupt requests, some of which are triggered by the act of queueing up APCs and DPCs and some of which are triggered asynchronously by devices.
As noted, on x86 systems with 8259s, it is in fact the case that all of the abstract hardware interrupts are potentially software interrupts under the covers, because such systems have no workable mechanisms for implementing everything as hardware. But, conversely but even less well known, on x86 systems with local APICs it is in fact that case that all of the abstract hardware interrupts, even the ones mistakenly labelled "software" interrupts, are implemented using real hardware interrupt mechanisms.
This is because x86 local APICs have two things:
They have a Task Priority Register in the Local APIC, which, unlike the
Interrupt Mask Register in an 8259, can work like an "IRQ level
register". The TPR in a Local APIC doesn't suffer from the problems of an
8259 IMR. Whereas an 8259 IMR masks the inputs to the IRR, and doesn't
affect the output of the IRR to the interrupt priority and
INT
signal assertion logic, the TPR in a Local APIC does
operate upon the interrupt priority and INT
signal assertion
logic, and can mask out already-raised interrupts.
They have a way to generate hardware interrupt requests, as if they had come over the interrupt bus from an I/O APIC, directed at the current CPU. These are called "self-interrupts", and are issued by programming the interrupt vector number and a "destination shorthand" of "self" into the local APIC's Interrupt Command Register.
So on a Windows NT system with a non-8259 HAL (e.g.
HALAPIC.DLL
, HALAACPI.DLL
, and so forth) not
only is the IRQL register not implemented in software, but APC
and DPC interrupts are implemented in hardware, too, and are not
software interrupts.