Skip to content
Surf Wiki
Save to docs
general/central-processing-unit

From Surf Wiki (app.surf) — the open knowledge base

Hardware performance counter

Registers that count hardware-related activities

Hardware performance counter

Summary

Registers that count hardware-related activities

In computers, hardware performance counters (HPCs), or hardware counters are a set of special-purpose registers built into modern microprocessors to store the counts of hardware-related activities. Advanced users often rely on those counters to conduct low-level performance analysis or tuning.

Implementations

The number of available hardware counters in a processor is limited while each CPU model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction.

One of the first processors to implement a hardware counter and an associated instruction to access it (the RDPMC instruction) was the Intel Pentium, but they were not documented until Terje Mathisen wrote an article about reverse engineering them in Byte July 1994.

The following table shows some examples of CPUs and the number of available hardware counters:

Processoravailable HW countersUltraSparc IIPentium IIIARM11AMD AthlonIA-64ARM Cortex-A5ARM Cortex-A8ARM Cortex-A9 MPCorePOWER4Pentium 4
2
2
2
4
4
url=https://developer.arm.com/documentation/ddi0433/b/CIHJGICAtitle=Documentation – Arm Developerwebsite=developer.arm.com}}
4
6
8
18

Versus software techniques

Compared to software profilers, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc. Another benefit of using them is that no source code modifications are needed in general. However, the types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations.

There can be difficulties correlating the low level performance metrics back to source code. The limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics.

Instruction based sampling

Output of an IBS profile from [[CodeAnalyst]].

Modern superscalar processors schedule and execute multiple instructions out-of-order at one time. These "in-flight" instructions can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors. This can cause performance counter events to be attributed to the wrong instructions, making precise performance analysis difficult or impossible.

AMD introduced methods to mitigate some of these drawbacks. For example, the Opteron processors have implemented in 2007 a technique known as Instruction Based Sampling (IBS). AMD's implementation of IBS provides hardware counters for both fetch sampling (the front of the superscalar pipeline) and op sampling (the back of the pipeline). This results in discrete performance data associating retired instructions with the "parent" AMD64 instruction.

References

References

  1. (2011). "Proceedings of the sixth ACM workshop on Scalable trusted computing".
  2. "Pentium Secrets". Gamedev.net.
  3. "Documentation – Arm Developer".
  4. "Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors". [[AMD]].
Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about Hardware performance counter — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report