About

Many RISC-V processors support "Instruction Fusion" or "Macro-op Fusion" to improve performance.  The basic idea is certain instructions often show up together in a particular order to implement certain idioms.  For example lui+addi for constant synthesis.  Under the right conditions the processor can "fuse" the two instructions together to reduce the latency of the second instruction, reduce internal processor resources, etc. 

Fusion typically requires the instructions to be consecutive in the instruction stream.   The goal of this project is to define, in a relatively generic way, a method to describe what fusions a particular micro-architecture supports and provide mechanisms to keep those instructions consecutive in the instruction stream.

It is expected that a typical set of supported fusions can reduce the operation count within the processor's execution units by 1-3%.

Stakeholders/Partners

RISE:

Ventana: 1 FTE (VRULL Contract).  Initial implementation (Philipp Tomsich)

Ventana: 1 FTE ~2 weeks.  Raphael Zinsly: Improve implementation to cover missed cases

Ventana: 1 FTE ~2 weeks.  Jivan Hakobyan: Improve tooling to analyze instruction trace data for missed cases

External:


Dependencies


Status

Development

COMPLETED


Development TimelineNA
Upstreaming

COMPLETED


Upstream Version





Contacts

Jeff Law (Ventana)


Dependencies

None



Updates

 

  • Infrastructure for fusion upstreamed to GCC.  Currently supports 10 fusion cases supported by Veyron V1
  • Other ports can re-use those cases trivially and the framework is generic enough to add additional cases over time

 

  • Development of store-store fusion support is effectively complete

 

  • Some data on how to evaluate store-pair fusion available, but it's very noisy
  • Perhaps just focus on squashing out the obvious cases from the instruction stream data and call it done

 

  • Working through implementation details on store-store case
  • Thinking is to start upstreaming once store-store case is handled reasonably well

 

  • Raphael has prototype to implement missing fusion case
    • Under evaluation using tools from Jivan (dynamic instruction stream)
    • Unclear how large end benefit will be, not sure if we have good insights from our emulator to tell us when this happens

 

  • WIP to implement missing "fusion" case for Veyron V1 from Raphael

 

  • Note Stakeholders/Partners in a consistent way

– Dates on or before June 1 are approximate

  • Project reported as priority for 2H23