About

Basic enablement of auto-vectorization in GCC for RISC-V, targeting the V extension version 1.0.  The initial focus is to implement the RISC V target specific code to wire up the existing intrinsics to the basic vectorizer primitives and improvements to the generic vectorizer as needed to support basic vectorization for the RISC-V architecture.

This is the first stage in the overall autovectorization effort and should be sufficient to "light up" vector code generation in common cases.   In general, if code vectorizes on other architectures, then it should vectorize on RISC-V as well.  Cases there that generalization does not hold are expected to primarily be cost modeling issues at are under investigation right now.  Similarly improving VLA (vector length agnostic) approaches to vectorizing rather than falling back to VLS style vectorization is a major focus area for 1H2024 and tracked on a separate page.


Stakeholders/Partners

RISE:

Ventana:  Robin Dapp (full time) + Jeff Law for oversight/review

Rivos: Palmer Dabbelt for oversight/review

           Joern Renneke (Embecosm contractor for Rivos): Some initial vector math library implementations, compiler expansion of memcpy

SiFive: Kito Cheng for oversight/Review role


External:

RiVAI: Much of the initial work was done by Juzhe.  This has included the basic design/implementation, ABI work, etc.  Juzhe continues to play a major role in design/implementation going forward.

SuSE/ARM/Linaro: Some of the work in this space has touched on generic parts of GCC.  Various engineers from Suse, IBM, ARM & Linaro have been involved on an as-needed basis.  Richard Sandiford, Richard Biener and others.


Dependencies


Status

Development

COMPLETE


Development Timeline

2H2023

Improvements tracked on 2024H1 project page
Upstreaming

COMPLETE


Upstream Version

gcc-14 (Spring 2024)

gcc-13 RISC-V coordination branch


Available to all to use, but not official release from GCC project

Contacts

Robin Dapp (Ventana)

Kito Cheng (SiFive)

Palmer Dabbelt (Rivos)

Jeff Law (Ventana)


Dependencies

None



Updates

  • Rivai's team has submitted final gcc-14 feature – strided loads/stores, likely will go in this week
  • Still seeing too many testsuite failures, though they may be more hf16/bf16 related.   Embecosm & Rivos engineers syncing on exact configurations being tested 

 

  • Remainder of tsvc testsuite has been enabled, various minor bugs have been fixed.  Most of the remaining issues are testsuite problems (properly marking tests as needing particular features so that their pass/fail optimization state is determined correctly)
  • Seeing ongoing convergence of testresults with V enabled by default.  Not there yet, but improving.

 

  • Embecosm reports many failures with spec2017 + vector
  • More generally, focus is on bringing down failures across the board and ensuring everyone's results are roughly in line
  • Push harder on upstream patch policy in general – ie, every patch must be tested with no regressions across a standard set of options

 

  • Generic vector testsuite is enabled in upstream GCC for vector integer operations.
  • Expecting to turn on vector single/double precision testing shortly
    • Drive down those failures first, then circle back to potential issues in SPEC2017
  • Seeing some convergence on testing results across organizations, but not complete convergence yet.

 

  • Hoping to enable tsvc this week or next (vector testsuite).  Only a few real failures
  • Need to figure out why test results are different across different organizations doing testing of V support
  • Overall sense is to drive down failures rather than focus on further improvements right now

 

  • Functionally complete and upstreamed
  • Improvements being tracked as a distinct project

 

  • Additional conditional vector operations via masking landing
  • Optimized rounding mode switching progressing well for vector code which wants control over rounding modes
  • Generic scheduling model submitted, not yet approved

 

  • Support for "load/store lanes" with length and mask support integrated
  • More rounding mode intrinsics API support landing
  • Vectorized cpymem approved, will integrate once some testsuite infrastructure issues are resolve
  • Remaining chunks of work:
    • VEC_EXTRACT/EXTRACT_LAST, FOLD_EXTRACT_LAST
    • fmac with length control
    • Strided memory access
    • Scheduler models
    • libmvec

 

  • Vectorization of loops with control flow via masking
  • More VLS bits falling into place

 

  • Rounding mode intrinsics API and RVV floating point dynamic rounding support 
  • VLS for static vector length fallback path when VLA vectorization fails or when loop iterations are known
  • Averaging synthesis
  • General agreement on annotation of functions with vector ABI

  • In and out of order FP reductions
  • Refactoring done – shaves maybe 10% off the bootstrap times
  • Generic work on vectorizer significantly helped key loop in imagemagik – 11%-17% for Altra and Zen3 respectively

 

  • Ju-Zhe and Robin appointed as reviewers for RISC-V port
    • Recognize their contributions to date
    • Speed up cycle time for patch review & integration
  • Reimplementation of one low level concept (not user visible)
    • Less confusion for developers
    • Easier to extend for certain cases
    • Hoping it will help scaling issues we've recently seen with builds (untested)
  • Seeing some movement on functions that should likely land in libmvec

 

  • Scatter/gather support, cond_len_* landing
  • Strided loads/stores temporarily deferred to take a different approach

 

  • Narrowing and widening vector operations in place, int↔fp conversions
  • LTO issues are supposed to be fixed now
  • Generic improvements for VLA scatter/gather with masking
  • float16 tuple types
  • Coordination branch not updated yet due to US holidays (perhaps 7/6 or 7/7)
  • Expecting to have automated testing of the coordination branch in place this week

 

  • Integer and FP ternary (multiply accumulate) are approved and partially integrated
  • Optimization of widening ternary operations in progress/under review upstream
  • Reductions under development

 

  • Basic FP (unary/binary) supported on trunk and coordination branch
  • Ternary (fmac) in progress, but not yet integrated.

 – Note dates on or before June 15 are only approximate

  • Basic integer, data movement, select, insert and extract supported on trunk and coordination branch

 

  • Project reported as priority for 2H23
  • Coordination branch created in upstream GCC repository vendor namespace.   riscv/gcc-13-with-riscv-opts