About

Basic enablement of auto-vectorization in GCC for RISC-V, targeting the V extension version 1.0. The initial focus is to implement the RISC V target specific code to wire up the existing intrinsics to the basic vectorizer primitives and improvements to the generic vectorizer as needed to support basic vectorization for the RISC-V architecture.

This is the first stage in the overall autovectorization effort and should be sufficient to "light up" vector code generation in common cases. In general, if code vectorizes on other architectures, then it should vectorize on RISC-V as well. Cases there that generalization does not hold are expected to primarily be cost modeling issues at are under investigation right now. Similarly improving VLA (vector length agnostic) approaches to vectorizing rather than falling back to VLS style vectorization is a major focus area for 1H2024 and tracked on a separate page.

Stakeholders/Partners

RISE:

Ventana: Robin Dapp (full time) + Jeff Law for oversight/review

Rivos: Palmer Dabbelt for oversight/review

Joern Renneke (Embecosm contractor for Rivos): Some initial vector math library implementations, compiler expansion of memcpy

SiFive: Kito Cheng for oversight/Review role

External:

RiVAI: Much of the initial work was done by Juzhe. This has included the basic design/implementation, ABI work, etc. Juzhe continues to play a major role in design/implementation going forward.

SuSE/ARM/Linaro: Some of the work in this space has touched on generic parts of GCC. Various engineers from Suse, IBM, ARM & Linaro have been involved on an as-needed basis. Richard Sandiford, Richard Biener and others.

Dependencies

Status

Development	COMPLETE
Development Timeline	2H2023	Improvements tracked on 2024H1 project page
Upstreaming	COMPLETE
Upstream Version	gcc-14 (Spring 2024) gcc-13 RISC-V coordination branch	Available to all to use, but not official release from GCC project
Contacts	Robin Dapp (Ventana) Kito Cheng (SiFive) Palmer Dabbelt (Rivos) Jeff Law (Ventana)
Dependencies	None

Updates

31 Oct 2023

Rivai's team has submitted final gcc-14 feature – strided loads/stores, likely will go in this week
Still seeing too many testsuite failures, though they may be more hf16/bf16 related. Embecosm & Rivos engineers syncing on exact configurations being tested

11 Oct 2023

Remainder of tsvc testsuite has been enabled, various minor bugs have been fixed. Most of the remaining issues are testsuite problems (properly marking tests as needing particular features so that their pass/fail optimization state is determined correctly)
Seeing ongoing convergence of testresults with V enabled by default. Not there yet, but improving.

27 Sept 2023

Embecosm reports many failures with spec2017 + vector
More generally, focus is on bringing down failures across the board and ensuring everyone's results are roughly in line
Push harder on upstream patch policy in general – ie, every patch must be tested with no regressions across a standard set of options

13 Sept 2023

Generic vector testsuite is enabled in upstream GCC for vector integer operations.
Expecting to turn on vector single/double precision testing shortly
- Drive down those failures first, then circle back to potential issues in SPEC2017
Seeing some convergence on testing results across organizations, but not complete convergence yet.

06 Sept 2023

Hoping to enable tsvc this week or next (vector testsuite). Only a few real failures
Need to figure out why test results are different across different organizations doing testing of V support
Overall sense is to drive down failures rather than focus on further improvements right now

30 Aug 2023

Functionally complete and upstreamed
Improvements being tracked as a distinct project

23 Aug 2023

Additional conditional vector operations via masking landing
Optimized rounding mode switching progressing well for vector code which wants control over rounding modes
Generic scheduling model submitted, not yet approved

16 Aug 2023

Support for "load/store lanes" with length and mask support integrated
More rounding mode intrinsics API support landing
Vectorized cpymem approved, will integrate once some testsuite infrastructure issues are resolve
Remaining chunks of work:
- VEC_EXTRACT/EXTRACT_LAST, FOLD_EXTRACT_LAST
- fmac with length control
- Strided memory access
- Scheduler models
- libmvec

09 Aug 2023

Vectorization of loops with control flow via masking
More VLS bits falling into place

02 Aug 2023

Rounding mode intrinsics API and RVV floating point dynamic rounding support
VLS for static vector length fallback path when VLA vectorization fails or when loop iterations are known
Averaging synthesis
General agreement on annotation of functions with vector ABI

26 Jul 2023

In and out of order FP reductions
Refactoring done – shaves maybe 10% off the bootstrap times
Generic work on vectorizer significantly helped key loop in imagemagik – 11%-17% for Altra and Zen3 respectively

19 Jul 2023

Ju-Zhe and Robin appointed as reviewers for RISC-V port
- Recognize their contributions to date
- Speed up cycle time for patch review & integration
Reimplementation of one low level concept (not user visible)
- Less confusion for developers
- Easier to extend for certain cases
- Hoping it will help scaling issues we've recently seen with builds (untested)
Seeing some movement on functions that should likely land in libmvec

13 Jul 2023

Scatter/gather support, cond_len_* landing
Strided loads/stores temporarily deferred to take a different approach

05 Jul 2023

Narrowing and widening vector operations in place, int↔fp conversions
LTO issues are supposed to be fixed now
Generic improvements for VLA scatter/gather with masking
float16 tuple types
Coordination branch not updated yet due to US holidays (perhaps 7/6 or 7/7)
Expecting to have automated testing of the coordination branch in place this week

28 Jun 2023

Integer and FP ternary (multiply accumulate) are approved and partially integrated
Optimization of widening ternary operations in progress/under review upstream
Reductions under development

21 Jun 2023

Basic FP (unary/binary) supported on trunk and coordination branch
Ternary (fmac) in progress, but not yet integrated.

15 Jun 2023 – Note dates on or before June 15 are only approximate

Basic integer, data movement, select, insert and extract supported on trunk and coordination branch

01 Jun 2023

Project reported as priority for 2H23
Coordination branch created in upstream GCC repository vendor namespace. riscv/gcc-13-with-riscv-opts

Page tree

CT_00_001 - Autovectorization -- Basic Functionality (GCC)