OpenBLAS

OpenBLAS
Original author(s)	Kazushige Goto
Developer(s)	Zhang Xianyi, Wang Qian, Werner Saar
Initial release	22 March 2011; 13 years ago
Stable release	0.3.27 / 4 April 2024; 3 months ago
Repository	github.com/OpenMathLib/OpenBLAS ;
Written in	C, modern Fortran
Operating system	Linux; Microsoft Windows; macOS; FreeBSD;
Platform	x86, x86-64; MIPS; ARM, AArch64; POWER, PPC64; IBM Z; SPARC; RISC-V;
Type	Linear algebra library; implementation of BLAS
License	BSD License
Website	www.openblas.net

OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software and Computational Science, ISCAS.

OpenBLAS adds optimized implementations of linear algebra kernels for several processor architectures, including Intel Sandy Bridge^[3] and Loongson.^[4] It claims to achieve performance comparable to the Intel MKL: this mostly holds true on the BLAS part, while the LAPACK part falls behind.^{[citation needed]} On machines that support the AVX2 instruction set, OpenBLAS can achieve similar performance to MKL, but there are currently almost no open source libraries comparable to MKL on CPUs with the AVX512 instruction set.

OpenBLAS is a fork of GotoBLAS2, which was created by Kazushige Goto at the Texas Advanced Computing Center.

History and present

OpenBLAS was developed by the parallel software group led by Professor Yunquan Zhang from the Chinese Academy of Sciences.

OpenBLAS was initially only for the Loongson CPU platform. Dr. Xianyi Zhang contributed a lot of work. Since GotoBLAS was abandoned, the successor OpenBLAS is now developed as an open source BLAS library for multiple platforms, including x86, ARMv8, MIPS, and RISC-V platforms, and is respected for its excellent portability.

The parallel software group is modernizing OpenBLAS to meet current computing needs. For example, OpenBLAS's level-3 computations were primarily optimized for large and square matrices (often considered as regular-shaped matrices). And now irregular-shaped matrix multiplication are also supported, such as tall and skinny matrix multiplication (TSMM),^[5] which supports faster deep learning calculations on the CPU. TSMM is one of the core calculations in deep learning operations. Besides this, the compact function and small GEMM will also be supported by OpenBLAS.

References

^ "OpenBLAS 0.3.27 version". 4 April 2024. Retrieved 4 April 2024.
^ "OpenBLAS". 25 October 2021.
^ Wang Qian; Zhang Xianyi; Zhang Yunquan; Qing Yi (2013). AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs (PDF). Int'l Conf. on High Performance Computing, Networking, Storage and Analysis.
^ Zhang Xianyi; Wang Qian; Zhang Yunquan (2012). Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor. IEEE 18th Int'l Conf. on Parallel and Distributed Systems (ICPADS).
^ Chendi Li; Haipeng Jia; Hang Cao; Jianyu Yao; Boqian Shi; Chunyang Xiang; Jinbo Sun; Pengqi Lu; Yunquan Zhang (2021). AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs (PDF). IEEE International Symposium on Parallel and Distributed Processing with Applications.

External links

Official website

[wikidata-2655a2b7d5a1422a9808d510344e86fb30170267-v13-1] "OpenBLAS 0.3.27 version". 4 April 2024. Retrieved 4 April 2024.

[2] "OpenBLAS". 25 October 2021.

[3] Wang Qian; Zhang Xianyi; Zhang Yunquan; Qing Yi (2013). AUGEM: Automatically Generate High Performance Dense Linear Algebra Kernels on x86 CPUs (PDF). Int'l Conf. on High Performance Computing, Networking, Storage and Analysis.

[4] Zhang Xianyi; Wang Qian; Zhang Yunquan (2012). Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor. IEEE 18th Int'l Conf. on Parallel and Distributed Systems (ICPADS).

[5] Chendi Li; Haipeng Jia; Hang Cao; Jianyu Yao; Boqian Shi; Chunyang Xiang; Jinbo Sun; Pengqi Lu; Yunquan Zhang (2021). AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs (PDF). IEEE International Symposium on Parallel and Distributed Processing with Applications.

[1]

[2]

[3]

[4]

[5]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software

History and present

See also

References

External links