1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
// Copyright 2016 - 2018 Ulrik Sverdrup "bluss" // // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your // option. This file may not be copied, modified, or distributed // except according to those terms. //! //! General matrix multiplication for f32, f64 matrices. Operates on matrices //! with general layout (they can use arbitrary row and column stride). //! //! This crate uses the same macro/microkernel approach to matrix multiplication as //! the [BLIS][bl] project. //! //! We presently provide a few good microkernels, portable and for x86-64, and //! only one operation: the general matrix-matrix multiplication (“gemm”). //! //! [bl]: https://github.com/flame/blis //! //! ## Matrix Representation //! //! **matrixmultiply** supports matrices with general stride, so a matrix //! is passed using a pointer and four integers: //! //! - `a: *const f32`, pointer to the first element in the matrix //! - `m: usize`, number of rows //! - `k: usize`, number of columns //! - `rsa: isize`, row stride //! - `csa: isize`, column stride //! //! In this example, A is a m by k matrix. `a` is a pointer to the element at //! index *0, 0*. //! //! The *row stride* is the pointer offset (in number of elements) to the //! element on the next row. It’s the distance from element *i, j* to *i + 1, //! j*. //! //! The *column stride* is the pointer offset (in number of elements) to the //! element in the next column. It’s the distance from element *i, j* to *i, //! j + 1*. //! //! For example for a contiguous matrix, row major strides are *rsa=k, //! csa=1* and column major strides are *rsa=1, csa=m*. //! //! Strides can be negative or even zero, but for a mutable matrix elements //! may not alias each other. //! //! ## Portability and Performance //! //! - The default kernels are written in portable Rust and available //! on all targets. These may depend on autovectorization to perform well. //! //! - *x86* and *x86-64* features can be detected at runtime by default or //! compile time (if enabled), and the crate following kernel variants are //! implemented: //! //! - `fma` //! - `avx` //! - `sse2` //! //! ## Features //! //! This crate can be used without the standard library (`#![no_std]`) by //! disabling the default `std` feature. To do so, use this in your //! `Cargo.toml`: //! //! ```toml //! matrixmultiply = { version = "0.2", default-features = false } //! ``` //! //! Runtime CPU feature detection is available only when `std` is enabled. //! Without the `std` feature, the crate uses special CPU features only if they //! are enabled at compile time. (To enable CPU features at compile time, pass //! the relevant //! [`target-cpu`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu) //! or //! [`target-feature`](https://doc.rust-lang.org/rustc/codegen-options/index.html#target-feature) //! option to `rustc`.) //! //! ## Other Notes //! //! The functions in this crate are thread safe, as long as the destination //! matrix is distinct. #![doc(html_root_url = "https://docs.rs/matrixmultiply/0.2/")] #![cfg_attr(not(feature = "std"), no_std)] #[cfg(not(feature = "std"))] extern crate alloc; #[cfg(feature = "std")] extern crate core; extern crate rawpointer; #[macro_use] mod debugmacros; #[macro_use] mod loopmacros; mod archparam; mod gemm; mod kernel; mod aligned_alloc; mod util; #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] #[macro_use] mod x86; mod dgemm_kernel; mod sgemm_kernel; pub use gemm::dgemm; pub use gemm::sgemm;