Parallel factorization algorithm for symmetric finite element matrices with kernel detection is developed.
The code only assumes the matrix has a factorization with a symmetric partial pivoting and can determine the dimension of the kernel of the matrix without setting a threshold to describe numerical zero value.
During factorization a given parameter selects suspicious null pivots and Schur complement matrices are constructed recursively following a decomposition of the connectivity graph by a nested bisection.
For indefinite matrices, combination of 1x1 and 2x2 pivoting will be used to determine the kernel dimension of the last block of Schur complement associated with suspicious null pivots.
The kernel detection algorithm is based on measurement of residuals with orthogonal projections onto supposed image spaces.
A static data structure from the nested bisection and a block sub-structure for Schur complements at all bisection levels can use level 3 BLAS routines, DTRSM and DGEMM, efficiently.
The code is targeted on shared memory architecture and is implemented by using POSIX threads to realize asynchronous task execution.
A critical path of dependency tree of tasks is analyzed heuristically and mixture of static and dynamic task scheduled is used to reduce both process of mutual exclusion lock and idle time.
The solver has good efficiency, about 75% of peak performance of 16 cores for finite element matrices whose DOF is about 1M.
The code is written by C++ with template to handle both double and quadruple arithmetic
and is extended to deal with unsymmetric matrices.
The source will be available alongside FreeFem++ soon.