Peter Bruin on Thu, 09 Mar 2017 11:01:26 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Linear algebra via CUP decomposition and reduction to matrix multiplication |
Hello, Here is a set of patches that speeds up many linear algebra computations over fields of small characteristic (Flm, FlxqM) by using the CUP matrix decomposition [1]. This decomposes any m × n matrix of rank r over a field as C*U*P with C in column echelon form of size m × r, U upper triangular of size r × n and P a permutation matrix of size n × n. Computing such a decomposition, and applying it to solving linear systems and other linear algebra operations, can be done using a divide-and-conquer approach that finally reduces everything to matrix multiplication. If ω is the exponent of matrix multiplication, then computing a CUP decomposition and doing other linear algebra operations can be done in O(mnr^{ω-2}) field operations. So far I have only implemented the Flm and FlxqM variants; adding other finite fields should be straightforward. In principle it should be possible to use the bb_field framework. However, this is not the most natural approach in my opinion, since the building blocks in this new code are not field operations, but matrix operations (matrix addition, matrix multiplication, scalar-matrix multiplication and solving triangular systems). Since matrix multiplication over Flxq fields is already reduced to matrix multiplication over Z via Kronecker substition, and since we have Strassen multiplication over Z, linear algebra over Flxq fields now becomes asymptotically faster than O(n^3), namely O(n^{log_2(7)}). Strassen multiplication over Fl fields is not implemented yet but would be easy to add, although the matrix sizes for which it will have a substantial effect will probably be larger than over Flxq fields. The code also performs well in practice: almost all linear algebra routines become several times faster for large matrices, and in the benchmarks I used (see timings.gp, timings-old.txt, timings-new.txt), the total time went down by a factor around 4.76. Matrix operations over Z that are implemented using a multimodular algorithm (determinant, kernel, pivots) also profit from the speedup of the corresponding Fl algorithms. The examples that I tested (determinants and kernels of 120 x 120 matrices, computing a space of modular symbols and its new subspace) are now about 30% faster. The first two patches add the following functions that are used by the new code: perm_sign (sign of a permutation), FlxC_neg, FlxC_sub, FlxC_to_ZXC, FlxqC_Flxq_mul, FlxqM_Flxq_mul, zero_FlxC, zero_FlxM. Most of the new code is contained in the third patch. It is quite big (1234 insertions/deletions in alglin1.c), but I have tried to make it fit as cleanly as possible into the existing code. Some linear algebra functions only need an echelon form, not a full CUP decomposition. The echelon form could simply be computed via the CUP decomposition, which would not be much slower but uses much more memory, so I wrote separate functions for computing echelon forms. In the fourth patch, I added a number of tests to make sure the new code is covered by the testsuite as much as possible (up to some gerepile code and a few lines that are hard to reach from GP). Besides that, throughout the writing of this code I have already used it extensively in my computations with curves over finite fields. There is one small user-visible change: the various gauss_pivot and indexrank functions used to return the list of pivot columns in the row echelon form, and for each pivot column the first possible pivot row. The new algorithm instead returns the list of pivot rows in the column echelon from, and for each pivot row some choice of pivot column. This explains the changed output in the "nf" and "rnfkummer" tests in the last patch. I checked that the new output is mathematically equivalent to the old output. Thanks, Peter [1] C.-P. Jeannerod, C. Pernet and A. Storjohann, Rank-profile revealing Gaussian elimination and the CUP matrix decomposition. J. Symbolic Comput. 56 (2013), 46–68.
>From 5e46827e76565c843406ca548bc68e3185299585 Mon Sep 17 00:00:00 2001 From: Peter Bruin <P.J.Bruin@math.leidenuniv.nl> Date: Thu, 16 Feb 2017 07:45:17 +0100 Subject: [PATCH 1/5] add function perm_sign --- doc/usersch5.tex | 2 ++ src/basemath/perm.c | 12 ++++++++++++ src/headers/paridecl.h | 1 + 3 files changed, 15 insertions(+) diff --git a/doc/usersch5.tex b/doc/usersch5.tex index 029e1f3..8dfd02f 100644 --- a/doc/usersch5.tex +++ b/doc/usersch5.tex @@ -11629,6 +11629,8 @@ a product of disjoint cycles (\kbd{cyc}); return $p^n$ (as a \kbd{cyc}). \fun{long}{perm_order}{GEN p} returns the order of the permutation $p$ (as the lcm of its cycle lengths). +\fun{long}{perm_sign}{GEN p} returns the sign of the permutation $p$. + \fun{GEN}{vecperm_orbits}{GEN p, long n} the permutation $p\in S_n$ being given as a product of disjoint cycles, return the orbits of the subgroup generated by $p$ on $\{1,2,\ldots,n\}$. diff --git a/src/basemath/perm.c b/src/basemath/perm.c index 7931397..291e1ca 100644 --- a/src/basemath/perm.c +++ b/src/basemath/perm.c @@ -316,6 +316,18 @@ perm_order(GEN v) avma = ltop; return d; } +/* sign of a permutation */ +long +perm_sign(GEN v) +{ + pari_sp av = avma; + GEN c = vecperm_orbits_i(mkvec(v), lg(v)-1); + long i, l = lg(c), s = 1; + for (i = 1; i < l; i++) + if (odd(lg(gel(c, i)))) s = -s; + avma = av; return s; +} + static long isperm(GEN v) { diff --git a/src/headers/paridecl.h b/src/headers/paridecl.h index e97d275..d048f22 100644 --- a/src/headers/paridecl.h +++ b/src/headers/paridecl.h @@ -3916,6 +3916,7 @@ int perm_commute(GEN p, GEN q); GEN perm_cycles(GEN v); long perm_order(GEN perm); GEN perm_pow(GEN perm, long exp); +long perm_sign(GEN perm); long permorder(GEN perm); GEN quotient_group(GEN C, GEN G); GEN quotient_perm(GEN C, GEN p); -- 2.7.3
>From af6892eb0708a6c1c1b216562e1dd3d44c9dee76 Mon Sep 17 00:00:00 2001 From: Peter Bruin <P.J.Bruin@math.leidenuniv.nl> Date: Tue, 10 Jan 2017 16:09:10 +0100 Subject: [PATCH 2/5] add various functions for FlxV, FlxM, FlxqV and FlxqM --- doc/usersch5.tex | 19 +++++++++++- src/basemath/Flx.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++ src/headers/paridecl.h | 10 +++++- 3 files changed, 109 insertions(+), 2 deletions(-) diff --git a/doc/usersch5.tex b/doc/usersch5.tex index 8dfd02f..4d24ea9 100644 --- a/doc/usersch5.tex +++ b/doc/usersch5.tex @@ -4256,7 +4256,8 @@ and \kbd{y} \kbd{deplin=0}) or \kbd{F2m\_deplin} (if \kbd{deplin=1}), in place (destroys~\kbd{x}). -\subsec{\kbd{FlxqV}, \kbd{FlxqM}} See \kbd{FqV}, \kbd{FqM} operations. +\subsec{\kbd{FlxqV}, \kbd{FlxqC}, \kbd{FlxqM}} +See \kbd{FqV}, \kbd{FqC}, \kbd{FqM} operations. \fun{GEN}{FlxqV_dotproduct}{GEN x, GEN y, GEN T, ulong p} as \kbd{FpV\_dotproduct}. @@ -4264,6 +4265,10 @@ and \kbd{y} \fun{GEN}{FlxM_Flx_add_shallow}{GEN x, GEN y, ulong p} as \kbd{RgM\_Rg\_add\_shallow}. +\fun{GEN}{FlxqC_Flxq_mul}{GEN x, GEN y, GEN T, ulong p} + +\fun{GEN}{FlxqM_Flxq_mul}{GEN x, GEN y, GEN T, ulong p} + \fun{GEN}{FlxqM_FlxqC_gauss}{GEN a, GEN b, GEN T, ulong p} \fun{GEN}{FlxqM_FlxqC_invimage}{GEN a, GEN b, GEN T, ulong p} @@ -5506,11 +5511,23 @@ returns their product. \fun{ulong}{FlxC_eval_powers_pre}{GEN x, GEN y, ulong p, ulong pi} apply \kbd{Flx\_eval\_powers\_pre} to all elements of \kbd{x}. +\fun{GEN}{FlxC_neg}{GEN x, ulong p} + +\fun{GEN}{FlxC_sub}{GEN x, GEN y, ulong p} + +\fun{GEN}{zero_FlxC}{long n, long sv} + \subsec{\kbd{FlxM}} See \kbd{FpXM} operations. \fun{ulong}{FlxM_eval_powers_pre}{GEN M, GEN y, ulong p, ulong pi} this function applies \kbd{FlxC\_eval\_powers\_pre} to all entries of \kbd{M}. +\fun{GEN}{FlxM_neg}{GEN x, ulong p} + +\fun{GEN}{FlxM_sub}{GEN x, GEN y, ulong p} + +\fun{GEN}{zero_FlxM}{long r, long c, long sv} + \subsec{\kbd{FlxT}} See \kbd{FpXT} operations. \fun{GEN}{FlxT_red}{GEN V, ulong p} reduces each leaf with \kbd{Flx\_red}. diff --git a/src/basemath/Flx.c b/src/basemath/Flx.c index a9f2c1b..ec9eb7b 100644 --- a/src/basemath/Flx.c +++ b/src/basemath/Flx.c @@ -3567,6 +3567,68 @@ FlxM_eval_powers_pre(GEN z, GEN x, ulong p, ulong pi) return y; } +GEN +zero_FlxC(long n, long sv) +{ + long i; + GEN x = cgetg(n + 1, t_COL); + GEN z = zero_Flx(sv); + for (i = 1; i <= n; i++) + gel(x, i) = z; + return x; +} + +GEN +FlxC_neg(GEN x, ulong p) +{ + long i, l = lg(x); + GEN z = cgetg(l, t_COL); + for (i = 1; i < l; i++) + gel(z, i) = Flx_neg(gel(x, i), p); + return z; +} + +GEN +FlxC_sub(GEN x, GEN y, ulong p) +{ + long i, l = lg(x); + GEN z = cgetg(l, t_COL); + for (i = 1; i < l; i++) + gel(z, i) = Flx_sub(gel(x, i), gel(y, i), p); + return z; +} + +GEN +zero_FlxM(long r, long c, long sv) +{ + long j; + GEN x = cgetg(c + 1, t_MAT); + GEN z = zero_FlxC(r, sv); + for (j = 1; j <= c; j++) + gel(x, j) = z; + return x; +} + +GEN +FlxM_neg(GEN x, ulong p) +{ + long j, l = lg(x); + GEN z = cgetg(l, t_MAT); + for (j = 1; j < l; j++) + gel(z, j) = FlxC_neg(gel(x, j), p); + return z; +} + +GEN +FlxM_sub(GEN x, GEN y, ulong p) +{ + long j, l = lg(x); + GEN z = cgetg(l, t_MAT); + for (j = 1; j < l; j++) + gel(z, j) = FlxC_sub(gel(x, j), gel(y, j), p); + return z; +} + /***********************************************************************/ /** **/ /** FlxX **/ @@ -4831,6 +4893,26 @@ FlxqV_roots_to_pol(GEN V, GEN T, ulong p, long v) /*** FlxqM ***/ +GEN +FlxqC_Flxq_mul(GEN x, GEN y, GEN T, ulong p) +{ + long i, l = lg(x); + GEN z = cgetg(l, t_COL); + for (i = 1; i < l; i++) + gel(z, i) = Flxq_mul(gel(x, i), y, T, p); + return z; +} + +GEN +FlxqM_Flxq_mul(GEN x, GEN y, GEN T, ulong p) +{ + long j, l = lg(x); + GEN z = cgetg(l, t_MAT); + for (j = 1; j < l; j++) + gel(z, j) = FlxqC_Flxq_mul(gel(x, j), y, T, p); + return z; +} + static GEN kron_pack_Flx_spec_half(GEN x, long l) { if (l == 0) diff --git a/src/headers/paridecl.h b/src/headers/paridecl.h index d048f22..d53b8e0 100644 --- a/src/headers/paridecl.h +++ b/src/headers/paridecl.h @@ -320,10 +320,14 @@ GEN Flx_to_ZX_inplace(GEN z); GEN Flx_triple(GEN y, ulong p); long Flx_val(GEN x); long Flx_valrem(GEN x, GEN *Z); -GEN FlxC_to_ZXC(GEN x); GEN FlxC_eval_powers_pre(GEN z, GEN x, ulong p, ulong pi); +GEN FlxC_neg(GEN x, ulong p); +GEN FlxC_sub(GEN x, GEN y, ulong p); +GEN FlxC_to_ZXC(GEN x); GEN FlxM_Flx_add_shallow(GEN x, GEN y, ulong p); GEN FlxM_eval_powers_pre(GEN z, GEN x, ulong p, ulong pi); +GEN FlxM_neg(GEN x, ulong p); +GEN FlxM_sub(GEN x, GEN y, ulong p); GEN FlxM_to_ZXM(GEN z); GEN FlxT_red(GEN z, ulong p); GEN FlxV_Flc_mul(GEN V, GEN W, ulong p); @@ -385,6 +389,8 @@ GEN Flxq_sqr(GEN y,GEN T,ulong p); GEN Flxq_sqrt(GEN a, GEN T, ulong p); GEN Flxq_sqrtn(GEN a, GEN n, GEN T, ulong p, GEN *zetan); ulong Flxq_trace(GEN x, GEN T, ulong p); +GEN FlxqC_Flxq_mul(GEN x, GEN y, GEN T, ulong p); +GEN FlxqM_Flxq_mul(GEN x, GEN y, GEN T, ulong p); GEN FlxqV_dotproduct(GEN x, GEN y, GEN T, ulong p); GEN FlxqV_roots_to_pol(GEN V, GEN T, ulong p, long v); GEN FlxqX_FlxqXQ_eval(GEN Q, GEN x, GEN S, GEN T, ulong p); @@ -443,6 +449,8 @@ GEN pol1_FlxX(long v, long sv); GEN polx_FlxX(long v, long sv); GEN random_Flx(long d1, long v, ulong p); GEN random_FlxqX(long d1, long v, GEN T, ulong p); +GEN zero_FlxC(long n, long sv); +GEN zero_FlxM(long r, long c, long sv); GEN zx_to_Flx(GEN x, ulong p); GEN zxX_to_FlxX(GEN B, ulong p); GEN zxX_to_Kronecker(GEN P, GEN Q); -- 2.7.3
>From 7eac9551987a0bf59c25bef1de14dcafd480f42c Mon Sep 17 00:00:00 2001 From: Peter Bruin <P.J.Bruin@math.leidenuniv.nl> Date: Thu, 16 Feb 2017 07:11:34 +0100 Subject: [PATCH 3/5] implement Flm_echelon, Flm_CUP, FlxqM_echelon, FlxqM_CUP use echelon form/CUP factorisation for deplin, det, gauss, image, indexrank, inv, invimage, ker, rank, suppl --- src/basemath/alglin1.c | 1234 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 1191 insertions(+), 43 deletions(-) diff --git a/src/basemath/alglin1.c b/src/basemath/alglin1.c index 83ecc62..a9ee388 100644 --- a/src/basemath/alglin1.c +++ b/src/basemath/alglin1.c @@ -527,6 +527,779 @@ image_from_pivot(GEN x, GEN d, long r) /*******************************************************************/ /* */ +/* Echelon form and CUP decomposition */ +/* */ +/*******************************************************************/ + +/* + Decompose an m x n-matrix A of rank r as C*U*P, with + - C: m x r-matrix in column echelon form (not necessarily reduced) + with all pivots equal to 1 + - U: upper-triangular r x n-matrix + - P: permutation matrix + The pivots of C and the known zeroes in C and U are not necessarily + filled in; instead, we also return the vector R of pivot rows. + Instead of the matrix P, we return the permutation p of [1..n] + (t_VECSMALL) such that P[i,j] = 1 if and only if j = p[i]. +*/ + +/* complement of a strictly increasing subsequence of (1, 2, ..., n) */ +static GEN +indexcompl(GEN v, long n) { + long i, j, k, m = lg(v) - 1; + GEN w = cgetg(n - m + 1, t_VECSMALL); + for (i = j = k = 1; i <= n; i++) { + if (j <= m && v[j] == i) + j++; + else + w[k++] = i; + } + return w; +} + +static GEN +Flm_rsolve_upper_1(GEN U, GEN B, ulong p) { + return Flm_Fl_mul(B, Fl_inv(ucoeff(U, 1, 1), p), p); +} + +static GEN +Flm_rsolve_upper_2(GEN U, GEN B, ulong p) { + ulong a = ucoeff(U, 1, 1), b = ucoeff(U, 1, 2), d = ucoeff(U, 2, 2); + ulong D = Fl_mul(a, d, p), Dinv = Fl_inv(D, p); + ulong ainv = Fl_mul(d, Dinv, p), dinv = Fl_mul(a, Dinv, p); + GEN B1 = rowslice(B, 1, 1); + GEN B2 = rowslice(B, 2, 2); + GEN X2 = Flm_Fl_mul(B2, dinv, p); + GEN X1 = Flm_Fl_mul(Flm_sub(B1, Flm_Fl_mul(X2, b, p), p), + ainv, p); + return vconcat(X1, X2); +} + +/* solve U*X = B, U upper triangular and invertible */ +static GEN +Flm_rsolve_upper(GEN U, GEN B, ulong p) { + long n = lg(U) - 1, n1; + GEN U1, U2, U11, U12, U22; + GEN B1, B2, X1, X2, X; + pari_sp av = avma; + + if (n == 0) + return B; + if (n == 1) + return Flm_rsolve_upper_1(U, B, p); + if (n == 2) + return Flm_rsolve_upper_2(U, B, p); + n1 = (n + 1)/2; + U1 = vecslice(U, 1, n1); + U2 = vecslice(U, n1 + 1, n); + U11 = rowslice(U1, 1, n1); + U12 = rowslice(U2, 1, n1); + U22 = rowslice(U2, n1 + 1, n); + B1 = rowslice(B, 1, n1); + B2 = rowslice(B, n1 + 1, n); + X2 = Flm_rsolve_upper(U22, B2, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &B1, &U11, &U12, &X2); + B1 = Flm_sub(B1, Flm_mul(U12, X2, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &B1, &U11, &X2); + X1 = Flm_rsolve_upper(U11, B1, p); + X = vconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +static GEN +Flm_lsolve_upper_1(GEN U, GEN B, ulong p) { + return Flm_Fl_mul(B, Fl_inv(ucoeff(U, 1, 1), p), p); +} + +static GEN +Flm_lsolve_upper_2(GEN U, GEN B, ulong p) { + ulong a = ucoeff(U, 1, 1), b = ucoeff(U, 1, 2), d = ucoeff(U, 2, 2); + ulong D = Fl_mul(a, d, p), Dinv = Fl_inv(D, p); + ulong ainv = Fl_mul(d, Dinv, p), dinv = Fl_mul(a, Dinv, p); + GEN B1 = vecslice(B, 1, 1); + GEN B2 = vecslice(B, 2, 2); + GEN X1 = Flm_Fl_mul(B1, ainv, p); + GEN X2 = Flm_Fl_mul(Flm_sub(B2, Flm_Fl_mul(X1, b, p), p), + dinv, p); + return shallowconcat(X1, X2); +} + +/* solve X*U = B, U upper triangular and invertible */ +static GEN +Flm_lsolve_upper(GEN U, GEN B, ulong p) { + long n = lg(U) - 1, n1; + GEN U1, U2, U11, U12, U22; + GEN B1, B2, X1, X2, X; + pari_sp av = avma; + + if (n == 0) + return B; + if (n == 1) + return Flm_lsolve_upper_1(U, B, p); + if (n == 2) + return Flm_lsolve_upper_2(U, B, p); + n1 = (n + 1)/2; + U1 = vecslice(U, 1, n1); + U2 = vecslice(U, n1 + 1, n); + U11 = rowslice(U1, 1, n1); + U12 = rowslice(U2, 1, n1); + U22 = rowslice(U2, n1 + 1, n); + B1 = vecslice(B, 1, n1); + B2 = vecslice(B, n1 + 1, n); + X1 = Flm_lsolve_upper(U11, B1, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &B2, &U12, &U22, &X1); + B2 = Flm_sub(B2, Flm_mul(X1, U12, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &B2, &U22, &X1); + X2 = Flm_lsolve_upper(U22, B2, p); + X = shallowconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +static GEN +Flm_rsolve_lower_unit_1(GEN L, GEN A, ulong p) { + return rowslice(A, 1, 1); +} + +static GEN +Flm_rsolve_lower_unit_2(GEN L, GEN A, ulong p) { + GEN X1 = rowslice(A, 1, 1); + GEN X2 = Flm_sub(rowslice(A, 2, 2), + Flm_Fl_mul(X1, ucoeff(L, 2, 1), p), p); + return vconcat(X1, X2); +} + +/* + solve L*X = A, L lower triangular with ones on the diagonal + (at least as many rows as columns) +*/ +static GEN +Flm_rsolve_lower_unit(GEN L, GEN A, ulong p) { + long m = lg(L) - 1, m1, n; + GEN L1, L2, L11, L21, L22, A1, A2, X1, X2, X; + pari_sp av = avma; + + if (m == 0) + return zero_Flm(0, lg(A) - 1); + if (m == 1) + return Flm_rsolve_lower_unit_1(L, A, p); + if (m == 2) + return Flm_rsolve_lower_unit_2(L, A, p); + m1 = (m + 1)/2; + n = nbrows(L); + L1 = vecslice(L, 1, m1); + L2 = vecslice(L, m1 + 1, m); + L11 = rowslice(L1, 1, m1); + L21 = rowslice(L1, m1 + 1, n); + L22 = rowslice(L2, m1 + 1, n); + A1 = rowslice(A, 1, m1); + A2 = rowslice(A, m1 + 1, n); + X1 = Flm_rsolve_lower_unit(L11, A1, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &A2, &L21, &L22, &X1); + A2 = Flm_sub(A2, Flm_mul(L21, X1, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &A2, &L22, &X1); + X2 = Flm_rsolve_lower_unit(L22, A2, p); + X = vconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +static GEN +Flm_lsolve_lower_unit_2(GEN L, GEN A, ulong p) { + GEN X2 = vecslice(A, 2, 2); + GEN X1 = Flm_sub(vecslice(A, 1, 1), + Flm_Fl_mul(X2, ucoeff(L, 2, 1), p), p); + return shallowconcat(X1, X2); +} + +/* solve L*X = A, L square lower triangular with ones on the diagonal */ +static GEN +Flm_lsolve_lower_unit(GEN L, GEN A, ulong p) { + long m = lg(L) - 1, m1; + GEN L1, L2, L11, L21, L22, A1, A2, X1, X2, X; + pari_sp av = avma; + + if (m <= 1) + return A; + if (m == 2) + return Flm_lsolve_lower_unit_2(L, A, p); + m1 = (m + 1)/2; + L1 = vecslice(L, 1, m1); + L2 = vecslice(L, m1 + 1, m); + L11 = rowslice(L1, 1, m1); + L21 = rowslice(L1, m1 + 1, m); + L22 = rowslice(L2, m1 + 1, m); + A1 = vecslice(A, 1, m1); + A2 = vecslice(A, m1 + 1, m); + X2 = Flm_lsolve_lower_unit(L22, A2, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &A1, &L11, &L21, &X2); + A1 = Flm_sub(A1, Flm_mul(X2, L21, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &A1, &L11, &X2); + X1 = Flm_lsolve_lower_unit(L11, A1, p); + X = shallowconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +/* destroy A */ +static long +Flm_CUP_gauss(GEN A, GEN *R, GEN *C, GEN *U, GEN *P, ulong p) { + long i, j, k, m = nbrows(A), n = lg(A) - 1, pr, pc, u, v; + pari_sp av; + + *P = identity_perm(n); + *R = cgetg(m + 1, t_VECSMALL); + av = avma; + for (j = 1, pr = 0; j <= n; j++) { + for (pr++, pc = 0; pr <= m; pr++) { + for (k = j; k <= n; k++) { + v = ucoeff(A, pr, k); + if (!pc && v) + pc = k; + } + if (pc) + break; + } + if (!pc) + break; + (*R)[j] = pr; + if (pc != j) { + swap(gel(A, j), gel(A, pc)); + lswap((*P)[j], (*P)[pc]); + } + if (pr != j) { + for (k = j; k <= n; k++) + ucoeff(A, j, k) = ucoeff(A, pr, k); + } + u = Fl_inv(ucoeff(A, j, j), p); + for (i = pr + 1; i <= m; i++) { + v = Fl_mul(ucoeff(A, i, j), u, p); + ucoeff(A, i, j) = v; + for (k = j + 1; k <= n; k++) { + ucoeff(A, i, k) = Fl_sub(ucoeff(A, i, k), + Fl_mul(ucoeff(A, j, k), v, p), p); + } + } + } + setlg(*R, j); + *C = vecslice(A, 1, j - 1); + *U = rowslice(A, 1, j - 1); + if (gc_needed(av, 1)) + gerepileall(av, 2, C, U); + return j - 1; +} + +static const long Flm_CUP_LIMIT = 8; + +static ulong +Flm_CUP(GEN A, GEN *R, GEN *C, GEN *U, GEN *P, ulong p) { + long m = nbrows(A), m1, n = lg(A) - 1, i, r1, r2, r; + GEN R1, C1, U1, P1, R2, C2, U2, P2; + GEN A1, A2, B2, C21, U11, U12, T21, T22; + pari_sp av = avma; + + if (m < Flm_CUP_LIMIT || n < Flm_CUP_LIMIT) + /* destroy A; not called at the outermost recursion level */ + return Flm_CUP_gauss(A, R, C, U, P, p); + m1 = (minss(m, n) + 1)/2; + A1 = rowslice(A, 1, m1); + A2 = rowslice(A, m1 + 1, m); + r1 = Flm_CUP(A1, &R1, &C1, &U1, &P1, p); + if (r1 == 0) { + r2 = Flm_CUP(A2, &R2, &C2, &U2, &P2, p); + *R = cgetg(r2 + 1, t_VECSMALL); + for (i = 1; i <= r2; i++) + (*R)[i] = R2[i] + m1; + *C = vconcat(zero_Flm(m1, r2), C2); + *U = U2; + *P = P2; + r = r2; + } else { + U11 = vecslice(U1, 1, r1); + U12 = vecslice(U1, r1 + 1, n); + T21 = vecslicepermute(A2, P1, 1, r1); + T22 = vecslicepermute(A2, P1, r1 + 1, n); + C21 = Flm_lsolve_upper(U11, T21, p); + if (gc_needed(av, 1)) + gerepileall(av, 7, &R1, &C1, &P1, &U11, &U12, &T22, &C21); + B2 = Flm_sub(T22, Flm_mul(C21, U12, p), p); + r2 = Flm_CUP(B2, &R2, &C2, &U2, &P2, p); + r = r1 + r2; + *R = cgetg(r + 1, t_VECSMALL); + for (i = 1; i <= r1; i++) + (*R)[i] = R1[i]; + for (; i <= r; i++) + (*R)[i] = R2[i - r1] + m1; + *C = shallowconcat(vconcat(C1, C21), + vconcat(zero_Flm(m1, r2), C2)); + *U = shallowconcat(vconcat(U11, zero_Flm(r2, r1)), + vconcat(vecpermute(U12, P2), U2)); + *P = cgetg(n + 1, t_VECSMALL); + for (i = 1; i <= r1; i++) + (*P)[i] = P1[i]; + for (; i <= n; i++) + (*P)[i] = P1[P2[i - r1] + r1]; + } + if (gc_needed(av, 1)) + gerepileall(av, 4, R, C, U, P); + return r; +} + +static ulong +Flm_echelon_gauss(GEN A, GEN *R, GEN *C, ulong p) { + GEN U, P; + return Flm_CUP_gauss(A, R, C, &U, &P, p); +} + +/* column echelon form */ +static ulong +Flm_echelon(GEN A, GEN *R, GEN *C, ulong p) { + long i, j, j1, j2, m = nbrows(A), n = lg(A) - 1, n1, r, r1, r2; + GEN A1, A2, R1, R1c, C1, R2, C2; + GEN A12, A22, B2, C11, C21, M12; + pari_sp av = avma; + + if (m < Flm_CUP_LIMIT || n < Flm_CUP_LIMIT) + return Flm_echelon_gauss(Flm_copy(A), R, C, p); + + n1 = (n + 1)/2; + A1 = vecslice(A, 1, n1); + A2 = vecslice(A, n1 + 1, n); + r1 = Flm_echelon(A1, &R1, &C1, p); + if (!r1) + return Flm_echelon(A2, R, C, p); + else if (r1 == m) { + *R = R1; + *C = C1; + return r1; + } + R1c = indexcompl(R1, m); + C11 = rowpermute(C1, R1); + C21 = rowpermute(C1, R1c); + if (R1[r1] != r1) { + for (j = 1; j <= r1; j++) + for (i = 1; i <= R1[j] - j; i++) + ucoeff(C21, i, j) = 0; + } + A12 = rowpermute(A2, R1); + A22 = rowpermute(A2, R1c); + M12 = Flm_rsolve_lower_unit(C11, A12, p); + B2 = Flm_sub(A22, Flm_mul(C21, M12, p), p); + r2 = Flm_echelon(B2, &R2, &C2, p); + if (!r2) { + *R = R1; + *C = C1; + r = r1; + } else { + R2 = perm_mul(R1c, R2); + C2 = rowpermute(vconcat(zero_Flm(r1, r2), C2), + perm_inv(vecsmall_concat(R1, R1c))); + r = r1 + r2; + *R = cgetg(r + 1, t_VECSMALL); + *C = cgetg(r + 1, t_MAT); + for (j = j1 = j2 = 1; j <= r; j++) { + if (j2 > r2 || (j1 <= r1 && R1[j1] < R2[j2])) { + gel(*C, j) = gel(C1, j1); + (*R)[j] = R1[j1++]; + } else { + gel(*C, j) = gel(C2, j2); + (*R)[j] = R2[j2++]; + } + } + } + if (gc_needed(av, 1)) + gerepileall(av, 2, R, C); + return r; +} + +static GEN +FlxqM_rsolve_upper_1(GEN U, GEN B, GEN T, ulong p) { + return FlxqM_Flxq_mul(B, Flxq_inv(gcoeff(U, 1, 1), T, p), T, p); +} + +static GEN +FlxqM_rsolve_upper_2(GEN U, GEN B, GEN T, ulong p) { + GEN a = gcoeff(U, 1, 1), b = gcoeff(U, 1, 2), d = gcoeff(U, 2, 2); + GEN D = Flxq_mul(a, d, T, p), Dinv = Flxq_inv(D, T, p); + GEN ainv = Flxq_mul(d, Dinv, T, p), dinv = Flxq_mul(a, Dinv, T, p); + GEN B1 = rowslice(B, 1, 1); + GEN B2 = rowslice(B, 2, 2); + GEN X2 = FlxqM_Flxq_mul(B2, dinv, T, p); + GEN X1 = FlxqM_Flxq_mul(FlxM_sub(B1, FlxqM_Flxq_mul(X2, b, T, p), p), + ainv, T, p); + return vconcat(X1, X2); +} + +/* solve U*X = B, U upper triangular and invertible */ +static GEN +FlxqM_rsolve_upper(GEN U, GEN B, GEN T, ulong p) { + long n = lg(U) - 1, n1; + GEN U1, U2, U11, U12, U22; + GEN B1, B2, X1, X2, X; + pari_sp av = avma; + + if (n == 0) + return B; + if (n == 1) + return FlxqM_rsolve_upper_1(U, B, T, p); + if (n == 2) + return FlxqM_rsolve_upper_2(U, B, T, p); + n1 = (n + 1)/2; + U1 = vecslice(U, 1, n1); + U2 = vecslice(U, n1 + 1, n); + U11 = rowslice(U1, 1, n1); + U12 = rowslice(U2, 1, n1); + U22 = rowslice(U2, n1 + 1, n); + B1 = rowslice(B, 1, n1); + B2 = rowslice(B, n1 + 1, n); + X2 = FlxqM_rsolve_upper(U22, B2, T, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &B1, &U11, &U12, &X2); + B1 = FlxM_sub(B1, FlxqM_mul(U12, X2, T, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &B1, &U11, &X2); + X1 = FlxqM_rsolve_upper(U11, B1, T, p); + X = vconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +static GEN +FlxqM_lsolve_upper_1(GEN U, GEN B, GEN T, ulong p) { + return FlxqM_Flxq_mul(B, Flxq_inv(gcoeff(U, 1, 1), T, p), T, p); +} + +static GEN +FlxqM_lsolve_upper_2(GEN U, GEN B, GEN T, ulong p) { + GEN a = gcoeff(U, 1, 1), b = gcoeff(U, 1, 2), d = gcoeff(U, 2, 2); + GEN D = Flxq_mul(a, d, T, p), Dinv = Flxq_inv(D, T, p); + GEN ainv = Flxq_mul(d, Dinv, T, p), dinv = Flxq_mul(a, Dinv, T, p); + GEN B1 = vecslice(B, 1, 1); + GEN B2 = vecslice(B, 2, 2); + GEN X1 = FlxqM_Flxq_mul(B1, ainv, T, p); + GEN X2 = FlxqM_Flxq_mul(FlxM_sub(B2, FlxqM_Flxq_mul(X1, b, T, p), p), + dinv, T, p); + return shallowconcat(X1, X2); +} + +/* solve X*U = B, U upper triangular and invertible */ +static GEN +FlxqM_lsolve_upper(GEN U, GEN B, GEN T, ulong p) { + long n = lg(U) - 1, n1; + GEN U1, U2, U11, U12, U22; + GEN B1, B2, X1, X2, X; + pari_sp av = avma; + + if (n == 0) + return B; + if (n == 1) + return FlxqM_lsolve_upper_1(U, B, T, p); + if (n == 2) + return FlxqM_lsolve_upper_2(U, B, T, p); + n1 = (n + 1)/2; + U1 = vecslice(U, 1, n1); + U2 = vecslice(U, n1 + 1, n); + U11 = rowslice(U1, 1, n1); + U12 = rowslice(U2, 1, n1); + U22 = rowslice(U2, n1 + 1, n); + B1 = vecslice(B, 1, n1); + B2 = vecslice(B, n1 + 1, n); + X1 = FlxqM_lsolve_upper(U11, B1, T, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &B2, &U12, &U22, &X1); + B2 = FlxM_sub(B2, FlxqM_mul(X1, U12, T, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &B2, &U22, &X1); + X2 = FlxqM_lsolve_upper(U22, B2, T, p); + X = shallowconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +static GEN +FlxqM_rsolve_lower_unit_1(GEN L, GEN A, GEN T, ulong p) { + return rowslice(A, 1, 1); +} + +static GEN +FlxqM_rsolve_lower_unit_2(GEN L, GEN A, GEN T, ulong p) { + GEN X1 = rowslice(A, 1, 1); + GEN X2 = FlxM_sub(rowslice(A, 2, 2), + FlxqM_Flxq_mul(X1, gcoeff(L, 2, 1), T, p), p); + return vconcat(X1, X2); +} + +/* + solve L*X = A, L lower triangular with ones on the diagonal + (at least as many rows as columns) +*/ +static GEN +FlxqM_rsolve_lower_unit(GEN L, GEN A, GEN T, ulong p) { + long m = lg(L) - 1, m1, n; + GEN L1, L2, L11, L21, L22, A1, A2, X1, X2, X; + pari_sp av = avma; + + if (m == 0) + return zeromat(0, lg(A) - 1); + if (m == 1) + return FlxqM_rsolve_lower_unit_1(L, A, T, p); + if (m == 2) + return FlxqM_rsolve_lower_unit_2(L, A, T, p); + m1 = (m + 1)/2; + n = nbrows(L); + L1 = vecslice(L, 1, m1); + L2 = vecslice(L, m1 + 1, m); + L11 = rowslice(L1, 1, m1); + L21 = rowslice(L1, m1 + 1, n); + L22 = rowslice(L2, m1 + 1, n); + A1 = rowslice(A, 1, m1); + A2 = rowslice(A, m1 + 1, n); + X1 = FlxqM_rsolve_lower_unit(L11, A1, T, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &A2, &L21, &L22, &X1); + A2 = FlxM_sub(A2, FlxqM_mul(L21, X1, T, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &A2, &L22, &X1); + X2 = FlxqM_rsolve_lower_unit(L22, A2, T, p); + X = vconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +static GEN +FlxqM_lsolve_lower_unit_2(GEN L, GEN A, GEN T, ulong p) { + GEN X2 = vecslice(A, 2, 2); + GEN X1 = FlxM_sub(vecslice(A, 1, 1), + FlxqM_Flxq_mul(X2, gcoeff(L, 2, 1), T, p), p); + return shallowconcat(X1, X2); +} + +/* solve L*X = A, L square lower triangular with ones on the diagonal */ +static GEN +FlxqM_lsolve_lower_unit(GEN L, GEN A, GEN T, ulong p) { + long m = lg(L) - 1, m1; + GEN L1, L2, L11, L21, L22, A1, A2, X1, X2, X; + pari_sp av = avma; + + if (m <= 1) + return A; + if (m == 2) + return FlxqM_lsolve_lower_unit_2(L, A, T, p); + m1 = (m + 1)/2; + L1 = vecslice(L, 1, m1); + L2 = vecslice(L, m1 + 1, m); + L11 = rowslice(L1, 1, m1); + L21 = rowslice(L1, m1 + 1, m); + L22 = rowslice(L2, m1 + 1, m); + A1 = vecslice(A, 1, m1); + A2 = vecslice(A, m1 + 1, m); + X2 = FlxqM_lsolve_lower_unit(L22, A2, T, p); + if (gc_needed(av, 1)) + gerepileall(av, 4, &A1, &L11, &L21, &X2); + A1 = FlxM_sub(A1, FlxqM_mul(X2, L21, T, p), p); + if (gc_needed(av, 1)) + gerepileall(av, 3, &A1, &L11, &X2); + X1 = FlxqM_lsolve_lower_unit(L11, A1, T, p); + X = shallowconcat(X1, X2); + if (gc_needed(av, 1)) + X = gerepilecopy(av, X); + return X; +} + +/* destroy A */ +static long +FlxqM_CUP_gauss(GEN A, GEN *R, GEN *C, GEN *U, GEN *P, GEN T, ulong p) { + long i, j, k, m = nbrows(A), n = lg(A) - 1, pr, pc; + pari_sp av; + GEN u, v; + + *P = identity_perm(n); + *R = cgetg(m + 1, t_VECSMALL); + av = avma; + for (j = 1, pr = 0; j <= n; j++) { + for (pr++, pc = 0; pr <= m; pr++) { + for (k = j; k <= n; k++) { + v = Flx_rem(gcoeff(A, pr, k), T, p); + gcoeff(A, pr, k) = v; + if (!pc && lgpol(v) > 0) + pc = k; + } + if (pc) + break; + } + if (!pc) + break; + (*R)[j] = pr; + if (pc != j) { + swap(gel(A, j), gel(A, pc)); + lswap((*P)[j], (*P)[pc]); + } + if (pr != j) { + for (k = j; k <= n; k++) + gcoeff(A, j, k) = gcoeff(A, pr, k); + } + u = Flxq_inv(gcoeff(A, j, j), T, p); + for (i = pr + 1; i <= m; i++) { + v = Flxq_mul(gcoeff(A, i, j), u, T, p); + gcoeff(A, i, j) = v; + for (k = j + 1; k <= n; k++) { + gcoeff(A, i, k) = Flx_sub(gcoeff(A, i, k), + Flx_mul(gcoeff(A, j, k), v, p), p); + } + } + } + setlg(*R, j); + *C = vecslice(A, 1, j - 1); + *U = rowslice(A, 1, j - 1); + if (gc_needed(av, 1)) + gerepileall(av, 2, C, U); + return j - 1; +} + +static const long FlxqM_CUP_LIMIT = 5; + +static ulong +FlxqM_CUP(GEN A, GEN *R, GEN *C, GEN *U, GEN *P, GEN T, ulong p) { + long m = nbrows(A), m1, n = lg(A) - 1, i, r1, r2, r, sv; + GEN R1, C1, U1, P1, R2, C2, U2, P2; + GEN A1, A2, B2, C21, U11, U12, T21, T22; + pari_sp av = avma; + + if (m < FlxqM_CUP_LIMIT || n < FlxqM_CUP_LIMIT) + /* destroy A; not called at the outermost recursion level */ + return FlxqM_CUP_gauss(A, R, C, U, P, T, p); + sv = get_Flx_var(T); + m1 = (minss(m, n) + 1)/2; + A1 = rowslice(A, 1, m1); + A2 = rowslice(A, m1 + 1, m); + r1 = FlxqM_CUP(A1, &R1, &C1, &U1, &P1, T, p); + if (r1 == 0) { + r2 = FlxqM_CUP(A2, &R2, &C2, &U2, &P2, T, p); + *R = cgetg(r2 + 1, t_VECSMALL); + for (i = 1; i <= r2; i++) + (*R)[i] = R2[i] + m1; + *C = vconcat(zero_FlxM(m1, r2, sv), C2); + *U = U2; + *P = P2; + r = r2; + } else { + U11 = vecslice(U1, 1, r1); + U12 = vecslice(U1, r1 + 1, n); + T21 = vecslicepermute(A2, P1, 1, r1); + T22 = vecslicepermute(A2, P1, r1 + 1, n); + C21 = FlxqM_lsolve_upper(U11, T21, T, p); + if (gc_needed(av, 1)) + gerepileall(av, 7, &R1, &C1, &P1, &U11, &U12, &T22, &C21); + B2 = FlxM_sub(T22, FlxqM_mul(C21, U12, T, p), p); + r2 = FlxqM_CUP(B2, &R2, &C2, &U2, &P2, T, p); + r = r1 + r2; + *R = cgetg(r + 1, t_VECSMALL); + for (i = 1; i <= r1; i++) + (*R)[i] = R1[i]; + for (; i <= r; i++) + (*R)[i] = R2[i - r1] + m1; + *C = shallowmatconcat(mkmat2(mkcol2(C1, C21), + mkcol2(zero_FlxM(m1, r2, sv), C2))); + *U = shallowmatconcat(mkmat2(mkcol2(U11, zero_FlxM(r2, r1, sv)), + mkcol2(vecpermute(U12, P2), U2))); + *P = cgetg(n + 1, t_VECSMALL); + for (i = 1; i <= r1; i++) + (*P)[i] = P1[i]; + for (; i <= n; i++) + (*P)[i] = P1[P2[i - r1] + r1]; + } + if (gc_needed(av, 1)) + gerepileall(av, 4, R, C, U, P); + return r; +} + +static ulong +FlxqM_echelon_gauss(GEN A, GEN *R, GEN *C, GEN T, ulong p) { + GEN U, P; + return FlxqM_CUP_gauss(A, R, C, &U, &P, T, p); +} + +/* column echelon form */ +static ulong +FlxqM_echelon(GEN A, GEN *R, GEN *C, GEN T, ulong p) { + long i, j, j1, j2, m = nbrows(A), n = lg(A) - 1, n1, r, r1, r2; + GEN A1, A2, R1, R1c, C1, R2, C2; + GEN A12, A22, B2, C11, C21, M12; + pari_sp av = avma; + + if (m < FlxqM_CUP_LIMIT || n < FlxqM_CUP_LIMIT) + return FlxqM_echelon_gauss(shallowcopy(A), R, C, T, p); + + n1 = (n + 1)/2; + A1 = vecslice(A, 1, n1); + A2 = vecslice(A, n1 + 1, n); + r1 = FlxqM_echelon(A1, &R1, &C1, T, p); + if (!r1) + return FlxqM_echelon(A2, R, C, T, p); + else if (r1 == m) { + *R = R1; + *C = C1; + return r1; + } + R1c = indexcompl(R1, m); + C11 = rowpermute(C1, R1); + C21 = rowpermute(C1, R1c); + if (R1[r1] != r1) { + GEN zero = zero_Flx(get_Flx_var(T)); + for (j = 1; j <= r1; j++) + for (i = 1; i <= R1[j] - j; i++) + gcoeff(C21, i, j) = zero; + } + A12 = rowpermute(A2, R1); + A22 = rowpermute(A2, R1c); + M12 = FlxqM_rsolve_lower_unit(C11, A12, T, p); + B2 = FlxM_sub(A22, FlxqM_mul(C21, M12, T, p), p); + r2 = FlxqM_echelon(B2, &R2, &C2, T, p); + if (!r2) { + *R = R1; + *C = C1; + r = r1; + } else { + R2 = perm_mul(R1c, R2); + C2 = rowpermute(vconcat(zero_FlxM(r1, r2, get_Flx_var(T)), C2), + perm_inv(vecsmall_concat(R1, R1c))); + r = r1 + r2; + *R = cgetg(r + 1, t_VECSMALL); + *C = cgetg(r + 1, t_MAT); + for (j = j1 = j2 = 1; j <= r; j++) { + if (j2 > r2 || (j1 <= r1 && R1[j1] < R2[j2])) { + gel(*C, j) = gel(C1, j1); + (*R)[j] = R1[j1++]; + } else { + gel(*C, j) = gel(C2, j2); + (*R)[j] = R2[j2++]; + } + } + } + if (gc_needed(av, 1)) + gerepileall(av, 2, R, C); + return r; +} + + +/*******************************************************************/ +/* */ /* LINEAR ALGEBRA MODULO P */ /* */ /*******************************************************************/ @@ -623,7 +1396,7 @@ _Fl_add_OK(GEN b, long k, long i, ulong p) } static GEN -Flm_ker_sp_OK(GEN x, ulong p, long deplin) +Flm_ker_gauss_OK(GEN x, ulong p, long deplin) { GEN y, c, d; long i, j, k, r, t, m, n; @@ -698,15 +1471,15 @@ Flm_ker_sp_OK(GEN x, ulong p, long deplin) } /* in place, destroy x */ -GEN -Flm_ker_sp(GEN x, ulong p, long deplin) +static GEN +Flm_ker_gauss(GEN x, ulong p, long deplin) { GEN y, c, d; long i, j, k, r, t, m, n; ulong a, pi; n = lg(x)-1; if (!n) return cgetg(1,t_MAT); - if (SMALL_ULONG(p)) return Flm_ker_sp_OK(x, p, deplin); + if (SMALL_ULONG(p)) return Flm_ker_gauss_OK(x, p, deplin); pi = get_Fl_red(p); m=nbrows(x); r=0; @@ -807,10 +1580,75 @@ GEN F2m_ker(GEN x) { return F2m_ker_sp(F2m_copy(x), 0); } GEN F2m_deplin(GEN x) { return F2m_ker_sp(F2m_copy(x), 1); } + +static GEN +Flm_ker_echelon(GEN x, ulong p, long pivots) { + pari_sp av = avma; + GEN R, Rc, C, C1, C2, S, K; + long n = lg(x) - 1, r; + r = Flm_echelon(Flm_transpose(x), &R, &C, p); + Rc = indexcompl(R, n); + C1 = rowpermute(C, R); + C2 = rowpermute(C, Rc); + if (R[r] != r) { + long i, j; + for (j = 1; j <= r; j++) + for (i = 1; i <= R[j] - j; i++) + ucoeff(C2, i, j) = 0; + } + S = Flm_lsolve_lower_unit(C1, C2, p); + K = vecpermute(shallowconcat(Flm_neg(S, p), matid_Flm(n - r)), + perm_inv(vecsmall_concat(R, Rc))); + K = Flm_transpose(K); + if (pivots) + return gerepilecopy(av, mkvec2(K, R)); + return gerepileupto(av, K); +} + +static GEN +Flm_deplin_echelon(GEN x, ulong p) { + pari_sp av = avma; + GEN R, Rc, C, C1, C2, s, v; + long i, j, n = lg(x) - 1, r; + r = Flm_echelon(Flm_transpose(x), &R, &C, p); + if (r == n) { avma = av; return NULL; } + Rc = indexcompl(R, n); + i = Rc[1]; + C1 = rowpermute(C, R); + C2 = rowslice(C, i, i); + for (j = i; j <= r; j++) + ucoeff(C2, 1, j) = 0UL; + s = Flm_row(Flm_lsolve_lower_unit(C1, C2, p), 1); + v = vecpermute(vecsmall_concat(Flv_neg(s, p), vecsmall_ei(n - r, 1)), + perm_inv(vecsmall_concat(R, Rc))); + return gerepileuptoleaf(av, v); +} + +static GEN +Flm_ker_i(GEN x, ulong p, long deplin, long inplace) { + if (lg(x) - 1 >= Flm_CUP_LIMIT && nbrows(x) >= Flm_CUP_LIMIT) + switch(deplin) { + case 0: return Flm_ker_echelon(x, p, 0); + case 1: return Flm_deplin_echelon(x, p); + case 2: return Flm_ker_echelon(x, p, 1); + } + return Flm_ker_gauss(inplace? x: Flm_copy(x), p, deplin); +} + GEN -Flm_ker(GEN x, ulong p) { return Flm_ker_sp(Flm_copy(x), p, 0); } +Flm_ker_sp(GEN x, ulong p, long deplin) { + return Flm_ker_i(x, p, deplin, 1); +} + GEN -Flm_deplin(GEN x, ulong p) { return Flm_ker_sp(Flm_copy(x), p, 1); } +Flm_ker(GEN x, ulong p) { + return Flm_ker_i(x, p, 0, 0); +} + +GEN +Flm_deplin(GEN x, ulong p) { + return Flm_ker_i(x, p, 1, 0); +} ulong F2m_det_sp(GEN x) { return !F2m_ker_sp(x, 1); } @@ -825,7 +1663,7 @@ F2m_det(GEN x) /* in place, destroy a, SMALL_ULONG(p) is TRUE */ static ulong -Flm_det_sp_OK(GEN a, long nbco, ulong p) +Flm_det_gauss_OK(GEN a, long nbco, ulong p) { long i,j,k, s = 1; ulong q, x = 1; @@ -871,14 +1709,15 @@ Flm_det_sp_OK(GEN a, long nbco, ulong p) if (s < 0 && x) x = p - x; return x; } + /* in place, destroy a */ -ulong -Flm_det_sp(GEN a, ulong p) +static ulong +Flm_det_gauss(GEN a, ulong p) { long i,j,k, s = 1, nbco = lg(a)-1; ulong pi, q, x = 1; - if (SMALL_ULONG(p)) return Flm_det_sp_OK(a, nbco, p); + if (SMALL_ULONG(p)) return Flm_det_gauss_OK(a, nbco, p); pi = get_Fl_red(p); for (i=1; i<nbco; i++) { @@ -908,14 +1747,42 @@ Flm_det_sp(GEN a, ulong p) return Fl_mul(x, ucoeff(a,nbco,nbco), p); } -ulong -Flm_det(GEN x, ulong p) -{ +static ulong +Flm_det_CUP(GEN a, ulong p) { + GEN R, C, U, P; + long d, i, n = lg(a) - 1, r; + r = Flm_CUP(a, &R, &C, &U, &P, p); + if (r < n) + d = 0; + else { + d = perm_sign(P) == 1? 1: p-1; + for (i = 1; i <= n; i++) + d = Fl_mul(d, ucoeff(U, i, i), p); + } + return d; +} + +static ulong +Flm_det_i(GEN x, ulong p, long inplace) { pari_sp av = avma; - ulong d = Flm_det_sp(Flm_copy(x), p); + ulong d; + if (lg(x) - 1 >= Flm_CUP_LIMIT) + d = Flm_det_CUP(x, p); + else + d = Flm_det_gauss(inplace? x: Flm_copy(x), p); avma = av; return d; } +ulong +Flm_det_sp(GEN x, ulong p) { + return Flm_det_i(x, p, 1); +} + +ulong +Flm_det(GEN x, ulong p) { + return Flm_det_i(x, p, 0); +} + static GEN FpM_init(GEN a, GEN p, ulong *pp) { @@ -1036,6 +1903,27 @@ Flm_gauss_pivot(GEN x, ulong p, long *rr) } static GEN +Flm_pivots_CUP(GEN x, ulong p, long *rr) { + pari_sp av; + long i, n = lg(x) - 1, r; + GEN R, C, U, P, d = zero_zv(n); + av = avma; + r = Flm_CUP(x, &R, &C, &U, &P, p); + for(i = 1; i <= r; i++) + d[P[i]] = R[i]; + avma = av; + *rr = n - r; + return d; +} + +static GEN +Flm_pivots(GEN x, ulong p, long *rr, long inplace) { + if (lg(x) - 1 >= Flm_CUP_LIMIT && nbrows(x) >= Flm_CUP_LIMIT) + return Flm_pivots_CUP(x, p, rr); + return Flm_gauss_pivot(inplace? x: Flm_copy(x), p, rr); +} + +static GEN FpM_gauss_pivot_gen(GEN x, GEN p, long *rr) { void *E; @@ -1052,7 +1940,7 @@ FpM_gauss_pivot(GEN x, GEN p, long *rr) { case 0: return FpM_gauss_pivot_gen(x, p, rr); case 2: return F2m_gauss_pivot(x, rr); - default:return Flm_gauss_pivot(x, pp, rr); + default:return Flm_pivots(x, pp, rr, 1); } } @@ -1063,13 +1951,15 @@ FpM_image(GEN x, GEN p) GEN d = FpM_gauss_pivot(x,p,&r); /* d left on stack for efficiency */ return image_from_pivot(x,d,r); } + GEN Flm_image(GEN x, ulong p) { long r; - GEN d = Flm_gauss_pivot(Flm_copy(x),p,&r); /* d left on stack for efficiency */ + GEN d = Flm_pivots(x, p, &r, 0); /* d left on stack for efficiency */ return image_from_pivot(x,d,r); } + GEN F2m_image(GEN x) { @@ -1086,14 +1976,21 @@ FpM_rank(GEN x, GEN p) (void)FpM_gauss_pivot(x,p,&r); avma = av; return lg(x)-1 - r; } + long Flm_rank(GEN x, ulong p) { pari_sp av = avma; long r; - (void)Flm_gauss_pivot(Flm_copy(x),p,&r); + if (lg(x) - 1 >= Flm_CUP_LIMIT && nbrows(x) >= Flm_CUP_LIMIT) { + GEN R, C; + r = Flm_echelon(x, &R, &C, p); + avma = av; return r; + } + (void) Flm_pivots(x, p, &r, 0); avma = av; return lg(x)-1 - r; } + long F2m_rank(GEN x) { @@ -1103,7 +2000,6 @@ F2m_rank(GEN x) avma = av; return lg(x)-1 - r; } - static GEN FlxqM_gauss_pivot(GEN x, GEN T, ulong p, long *rr) { @@ -1112,11 +2008,32 @@ FlxqM_gauss_pivot(GEN x, GEN T, ulong p, long *rr) return gen_Gauss_pivot(x, rr, E, S); } +static GEN +FlxqM_pivots_CUP(GEN x, GEN T, ulong p, long *rr) { + pari_sp av; + long i, n = lg(x) - 1, r; + GEN R, C, U, P, d = zero_zv(n); + av = avma; + r = FlxqM_CUP(x, &R, &C, &U, &P, T, p); + for(i = 1; i <= r; i++) + d[P[i]] = R[i]; + avma = av; + *rr = n - r; + return d; +} + +static GEN +FlxqM_pivots(GEN x, GEN T, ulong p, long *rr) { + if (lg(x) - 1 >= FlxqM_CUP_LIMIT && nbrows(x) >= FlxqM_CUP_LIMIT) + return FlxqM_pivots_CUP(x, T, p, rr); + return FlxqM_gauss_pivot(x, T, p, rr); +} + GEN FlxqM_image(GEN x, GEN T, ulong p) { long r; - GEN d = FlxqM_gauss_pivot(x,T,p,&r); /* d left on stack for efficiency */ + GEN d = FlxqM_pivots(x, T, p, &r); /* d left on stack for efficiency */ return image_from_pivot(x,d,r); } @@ -1125,17 +2042,47 @@ FlxqM_rank(GEN x, GEN T, ulong p) { pari_sp av = avma; long r; - (void)FlxqM_gauss_pivot(x,T,p,&r); + if (lg(x) - 1 >= FlxqM_CUP_LIMIT && nbrows(x) >= FlxqM_CUP_LIMIT) { + GEN R, C; + r = FlxqM_echelon(x, &R, &C, T, p); + avma = av; return r; + } + (void) FlxqM_pivots(x, T, p, &r); avma = av; return lg(x)-1 - r; } -GEN -FlxqM_det(GEN a, GEN T, ulong p) + +static GEN +FlxqM_det_gen(GEN a, GEN T, ulong p) { void *E; const struct bb_field *S = get_Flxq_field(&E, T, p); return gen_det(a, E, S); } +static GEN +FlxqM_det_CUP(GEN a, GEN T, ulong p) { + pari_sp av = avma; + GEN R, C, U, P, d; + long i, n = lg(a) - 1, r, sv = get_Flx_var(T); + r = FlxqM_CUP(a, &R, &C, &U, &P, T, p); + if (r < n) + d = pol0_Flx(sv); + else { + d = mkvecsmall2(sv, perm_sign(P) == 1? 1: p - 1); + for (i = 1; i <= n; i++) + d = Flxq_mul(d, gcoeff(U, i, i), T, p); + } + return gerepileuptoleaf(av, d); +} + +GEN +FlxqM_det(GEN a, GEN T, ulong p) { + if (lg(a) - 1 >= FlxqM_CUP_LIMIT) + return FlxqM_det_CUP(a, T, p); + else + return FlxqM_det_gen(a, T, p); +} + GEN FlxqM_FlxqC_invimage(GEN A, GEN B, GEN T, ulong p) { void *E; @@ -1164,13 +2111,50 @@ FlxqM_mul(GEN A, GEN B, GEN T, ulong p) { return gen_matmul(A, B, E, ff); } -GEN -FlxqM_invimage(GEN A, GEN B, GEN T, ulong p) { +static GEN +FlxqM_invimage_gen(GEN A, GEN B, GEN T, ulong p) { void *E; const struct bb_field *ff = get_Flxq_field(&E, T, p); return gen_matinvimage(A, B, E, ff); } +static GEN +FlxqM_invimage_CUP(GEN A, GEN B, GEN T, ulong p) { + pari_sp av = avma; + GEN R, Rc, C, U, P, B1, B2, C1, C2, X, Y, Z; + long r, sv = get_Flx_var(T); + r = FlxqM_CUP(A, &R, &C, &U, &P, T, p); + Rc = indexcompl(R, nbrows(B)); + C1 = rowpermute(C, R); + C2 = rowpermute(C, Rc); + if (R[r] != r) { + long i, j; + GEN zero = zero_Flx(sv); + for (j = 1; j <= r; j++) + for (i = 1; i <= R[j] - j; i++) + gcoeff(C2, i, j) = zero; + } + B1 = rowpermute(B, R); + B2 = rowpermute(B, Rc); + Z = FlxqM_rsolve_lower_unit(C1, B1, T, p); + if (!gequal(FlxqM_mul(C2, Z, T, p), B2)) + return NULL; + Y = vconcat(FlxqM_rsolve_upper(vecslice(U, 1, r), Z, T, p), + zero_FlxM(lg(A) - 1 - r, lg(B) - 1, sv)); + X = rowpermute(Y, perm_inv(P)); + return gerepilecopy(av, X); +} + +GEN +FlxqM_invimage(GEN A, GEN B, GEN T, ulong p) { + long nA = lg(A)-1, nB = lg(B)-1; + + if (!nB) return cgetg(1, t_MAT); + if (nA + nB >= FlxqM_CUP_LIMIT && nbrows(B) >= FlxqM_CUP_LIMIT) + return FlxqM_invimage_CUP(A, B, T, p); + return FlxqM_invimage_gen(A, B, T, p); +} + GEN F2xqM_det(GEN a, GEN T) { @@ -1395,7 +2379,7 @@ GEN FqM_deplin(GEN x, GEN T, GEN p) { return FqM_ker_i(x,T,p,1); } static GEN -FlxqM_ker_i(GEN x, GEN T, ulong p, long deplin) +FlxqM_ker_gen(GEN x, GEN T, ulong p, long deplin) { const struct bb_field *ff; void *E; @@ -1405,6 +2389,66 @@ FlxqM_ker_i(GEN x, GEN T, ulong p, long deplin) return gen_ker(x,deplin, E, ff); } +static GEN +FlxqM_ker_echelon(GEN x, GEN T, ulong p) { + pari_sp av = avma; + GEN R, Rc, C, C1, C2, S, K; + long n = lg(x) - 1, r; + r = FlxqM_echelon(shallowtrans(x), &R, &C, T, p); + Rc = indexcompl(R, n); + C1 = rowpermute(C, R); + C2 = rowpermute(C, Rc); + if (R[r] != r) { + long i, j; + GEN zero = zero_Flx(get_Flx_var(T)); + for (j = 1; j <= r; j++) + for (i = 1; i <= R[j] - j; i++) + gcoeff(C2, i, j) = zero; + } + S = FlxqM_lsolve_lower_unit(C1, C2, T, p); + K = vecpermute(shallowconcat(FlxM_neg(S, p), matid_FlxqM(n - r, T, p)), + perm_inv(vecsmall_concat(R, Rc))); + K = shallowtrans(K); + return gerepilecopy(av, K); +} + +static GEN +col_ei_FlxC(long n, long i, long sv) { + GEN v = zero_FlxC(n, sv); + gel(v, i) = pol1_Flx(sv); + return v; +} + +static GEN +FlxqM_deplin_echelon(GEN x, GEN T, ulong p) { + pari_sp av = avma; + GEN R, Rc, C, C1, C2, s, v; + long i, j, n = lg(x) - 1, r, sv = get_Flx_var(T); + r = FlxqM_echelon(shallowtrans(x), &R, &C, T, p); + if (r == n) { avma = av; return NULL; } + Rc = indexcompl(R, n); + i = Rc[1]; + C1 = rowpermute(C, R); + C2 = rowslice(C, i, i); + if (i <= r) { + GEN zero = zero_Flx(sv); + for (j = i; j <= r; j++) + gcoeff(C2, 1, j) = zero; + } + s = row(FlxqM_lsolve_lower_unit(C1, C2, T, p), 1); + settyp(s, t_COL); + v = vecpermute(shallowconcat(FlxC_neg(s, p), col_ei_FlxC(n - r, 1, sv)), + perm_inv(vecsmall_concat(R, Rc))); + return gerepilecopy(av, v); +} + +static GEN +FlxqM_ker_i(GEN x, GEN T, ulong p, long deplin) { + if (lg(x) - 1 >= FlxqM_CUP_LIMIT && nbrows(x) >= FlxqM_CUP_LIMIT) + return deplin? FlxqM_deplin_echelon(x, T, p): FlxqM_ker_echelon(x, T, p); + return FlxqM_ker_gen(x, T, p, deplin); +} + GEN FlxqM_ker(GEN x, GEN T, ulong p) { @@ -1694,6 +2738,8 @@ init_gauss(GEN a, GEN *b, long *aco, long *li, int *iscol) return 1; } +static GEN Flm_inv_sp(GEN a, ulong *detp, ulong p); + static int is_modular_solve(GEN a, GEN b, GEN *u) { @@ -1714,7 +2760,7 @@ is_modular_solve(GEN a, GEN b, GEN *u) if (a) a = F2m_to_mod(a); break; default: - a = Flm_inv(a,pp); + a = Flm_inv_sp(a, NULL, pp); if (a) a = Flm_to_mod(a, pp); } } @@ -2231,19 +3277,68 @@ Flm_gauss_sp(GEN a, GEN b, ulong *detp, ulong p) return u; } +static GEN +Flm_gauss_CUP(GEN a, GEN b, ulong *detp, ulong p) { + GEN R, C, U, P, X, Y; + long i, n = lg(a) - 1, r; + if (nbrows(a) < n || (r = Flm_CUP(a, &R, &C, &U, &P, p)) < n) + return NULL; + Y = Flm_rsolve_lower_unit(rowpermute(C, R), + rowpermute(b, R), p); + X = rowpermute(Flm_rsolve_upper(U, Y, p), + perm_inv(P)); + if (detp) { + ulong d = perm_sign(P) == 1? 1: p-1; + for (i = 1; i <= r; i++) + d = Fl_mul(d, ucoeff(U, i, i), p); + *detp = d; + } + return X; +} + GEN Flm_gauss(GEN a, GEN b, ulong p) { - return Flm_gauss_sp(RgM_shallowcopy(a), RgM_shallowcopy(b), NULL, p); + pari_sp av = avma; + GEN x; + if (lg(a) - 1 >= Flm_CUP_LIMIT) + x = Flm_gauss_CUP(a, b, NULL, p); + else { + a = RgM_shallowcopy(a); + b = RgM_shallowcopy(b); + x = Flm_gauss_sp(a, b, NULL, p); + } + if (!x) { avma = av; return NULL; } + return gerepileupto(av, x); } + +static GEN +Flm_inv_i(GEN a, ulong *detp, ulong p, long inplace) { + pari_sp av = avma; + long n = lg(a) - 1; + GEN b, x; + if (n == 0) return cgetg(1, t_MAT); + b = matid_Flm(nbrows(a)); + if (n >= Flm_CUP_LIMIT) + x = Flm_gauss_CUP(a, b, detp, p); + else { + if (!inplace) + a = RgM_shallowcopy(a); + x = Flm_gauss_sp(a, b, detp, p); + } + if (!x) { avma = av; return NULL; } + return gerepileupto(av, x); +} + static GEN Flm_inv_sp(GEN a, ulong *detp, ulong p) { - if (lg(a) == 1) return cgetg(1,t_MAT); - return Flm_gauss_sp(a, matid_Flm(nbrows(a)), detp, p); + return Flm_inv_i(a, detp, p, 1); } + GEN Flm_inv(GEN a, ulong p) { - return Flm_inv_sp(RgM_shallowcopy(a), NULL, p); + return Flm_inv_i(a, NULL, p, 0); } + GEN Flm_Flc_gauss(GEN a, GEN b, ulong p) { pari_sp av = avma; @@ -2339,6 +3434,26 @@ FlxqM_gauss_gen(GEN a, GEN b, GEN T, ulong p) const struct bb_field *S = get_Flxq_field(&E, T, p); return gen_Gauss(a, b, E, S); } + +static GEN +FlxqM_gauss_CUP(GEN a, GEN b, GEN T, ulong p) { + GEN R, C, U, P, Y; + long n = lg(a) - 1, r; + if (nbrows(a) < n || (r = FlxqM_CUP(a, &R, &C, &U, &P, T, p)) < n) + return NULL; + Y = FlxqM_rsolve_lower_unit(rowpermute(C, R), + rowpermute(b, R), T, p); + return rowpermute(FlxqM_rsolve_upper(U, Y, T, p), + perm_inv(P)); +} + +static GEN +FlxqM_gauss_i(GEN a, GEN b, GEN T, ulong p) { + if (lg(a) - 1 >= FlxqM_CUP_LIMIT) + return FlxqM_gauss_CUP(a, b, T, p); + return FlxqM_gauss_gen(a, b, T, p); +} + GEN FlxqM_gauss(GEN a, GEN b, GEN T, ulong p) { @@ -2346,7 +3461,7 @@ FlxqM_gauss(GEN a, GEN b, GEN T, ulong p) long n = lg(a)-1; GEN u; if (!n || lg(b)==1) { avma = av; return cgetg(1, t_MAT); } - u = FlxqM_gauss_gen(a, b, T, p); + u = FlxqM_gauss_i(a, b, T, p); if (!u) { avma = av; return NULL; } return gerepilecopy(av, u); } @@ -2356,7 +3471,7 @@ FlxqM_inv(GEN a, GEN T, ulong p) pari_sp av = avma; GEN u; if (lg(a) == 1) { avma = av; return cgetg(1, t_MAT); } - u = FlxqM_gauss_gen(a, matid_FlxqM(nbrows(a),T,p), T,p); + u = FlxqM_gauss_i(a, matid_FlxqM(nbrows(a),T,p), T,p); if (!u) { avma = av; return NULL; } return gerepilecopy(av, u); } @@ -2366,7 +3481,7 @@ FlxqM_FlxqC_gauss(GEN a, GEN b, GEN T, ulong p) pari_sp av = avma; GEN u; if (lg(a) == 1) return cgetg(1, t_COL); - u = FlxqM_gauss_gen(a, mkmat(b), T, p); + u = FlxqM_gauss_i(a, mkmat(b), T, p); if (!u) { avma = av; return NULL; } return gerepilecopy(av, gel(u,1)); } @@ -2451,7 +3566,7 @@ ZM_gauss(GEN a, GEN b0) for(;;) { p = u_forprime_next(&S); - C = Flm_inv(ZM_to_Flm(a, p), p); + C = Flm_inv_sp(ZM_to_Flm(a, p), NULL, p); if (C) break; elim -= expu(p); if (elim < 0) return NULL; @@ -2919,7 +4034,6 @@ gauss_pivot_ker(GEN x, GEN x0, GEN *dd, long *rr) static GEN FpM_gauss_pivot(GEN x, GEN p, long *rr); static GEN FqM_gauss_pivot(GEN x, GEN T, GEN p, long *rr); static GEN F2m_gauss_pivot(GEN x, long *rr); -static GEN Flm_gauss_pivot(GEN x, ulong p, long *rry); /* r = dim ker(x). * Returns d: @@ -3002,7 +4116,7 @@ ZM_pivots(GEN M0, long *rr) ulong p = u_forprime_next(&S); long rp; if (!p) pari_err_OVERFLOW("ZM_pivots [ran out of primes]"); - d = Flm_gauss_pivot(ZM_to_Flm(M0, p), p, &rp); + d = Flm_pivots(ZM_to_Flm(M0, p), p, &rp, 1); if (rp == rmin) { rbest = rp; goto END; } /* maximal rank, return */ if (rp < rbest) { /* save best r so far */ rbest = rp; @@ -3287,10 +4401,41 @@ inverseimage(GEN m, GEN v) } static GEN +Flm_invimage_CUP(GEN A, GEN B, ulong p) { + pari_sp av = avma; + GEN R, Rc, C, U, P, B1, B2, C1, C2, X, Y, Z; + long r; + r = Flm_CUP(A, &R, &C, &U, &P, p); + Rc = indexcompl(R, nbrows(B)); + C1 = rowpermute(C, R); + C2 = rowpermute(C, Rc); + if (R[r] != r) { + long i, j; + for (j = 1; j <= r; j++) + for (i = 1; i <= R[j] - j; i++) + ucoeff(C2, i, j) = 0UL; + } + B1 = rowpermute(B, R); + B2 = rowpermute(B, Rc); + Z = Flm_rsolve_lower_unit(C1, B1, p); + if (!gequal(Flm_mul(C2, Z, p), B2)) + return NULL; + Y = vconcat(Flm_rsolve_upper(vecslice(U, 1, r), Z, p), + zero_Flm(lg(A) - 1 - r, lg(B) - 1)); + X = rowpermute(Y, perm_inv(P)); + return gerepileupto(av, X); +} + +static GEN Flm_invimage_i(GEN A, GEN B, ulong p) { GEN d, x, X, Y; long i, j, nY, nA = lg(A)-1, nB = lg(B)-1; + + if (!nB) return cgetg(1, t_MAT); + if (nA + nB >= Flm_CUP_LIMIT && nbrows(B) >= Flm_CUP_LIMIT) + return Flm_invimage_CUP(A, B, p); + x = Flm_ker_sp(shallowconcat(Flm_neg(A,p), B), p, 0); /* AX = BY, Y in strict upper echelon form with pivots = 1. * We must find T such that Y T = Id_nB then X T = Z. This exists iff @@ -3524,16 +4669,17 @@ FpM_suppl(GEN x, GEN p) init_suppl(x); d = FpM_gauss_pivot(x,p, &r); avma = av; return get_suppl(x,d,nbrows(x),r,&col_ei); } + GEN Flm_suppl(GEN x, ulong p) { pari_sp av = avma; GEN d; long r; - init_suppl(x); d = Flm_gauss_pivot(Flm_copy(x),p, &r); + init_suppl(x); d = Flm_pivots(x, p, &r, 0); avma = av; return get_suppl(x,d,nbrows(x),r,&vecsmall_ei); - } + GEN F2m_suppl(GEN x) { @@ -3756,15 +4902,17 @@ FpM_indexrank(GEN x, GEN p) { d = FpM_gauss_pivot(x,p,&r); avma = av; return indexrank0(lg(x)-1, r, d); } + GEN Flm_indexrank(GEN x, ulong p) { pari_sp av = avma; long r; GEN d; init_indexrank(x); - d = Flm_gauss_pivot(Flm_copy(x),p,&r); + d = Flm_pivots(x, p, &r, 0); avma = av; return indexrank0(lg(x)-1, r, d); } + GEN F2m_indexrank(GEN x) { pari_sp av = avma; @@ -3892,7 +5040,7 @@ FlkM_inv(GEN M, GEN P, ulong p) GEN V = cgetg(l, t_VEC); for(i=1; i<l; i++) { - gel(V, i) = Flm_inv(FlxM_eval_powers_pre(M, Fl_powers_pre(uel(R,i), degpol(P), p, pi), p, pi), p); + gel(V, i) = Flm_inv_sp(FlxM_eval_powers_pre(M, Fl_powers_pre(uel(R,i), degpol(P), p, pi), p, pi), NULL, p); if (!gel(V, i)) return NULL; } return FlmV_recover(V,W,p); @@ -4632,7 +5780,7 @@ ZM_det_i(GEN M, long n) p = 0; /* -Wall */ while( cmpii(q, h) <= 0 && (p = u_forprime_next(&S)) ) { - av2 = avma; Dp = Flm_det(ZM_to_Flm(M, p), p); + av2 = avma; Dp = Flm_det_sp(ZM_to_Flm(M, p), p); avma = av2; if (Dp) break; q = muliu(q, p); @@ -4669,7 +5817,7 @@ ZM_det_i(GEN M, long n) Dp = umodiu(D, p); if (!Dp) continue; av2 = avma; - compp = Fl_div(Flm_det(ZM_to_Flm(M, p), p), Dp, p); + compp = Fl_div(Flm_det_sp(ZM_to_Flm(M, p), p), Dp, p); avma = av2; (void) Z_incremental_CRT(&comp, compp, &q, p); } @@ -4709,7 +5857,7 @@ det(GEN a) { case 0: return gerepileupto(av, Fp_to_mod(FpM_det(a,p),p)); break; case 2: d = F2m_det(a); break; - default:d = Flm_det(a,pp); break; + default:d = Flm_det_sp(a, pp); break; } avma = av; return mkintmodu(d, pp); } -- 2.7.3
>From 2404f455d8ad02acf946a71bda6ac8386c83d7f1 Mon Sep 17 00:00:00 2001 From: Peter Bruin <P.J.Bruin@math.leidenuniv.nl> Date: Thu, 16 Feb 2017 07:14:59 +0100 Subject: [PATCH 4/5] add tests --- src/test/32/mat | 32 ++++++++++++++++++++++++++++++++ src/test/in/mat | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/src/test/32/mat b/src/test/32/mat index 9a3854e..2eaaff7 100644 --- a/src/test/32/mat +++ b/src/test/32/mat @@ -463,4 +463,36 @@ Vecsmall([2, 2, 0, -4, -10]) [;] [;] [;] +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +[]~ +[;] +[;] +[;] +[;] +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +[]~ +[;] +[;] +[;] +[;] Total time spent: 104 diff --git a/src/test/in/mat b/src/test/in/mat index f4c124a..22da0ea 100644 --- a/src/test/in/mat +++ b/src/test/in/mat @@ -154,3 +154,36 @@ q = nextprime(p + 1); matker([p^v[1]*q^v[2], p^v[3]*q^v[4]; p^v[5]*q^v[6], p^v[7]*q^v[8]], 1)); } + +test(x) = +{ + n = 22; r = 12; + M = matrix(n, n, i, j, random(x)); + P = matrix(n, r, i, j, if(2*i >= 3*j + 8, random(x), 0*x)); + N = P*matrix(r, n, i, j, random(x)); + Q = N[,1..r]; + R = mattranspose(P); + X = matsolve(M, N); + K = matker(N); + J = matimage(N); + print(M*X == N); + print(N*K == 0); + print(N*lindep(N) == 0); + print(R*lindep(R) == 0); + print(matimage(J) == J); + print(matrank(J) == r); + print(matrank(K) == n - r); + print(matrank(N) == r); + print(J^-1 * J == matid(r)); + print(matdet(M^-1) == matdet(M)^-1); + print(P*matinverseimage(P, Q) == Q); + print(lindep(J)); + print(matker(K)); + print(matker(M)); + print(matinverseimage(N, M)); + print(matinverseimage(P, M)); + iferr(N^-1, e,, errname(e) == "e_INV"); + iferr(matsolve(N, M), e,, errname(e) == "e_INV"); +} +test(Mod(1, 8161)); +test(ffgen(2017^3)); -- 2.7.3
>From fb5eedb09e5021c80acca2bf59d2c1840549017c Mon Sep 17 00:00:00 2001 From: Peter Bruin <P.J.Bruin@math.leidenuniv.nl> Date: Mon, 6 Mar 2017 12:01:15 +0100 Subject: [PATCH 5/5] update test output --- src/test/32/nf | 2 +- src/test/32/rnfkummer | 49 ++++++++++++++++++++++++------------------------- 2 files changed, 25 insertions(+), 26 deletions(-) diff --git a/src/test/32/nf b/src/test/32/nf index 408f4e8..ae08b2c 100644 --- a/src/test/32/nf +++ b/src/test/32/nf @@ -7,7 +7,7 @@ [1, [], []] 20915648110955829231381594293324156411897455346679838307589120000 571459344155975480004612560667633185714077696 -[54898, [54898], [[7, 0; 0, 1]]] +[54898, [54898], [[17, 15; 0, 1]]] [26, [26], [[19, 15, 18, 5; 0, 1, 0, 0; 0, 0, 1, 0; 0, 0, 0, 1]]] [1, [], []] [3, 3] diff --git a/src/test/32/rnfkummer b/src/test/32/rnfkummer index a179c1e..6923dae 100644 --- a/src/test/32/rnfkummer +++ b/src/test/32/rnfkummer @@ -9,25 +9,24 @@ x^3 + (-455492756088*y^3 + 72746635392*y^2 + 30103312785*y - 532729124469)*x + (-4059137465271111971*y^3 + 652777677080133659*y^2 + 262952422699146889*y - 4742293189040084766) 3 -x^5 + (-2672499745438128324166624371403175/229919*y^3 - 10809720543044940554 -5848004057516450/229919*y^2 + 1946314387168194463838027397756897075/229919*y - + 74382434311247931914387696921552317765/229919)*x^3 + (5129305801893146709 -05707673158636170489515232134558120277188077675/229919*y^3 + 207207789641531 -13686108930191955787281793237860995272607395332803700/229919*y^2 - 375311020 -623795738763355889825099665721475609297058319117285472881825/229919*y - 1430 -3665596945640348838133533759481776279298319916718872305953671726070/229919)* -x + (-2689189441675545313240964630807739710646668174590848349012607345577703 -904627145140/229919*y^3 - 10863446392599003895107039111912213295761589155826 -4911346511142382718596917639198030/229919*y^2 + 1967699497774698568707239554 -459334515216099647613836963121675081938147097045947197610/229919*y + 7499150 -4294653703573704123086807127325197706131462485408742266590128280995815043988 -790/229919) +x^5 + (338906430826501923385346607707925/229919*y^3 - 1457798214872402183464 +6220260998400/229919*y^2 - 202240490266682646928234413452546325/229919*y + 9 +311818671232806283499820722904784405/229919)*x^3 + (-97462976500434619194588 +15391166956054900191851407121368976375225/229919*y^3 + 419234752507246797754 +440638037503934193147867526951273244896871800/229919*y^2 + 58160478163139363 +18046410689887796592739326364262547764388484603275/229919*y - 26779001130318 +2612612928927273255972900605400487205732555912057079030/229919)*x + (1383785 +9122129794978630768313810944358518067010816386533939954568750960963976700/22 +9919*y^3 - 59523232848017545627423707739853544444403314870247075668056585441 +8672632321333000/229919*y^2 - 8257663906803008790492508500548575219481689071 +133261565693002578393264223628510850/229919*y + 3802100637374297497910983850 +11734464444507295841035234635984867875452558433080616962/229919) 4 x^5 + 110*x^3 - 385*x^2 + (-6875*y + 13310)*x + (20625*y + 32197) 5 -x^5 + 1497313810*x^3 - 25448437930974984193005709510660440*x^2 + 14522162707 -1192240665553733936628852906585*x + 4539521583971768757621570094824005976310 -535075196404282324328 +x^5 + 3004517690*x^3 - 28917505076965983010807657580*x^2 - 80540351781678421 +251251538235548863606045415*x + 31011472834856538584168163496443071927439210 +5424319898991797804 6 x^3 + (6*y^4 - 6*y^3 + 6*y^2 - 6*y - 6)*x + (18*y^5 + 22*y^3 + 16*y^2 + 6*y + 2) @@ -51,15 +50,15 @@ x^3 + (22049832980412*y - 801716887536675)*x + (321027751930293073102*y - 11 10 x^5 - 10*x^3 + 20*x + 10 11 -x^5 + (51814919566041140/27*y^3 - 512130951433266020/27*y^2 - 12296256927237 -71840/9*y + 367446491842204360)*x^3 + (3414154904347712686931950/81*y^3 - 33 -744998817726993590949550/81*y^2 - 81021694609866878539643860/27*y + 80705149 -06462415911392980)*x^2 + (-15185325298365785928052588484315210/27*y^3 + 1500 -89494647008327107727889840385730/27*y^2 + 3603646651500540639118318823844391 -40/9*y - 107687024414631416745446738401726620)*x + (-57823304198737588116852 -2135931847246530148/81*y^3 + 5715169306872036759851381101453859510319964/81* -y^2 + 13722113452314566641868897163626950825646644/27*y - 136685022077514790 -3609605218580985801810226) +x^5 + (29680475428049860/9*y^3 + 34921354500260180/9*y^2 - 15477290427050761 +40/3*y - 2577261173805358400)*x^3 + (-12081003902993919799182130/81*y^3 - 14 +214245862026090743389440/81*y^2 + 629980400564212569840119770/27*y + 1165595 +99110534944226492390)*x^2 + (-41713649969867415576551797389166070/27*y^3 - 4 +9079370978551972052667611917864830/27*y^2 + 21752150855081820680445678911540 +81150/9*y + 1207381363290003193446783411707755890)*x + (-1624409998713015666 +0037949992862545643869538/81*y^3 - 19112453838427485951263179674887910269786 +026/81*y^2 + 847070715893511796620187790773814796016991384/27*y + 1567258647 +31306991463206177661754686074438714) 12 x^5 + (-81813555181352222150051800*y - 758707698512046332750174210)*x^3 + (- 694598776821165938962621533366567137400*y - 64414440636774569791913727277961 -- 2.7.3
Attachment:
timings.gp
Description: Binary data
matdet([480, 480]) 48 matrank([480, 480]) 606 matrank([480, 320]) 303 matindexrank([480, 480]) 606 matindexrank([480, 320]) 303 matker([480, 480]) 130 matker([480, 320]) 47 lindep([480, 480]) 130 lindep([480, 320]) 45 matimage([480, 480]) 613 matimage([480, 320]) 304 matsupplement([480, 480]) 613 matsupplement([480, 320]) 306 (t)->my(m=480,n=320,r=240,x=Mod(1, 2017));t^-1([480, 480]) 475 (t)->my(m=480,n=320,r=240,x=Mod(1, 2017));matsolve(t,id)([480, 480]) 472 matsolve([480, 480], [480, 320]) 426 matinverseimage([480, 480], [480, 320]) 434 matinverseimage([480, 320], [480, 240]) 168 (t)->my(m=480,n=320,r=240,x=Mod(1, 2017));matinverseimage(t,id)([480, 480]) 692 matdet([480, 480]) 295 matrank([480, 480]) 1347 matrank([480, 320]) 670 matindexrank([480, 480]) 1347 matindexrank([480, 320]) 670 matker([480, 480]) 555 matker([480, 320]) 228 lindep([480, 480]) 554 lindep([480, 320]) 228 matimage([480, 480]) 1356 matimage([480, 320]) 674 matsupplement([480, 480]) 1356 matsupplement([480, 320]) 677 (t)->my(m=480,n=320,r=240,x=Mod(1, 9223372036854775837));t^-1([480, 480]) 2851 (t)->my(m=480,n=320,r=240,x=Mod(1, 9223372036854775837));matsolve(t,id)([480, 480]) 2840 matsolve([480, 480], [480, 320]) 2006 matinverseimage([480, 480], [480, 320]) 1574 matinverseimage([480, 320], [480, 240]) 618 (t)->my(m=480,n=320,r=240,x=Mod(1, 9223372036854775837));matinverseimage(t,id)([480, 480]) 2489 matdet([120, 120]) 2097 matrank([120, 120]) 121 matrank([120, 80]) 59 matindexrank([120, 120]) 120 matindexrank([120, 80]) 59 matker([120, 120]) 178 matker([120, 80]) 74 lindep([120, 120]) 178 lindep([120, 80]) 74 matimage([120, 120]) 122 matimage([120, 80]) 61 matsupplement([120, 120]) 124 matsupplement([120, 80]) 61 (t)->my(m=120,n=80,r=60,x=x);t^-1([120, 120]) 407 (t)->my(m=120,n=80,r=60,x=x);matsolve(t,id)([120, 120]) 410 matsolve([120, 120], [120, 80]) 395 matinverseimage([120, 120], [120, 80]) 447 matinverseimage([120, 80], [120, 60]) 176 (t)->my(m=120,n=80,r=60,x=x);matinverseimage(t,id)([120, 120]) 444 matdet([120, 120]) 119 (x)->matker(x,1)([120, 120]) 936 msnew(msinit(800, 2, 1)) 464 total time: 36182
matdet([480, 480]) 52 matrank([480, 480]) 54 matrank([480, 320]) 30 matindexrank([480, 480]) 54 matindexrank([480, 320]) 29 matker([480, 480]) 57 matker([480, 320]) 32 lindep([480, 480]) 54 lindep([480, 320]) 28 matimage([480, 480]) 63 matimage([480, 320]) 33 matsupplement([480, 480]) 63 matsupplement([480, 320]) 35 (t)->my(m=480,n=320,r=240,x=Mod(1, 2017));t^-1([480, 480]) 209 (t)->my(m=480,n=320,r=240,x=Mod(1, 2017));matsolve(t,id)([480, 480]) 219 matsolve([480, 480], [480, 320]) 163 matinverseimage([480, 480], [480, 320]) 165 matinverseimage([480, 320], [480, 240]) 70 (t)->my(m=480,n=320,r=240,x=Mod(1, 2017));matinverseimage(t,id)([480, 480]) 221 matdet([480, 480]) 114 matrank([480, 480]) 120 matrank([480, 320]) 62 matindexrank([480, 480]) 114 matindexrank([480, 320]) 59 matker([480, 480]) 122 matker([480, 320]) 62 lindep([480, 480]) 120 lindep([480, 320]) 56 matimage([480, 480]) 123 matimage([480, 320]) 64 matsupplement([480, 480]) 123 matsupplement([480, 320]) 66 (t)->my(m=480,n=320,r=240,x=Mod(1, 9223372036854775837));t^-1([480, 480]) 440 (t)->my(m=480,n=320,r=240,x=Mod(1, 9223372036854775837));matsolve(t,id)([480, 480]) 440 matsolve([480, 480], [480, 320]) 334 matinverseimage([480, 480], [480, 320]) 338 matinverseimage([480, 320], [480, 240]) 143 (t)->my(m=480,n=320,r=240,x=Mod(1, 9223372036854775837));matinverseimage(t,id)([480, 480]) 442 matdet([120, 120]) 49 matrank([120, 120]) 49 matrank([120, 80]) 26 matindexrank([120, 120]) 120 matindexrank([120, 80]) 61 matker([120, 120]) 50 matker([120, 80]) 30 lindep([120, 120]) 50 lindep([120, 80]) 26 matimage([120, 120]) 51 matimage([120, 80]) 27 matsupplement([120, 120]) 123 matsupplement([120, 80]) 62 (t)->my(m=120,n=80,r=60,x=x);t^-1([120, 120]) 151 (t)->my(m=120,n=80,r=60,x=x);matsolve(t,id)([120, 120]) 152 matsolve([120, 120], [120, 80]) 141 matinverseimage([120, 120], [120, 80]) 142 matinverseimage([120, 80], [120, 60]) 60 (t)->my(m=120,n=80,r=60,x=x);matinverseimage(t,id)([120, 120]) 152 matdet([120, 120]) 85 (x)->matker(x,1)([120, 120]) 697 msnew(msinit(800, 2, 1)) 314 total time: 7591