Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Saturday, 13 June 2020

R Packette - Fraction Matrix Operations


Open up any linear algebra textbook and have a look at the matrix entries. Are there all integers? Are they all written as decimals? There's probably at least some that are fractions. Matrices in computer programs are almost always in decimal form. The exception is symbolic mathematics programs like Maple and Mathematica. That's because computers store non-whole numbers as floating-point values instead.

Floating-points are fine most of the time, but they're often not exact. What happens if we work the fractions directly?



The fractions 1/2, 1/4, and 3/4 are represented as 0.1, 0.01, and 0.11 in binary, respectively, but 1/3 and 1/5 are repeating values that can't be represented exactly. The fraction 1/3 is 0.0101010101… and 1/5 is 0.001100110011..  , so when using these numbers, precision is lost just by storing the number; it can only get worse with calculations.

From there, any calculation using that number is going to propagate and accumulate. With a fraction of integers, you can maintain exactitude longer.

The other possible advantage is speed, (although not in my implementation, it's R instead of C or assembly code). Computers can add, subtract, and multiply numbers in binary very efficiently. However, binary division has to be done with long division* , which is N times slower, where N is the number of bits of accuracy.

But with fractions, division is just cross-multiplication, which is two quick multiplications instead of one slow division.

I wrote a set of R functions that handles some common matrix operations for matrices of fractions instead of floating-point numbers. Only in edge cases are these preferable than the existing functions, but

The set of R functions in this source file in this google docs folder are a proof of concept. They perform some scalar and matrix operations by inputting a fraction, outputting a fraction, and at no internal point relying on floating-point arithmetic. 

This isn't trivial. Consider the inner product of two vectors of length n:

 
 
If we write A and C instead as fractions A/B and C/D respectively, things get messier:

 
This formula is used in the function innerprod.fracvec(), which takes two vectors of fractions and returns a single fraction. 

Alternatively, one could rewrite matrix operations in terms of scalar fraction operations instead of trying to do everything in one big formula, but there's a loss of speed in doing basic arithmetic as function calls. This is what I ended up doing for fracmat.upper.tri() the function that converts a square matrix into an upper triangular form. 

Here's a few test cases selected from the test case file in the google docs folder

 
## Case 1
# Matrix multiplication, floating point vs fraction matrix
# Make some random matrices of fractions, and get their product
A = matrix(sample(1:9),nrow=3,ncol=3)
B = matrix(sample(1:9),nrow=3,ncol=3)
F = fracmat(A,B)
C = matrix(sample(1:9),nrow=3,ncol=3)
D = matrix(sample(1:9),nrow=3,ncol=3)
G = fracmat(C,D)

F_float = A/B
G_float = C/D
prod_float = F_float %*% G_float

FG = multiply.fracmat(F,G)
FG
$`top`
     [,1] [,2] [,3]
[1,]  143   97  577
[2,] 2837   23   59
[3,]  172   29   61

$bottom
     [,1] [,2] [,3]
[1,]   36   42  126
[2,]  420   15   15
[3,]    7    4    4

prod_frac = FG$top / FG$bottom

prod_float
          [,1]     [,2]      [,3]
[1,]  3.972222 2.309524  4.579365
[2,]  6.754762 1.533333  3.933333
[3,] 24.571429 7.250000 15.250000

prod_frac
          [,1]     [,2]      [,3]
[1,]  3.972222 2.309524  4.579365
[2,]  6.754762 1.533333  3.933333
[3,] 24.571429 7.250000 15.250000

sum(abs(prod_float - prod_frac))
[1] 1.554312e-15


### Case 2
## Get the inverse and determinant of a random matrix
## Compare to the answers from base::det and base::solve

A = matrix(sample(1:20)[1:9],nrow=3,ncol=3)
B = matrix(sample(1:20)[1:9],nrow=3,ncol=3)
A
     [,1] [,2] [,3]
[1,]   20   18    3
[2,]    2   16    8
[3,]    4   10    7
B
     [,1] [,2] [,3]
[1,]    8    9   17
[2,]   10    1   18
[3,]   12   14    5

F = fracmat(A,B) #Fraction matrix object
F_float = A/B #Floating point matrix

# Get the determinants, compare.
fracmat.det(F)
$`top`
[1] 8592592140

$bottom
[1] 159043500

fracmat.det(F)$top / fracmat.det(F)$bottom
[1] 54.02668
det(F_float)
[1] 54.02668



# Get the inverse of each, display, and compare
Finv1 = fracmat.inverse(F)$top / fracmat.inverse(F)$bottom
Finv2 = solve(F_float)

Finv1
             [,1]        [,2]        [,3]
[1,]  0.408733982 -0.04949313 -0.03580898
[2,] -0.002440495  0.06369402 -0.01991270
[3,] -0.096072464 -0.02071287  0.73297120

Finv2
             [,1]        [,2]        [,3]
[1,]  0.408733982 -0.04949313 -0.03580898
[2,] -0.002440495  0.06369402 -0.01991270
[3,] -0.096072464 -0.02071287  0.73297120

sum(abs(Finv1 - Finv2))
[1] 1.065684e-12

Here's a catalogue of the functions included in this packette. The source file has more detailed comments explaining each one.


These functions establish a new fraction matrix, vector, or value from a pair of matrices, vectors, or scalars of equal dimensions respectively (top, bottom). When enforce_integer = TRUE, the function rounds each entry of the top and bottom parts. The function fraction() is much simpler and doesn't include rounding or quality checks because it's only intended for internal use.
fracmat(top=NA, bottom=NA, enforce_integer=TRUE)
fracvec(top=NA, bottom=NA, enforce_integer=TRUE)
fraction(A,B)

These functions convert a fraction matrix or vector back into a matrix or vector of decimal (floating-point) numbers.
fracvec.to.vec(fv)
fracmat.to.mat(fm)


There are shortcuts to produce commonly used fraction matrices
allsame.fracmat(top, bottom=1, nrow, ncol)
allzero.fracmat(nrow, ncol)
allone.fracmat(nrow, ncol)
identity.fracmat(size)


Many of the following functions have an argument called 'simplify', which defaults to true. These are functions which call simplify.fracmat, which divides the numerator and the denominator of each fraction by their greatest common denominator.
simplify.fracmat(fm, nPrimes=10, report_overflow = FALSE, fuzzy_reduce = TRUE, maxint = 10^15)

If any part of the fraction matrix is larger than maxint, which is close to the limit of the modulo function, fuzzy_reduce will scale the fraction in question down so the larger of the numerator and denominator is maxint.
fuzzy_reduce(top, bottom, limit = 10^15)

These do elementwise addition and subtraction of fractions in two fraction matrices or vectors.
add.fracmat(fracmatAB, fracmatCD, subtraction = FALSE)
subtract.fracmat(fracmatAB, fracmatCD)

This does matrix multiplication of fraction matrices. It does the by iteratively calling the inner product function on rows of the left matrix (fracmatAB) and columns of the right matrix (fracmatCD)
multiply.fracmat(fracmatAB, fracmatCD, simplify=TRUE)
innerprod.fracvec(fracvecAB, fracvecCD, simplify=TRUE)

Just calling a fracmat object will give you two matrices, but print.fracmat() will output text of each fraction with the numerator and denominator together.
print.fracmat(fm)


This function converts a single matrix of floats (floatmat) to a fraction matrix. It sets the denominators to 10^max_digits and the numerator to floatmat*10^max_digits (it will use fewer digits if exactitude is reached)
mat.to.fracmat(floatmat, max_digits = 6, simplify = TRUE)

               
Does the row operations on a fraction matrix fm to turn it into an upper triangular form
Returns the upper triangular matrix U and 'inverse', which records the same row operations starting from the identity
fracmat.upper.tri(fm, track = FALSE, output_inverse = TRUE)
fracmat.upper.tri.to.ident(fmU, inv_fm, track = FALSE)


These get the determinant and the inverse of a fraction matrix. They do so with calls to fracmat.upper.tri and fracmat.upper.tri.to.ident
fracmat.det(m, track=FALSE, output_float = FALSE)
fracmat.inverse(m, track=FALSE)


There is also a decimal (floating-point) matrix version of these high-level functions, which are useful for developing and testing the fraction matrix counterparts. float.inverse and float.det should behave identically to how base::solve() and base::det() on single square matrices.
float.det(m, track=FALSE)
float.inverse(m, track=FALSE)
float.upper.tri(m, track=FALSE, output_inverse =TRUE)
float.upper.tri.to.ident(U, inv_m, track=FALSE)


These perform elementary fraction arithmetic functions. They are used internally for row-operation functions like fracmat.upper.tri
negate.fraction(fm, negate_denom = FALSE)
add.fraction(fracAB, fracCD)
subtract.fraction(fracAB, fracCD)
multiply.fraction(fracAB, fracCD)
divide.fraction(fracAB, fracCD)




If I come back to this, this is my wish list:

General development:
 - More unit testing, stress testing, speed testing.
- Further development into a bona-fide package.
- Is any of this paper-worthy?

Additional matrix functions:
- Recording of row operations.
- LU decomposition
               
Additional presentation:
 - Print out records of matrices after row operations in a way that Linear Algebra students can use
- Print fraction matrices in latex code

Avenues for improved fraction simplification:
- Integration with the big64 package so 64-bit integers can be handed exactly.
- Possible integration from the VeryLargeIntegers so arbitrarily large integers can be handled, however since VeryLargeIntegers operates by string manipulation, it is much slower than floating point arithmetic.
- Possibly passing the whole thing over to RcppBigIntAlgos which is designed to factor large integers.

But what do you think? I am open to suggestion at jackd@sfu.ca , jack.davis@sportlogiq.com , or @jack_davis_sfu on Twitter.

* (or multiplication by an inverse, or subtraction of logarithms, which are all either slower or much more memory intensive)

References: Linear Algebra: An interactive approach by Jain and Gunawardena, especially the worked example on page 118 for the detailed test case.

The End of Error: Unum Computing by John L. Gustafson for its alternative approaches to rounding error.

Shout out to Ashley Lindalia the astrochemist, because 'astrochemist' is an amazing title and because planetary science matters more than anything I do.

No comments:

Post a Comment