...
Wiki Markup |
---|
When precise computation is necessary, carefully and methodically estimate the maximum cumulative error of the computations, regardless of whether decimal or binary is used, to ensure that the resulting error is within tolerances. Consider using numerical analysis to properly understand the numerical properties of the problem. A useful introduction can be found in \[[Goldberg 91|AA. C References#Goldberg 91]\]. |
Non-Compliant Code Example
This non-compliant code example takes the mean of 10 numbers, and then checks to see if the mean matches the first number. It should, since the 10 numbers are all 10.1. Yet, due to the imprecision of floating-point arithmetic, the computed mean does not match the numbers.
Code Block | ||
---|---|---|
| ||
#include <stdio.h>
/* Returns the mean value of the array */
float mean(float array[], int size) {
float total = 0.0;
int i;
for (i = 0; i < size; i++) {
total += array[i];
printf("array[%d] = %f and total is %f\n", i, array[i], total);
}
return total / size;
}
enum {array_size = 10};
float array_value = 10.1;
int main() {
float array[array_size];
float avg;
int i;
for (i = 0; i < array_size; i++) {
array[i] = array_value;
}
avg = mean( array, array_size);
printf("mean is %f\n", avg);
if (avg == array[0]) {
printf("array[0] is the mean\n");
} else {
printf("array[0] is not the mean\n");
}
return 0;
}
|
On a 64-bit Linux machine using gcc 4.1, this program yields the following output:
Code Block |
---|
array[0] = 10.100000 and total is 10.100000
array[1] = 10.100000 and total is 20.200001
array[2] = 10.100000 and total is 30.300001
array[3] = 10.100000 and total is 40.400002
array[4] = 10.100000 and total is 50.500000
array[5] = 10.100000 and total is 60.599998
array[6] = 10.100000 and total is 70.699997
array[7] = 10.100000 and total is 80.799995
array[8] = 10.100000 and total is 90.899994
array[9] = 10.100000 and total is 100.999992
mean is 10.099999
array[0] is not the mean
#include <stdio.h>
/* Returns the mean value of the array */
int mean(int array[], int size) {
int total = 0.0;
int i;
for (i = 0; i < size; i++) {
total += array[i];
printf("array[%d] = %f and total is %f\n", i, array[i] / 100.0, total / 100.0);
}
return total / size;
}
enum {array_size = 10};
int array_value = 1010;
int main() {
int array[array_size];
int avg;
int i;
for (i = 0; i < array_size; i++) {
array[i] = array_value;
}
avg = mean( array, array_size);
printf("mean is %f\n", avg / 100.0);
if (avg == array[0]) {
printf("array[0] is the mean\n");
} else {
printf("array[0] is not the mean\n");
}
return 0;
}
|
Compliant Solution
This code may be fixed by replacing the floating-point numbers with integers for the internal computation. Floats are used only when printing results.
Code Block | ||
---|---|---|
| ||
On a 64-bit Linux machine using gcc 4.1, this program yields the following output, which is what we expect:
Code Block |
---|
array[0] = 10.100000 and total is 10.100000
array[1] = 10.100000 and total is 20.200000
array[2] = 10.100000 and total is 30.300000
array[3] = 10.100000 and total is 40.400000
array[4] = 10.100000 and total is 50.500000
array[5] = 10.100000 and total is 60.600000
array[6] = 10.100000 and total is 70.700000
array[7] = 10.100000 and total is 80.800000
array[8] = 10.100000 and total is 90.900000
array[9] = 10.100000 and total is 101.000000
mean is 10.100000
array[0] is the mean
|
Risk Analysis
Using a representation other than floating point may allow for more precision and accuracy for critical arithmetic.
...