Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Most implementations of C use the IEEE 754 standard for floating point representation. In this representation, floats are encoded using 1 sign bit, 8 exponent bits, and 23 mantissa bits. Doubles are encoded and used exactly the same way, except they use 1 sign bit, 11 exponent bits, and 52 mantissa bits. These bits encode the values of s, the sign; M, the significand; and E, the exponent. Floating point numbers are then calculated as (-1)s * M * 2 E.

Ordinarily all of the mantissa bits are used to express significant figures, in addition to a leading 1, which is implied and, therefore, left out. Thus, floats ordinarily have 24 significant bits of precision, and doubles ordinarily have 53 significant bits of precision. Such numbers are called normalized numbers. All floating point numbers are limited in this sense that they have fixed precision. See guideline FLP00-C. Understand the limitations of floating point numbers.

However, Mantissa bits are used to express extremely small numbers that are too small to encode normally because not enough of the lack of available exponent bits are available . Using mantissa bits are used to extend extends the possible range of exponents. These Because these bits no longer function as significant bits of precision so , the total precision of extremely small numbers is smaller less than usual. Such numbers are called denormalized. Using even , and they are more limited than normalized numbers. However, even using normalized numbers where precision is required can pose a risk. See guideline FLP02-C. Avoid using floating point numbers when precise computation is needed, but denormalized numbers are vastly more limited than one would expect from dealing only with normalized values for more information.

Using denormalized numbers can severely impair the precision of floating point numbers and should not be used.

...

This code produces the following output on implementations that use IEEE 754 floats:

Code Block
Original      : 3.333333e-01
Denormalized? : 2.802597e-45
Restored      : 4.003710e-01

...

Don't produce code that could use denormalized numbers. If floats are producing denormalized numbers, use doubles instead.

Code Block
bgColor#ccccff
#include <stdio.h>
double x = 1/3.0;
printf("Original      : %e\n", x);
x = x * 7e-45;
printf("Denormalized? : %e\n", x);
x = x / 7e-45;
printf("Restored      : %e\n", x);

...

If using doubles also produces denormalized numbers, using long doubles might help my or it might may not help. (on some implementations, long double has the same exponent range as double.) . If using long doubles produces denormalized numbers, some other solution must be found.

...

According to ISO/IEC 9899:TC3 §7.19.6.1:

A double argument representing a floating-point number is converted in the style ?0xh.hhhh p±d, where there is one hexadecimal digit (which is nonzero if the argument is a normalized floating-point number and is otherwise unspecified) before the decimal-point character

...

On a 32-bit Linux machine using gcc version 4.3.2 this code produces the following output.:

Code Block
normalized float with %e    : 2.350989e-38
normalized float with %a    : 0x1p-125
denormalized float with %e  : 7.174648e-43
denormalized float with %a  : 0x1p-140
normalized double with %e   : 8.900295e-308
normalized double with %a   : 0x1p-1020
denormalized double with %e : 8.289046e-317
denormalized double with %a : 0x0.0000001p-1022

...

Wiki Markup
\[[IEEE 754|AA. Bibliography#IEEE 754 2006]\]
\[[Bryant 032003|AA. Bibliography#Bryant 03]\] Computer Systems: A Programmer's Perspective. Section 2.4 Floating Point
\[[ISO/IEC 9899:1999|AA. Bibliography#ISO/IEC 9899-1999]\]

...