Learning · Step 2
Q-format basics
Fixed point drops the per-number exponent. A Q-format
number is just a plain integer — you decide, once, where the binary
point sits. Store the integer i; it stands for the real
value i / 2Q, where Q is how
many bits you put after the point.
A signed N-bit word splits into 1 sign bit,
some integer bits, and
Q fractional bits. Q15 in an
int16_t is the classic: 1 sign + 15 fractional bits,
covering [−1, 1) in steps of 2−15. Move the
point left (smaller Q) for more range; right (bigger Q) for finer
resolution — you can't have both.
Converting in is just rounding: i = round(x × 2^Q),
clamped to the word. Converting out is i / 2^Q. There's
no exponent to rescale per sample — that's what makes fixed point
fast, and what makes choosing Q the whole game.
Try it
Slide the binary point with Q, switch the word width,
click bits, or type a value — the decode, range, and resolution
update live.
What to notice
- Slide
Qup — the binary point moves right, the step size (LSB) shrinks, but the range collapses toward [−1, 1). That trade-off is the whole reason to think about Q. - Type
0.1— just like in a float it can't land exactly; the quantization error is shown. - Type
1.0at Q15 — it saturates: the largest Q15 value is 1 − 2−15, just short of 1. - Set
Q = 0— the point is at the far right and the format is a plain signed integer again.
Step exam
Answer all 3 questions correctly to complete this step.
-
What does the "15" in a Q15 format count?
-
For a 16-bit Q15 value, what is the resolution (one LSB)?
-
What is the approximate range of a signed Q15 value?