Learning · Step 4

Fixed-point arithmetic

A value sitting in a Q-format is the easy part. The moment you compute with it the format moves: every add and every multiply produces a result whose Q and word width differ from the inputs. Miss that and the result silently overflows, or you quietly throw away the low bits. Two rules cover almost all of it.

Add. Two fixed-point numbers can only be added once their binary points line up — they need the same Q. Aligned, the sum keeps exactly those fractional bits, but the integer side can carry: Qm.n + Qm.n → Q(m+1).n. Add two 16-bit values and the safe result is 17 bits. Sum K of them and you need ⌈log₂K⌉ extra integer guard bits up top.

Multiply. Multiplying multiplies the ranges and adds the resolutions: Qa × Qb → Q(a+b) — the fractional bits add. An N-bit by N-bit product needs 2N bits, so 16×16 → 32. One catch for signed values: the top two bits of the product are both sign. Q15 × Q15 lands in Q30 of a 32-bit word — one bit short of Q31 — which is exactly why DSP hardware offers a fractional multiply that shifts the result left by one.

That wide result rarely gets to stay wide. Sooner or later you store it back into a native word and the low bits have to go. How they go — plain truncation or rounding — decides whether the quantization error averages to zero or builds up as a steady DC offset. The explorer below shows the format arithmetic and that final squeeze together.

Block diagrams

An adder and a multiplier as datapath blocks — inputs on the left, result on the right. Watch the format labels: the output never carries the same format as the inputs.

Adder
INPUTS OUTPUT A Qm.n · N-bit B Qm.n · N-bit + A + B Q(m+1).n Same Q in and out — the word grows one bit for the carry.
Multiplier
INPUTS OUTPUT A Qa · N-bit B Qb · N-bit × A × B Q(a+b) Fractional bits add — the word width doubles to 2N bits.

Try it

Pick an operation and the operand formats — the result's bit field updates live. Then choose where to store it and how to round, and watch the error and its bias.

sign redundant sign integer fraction dropped

Store the result back into a word

Examples:

What to notice

Output truncation — best practice

Step exam

Answer all 3 questions correctly to complete this step.

  1. Multiplying a Qa value by a Qb value gives a result in which format?

  2. Adding two N-bit fixed-point numbers needs how many bits to stay safe from overflow?

  3. An N-bit by N-bit signed multiply produces a result of what width?