English for Computer Vision Engineers

Master the vocabulary for discussing bounding boxes, IoU, augmentation, and model evaluation in computer vision engineering work.

Computer vision work has its own precise vocabulary for describing detection, segmentation, and evaluation, and small imprecisions here can genuinely confuse a discussion — saying “the model missed it” instead of specifying whether it was a false negative in detection or a poor mask boundary in segmentation sends colleagues investigating the wrong stage of the pipeline. This guide covers the essential English vocabulary computer vision engineers use daily.

Key Vocabulary

Bounding box A rectangle, typically defined by coordinates, that tightly encloses an object of interest in an image, used as the standard output format for object detection models. Example: “The bounding box for the pedestrian is shifted about 10 pixels too far right, which is throwing off our distance estimate.”

IoU (Intersection over Union) A metric measuring the overlap between a predicted bounding box (or mask) and the ground truth, calculated as the area of overlap divided by the area of union, used to judge detection accuracy. Example: “We’re using an IoU threshold of 0.5 to count a detection as correct; anything below that is treated as a miss.”

Non-maximum suppression (NMS) A post-processing step that removes duplicate, overlapping bounding box predictions for the same object, keeping only the highest-confidence box. Example: “Without non-maximum suppression, the model was outputting five overlapping boxes for a single car.”

Segmentation (semantic vs. instance) Semantic segmentation labels every pixel with a class (like “road” or “sky”) without distinguishing individual objects; instance segmentation additionally separates distinct objects of the same class from one another. Example: “Semantic segmentation would label all cars as ‘car,’ but instance segmentation gives us a separate mask for each individual vehicle.”

Augmentation Applying transformations (rotation, flipping, color jitter, cropping) to training images to artificially increase dataset diversity and improve model generalization. Example: “We added random occlusion augmentation so the model learns to detect partially blocked objects, not just fully visible ones.”

False positive / false negative A false positive occurs when the model detects something that isn’t actually there; a false negative occurs when the model fails to detect something that is genuinely present. Example: “The false positive rate on shadows being classified as objects went up after we relaxed the confidence threshold.”

Ground truth The verified, correct labels for a dataset — used as the reference against which model predictions are measured. Example: “We found labeling errors in the ground truth for this batch; several ‘stop sign’ boxes were actually mislabeled ‘yield signs.’”

Anchor box A set of predefined reference boxes of various sizes and aspect ratios that some detection models use as starting points for predicting final bounding boxes. Example: “The default anchor box sizes don’t match our use case well, since most of our target objects are much smaller and wider than what they were tuned for.”

Common Phrases

In code reviews:

  • “This augmentation pipeline is applying the same random seed across the whole batch, so we’re not actually getting diverse augmented samples.”
  • “We’re computing IoU against the wrong ground truth class in this evaluation script — it’s comparing predictions to the merged ‘vehicle’ class instead of the specific subclass.”
  • “This model is outputting overlapping boxes for the same object — check whether NMS is actually being applied at inference time, not just during training.”

In standups:

  • “Yesterday I retrained with heavier occlusion augmentation; today I’ll check whether it improved recall on the partially blocked test cases.”
  • “I’m seeing a spike in false positives on reflective surfaces — I think the model is confusing reflections with actual objects.”
  • “I finished re-labeling the ground truth for the mislabeled batch; precision should improve once we retrain against the corrected labels.”

In model evaluation reviews:

  • “Precision looks good at an IoU threshold of 0.5, but it drops sharply at 0.75 — the boxes are roughly right but not tightly localized.”
  • “Recall on small objects is much lower than on large ones — we should check whether the anchor box sizes cover that range well.”
  • “The confusion matrix shows the model consistently mixes up these two visually similar classes — we may need more distinguishing training examples.”

Phrases to Avoid

Saying “the model missed it” without specifying the failure type. Say instead: “that’s a false negative — the model didn’t detect the object at all” or “it detected the object, but the box’s IoU with ground truth is too low to count as a correct match.” These are different problems with different fixes.

Saying “the accuracy is bad” for detection or segmentation tasks. Plain “accuracy” is a classification metric and rarely the right one here. Use task-appropriate metrics: “precision and recall,” “mean average precision (mAP),” or “mean IoU,” depending on the task.

Saying “the labels are wrong” when the issue might be ambiguous ground truth. Distinguish between genuinely incorrect labels (“this box is on the wrong object entirely”) and ambiguous edge cases (“reasonable annotators could disagree on this boundary”). The first needs a labeling fix; the second may need a labeling guideline update.

Quick Reference

TermHow to use it
bounding box”The bounding box is a few pixels off on the left edge.”
IoU”We require an IoU above 0.5 to count a detection as a match.”
NMS”Non-maximum suppression removes the duplicate overlapping boxes.”
segmentation”Instance segmentation separates each individual car’s mask.”
augmentation”We added occlusion augmentation to improve robustness.”
false positive/negative”The false positive rate rose after lowering the confidence threshold.”

Key Takeaways

  • Use precise failure-mode language (false positive, false negative, low-IoU match) instead of vague terms like “missed it” or “wrong.”
  • Distinguish semantic segmentation from instance segmentation clearly, since they solve different problems and require different evaluation metrics.
  • Use task-appropriate metrics (precision, recall, mAP, mean IoU) rather than generic “accuracy” when discussing detection or segmentation performance.
  • When labels look wrong, clarify whether it’s a genuine labeling error or an ambiguous edge case — the fix differs for each.
  • Describe augmentation strategies specifically (occlusion, color jitter, rotation) rather than saying “we augmented the data,” since the type of augmentation targets a specific generalization gap.