How Visual Language Models are Revolutionizing Quality Inspections

by | Oct 2, 2024 | AI and IT

Imagine taking a picture of a machine part and simply asking an AI, “Is there anything wrong here?” The AI quickly analyzes the image and points out defects or irregularities. This ability to instantly identify issues from visual data is transforming how we perform quality inspections. In fact, a recent study by McKinsey found that AI-powered visual inspection can reduce quality control costs by up to 50% while improving defect detection rates by 90%. In this post, we’ll explore how Visual Language Models (VLMs) are being applied in quality inspection settings and introduce state-of-the-art tools like OWen, FastSAM, and Florence.

The Problem: Manual Labor and Complexity in Quality Inspections

Quality inspections in manufacturing often require a significant amount of human resources as they are highly labor-intensive. Companies are increasingly struggling with:

  • Labor shortages
  • Finding qualified workers to ensure required efficiency rates are met
  • Complexity of inspecting various parts (ranging in size, inspection criteria, and component amounts)

VLMs can be game-changers in automating certain inspection tasks, enhancing efficiency and maintaining smooth manufacturing processes, which helps prevent costly stoppages.

Tools and Technologies: OWen, FastSAM, and Florence

OWen: Object Detection

OWen is a high-performing model designed for real-time object detection based on textual descriptions. Its standout feature is its zero-shot detection capability, allowing it to identify objects based on descriptions even if it hasn’t been specifically trained on those items. For instance, if you describe “a headrest,” OWen can detect and highlight any headrest in the image that fits that description. This flexibility eliminates the need for extensive retraining, making OWen particularly valuable in dynamic manufacturing environments where new products or components are frequently introduced.

FastSAM: Image Segmentation

FastSAM offers a fast, lightweight solution for edge detection, image captioning, and visual question answering. Its approach to large segmentation tasks involves a two-stage process. First, in the All-Instance Segmentation stage, FastSAM scans the entire image, such as a warehouse filled with various items like boxes, tools, and machinery, identifying all objects and assigning them as unique segments. Then, in the Prompt-Guided Selection stage, it refines the segmentation based on user input, which could be a text description or a point in the image. For example, if prompted with “select machinery,” FastSAM will focus only on the regions corresponding to machinery. Due to its smaller size and computational efficiency, FastSAM can run effectively on devices with lower memory, making it ideal for on-site, real-time quality inspections.

Florence: Multimodal AI

Florence, specifically its Florence-2 iteration, is a large multimodal AI model developed by Microsoft. Like OWen, Florence supports zero-shot learning, but it goes beyond object detection to handle multiple vision and language tasks. These include image captioning, question answering, and segmentation, making it a versatile tool for complex quality inspection scenarios. Florence’s ability to work across mobile devices enhances its utility in real-world applications, allowing for flexible deployment in various manufacturing settings. This combination of multimodal capabilities and device compatibility makes Florence particularly suited for comprehensive quality control systems that require both visual analysis and natural language processing.

    Benefits and Limitations

    While VLMs hold great potential for automating quality inspections, they also come with limitations:

    • Training Effort: VLMs require extensive training, meaning they may struggle with tasks outside their training data.
    • Generalization Issues: They may fail to generalize well in complex scenarios, especially with ambiguous or nuanced visual tasks.
    • Context Understanding: VLMs sometimes miss complex relationships between objects or fail to follow multi-step instructions, such as understanding the relative importance of certain objects.
    • Environmental Factors: Lighting conditions, occlusions, and noise can negatively impact their performance.

    Conclusion

    VLMs are opening new doors in automated quality inspections. Technologies in the field of zero-shot object detection, image segmentation, and multimodal understanding bring promising solutions. If you’re curious about implementing VLMs in your industry, feel free to reach out to us.