Extract document data using Vision

It is amazing how much you can do on your mobile device without relying on internet connection when using Machine Learning, and I think this is particularly true for image analysis and visional computing.

Yesterday, Frank Doepke, Vision Engineer at Apple, stepped once again on the (virtual) WWDC stage to show the improvements he and his colleagues were up to over the last year. Extract document data using Vision explores the latest updates to barcode detection, text recognition, and document detection.

Here are some highlights from the session:

Barcode detection

  • VNDetectBarcodesRequestRevision2 adds new symbologies and updates the relation of the bounding box which is now relative to the Region of Interest (ROI) and no longer to the whole image (as with Revision 1).
    Results in respects to the Region of Interest
    If not specified and compiled against the latest SDK, you will always get the latest revision.
  • Support for 1D and 2D codes, and the detection of multiple codes and multiple symbologies at once (you can combine both detections) — no need to scan multiple times. However, keep in mind that the detection takes longer if you want to detect multiple codes and/or symbologies. Set up the request only with the codes which are relevant for the respective use case.
  • Capabilities to scan codes in low light scenarios

Text recognition

  • As a reminder, VNRecognizedTextRequest can be used in .fast mode (Latin character recognizer) and .accurate mode (ML-based recognizer) and got a Revision 2 in 2020
  • Language selection influences language correction as it picks the correct dictionary
  • If multiple languages should be recognized with recognitionLanguages, keep in mind that the order matters as the recognization gets resolved in the provided order

Document detection

  • New ML-based detector VNDetectDocumentSegmentationRequest which was trained on various types of documents (sheets of papers, signs, notes, receipts, labels, etc.) to get a low-resolution segmentation mask — however, this request runs only in realtime on devices with a Neural Engine
  • Document Segmentation vs Rectangle Detector
  • Document Segmentation vs Rectangle Detector

Site Footer

Sliding Sidebar