AzureのComputer Vision APIのOCR機能を試してみた

Azure Cognitive ServicesにあるComputer Vision APIを使ってみました。

Computer Vision — 画像処理および画像分析 | Microsoft Azure

Computer Vision APIといってもいくつか機能があって、今回はそのうちOCR機能をJavaScript（というかTypeScript）で試してみました。

OCRを試す

APIの仕様はここにあります。

<Microsoft Cognitive Services> developer portal

使い方は簡単でAPIのエンドポイントURLに、画像のURLをJSON形式でPOSTするか、画像のバイナリデータをPOSTするだけです。

TypeScriptでバイナリデータをPOSTする場合は次のような感じになるのかなと思います。

// 画像をBlobとして取得するとして
const blob: Blob = await getBlob();

// APIのURL（今回は東アジア）
const apiUrl = "https://eastasia.api.cognitive.microsoft.com/vision/v1.0/ocr";

// APIのキー
const apiKey = "内緒のキー";

// APIを呼び出す
const response = await fetch(apiUrl, {
    method: "POST",
    // リクエストのヘッダにメディアタイプとAPIのキーを指定する
    headers: {
        "Content-Type": "application/octet-stream", // ボディのメディアタイプ
        "Ocp-Apim-Subscription-Key": apiKey  // APIのキー
    },
    // リクエストのボディにBlobを設定する
    body: blob
});

// レスポンスはJSON
// OCRの結果を取得できる
const result: IResult = await response.json();

/*
// 適当に行（文字列）を抽出してみるならこんな感じ
result.regions
   .map(region => region.lines)
   .reduce((previous, current) => previous.concat(current))
   .map(line => line.words.map(word => word.text).join(" "))
   .forEach(line => console.log(line));
*/

レスポンスにはOCRの結果が入ります。フォーマットはJSONで、次のようなインターフェイスを持ったオブジェクトです。単にテキストだけでなく、その座標も取得できるようです。

// OCRの結果
interface IResult {
    language: string;
    textAngle: number;
    orientation: string,
    // 複数のリージョン
    regions: IRegion[];
}

// リージョン
interface IRegion {
    boundingBox: string;
    // 複数行を持つ
    lines: ILine[];
}

// 行
interface ILine {
    boundingBox: string;
    words: IWord[];
}

// 単語
interface IWord {
    boundingBox: string;
    text: string;
}

参考

JavaScriptのチュートリアルとかはこのあたり。