# Simple OCR actions with Gondola

When implementing an automation script, sometimes we cannot directly interact with a control. If a control is rendered as an image in a canvas or an OpenGl viewport and contains text, OCR actions may help. However, OCR is not a silver bullet. Most OCR algorithms work best when there are extremely clear text segments. These segments need to be as high resolution (DPI) as possible and the characters in the input image cannot appear “pixelated” after segmentation. This means different fonts, languages, complex backgrounds and poorly trained data can create poor results.

In this guide we'll add Tesseract,a well-known open-source OCR framework), to a Gondola Project and use it to check some Japanese text in the Car Rental app.

# Prerequisites

# Building your OCR project

# Create sample project and install the necessary dependencies

Start with Creating an ABT Project. Once the project has been created, run the following commands in the terminal:

# A library for creating unique names when capturing images
npm i uuid @types/uuid

# A library is for pre-processing the captured images
npm i pngjs @types/pngjs

# The JS/TS version of Tesseract
npm i tesseract.js @types/tesseract.js

# Creating a Tesseract worker

Create the file src/utilities/ocr.ts to store the source code for interacting with Tesseract.

import Tesseract = require("tesseract.js");
import pngjs = require("pngjs");
import fs = require("fs");
import path = require("path");

export interface TextCoordinate {
    x0: number,
    y0: number,
    x1: number,
    y1: number
}

export interface Color {
    red: number,
    green: number,
    blue: number
}

var worker: Tesseract.TesseractStatic;

/**
 * When running, Tesseract.js will create a worker process to handle OCR tasks.
 * For optimal performance, we will create and keep the worker running for reuse.
 */
export async function StartOcrWorker() {
    worker = Tesseract.create({ } as any);
}

/**
 * Terminate the worker when finished.
 */
export function StopOcrWorker() {
    (worker as any).terminate()
}

/**
 * GetTextCoordinates performs 4 main steps:
 *  1. Pre-process the image to increase the success rate of OCR
 *  2. Call Tesseract to perform OCR
 *  3. Find all occurrences of the given text
 *  4. Return all coordinates where the text has been found
 * @param imagePath
 * @param text
 * @param options
 */
export async function GetTextCoordinates(imagePath: string, text: string,
    options?: {
        lang?: string,
        dpi?: number,
        invert?: boolean,
        textColor?: Color
    }): Promise<TextCoordinate[]> {

    let result: TextCoordinate[] = [];
    process.execArgv.length = 0 //Workaround for debugging

    //Step 1: Pre-process the image
    if (options && options.invert) {
        const invertedImage = path.join(path.dirname(imagePath),
            `inverted_${path.basename(imagePath)}`);
        await preProcessImage(imagePath, options, invertedImage);
        imagePath = invertedImage;
    }

    //Step 2: Perform OCR
    const recognizedResult = await worker.recognize(imagePath, options);
    if (recognizedResult && recognizedResult.text.indexOf(text) >= 0) {
        //Step 3: Find the line that contains the provided text
        recognizedResult.lines.forEach((line) => {
            const index = line.text.indexOf(text)
            if (index >= 0) {
                //Step 4: Get the coordinates of all symbols to build our text coordinates
                let textPosition: TextCoordinate = {
                    x0: Number.MAX_VALUE,
                    y0: Number.MAX_VALUE,
                    x1: -1,
                    y1: -1
                }
                for (let i = index; i < index + text.length - 1; i++) {
                    textPosition = increaseBox(textPosition, line.symbols[i].bbox);
                }
                result.push(textPosition);
            }
        });
    }
    return result;
}

/**
 * preProcessImage helps to improve the image quality for the best OCR result.
 * In this function, we implement two algorithms:
 *      1. Simple color inversion
 *      2. Attempt to change all non-text colors to white and change the text's color to black
 * @param imagePath
 * @param options
 * @param invertedImage
 */
async function preProcessImage(imagePath: string, options: { lang?: string | undefined; dpi?: number | undefined; invert?: boolean | undefined; textColor?: Color | undefined; }, invertedImage: string) {
    await new Promise((resolve, reject) => {
        fs.createReadStream(imagePath)
            .pipe(new pngjs.PNG({
                filterType: 4
            }))
            .on('parsed', function (this: pngjs.PNG) {
                for (var y = 0; y < this.height; y++) {
                    for (var x = 0; x < this.width; x++) {
                        var idx = (this.width * y + x) << 2;
                        if (options.textColor) {
                            //If it matches our color => convert it to black
                            if (this.data[idx] == options.textColor.red
                                && this.data[idx + 1] == options.textColor.green
                                && this.data[idx + 2] == options.textColor.blue) {
                                this.data[idx] = 0;
                                this.data[idx + 1] = 0;
                                this.data[idx + 2] = 0;
                            }
                            else { //If it doesn't match => white
                                this.data[idx] = 255;
                                this.data[idx + 1] = 255;
                                this.data[idx + 2] = 255;
                            }
                        }
                        else {
                            // invert color
                            this.data[idx] = 255 - this.data[idx];
                            this.data[idx + 1] = 255 - this.data[idx + 1];
                            this.data[idx + 2] = 255 - this.data[idx + 2];
                        }
                    }
                }
                this.pack().pipe(fs.createWriteStream(invertedImage))
                    .on("close", () => resolve());
            });
    });
}

/**
 * Combine given bounding boxes to create a bigger one
 * @param tobeIncrease
 * @param bbox
 */
function increaseBox(tobeIncrease: TextCoordinate, bbox: Tesseract.Bbox): TextCoordinate {
    return {
        x0: Math.min(tobeIncrease.x0, bbox.x0),
        y0: Math.min(tobeIncrease.y0, bbox.y0),
        x1: Math.max(tobeIncrease.x1, bbox.x1),
        y1: Math.max(tobeIncrease.y1, bbox.y1),
    }
}

# Create a Gondola Page Object

After creating the OCR utility, we'll create a Page Object. Right click on the src/pages folder and create a file named ocrPage.ts. Copy and paste the following code into ocrPage.ts and save it.

import { TextCoordinate, GetTextCoordinates, StartOcrWorker, StopOcrWorker } from "../utilities/ocr";
import { action, gondola, page } from "@logigear/gondola";
import fs = require("fs");
import path = require("path");
import uuid = require("uuid");

@page
export class ocrPage {

    @action("tap ocr text", "Taps on given text")
    public async tapText(text: string, language = "eng", index = 1, touchDuration = 100, invert = true) {
        const coordinates = await this.findTheText(text, language, invert);
        if (!coordinates || coordinates.length === 0) {
            gondola.checkEqual("Not found", `'${text}' is found`,
                "OCR function cannot find the given text");
        }
        let toClick: TextCoordinate;
        if (coordinates.length >= index) {
            toClick = coordinates[index - 1];
        } else {
            gondola.checkEqual(`Not found ${index}`,
                `'${text}' appears at least ${index} time(s)`,
                "OCR function cannot find the given text");
            return;
        }
        await this.tapAtCoordinate(toClick, touchDuration);
    }

    @action("check ocr text", "Checks given text appears on screen")
    public async checkText(text: string, language = "eng", index = 1, invert = true) {
        const coordinates = await this.findTheText(text, language, invert);
        if (!coordinates || coordinates.length === 0) {
            gondola.checkEqual("Not found", `'${text}' is found`,
                "OCR function cannot find the given text");
        }
        if (index && index > 0) {
            if (coordinates.length < index) {
                gondola.checkEqual(`Not found ${index}`,
                    `'${text}' appears at least ${index} time(s)`,
                    "OCR function cannot find the given text");
            }
        }
    }

    @action("start ocr worker", "Start the OCR worker")
    public async startOcrWorker(){
        StartOcrWorker();
    }

    @action("stop ocr worker", "Stop the OCR worker")
    public async stopOcrWorker(){
        StopOcrWorker();
    }

    private async tapAtCoordinate(toClick: TextCoordinate, touchDuration: number) {
        await (await gondola.getCurrentBrowser()).touchPerform([
            {
                action: "press",
                options: {
                    x: (toClick.x0 + toClick.x1) / 2,
                    y: (toClick.y0 + toClick.y1) / 2,
                },
            }, {
                action: "wait",
                options: {
                    ms: touchDuration,
                },
            }, {
                action: "release",
                options: {},
            },
        ]);
    }

    private async findTheText(text: string, language: string, invert = false) {
        //Temp folder to store captured images
        if (!fs.existsSync("./temp-images")){
            fs.mkdirSync("./temp-images");
        }
        let imagePath = path.join("./temp-images", `ocr_image_${uuid.v4()}.png`)
        await gondola.saveScreenshot(imagePath);
        const caps = await gondola.getCapabilities();
        return GetTextCoordinates(imagePath, text, {
            lang: language, 
            /**TODO Get the DPI of iOS device*/
            dpi: (caps.platformName === "android" ? caps.deviceScreenDensity : 300),
            invert,
            textColor: {red: 255, green: 255, blue: 255}
        });
    }
}
export default new ocrPage();

# Create an OCR test using Gondola Test Designer

Once you've created the Page Object ocrPage, We can use its methods in Gondola Test Designer.

Right click on the src/tests folder then click New Gondola Test. When you press CTRL + space in an action cell you should be able to see the ocr actions.

TIP

OCR won't always be correct. In this example, the login button with characters ログイン is recognized as 口グイン. There is still work to do if you'd like to use the sample code in your Test Suite (see the TODO tag in the code comments).

TIP

The first test run may be slow since Tesseract will need to download the language data.

Compile then run the project to see the result Make sure your emulator/device is running/connected and that Appium server is running.

# Compile the scripts
npm run compile

# Run the OCR test on Android
npm run test:android -- --grep OCR

# View the result
npm run show-report
Last Updated: 12/28/2020, 4:12:58 AM