From Pixels to Insights: How I Built an OCR-Powered UI Test in TypeScript
The Challenge: Making Screens Speak
Modern UI testing often relies on traditional DOM element checks, but what happens when text is rendered dynamically or embedded in images? I faced this challenge while automating an application that had crucial textual information displayed in non-selectable areas — error messages in images, CAPTCHA-like UI elements, and dynamically loaded text.
I needed a way to verify the presence of text on the screen, not just within the HTML structure. That’s when I turned to Optical Character Recognition (OCR) and decided to integrate Tesseract.js into my testing workflow.
A Real-World Use Case: Handling Unchanging Attributes
During a take-home task, I encountered a scenario where an element’s text initially displayed ‘Order incorrect’, and after modifying the order, it changed to ‘Order correct’. However, the following attributes remained unchanged, making conventional test automation methods ineffective:
- value: orderStatusMessage
- name: orderStatusMessage
- label: orderStatusMessage
- enabled: true
- visible: true
- accessible: true
- x, y, width, height, index: remained the same
In most cases, I would suggest asking the development team to add a unique attribute for automation purposes. But in situations where this isn’t feasible, OCR provides an alternative way to validate the text on the screen.
The Solution: Capturing and Analyzing Screenshots with OCR
To address this, I built a TypeScript method using WebDriver and Tesseract.js, enabling automated tests to visually verify text. Here’s how it works:
/**
* Visually checks the screen for a specified text using OCR.
* @param text The text to search for in the screenshot.
* @returns True if the text is found, false otherwise.
* @throws Error if an error occurs during the OCR process or screenshot capture.
*/
public async visuallyCheckForText(text: string): Promise<boolean> {
const screenshotDir = path.resolve(__dirname, '../../screenshot/');
const screenshotPath = path.join(screenshotDir, 'screenshot.png');
try {
// Ensure the screenshot directory exists
if (!fs.existsSync(screenshotDir)) {
fs.mkdirSync(screenshotDir);
}
// Capture a screenshot
await driver.saveScreenshot(screenshotPath);
// Use Tesseract.js for OCR
const result = await Tesseract.recognize(screenshotPath, 'eng', {
logger: (m) => console.log(m),
});
// Extract text and verify
const recognizedText = result.data.text;
return recognizedText.includes(text);
} catch (error) {
console.error('Error during OCR process:', error);
throw error;
} finally {
// Cleanup: Delete the screenshot
if (fs.existsSync(screenshotPath)) {
fs.unlinkSync(screenshotPath);
console.log('Deleted screenshot file.');
}
}
}
Implementing the Method in a Page Object Model
Since Android provides a text attribute that changes dynamically, I limited the OCR-based check to iOS devices in my test automation framework. Here’s how I integrated it into my page object file:
public async isExpectedMessageDisplayed(message: string): Promise<boolean> {
if (PLATFORM === 'ios') {
return this.wdCommands.visuallyCheckForText(message);
} else {
const actualMessage: string = await this.wdCommands.getText(
await this.orderStatusMessage,
);
return message === actualMessage;
}
}
Breaking It Down: How This Works
- Capturing the UI: The method first ensures the screenshot directory exists and then captures the current screen.
- Processing with OCR: The captured image is analyzed using Tesseract.js to extract any text present.
- Matching Text: The recognized text is checked against the expected string.
- Cleanup: To keep the environment clean, the screenshot file is deleted at the end of each run.
- Platform-Specific Handling: OCR-based validation is only used for iOS, while traditional text attribute checks are used for Android.
The Outcome: Enhanced UI Testing with Visual Validation
By integrating OCR-based validation, my automated tests became more resilient and capable of verifying non-traditional UI elements. This method successfully detected error messages embedded in graphical elements, validated dynamically rendered text, and enhanced accessibility testing.
Explore the Code
You can find more details in this GitHub repository: webdriver_io_appium
Final Thoughts
Visual validation using OCR isn’t just useful for testing — it’s a powerful tool for accessibility, data extraction, and automated insights. If you’re working with applications where text might not be directly accessible via the DOM, consider integrating OCR-powered tests into your workflow!
Have you ever faced similar challenges? I’d love to hear how you tackled them!