Viewpoint Documentation

Introduction

Viewpoint is an AI-powered accessibility tool designed to accomplish tasks that traditional screen readers struggle to complete. By leveraging the power of AI, the tool allows you to navigate inaccessible software, read on-screen text, and view scanned or otherwise inaccessible PDF files.

Powered by Google's latest Gemini model, Viewpoint can identify UI elements and text on screen and present them in an accessible format. It is designed to run alongside your screen reader and offers four distinct modes:

UI Mode: Interact with inaccessible software.
OCR Mode: Identify all on-screen text.
Query Mode: Ask questions about anything on your screen and find specific UI elements.
PDF Reader: Read scanned or inaccessible PDF files.

Setup

Since Viewpoint is powered by Gemini, you will need a Gemini API key, which can be obtained from the link below. Google provides free access to the API with generous usage limits that should be more than sufficient for daily use of Viewpoint.

Get An API Key

After obtaining an API key, you can simply download the installer and follow the installation prompts.

When you run Viewpoint for the first time, it will prompt you for your API key. Paste it into the dialog that appears. Your API key is stored locally and never shared with anyone.

Global Hotkeys

Below is a list of common hotkeys used throughout Viewpoint.

Hotkey	Action
`Ctrl` + `Shift` + `\`	Cycle Between Modes
`Ctrl` + `Shift` + `/`	Activate Selected Mode
`Ctrl` + `Alt` + `Shift` + `V`	Open Viewpoint Settings
`Ctrl` + `Shift` + `F4`	Close Viewpoint

UI Mode

UI mode allows you to navigate inaccessible software. When activated, Viewpoint takes a screenshot and identifies any on-screen UI elements.

Once the UI has been generated, you can navigate it with Tab and Shift + Tab. Pressing Enter or Space on an element will move the mouse to it and click on it. By holding Shift when activating an element, you can double-click the element instead.

After clicking on an element, Viewpoint will rebuild the UI after a short delay, allowing you to continuously use the mode without having to reactivate it. If you wish to exit UI mode without clicking an item, you can press Ctrl + Alt + Shift + /.

Note: For the best experience, make sure to maximize the window before activating UI Mode.

OCR Mode

OCR mode identifies all on-screen text and displays it in an accessible format. When activated, Viewpoint takes a screenshot and displays all detected text in a simple dialog.

You can navigate the dialog with standard text navigation commands. There is also a button to copy all of the detected text to the clipboard, enabling you to paste it somewhere else or save it to a file.

Note: Occasionally, the results dialog may not immediately gain focus. In such cases, you may need to use Alt + Tab to focus it before reading the detected text.

Query Mode

Query mode allows you to ask questions about what's on screen, as well as find specific UI elements.

Viewpoint takes a screenshot and presents a dialog for you to enter your query. If you requested it to find one or more UI elements, it will speak the response to your query and place the UI elements into UI mode for you to navigate between with Tab and Shift + Tab. You can interact with the elements in the same way as you would in UI mode.

If the response does not involve any UI elements, the answer will open in a dialog that can be navigated with standard text navigation commands. You can also copy the text to your clipboard to save it somewhere else.

Note: Occasionally, the results and query dialogs may not immediately gain focus. In such cases, you may need to use Alt + Tab to focus them before attempting to interact with them.

PDF Reader

The PDF reader extracts text from scanned or otherwise inaccessible PDF files and displays it in a simple dialog. When activated, Viewpoint prompts you to select the PDF file you wish to scan. After choosing a file, Viewpoint will extract and display all text in it.

Note: Occasionally, the results and file upload dialogs may not immediately gain focus. In such cases, you may need to use Alt + Tab to focus them before attempting to interact with them.