Skip to main content


What Is Viewpoint?

Viewpoint is an AI-powered screen reader assistant that is designed to perform various tasks that traditional screen readers struggle with. Powered by Google's latest Gemini models, it is able to analyze and understand elements on your screen and allow you to interact with them. Viewpoint contains four modes:


Setup

To start using Viewpoint, you can download it from the link in the navigation bar (Windows only). This will download an installer that you can run to install the program. When you launch the program for the first time, it will prompt you for a Gemini API key. You can obtain a key for free from Google AI Studio, or by clicking the link below: Get An API Key Google provides a free tier that gives you 20 requests per day with each model. Viewpoint supports multiple models with varying degrees of accuracy, and you can switch between them in settings to get the most out of your API key. After pasting your key into the dialog and pressing Enter, Viewpoint will run in the background and be ready to use.


Using Viewpoint

Global Hotkeys

Below is a list of common hotkeys that apply to any mode in Viewpoint.

Key Action
Ctrl+Shift+\ Switch Modes
Ctrl+Shift+/ Activate Current Mode
Ctrl+Alt+Shift+V Open Settings
Ctrl+Shift+F4 Exit Viewpoint

You can change any of these key bindings in the settings menu.

UI Mode

UI Mode allows you to navigate any application that works with the mouse. When you activate it, Viewpoint will take a screenshot of the screen and identify the UI elements such as buttons, links, and menus. When it is finished, it will tell you how many elements it found, and your Tab and Shift+Tab keys will cycle through the UI it created as if it were a normal application. To activate an element, you can press Enter or Space to move the mouse and click on it. By default, Viewpoint will left-click an item, but you can add modifiers in order to change the behavior of the mouse click.

Modifier Behavior
Shift Double-Click
Ctrl Right-Click
Alt Drag (see below)

By default, after activating an item, Viewpoint will wait a short delay before scanning the screen again. This behavior can be changed in settings. You can close UI Mode at any time by pressing Ctrl+Alt+Shift+/, restoring normal navigation functionality.

Dragging Elements

Viewpoint allows you to drag one UI element to another. This can be used for apps with drag-and-drop functionality. To drag an element, press Alt with Enter or Space as mentioned above, and Viewpoint will announce that it is dragging the item. Next, focus the element you want to drag it to and activate it normally with Enter or Space. Viewpoint will automatically perform the drag.

Tips For UI Mode

Here are some tips to help you get the most out of UI Mode:

OCR Mode

OCR Mode will read all text visible on screen. When activated, Viewpoint will take a screenshot and identify all of the text with AI. The result will be shown in a dialog with the option to copy it to the clipboard for future use. Note: The result dialog may not immediately gain focus. If this happens, you will need to focus it with Alt+Tab.

Query Mode

Query Mode allows you to ask the AI questions about what is on screen, as well as locate specific UI elements. When activated, Viewpoint will first take a screenshot, then a dialog will appear where you can type your query. After submitting your question, the answer will be presented to you in one of two ways. If the AI determined that you were asking it to find one or more UI elements, the answer will be spoken by your screen reader, and you will be placed into UI Mode with only the elements requested. When this happens, all keys related to UI Mode are used. If there were no UI elements to find, then the answer will appear in a dialog similar to the OCR results dialog. You can copy the text, and use the close button to dismiss it. Note: The query and result dialogs may not immediately gain focus, and you may need to use Alt+Tab to focus them.

Prompt Groups

If you have certain prompts that you send to Query Mode frequently, you may wish to save them so that you don't have to type them every time. Viewpoint allows you to do this with prompt groups. You can create as many groups as you need, and each group has ten prompt presets available. You can configure the groups in settings. Below are the keys used to work with prompt groups.

Key Action
Ctrl+Alt+PageUp Previous Prompt Group
Ctrl+Alt+PageDown Next Prompt Group
Ctrl+Alt+1-0 Activate prompts 1 through 10

Tips For Query Mode

Query Mode is Viewpoint's most powerful feature, so here are some tips to get the most out of it.

PDF Reader

The PDF Reader will extract all text from any PDF, even if it is scanned or otherwise inaccessible. When activated, a file selection dialog will appear. Select the PDF you wish to read, and Viewpoint will begin using AI to extract all of the text. The progress will be displayed in a dialog, and when it is complete, a dialog will appear with the detected text, which you can copy to the clipboard for future use or read directly in the dialog. Notes:


Configuring Viewpoint

By pressing Ctrl+Alt+Shift+V, you can open Viewpoint's settings dialog, which features various options to customize your experience. The settings are divided into categories, and you can find more information on each one below. Note: The settings dialog may not immediately gain focus after activating it, so you may need to use Alt+Tab to focus it.

General

The general category contains various settings to control how Viewpoint behaves. Below is a list of settings with descriptions.

Setting Description
API Key Change the Gemini API key used by Viewpoint
Model Change the Gemini model
Rescan UI After Activation Determines whether Viewpoint will automatically refresh UI Mode after selecting an element
Delay Before UI Rescan Determines how long (in milliseconds) Viewpoint waits before refreshing the UI. This setting only matters when Rescan UI After Activation is enabled.
Close UI After Selection Determines whether UI Mode automatically exits after selecting an element
Play Sounds Determines whether Viewpoint will play sounds such as the screenshot sound and processing sound

Prompt Presets

The prompt presets section allows you to create and edit prompt groups for Query Mode. You can select, create, or delete groups, and when a group is selected, you can edit the ten preset prompts for that group.

Key Configuration

The key configuration section allows you to change most of Viewpoint's keyboard shortcuts. Select an action in the list, and then you can type the key combination you wish to use in the edit field. Key combinations follow the format key1+key2+key3, where key1, key2, key3, etc. are key codes. For characters on the keyboard, the key code is simply the character in question. For special keys like modifiers and function keys, you write the name of the key between less than and greater than signs. For example, the shift key would be . Below are several commonly used modifiers.