What Is Viewpoint?
Viewpoint is an AI-powered screen reader assistant that is designed to perform various tasks that traditional screen readers struggle with. Powered by Google's latest Gemini models, it is able to analyze and understand elements on your screen and allow you to interact with them. Viewpoint contains four modes:
- UI Mode: Navigate inaccessible interfaces.
- OCR Mode: Read all text on screen.
- Query Mode: Ask questions and find items on screen.
- PDF Reader: Read scanned or otherwise inaccessible PDF files.
Setup
To start using Viewpoint, you can download it from the link in the navigation bar (Windows only). This will download an installer that you can run to install the program. When you launch the program for the first time, it will prompt you for a Gemini API key. You can obtain a key for free from Google AI Studio, or by clicking the link below: Get An API Key Google provides a free tier that gives you 20 requests per day with each model. Viewpoint supports multiple models with varying degrees of accuracy, and you can switch between them in settings to get the most out of your API key. After pasting your key into the dialog and pressing Enter, Viewpoint will run in the background and be ready to use.
Using Viewpoint
Global Hotkeys
Below is a list of common hotkeys that apply to any mode in Viewpoint.
| Key | Action |
|---|---|
| Ctrl+Shift+\ | Switch Modes |
| Ctrl+Shift+/ | Activate Current Mode |
| Ctrl+Alt+Shift+V | Open Settings |
| Ctrl+Shift+F4 | Exit Viewpoint |
You can change any of these key bindings in the settings menu.
UI Mode
UI Mode allows you to navigate any application that works with the mouse. When you activate it, Viewpoint will take a screenshot of the screen and identify the UI elements such as buttons, links, and menus. When it is finished, it will tell you how many elements it found, and your Tab and Shift+Tab keys will cycle through the UI it created as if it were a normal application. To activate an element, you can press Enter or Space to move the mouse and click on it. By default, Viewpoint will left-click an item, but you can add modifiers in order to change the behavior of the mouse click.
| Modifier | Behavior |
|---|---|
| Shift | Double-Click |
| Ctrl | Right-Click |
| Alt | Drag (see below) |
By default, after activating an item, Viewpoint will wait a short delay before scanning the screen again. This behavior can be changed in settings. You can close UI Mode at any time by pressing Ctrl+Alt+Shift+/, restoring normal navigation functionality.
Dragging Elements
Viewpoint allows you to drag one UI element to another. This can be used for apps with drag-and-drop functionality. To drag an element, press Alt with Enter or Space as mentioned above, and Viewpoint will announce that it is dragging the item. Next, focus the element you want to drag it to and activate it normally with Enter or Space. Viewpoint will automatically perform the drag.
Tips For UI Mode
Here are some tips to help you get the most out of UI Mode:
- After activating the mode, do not switch windows, even if you are going to switch back. If the window appears in a different location than it did when you first scanned, Viewpoint will not click on the correct item, as it relies on item positions not changing.
- The Gemini 3 models have significantly higher accuracy than the 2.5 models, so they are recommended for the best experience. Older models will miss more of their clicks and could end up clicking on the wrong item.
OCR Mode
OCR Mode will read all text visible on screen. When activated, Viewpoint will take a screenshot and identify all of the text with AI. The result will be shown in a dialog with the option to copy it to the clipboard for future use. Note: The result dialog may not immediately gain focus. If this happens, you will need to focus it with Alt+Tab.
Query Mode
Query Mode allows you to ask the AI questions about what is on screen, as well as locate specific UI elements. When activated, Viewpoint will first take a screenshot, then a dialog will appear where you can type your query. After submitting your question, the answer will be presented to you in one of two ways. If the AI determined that you were asking it to find one or more UI elements, the answer will be spoken by your screen reader, and you will be placed into UI Mode with only the elements requested. When this happens, all keys related to UI Mode are used. If there were no UI elements to find, then the answer will appear in a dialog similar to the OCR results dialog. You can copy the text, and use the close button to dismiss it. Note: The query and result dialogs may not immediately gain focus, and you may need to use Alt+Tab to focus them.
Prompt Groups
If you have certain prompts that you send to Query Mode frequently, you may wish to save them so that you don't have to type them every time. Viewpoint allows you to do this with prompt groups. You can create as many groups as you need, and each group has ten prompt presets available. You can configure the groups in settings. Below are the keys used to work with prompt groups.
| Key | Action |
|---|---|
| Ctrl+Alt+PageUp | Previous Prompt Group |
| Ctrl+Alt+PageDown | Next Prompt Group |
| Ctrl+Alt+1-0 | Activate prompts 1 through 10 |
Tips For Query Mode
Query Mode is Viewpoint's most powerful feature, so here are some tips to get the most out of it.
- You can use Query Mode to click on things that are not detected in UI Mode. To do this, ask Viewpoint to locate the item you are looking for and include the words "as a UI element" in your prompt. For example: "Locate the picture of the building as a UI element."
- If you use Query Mode for several apps, it is recommended to create a prompt group for each app to keep all of your presets organized.
PDF Reader
The PDF Reader will extract all text from any PDF, even if it is scanned or otherwise inaccessible. When activated, a file selection dialog will appear. Select the PDF you wish to read, and Viewpoint will begin using AI to extract all of the text. The progress will be displayed in a dialog, and when it is complete, a dialog will appear with the detected text, which you can copy to the clipboard for future use or read directly in the dialog. Notes:
- The dialogs may not immediately gain focus. If this happens, you may need to use Alt+Tab to focus them.
- When the scan completes, you may need to press the close button on the progress dialog to get the result dialog to appear.
Configuring Viewpoint
By pressing Ctrl+Alt+Shift+V, you can open Viewpoint's settings dialog, which features various options to customize your experience. The settings are divided into categories, and you can find more information on each one below. Note: The settings dialog may not immediately gain focus after activating it, so you may need to use Alt+Tab to focus it.
General
The general category contains various settings to control how Viewpoint behaves. Below is a list of settings with descriptions.
| Setting | Description |
|---|---|
| API Key | Change the Gemini API key used by Viewpoint |
| Model | Change the Gemini model |
| Rescan UI After Activation | Determines whether Viewpoint will automatically refresh UI Mode after selecting an element |
| Delay Before UI Rescan | Determines how long (in milliseconds) Viewpoint waits before refreshing the UI. This setting only matters when Rescan UI After Activation is enabled. |
| Close UI After Selection | Determines whether UI Mode automatically exits after selecting an element |
| Play Sounds | Determines whether Viewpoint will play sounds such as the screenshot sound and processing sound |
Prompt Presets
The prompt presets section allows you to create and edit prompt groups for Query Mode. You can select, create, or delete groups, and when a group is selected, you can edit the ten preset prompts for that group.
Key Configuration
The key configuration section allows you to change most of Viewpoint's keyboard shortcuts. Select an action in the list, and then you can type the key combination you wish to use in the edit field. Key combinations follow the format key1+key2+key3, where key1, key2, key3, etc. are key codes. For characters on the keyboard, the key code is simply the character in question. For special keys like modifiers and function keys, you write the name of the key between less than and greater than signs. For example, the shift key would be
through