Skip to content
Go back

VoxPilot: Private Voice-Augmented AI Assistant for Hands-Free Web Browsing

Edit page

TL;DR

VoxPilot is a privacy-first, on-device Chrome extension that lets you browse the web hands-free using natural voice or text prompts. It enables seamless navigation, content summarization, and Q&A without sending data to the cloud. Designed for accessibility and productivity, VoxPilot interprets flexible commands in real time using Gemini Nano and browser APIs. Form filling is coming soon!


Table of Contents

Open Table of Contents

Introduction

VoxPilot is a privacy-first Chrome extension that lets users navigate, summarize, and interact with web content hands-free using natural voice or text prompts.


Key Goals


Team


Motivation

Many assistive tools still force specially-abled users to memorize rigid scripts, making web browsing frustrating and limiting. VoxPilot removes that barrier by enabling natural, flexible voice or text prompts for navigation, summarization, and Q&A, delivering a seamless, privacy-first browsing experience.


How It Works

VoxPilot runs entirely on-device, combining built-in browser APIs for speech-to-text (STT) and text-to-speech (TTS), Gemini Nano for natural language understanding, and DOM-aware action execution. Instead of relying on server calls or rigid scripts, it interprets flexible prompts and maps them directly to browser interactions.


Flow:


Designs

VoxPilot Prompt Engineering Design


Video


Outputs

Below are some sample outputs demonstrating VoxPilot’s capabilities in action.

Q&A Prompting

Q&A Prompting Output

Web Navigation

Web Navigation Output


Current Limitations


Resources

VoxPilot PPT, GitHub Link (Coming Soon)


Edit page
Share this post on:

Previous Post
Operators and Knowledge States in the AI Safety Realm
Next Post
Hacking Legally: My Journey into CTFs and How You Can Start Too