Context AI

Published on 2024-12-10 by Jonathan Maiorana

A minimal proof-of-concept for an application that tries to identify concepts in live audio and provide information about those concepts.

Context AI

This is a minimal, proof-of-concept for an application that prompts a user to share their system audio and then returns a live stream of key concepts and definitions. It uses the Whisper and Text models from the OpenAI API and is deployed on Fly.io. It is written using Javascript with React on the front end and Express on the back end.

The audio is sent to the back end in five second chunks. On the back end the chunks are sent to the Whisper speech-to-text model from OpenAI. Sending five second chunks is very easy to implement but not ideal for transcription accuracy. The fixed length chunks often result in losing context or breaking up related concepts. A potential improvement would be local audio analysis to determine more optimal moments to send audio chunks. Another improvement could be to change to a streaming approach. In lieu of streaming, interleaving chunks, reconstructing context, and then retroactively updating the analysis could also improve accuracy.

The current implementation takes the text extracted from the audio samples, passes them to GPT-4o, and requests a structured JSON output response. The proompt could defintely use some more tweaking. Currently it is a bit too generic in the concepts it identifies. Though this likely also is due to the rigid five second audio chunking. The JSON response is returned to the front end and displayed as a stream of concept+description boxes.

A bit of effort was put in to maing the UI/UX not totally lame. The Motion JS animation library is used on the front end to make it easier to track new concepts streaming in. Implementing some basic UI animations with Motion was surprisingly straightforward. A bit of faux timing delay is added to each new concept before it is displayed so that they appear more linearly instead of all at once as responses come back.

There are no tests, nothing is refined, and it currently burns through OpenAI credits like there's no tomorrow, but it works and definitely counts as a proof-of-concept.

Give Context AI a whirl and let me know what you think.