LLM Agents


So far, agents are a poorly explained concept in the field of natural language machine learning. In this post I outline the concept and its relevance to you, my reader.

Completion and Chat

Large Language Models (LLMs) predict completions for sequences of text.

The best food is

A completion model might suggest 'Pizza' or 'ice cream'.

Systems like ChatGPT are built on top of completions by framing a chat between a user and an assistant.

User and Assistant are chatting. Assistant is kind, honest, and as politically inert as possible.
====
User: Hi AI. I am Alice.
Assistant: Hello Alice, I am an artificial intelligence. How is your day?
User: Mine is great. How is yours?

The completion model picks something along the lines of AI: Mine is wonderful. How is the weather where you are?

Agents

While LLMs will often depict impressive feats of intelligence, they are extremely limited in a handful of key areas important to those of us building tools with them. Models like OpenAI's GPT-4 were trained on data up until September 2021. GPT-4 has no idea about the James Webb Space Telescope entering orbit in January 2022 or the FBI searching Mar-a-Lago in August 2022. They also struggle with queries that are not well-represented in their training data, such as "what day of the week was June 1, 1776?" or "find the square root of 923784". To get around these limitations and make LLMs more powerful, we engineers are doing something quite crafty.

A solution

We instruct the assistant that they are capable of doing something that they, in fact, can not.

What?

For example, we tell the Assistant that it is capable of searching google by just vocalizing its intention in the format google(...) with whatever it wants to search out loud.

Assistant can search google:
    google(weather los angeles)=72 degrees sunny
 or google(euro to dollar exchange rate)=1.07 US Dollar per Euro
 or google(Day of week june 1, 1776)=Saturday

Where is this going??

Stop Sequences

Now, the trick (or a minor simplification of it). We just halt the text completion whenever we encounter a )=, the end of our sequence of magic words. Upon halting, we perform the

User: Can you get me surf conditions at First Point Malibu?
Assistant: Yes, of course! google(surf report first point malibu)=

At this point — stop!. We, the programmer, intercept the use of the term google followed by parentheses, sneak out to google with that exact query, and replace it with the results.

User: Can you get me surf conditions at First Point Malibu?
Assistant: Yes, of course! Malibu's first point will be cleanest in the morning -- bring small wave gear. Tide push mid to later morning will help overall conditions into a choppy afternoon.

Et voilà. To the user, the LLM functioned (transparently) as an agent on the user's behalf. We're currently seeing agents proliferate into spaces such as math, news, weather, program execution, and much more. An agent is just a technical bridge over the limitations of LLMs.

Back to posts
TwitterUdemyMy Twitter ProfileMy Instagram

Copyright © Kevin Katz 2023

Privacy