Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mistral Agents (mistral.ai)
161 points by eitanturok 10 months ago | hide | past | favorite | 42 comments



Wait, this 'Agents' thing seems to be just a way to couple a system prompt and temperature to a model, that's it?

What's the difference from sending the system prompt in the api call, as usual?

Edit: Oh, missed that: "We’re working on connecting Agents to tools and data sources."


There's this massive gap between those who can call API and those who can't. If you can't, then you get the same aspirational-AGI chat UI as everyone else.

I agree with the implied statement that 'Agents' doesn't feel right. Reminds me more of the projects that put the model in a loop.

It does feel to me to be a really tough thing to name & market, I'm about to release an app for this across all providers, I call it "Scripts" with "Steps" like chat, search, retrieval, art...


I implemented a number of enterprise Conversational AI tools for customer service back before the GenAI craze started and we used to just call it service orchestration and data/application integration. The chatbot was used to figure out what the customer wanted to do and then from there it was just about automating some business workflow. Customer wants to pay their bill, the bot needs to pull their current balance, get their payment information, process the payment. Customer wants to return a product, the bot needs to retrieve the order info, initiate an RMA, process a refund, etc. These were all well established business process that the bot would execute by making API calls or kicking off an RPA routine. The "agent" talk sounds to me like "let the LLM figure out what it needs to do and then do it" which I'm not even sure is the right approach for most enterprise use cases, it's how you get people tricking chatbots into selling them a new car for $1.


Why is tool picking such a hard functionality for these vendors to implement.

Seems like a lot of the heavy lifting will come from 3rd parties making their APIs compatible with llms.

There should be some sort of extension type app where people can build extensions or "tools" for llms and share them (I guess openAI sort or attempts to do this). Say I want to build one for Toast to order food. I can collect the info needed to run that tool (toast account info or whatever) and an API key for an appropriate llm and then use this configuration info for Toast to build out a middleware that can use natural langauge to build out an order and send the request to Toast via some function call.

This seems very doable and I don't understand why there aren't a million of these "tools" already built into some LLM centric tool aggregator/ web store. What is the hold up? Is it just 3rd parties not wanting to hand out API access for things that require payment to applications controlled by llms? Would these 3rd parties rather have their own assistant tool they run? I'd imagine that some central llm-extension aggregator could have a central mechanism for payment methods that the llm had access to that could be used to implement safegaurds.

Or is it simply that any assistant type tool that could be easily generalized like ordering food, booking a flight or inputing calender events is simply easier to handle doing yourself than asking an llm to do for you?


A lot of models are hit and miss when it comes to invoking tools. I have llama 3 8b with a weather tool but half the time it will just hallucinate giving me made up info instead of running the tool.

I imagine the big sites have similar issues and it undermines customer trust when they're given false information.


Genuine question, are there any examples of agents in production?


Depends on your definition. I created a mapping application that allows one to navigate and style the map with natural language (more or less) as well as some prototype database interaction. When the user inputs a prompt, it gets sent to an "agent" whose sole purpose is to send a request to the API with a custom system prompt with few-shot examples stating something along the lines of "determine which agent should handle this request...only respond with one of [NavigationAgent, StyleAgent, ...]". When the response comes back, the prompt is then sent to the proper agent to handle the request. Each agent has function definitions for properly returning parameters to use to manipulate the map. I don't use any special libraries like langchain or anything, it's just regular API calls organized into classes that have specific system prompt behavior defined, function definitions, and some user prompt context ingestion when required (e.g. the current extent of the map).


Why can’t this be done with functions? I don’t see why you need the complexity and unpredictability of using “agents” to do that.

I might be missing something?


I call them agents just because. I'm just doing function calling with custom system prompts for the most part. It's hardly complex, I wrote most of it in a few days.

EDIT: I should add that the first step is used to cut down on the number of function definitions I need to send to the model on each user prompt. Navigating a map can be done with as few as four function definitions but styling a map gets out of control fast (google "Mapbox Style Specification" if you want to see why).


Not quite sure if this is what you're looking for but Amazon has hidden their description/review search box behind a "Ask Rufus about this product" box. For example, go here and Ctrl+F for "Looking for specific info?":

https://www.amazon.com/PolyScience-Temperature-Controlled-Co...


Doctors are already using agents for scheduling. These agents can access calendar and talk to patients to arrange scheduling, changes, conflicts etc


This use-case actually makes a ton of sense. How many other low-hanging fruit applications for agents are out there like that?

Although now that I think about it, a lot of doctors practices have a MyChart-style portal where you can schedule an appointment yourself. Why does an LLM need to be involved in that process? I guess for people who still want to schedule over the phone, the LLM agent makes sense. Kind of, assuming you don't have any special case problems. Which patients most likely do, if they're calling in. Is an LLM actually a good solution here?


The LLM works great for this use case for doctors in Brazil where a lot of the scheduling happens in WhatsApp. Hooking up a WhatsApp bot to chatgpt and Google calendar is incredibly powerful


Yes, here's a directory of many of them https://staf.ai


Depends what you mean by "agents".


The article is about LLM agents, so I’m asking in context.

To clarify more, I see frameworks like CrewAI and similar, with tools even from Microsoft to define these “agents” quickly. But when I tried them, I noticed they are no more than chain of thought CoT functions to ask/extract/generate based on user input and functions output.

As such, they can be quite unpredictable, hence my question of examples of LLM agents being used in production. I just don’t see their value, but I might be missing something so wanted to see examples to understand more.


That’s the problem: the term “LLM agents” doesn’t have a clear, unambiguous meaning either.


The SOTA models have excellent instruction following capability and the ability to output in any format you want including JSON.

That's all you need from the model to be able to use it in an agent. Tell it to output commands in a given JSON format.

I assume that Mistral's API already allowed you to define the system prompt, right?


This is basically Mistral's attempt at custom GPTs?


I've been complaining about how vague and loosely defined the term "agents" is like a broken record for months. This is not going to help.


it is quite simple to explain. it is a while(1) and some if.


The problem is that if you ask two people you're likely to get two different answers. And those people probably incorrectly think that their version of a definition for "agents" is the same as everybody else's.


Yeah, but there is still no while(1) here.


One sad part of the GenAI wave happening right now is that we're past the golden age of open APIs.

It's hard to read data with widespread anti-abuse checks (CAPTCHAs), lack of open-format data (RSS support being spotty), and restricted APIs (ex: Twitter API). Companies have all the incentives to prevent bot use, and select for human eyeballs.

If we had a Yahoo Pipes sort of golden age, GenAI agents would have a vaster playground to play in, and would be more useful for us.

Consider building an agent for choosing what to do on weekends for a group of friends. The agent would need to keep state for past activities (X, Y, and Z went upstate to Storm King last week) and users' preferences (ex: liking dosas or Calder, dietary restrictions). This part is easy enough - you could just keep a notebook that's passed as context. Older context gets simply deleted or condensed into high level points.

But would it be easy for the agent to:

1) Look up nearby restaurants and events? (Perhaps Resy/OpenTable allow listing restaurants, but it's likely they have tons of anti-abuse tech. Is there even a place where you could see a list of public events - Google pays a third-party for this feed.)

2) Actuate on behalf of the user? (Do Resy and OpenTable allow authority delegation so the agent could book restaurants for users? There's no standard way to do this across venue types - concerts, museums, cooking classes. Is it realistic for agents to click through these sites on their own?)


seems to be a monetisation problem. everybody wants their cut. so any super agent needs to figure out how to pay them.

we could imagine a data/api marketplace, where such an agent could pay for the data and subscriptions.


You are correct - if a workflow/agent company starts bundling data and API access - they'll multiplicatively increase the capability of their agents.

LLMs themselves are becoming a commodity, plus or minus prompt-following/format-following growing pains. In a year or two, we'll have pretty decent general LLMs that can make use of databases and tools/APIs.

It's a race to see who can integrate all these things in a good way. It really is an execution problem, not an idea problem - it's so obvious.


This could/should be a direction for companies like Zapier...

Edit: or Stripe.


Since we've apparently moved from calling everything a Copilot to calling everything an Agent, this seems much closer to OAI's GPT Store than anything that is truly agentic.


Finally, finally we have a true and worthy successor to “AI” as buzzword. It’s “Agents” ladies and gentlemen.

Make sure to put it into your pitch as often as possible.


Computers, desktops, and now we have these things called "Virtual Machines".

Meaningless buzzword central.

<<eats gallery peanuts>>


Agent Intelligence or "AI"


Doesn't "AI" already stand for Apple Intelligence, so that might be a bit confusing...


Anything Intelligence


Because Copilot implies a human working in tandem with AI. An agent is an autonomous process. The goal of agents is to remove any human agency from the process. Need software done? It won't be a human's job. It will be the agent's in an iterative loop.


I understand why. Most products claiming to be agents today are simply prompts.


but worse! there's no tools or RAG/data yet...


No memory, no reasoning, no planning.


We just have to wait that LLaMA does it, then suddenly they will have it.

Mistral is like FitGirl Repacks for LLaMA.


don't forget about agentic workflows


>Agents help you create custom behaviour and workflows with a simple set of instructions and examples. So, it's just custom instructions baked in? I hope at least it's harder for them to get overwritten by the user?

>We’re working on connecting Agents to tools and data sources... So tools and RAG for data sources aren't available yet.

Way behind GPTs/assistants. What's the point of this yet?


To me, this looks like a direct competitor to AI21 Labs: https://www.ai21.com/


OpenAI is supposedly working on agents as well: https://news.ycombinator.com/item?id=41125900 (subthread)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: