I'm not sure I understand your objection. You seem to imply that knowing the context is the same as knowing the solution to the problem the context provides?
Let me think of an example here. Context needed to determine if there is cancer in a radiology scan would be the contents of the scan. So there are two modes here, one I say "LLM please tell me if there is cancer in this patients scan" and the LLM makes an MCP call to load the patients report. The second mode is I say "LLM, here is the patients radiology scan, can you tell me if it has signs of cancer".
The first example is what I was calling a "pull" model and the second example is what I am calling a "push" model.
The point above about enterprise glue is why this is a pull model.
In your push model, the onus is on you to go find the scan from one of five backends, traverse whatever hoops of access are needed, and actually handle the files manually.
In the pull model, each backend implements the server once, the LLM gets connected to each one once, and you have one single flow to interact with all of them.
It is interesting that the model I am proposing inverts many peoples expectation of how LLMs will benefit us. In one vision, we give a data-lake of information to LLMs, they tease out the relevant context and then make deductions.
In my view, we hand craft the context and then the LLM makes the deductions.
I guess it will come down to how important crafting the relevant context is for making useful deductions. In my experience with writing code using LLMs, the effectiveness increases when I very carefully select the context and the effectiveness goes down when I let the agent framework (e.g. Cursor) figure out the context. The ideal case is the entire project fits in the context window obviously, but that won't always be possible.
What I've found is that LLMs struggle to ask the right questions. I will often ask the LLM "what other information can I provide you to help solve this problem" and I rarely get a good answer. However, if I know the information that will help it solve the problem and I provide it to the agent then it often does a good job.
> In my view, we hand craft the context and then the LLM makes the deductions.
We (as in users) provide the source material and our questions, the LLM provides the answers. The entire concept of a context is incidental complexity resulting from technical constraints, it's not anything that users should need to care about, and certainly not something they should need to craft themselves.
But it makes a radical difference to the quality of the answer. How is the LLM (or collaboration of LLMs) going to get all the useful context when it’s not told what it is?
(Maybe it’s obvious in how MCP works? I’m only at the stage of occasionally using LLMs to write half a function for me)
In short, that's the job of the software/tooling/LLM to figure out, not the job of the user to specify. The user doesn't know what the context needs to be, if they did and could specify it then they probably don't need an LLM in the first place.
MCP servers are a step in the direction of allowing the LLM to essentially build up its own context, based on the user prompt, by querying third-party services/APIs/etc. for information that's not part of their e.g. training data.
I'd be interested in examples of this. I've worked in offices for all of my adult life and I don't have any examples that come to mind.
I think of logic puzzles I used to do as a kid. The whole idea of the puzzle is that all of the information you need is provided, the fun is in solving using deduction. Sudoku scratches the same itch.
At the least, I would argue there are many problems that don't fit the mold you are suggesting and MCP is not the correct method for addressing them.
> I'd be interested in examples of this. I've worked in offices for all of my adult life and I don't have any examples that come to mind.
Wow, you must have worked in some really mature shops then if you knew instantly which of [Google Drive, Confluence, Airtable, GitHub wiki, ${that one deprecated thing that Alice was using}, ...] contained the reference to Project Frazlebaz mentioned in Slack last.. day? week? Maybe it was today but time is a blur?
I don't know how that is solved by MCP? How would the LLM possibly know where to search? Just making an API (or series of APIs) to slack/jira/airtable available doesn't magically surface the context, or the right search incantation to reveal it. The LLM still has to figure out which tool to search within, what search terms/tools are the right ones to choose, etc. If there are a million documents in your set of data providers and only 1000 fit into the context of the LLM, that filter happens somewhere.
This idea that if you don't know where the data is, magically the LLM will, is very confusing to me.
In my experience, this kind of thing is exactly what LLMs are good at, and fast at.
Here's a real example from my job as a BI dev. I needed to figure out how to get counts of incoming products from an ERP with a 1000+ table database schema in a data lake with 0 actual foreign keys. I sorta knew the data would need to come from the large set of "stock movements" tables, which I didn't know how to join, and I had no idea which rows from that table would be relevant to incoming product or even which fields to look at to even begin to determine that. I simultaneously asked a consultant for the ERP how to do it and asked Cursor a very basic "add the count of incoming units to this query" request.
Cursor gave me a plausible answer instantly, but I wasn't sure it was correct. When the consultant got back to be a few days later, the answer he gave was identical to Cursor's code. Cursor even thought of an edge case that the consultant hadn't.
It blew my mind! I don't know if Cursor just knows about this ERP's code or what, or if it ran enough research queries to figure it out. But it got it right. The only context I provided was the query I wanted to add the count to and the name of the ERP.
So, I 100% believe that, especially with something like MCP, the pull model is the right way. Let the LLM do the hard work of finding all the context.
MCP is just function calls with parameters. Whether or not it's push or pull can be decided by the author. A push model takes the scan as an input to the mcp call. A pull model does the pulling within the mcp call. Neither is right or wrong, it's situational.
Let me think of an example here. Context needed to determine if there is cancer in a radiology scan would be the contents of the scan. So there are two modes here, one I say "LLM please tell me if there is cancer in this patients scan" and the LLM makes an MCP call to load the patients report. The second mode is I say "LLM, here is the patients radiology scan, can you tell me if it has signs of cancer".
The first example is what I was calling a "pull" model and the second example is what I am calling a "push" model.