Show HN: MonkeyPatch – Cheap, fast and predictable LLM functions in Python

p10jkle · on Nov 15, 2023

Nice! If I were to write a test for invariant aspects of the function (eg, it produces valid json), will the system guarantee that those invariants are fulfilled? I suppose naively you could just do this by calling over and over and 'telling off' the model if it didn't get it right

JackHopkins · on Nov 15, 2023

The type constraints are indeed enforced but not by the tests but by the type-hints you give to the patched functions. The constraints and enforced structure are followed, there is also a repair feedback loop in place if the original LLM output is invalid for the types you've declared. Tests are more to align the model how to act for different inputs. Hope this makes it more clear!

Scipio_Afri · on Nov 16, 2023

Can I use open source LLMs with this? Would be great if everything was available self hosted with open source models.

JackHopkins · on Nov 16, 2023

Native support for OS LLMs is on the roadmap - the main challenge is to figure out how to manage the knowledge distillation for local models. It’s a top priority (along with typescript support), so check back in a few weeks?

Right now any ‘plug-in’ model needs to conform to the OpenAI API.

(P.S Carthago delenda est)

ipsum2 · on Nov 16, 2023

This is like calling a python package "ListComprehension", that loops through a list and calls OpenAI's API on each item. Confusing and unproductive.

JackHopkins · on Nov 16, 2023

The python package (and repo) is called ‘monkeypatch.py’ for the avoidance of confusion.

ipsum2 · on Nov 16, 2023

Calling my library "listcomprehension.py" doesn't really avoid confusion. In fact, `pip install monkey-patch.py` looks downright odd.

JackHopkins · on Nov 16, 2023

Yeah I definitely agree on the latter point, it does look odd. PyMonkeyPatch?

ipsum2 · on Nov 16, 2023

I don't think you grasp the point.

JackHopkins · on Nov 17, 2023

I understand the point. I would ideally like an association with monkey-patching something as that is relevant to the behaviour of the package. However, not so similar that it shadows the technique of monkey-patching!

apstls · on Nov 17, 2023

LlamaPatch? Once open source model support is added of course :)

mitthrowaway2 · on Nov 16, 2023

There seems to be a lot of (justified) concern about the name. Maybe call it LLMonkeyPatch?

ASalazarMX · on Nov 16, 2023

LLMonkeyPatch is one of the best suggestions here, only adds two letters that fence nicely the scope of the monkey patching, and looks playful.

Other suggestions, like PyMonkeyPatch, leave the reader to guess what is being monkey patched.

JackHopkins · on Nov 16, 2023

PyMonkeyPatch? MonkeyPatch.py?

I would quite like a short and distinctive name!

m_vyas123 · on Nov 15, 2023

Hey Jack! Thanks for sharing this. The incremental fine-tuning of smaller and cheaper models for cost reduction is definitely a really interesting differentiator. I had a few questions regarding the reliability of the LLM-powered functions MonkeyPatch facilitates and the testing process. How does MonkeyPatch ensure the reliability of LLM-powered functions it helps developers create, and do the tests employed provide sufficient confidence in maintaining consistent output? If tests fall short of 100% guarantee, how does MonkeyPatch address concerns similar to historical challenges faced with testing traditional LLMs? Thanks.

JackHopkins · on Nov 16, 2023

Heya, no worries - I’m glad to share it.

MonkeyPatch ensures reliability through what we call ‘test-driven alignment’, in which the tests that reference the patched functions are guaranteed to pass. The more align ‘tests’ you create, the more rigorous a contract that the functions have to fulfil.

The other way to increase consistency is using more constrained type annotations (i.e using pydantic field annotations), which is a similar concept to MarvinAI and Magentic.

OJFord · on Nov 16, 2023

Why 'Monkeypatch', when it's for Python, where that has an established and as far as I can tell (?) completely irrelevant meaning?

JackHopkins · on Nov 16, 2023

(As I understand it) monkey-patch means to modify code in runtime. I thought the naming was relevant because by adding ‘@monkey.patch’ to an unimplemented function, this library gives it an implementation at runtime.

bb88 · on Nov 16, 2023

I feel like you're hijacking a term at least 15 to maybe 20 years old though. It's kind of a brilliant marketing idea, but you're just confounding the vocabulary.

It's not to say this isn't a fucking great idea (because it is!). But you know, just don't piss in the community well.

Further, PostgreSQL means one and only one thing. But Monkey Patch means 2 now, apparently!

For reference: Here's the google ngram viewer link for "monkey patch" from 2000 to 2019.

https://books.google.com/ngrams/graph?content=monkey+patch&y...

JackHopkins · on Nov 16, 2023

The specific package name on GitHub and PyPi is ‘monkeypatch.py’ for the avoidance of doubt!

vqbd · on Nov 16, 2023

The problem isn't accidentally downloading the wrong package. How am I supposed to talk about this package? I regularly use monkeypatching (the existing meaning) to prototype out concepts. "I monkey patch dot pied this function"? Cause that doesn't seem to be the name used everywhere.

Why not name it something like Gorillapatch? Gorillas are stronger than monkeys as a slogan or whatever.

The core issue is how am I supposed to talk about regular monkeypatching and your library in the same sentence.

JackHopkins · on Nov 16, 2023

This is a solid point, at the end of the day creating confusion with the name wasn't the goal in any way as there already are 100s of (often overlapping) terms floating around in this scene. I appreciate the critique and we'll have a think over this. If you have any other naming ideas, would love to hear them!

pbronez · on Nov 16, 2023

Maybe keep the “Monkey” and work around that?

MonkeySeeDo —> it sees what you’re doing and does it better

CutMonkey —> it’s a monkey patch that’s cut weight to lean fighting trim

TypeMonkey —> it uses types to intelligently monkey patch your code

MonkeyZipper —> monkey patches that compresses your code

MonkeyModels —> monkey patches your models

Learning Monkey —> learns how to improve your Large Models

Branching out from there…

SimianStudy —> it’s a monkey patch that learns

PyStill —> it distills your Python functions

AutoSqueeze —> automatically squeezes your AI code into efficient implications

bb88 · on Nov 16, 2023

That's a distinction without a difference...

OJFord · on Nov 16, 2023

I guess. I think of it more like overriding an existing thing with another for a given scope/time/test, whereas this is providing an implementation for a thing which exists only as a stub.

Maybe it 'counts', I'm not meaning to be picky about the term, my point is more like even if this works by monkeypatching, I wouldn't personally use the term for the product which does something else by those means, if that makes sense? MonkeyLLM or MonkAIpatch or something, sure.

The other comment with a 'list comprehension' example puts it well I think.

(Seeing as you asked for name suggestions elsewhere, I think I prefer the 'stub' theme: Stubby, StubAI, Stubmonkey if you like monkeys/wanted an easier logo, something like that.)

JackHopkins · on Nov 16, 2023

I do get the point and the difference from the classical monkey-patching. I like the stub ideas though!

CyberDildonics · on Nov 16, 2023

MonkeyPatch is a specific programming term that people have been using for decades. What would posses someone to name a programming tool "MonkeyPatch" when the tool doesn't even have something to do with patching?

JackHopkins · on Nov 16, 2023

Monkeypatch (as I understand it) means to modify code at runtime. This library modifies functions at runtime to use an LLM as an execution target - I thought it was an apt (if admittedly cheeky) name! Appreciate the critique regardless.

JackHopkins · on Nov 16, 2023

The package is called ‘monkeypatch.py’ on GitHub and PyPi.

zo1 · on Nov 16, 2023

Many people here have said something about the naming, and you keep repeating that it's "monkeypatch.py" in response as if it fixes it. Maybe take the advice and just rename it to something while it's still early. You'll have a tough-enough time convincing people to use this novel/odd/unique concept without having the name confusion and bad-will from the community stemming from you appropriating a common term.

JackHopkins · on Nov 16, 2023

Don't get me wrong, I do appreciate the criticism of the current naming! It does seem to create some unwanted friction of using or talking about the library, I was just trying to explain the thought process and ideate on top of it but we will have a second look regarding the name and how to make using and talking about the library as unconfusing as possible

JackHopkins · on Nov 16, 2023

Thoughts on something like PyMonkeyPatch? GorillaPatch?

jstarfish · on Nov 16, 2023

Honestly dude, just drop the simian theme and pick a different, possibly-endangered animal instead.

Monkeys have nothing to do with your project and were a meme over a decade ago, which makes your brand-new project look dated.

Apes are associated with being heavyweight and freakishly strong and have been long-associated with racial slurs in America. You're only ever one degree removed from coming out with "chimpout.py" or "statutoryape.py" or something else that'll get you cancelled for unintended racism.

Your tool seems like it's meant to be reliable, used for work, and possibly elegant in its code. Consider the name of a work animal for their efficiency or birds for lightweight, graceful maneuverability.

JackHopkins · on Nov 17, 2023

Yeah this is fair. I’m not attached to a simian theme if we’re ditching specific association to monkey-patching something. Or indeed, a ‘patching’ theme for that matter.

A new name is definitely in order. I will think about it over the weekend.

Thanks for the feedback, I appreciate it.

CyberDildonics · on Nov 16, 2023

Monkey patching isn't what your library is supposed to do, it's just a mechanic that it uses to get there. This would be like me making a scripting language and calling it "virtual machine". Then when people ask "why is a scripting language called virtual machine" I would say "it uses a virtual machine and the file is called virtualmachine.py".

sweetgiorni · on Nov 16, 2023

Slightly tangential: is it unfair/unreasonable to judge a project by its name? It's hard not to interpret this project's name as the result of poor judgement. Is that sufficient cause to write off the project entirely? That may seem a tad dramatic but I feel that it's a fairly strong signal for how little effort I need to put into evaluating it.

JackHopkins · on Nov 16, 2023

Do any other names jump out at you as preferable?

babyshake · on Nov 16, 2023

Not including "pass" in a function definition in Python makes the code not compilable, and if we're using VSCode, PyCharm, etc. our IDEs will complain about this whenever the code is viewed. Is this an intentional design decision?

JackHopkins · on Nov 16, 2023

The IDEs shouldn't complain if the function has a docstring (which all the MP functions should have as that's the instruction that is executed) and the @patch decorator, atleast the ones we have tried it with have liked the syntax in that sense so far. But adding a "pass" is also permissible if the IDE does complain

whoiskatrin · on Nov 15, 2023

Would love to try a typescript implementation. Any plans to do that?

JackHopkins · on Nov 15, 2023

Great to know! We're working on extending MonkeyPatch to typescript, the work-in-progress repository can be found here https://github.com/monkeypatch/monkey-patch.ts

We will keep you posted on when it'll be ready for trying out!

jumploops · on Nov 15, 2023

I built a similar library for Typescript: https://github.com/jumploops/magic

Please note: it requires the use of ttypescript or ts-patch, as Typescript transformers aren’t supported by default!

JackHopkins · on Nov 16, 2023

Cool! Thanks for sharing. What do you mean that Ts transformers aren’t supported by default? Is this like a runtime modification of types?

jumploops · on Nov 19, 2023

tl;dr - TypeScript transformers are used to modify the AST before Javascript is emitted.

The magic functions library uses a transformer to take the TypeScript types and port them to JSON schema, such that they're available during runtime. This JSON schema is then used to validate that the response from the LLM matches the expected type signature of the function (and err if it doesn't).

Because TypeScript doesn't support 3rd party "transformers" by default, you're forced to hack around it (via ttypescript or ts-patch). This is especially problematic when TypeScript has a major version change, as the workarounds need to be modified accordingly; this often takes significant time.

Here's the long-lived Github issue: https://github.com/microsoft/TypeScript/issues/14419

And here's the newest proposal to add official support: https://github.com/microsoft/TypeScript/issues/54276

angryemu · on Nov 15, 2023

Tests to align your model seems neat. How reliable is it? Won’t models still hallucinate time to time? How do you think about performance monitoring/management?

JackHopkins · on Nov 15, 2023

Great questions! The tests act as few-shot examples for the LLMs, which has been shown to guide the style and accuracy of model outputs and improve performance quite well. For instance we’ve seen accuracy go from <70% to 93%+ vs without including the tests. The hallucinations are still an inherent risk with LLMs, especially with long-form context, but adding more diverse and well aligned examples as tests does reduce the hallucination risk and align the outputs with user intent. In terms of performance management and monitoring, QA for LLMs is a difficult process to get right and we’re looking into ways how to a) make it easy for users to test out different function descriptions and tests on their own datasets to gauge performance and b) introduce ways how to seamlessly carry out continuous monitoring of function outputs with low effort. Still WIP but will keep you posted!

angryemu · on Nov 15, 2023

Makes sense. Looking forward to testing it out.

ian_dot_so · on Nov 15, 2023

This is really interesting! What would be a good example of when I would want to use monkeypatch vs langchain or OpenAI functions?

JackHopkins · on Nov 15, 2023

Thanks! A big part of MonkeyPatch, which Langchain or OpenAI are lacking, is the model distillation aspect, which can reduce costs up to 10x and latency up to 6x in some of the tests we've been running. This means the more you use MonkeyPatch the cheaper the function calls get, which is beneficial for high usage applications with lots of calls

javidlakha · on Nov 15, 2023

How does that work?

JackHopkins · on Nov 15, 2023

Currently we distill the general GPT-4 down to function specific GPT3.5 turbo model using pseudo-labelling. The input-output pairs from the aligned few-shot GPT-4 are saved and this dataset is used to finetune a function-specific GPT3.5 model. Then that finetuned GPT3.5 is switched as the primary model used to carry out the function, which results in multiple times lower costs as the need for few-shot examples is removed and lower latency as well. If the finetuned model output does not follow the enforced constraints, we employ GPT-4 to "repair" the output and include that datapoint in the dataset used for future finetuning resulting in continuous improvements.

javidlakha · on Nov 15, 2023

How much control do I have over this process? I might not want this to be abstracted.

JackHopkins · on Nov 15, 2023

Currently the distillation happens automatically in the background for all functions but we're aiming to implement ways for the user to be able to turn it off if they wish to keep using the teacher models. Good to know that this'd be a wanted feature!

mnky9800n · on Nov 16, 2023

Does it ever just use the code that works and no longer makes calls to any LLM?

JackHopkins · on Nov 16, 2023

Great question! That is one of the ideas that we have on the roadmap and seems quite exciting to us. The general feasibility of switching the function execution over from a LLM to synthesised code depends on the specific use-case and if a deterministic program can solve the use-case well enough (or atleast as well as the SOTA LLMs can). But for all those cases where this could be done, the cost and latency of executing the program would become essentially 0

lamroger · on Nov 16, 2023

The guardrails are cool!

I think more details of where the data goes and when it goes from few-shot to fine-tune will be helpful.

JackHopkins · on Nov 16, 2023

Good to know, we'll make it more clear in the docs! To answer regarding these 2 areas,

1) The data for finetuning currently is saved on disk for low latency reading and writing. Both test statements and datapoints from the function execution are saved to the dataset. We also are aware that saving to disk is not the best option and limits many use-cases so we're currently working on creating persistence layers to allow communication with S3 / Redis / Cloudflare as the external data storage.

2) Currently starting the fine-tuning job happens after the dataset has at least 200 datapoints from GPT-4 executions and align statements. Once the finetuning is completed, the execution model for the function is automatically switched to the finetuned GPT 3.5 turbo model. Whenever the finetuned model breaks the constraints, the teacher (GPT4) is called upon to fix the datapoint and this datapoint will be saved back to the dataset for future iterative finetuning and improvements. We are also working on adding in ways for the user to include a "test-set" which could be used to evaluate if the finetuned model achieves the required performance before switching it as the primary executor of the function

Hope this makes it more clear, if you have any additional questions, let me know!

lamroger · on Nov 18, 2023

dope yea that's awesome!

vutch · on Nov 18, 2023

tried a shot , quite impressed. I am implementing the bedrock interface (OpenAPI is limited access from my location). Look promised. Will check it out the fine-tuning with bedrock. But not sure we can do that or not. Appreciate your work

pietz · on Nov 15, 2023

Could you explain the differences to Marvin AI? I see a large overlap.

JackHopkins · on Nov 16, 2023

Hey! There are 2 main similarities to Marvin, namely: (a) functions that act as APIs to the LLM backend, and (b) type coercion to ensure that the responses fit into the data model of your application.

However, there are a couple of big additions to Marvin as well, namely: Test Driven Alignment - by using ‘assert’ statements that declare the behavior of a patched function, we create a contract that makes invocations much more predictable, which makes it possible to use these functions in production settings.Automatic distillation - a combination of the function contract defined in function type signature and the alignment tests means we can automatically swap out bigger models for smaller ones. This saves up to 80% of the latency, and 90% of the cost of running these functions (check the benchmarks) Check out the readme, as there is more detail on these points there!

eychu94 · on Nov 16, 2023

Awesome stuff! What other potential integrations are on the roadmap?

JackHopkins · on Nov 16, 2023

The big one is a Typescript implementation. Other than that, the plan is to support other models (e.g Llama) that can be fine-tuned.

Finally, other persistence layers like S3 and Redis, to support running on execution targets (like AWS Lambda and CloudFlare workers) that don’t have persistent storage.

I think it could be really interesting to support Vercel more tightly too. We currently support Vercel with Python, but I think Typescript + Redis would really enable serverless AI functions - which is where I think this project should go!

jondwillis · on Nov 16, 2023

Where in the codebase are you performing the distillation process?

JackHopkins · on Nov 16, 2023

Check out the ‘function_modeler’. Currently it’s OpenAI only, but local models are on the immediate roadmap.

https://github.com/monkeypatch/monkeypatch.py/blob/master/sr...

fudged71 · on Nov 16, 2023

This is incredibly cool, I’m excited to try it out

JackHopkins · on Nov 16, 2023

Thanks a lot. I’d really appreciate any feedback you have on the design!

jackmcclelland · on Nov 15, 2023

this is super cool! what's the use case you're most excited about?

JackHopkins · on Nov 15, 2023

Thanks! I find the enforced typed outputs and structured object creation from unstructured inputs very useful, for instance we created a use-case around creating structured support-ticket objects that could be processed in downstream applications without worries of anything breaking or bugs

jacoboplu · on Nov 15, 2023

Super cool Jack

JackHopkins · on Nov 16, 2023

Cheers!