Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: MutableAI (YC W22) – Automatically clean Jupyter notebooks using AI (mutable.ai)
82 points by oshams on Feb 24, 2022 | hide | past | favorite | 49 comments
Hi HN, I’m Omar the Founder and CEO of MutableAI (YC W22) (https://mutable.ai). We transform Jupyter notebook code into production-quality Python code using a combination of AI (OpenAI codex) and PL metaprogramming techniques.

I'm obsessed with clean code because I've written so much terrible code in the past. I went from being a theoretical physics PhD dropout -> data scientist -> software engineer at Google -> research engineer at DeepMind -> ML engineer at Apple. In that time I've grown to tremendously value code quality. Clean code is not only more maintainable but also more extensible as you can more readily add new features. It even enables you to think thoughts that you may have never considered before.

I want to reduce the cost of clean, production-quality code using AI, and am starting with a niche I'm intimately familiar with (Jupyter), because it's particularly prone to bad code. Jupyter notebooks are beloved by data scientists, but notorious for having spaghetti code that is low on readability, hard to maintain, and hard to move into a production codebase or even share with a colleague. That’s why a Kaggle Grandmaster shocked his audience and recommended that they do not use Jupyter notebooks [1].

MutableAI allows developers to get the best of both worlds: Jupyter’s easy prototyping and visualization, plus greatly improved quality with our AI product. We also offer a full featured AI autocomplete to help prototyping go faster. I think the quadrant of "easy to develop in" and "easy to create high quality code" has been almost empty, and AI can help fill this gap.

Right now there are two ways of manipulating programs: PL techniques for program analysis and transformation, and large scale transformers from OpenAI/DeepMind, which are trained on code treated as text (tokens) and don't look at the tree structure of code (ASTs). MutableAI combines OpenAI Codex / Copilot with traditional PL analysis (variable lifetimes, scopes, etc.) and statistical filters to identify AST transformations that, when successively applied, produce cleaner code.

We use OpenAI's Codex to document and type the code, and for AI autocomplete. We use PL techniques to refactor the code (e.g. extract methods), remove zombie code, and normalize formatting (e.g. remove weird spacing). We use statistical filters to detect opportunities for refactoring, for example when a large grouping of variable lifetimes are suddenly created and destroyed, which can be an opportunity to extract a function.

Some of the PL techniques are similar to traditional refactoring tools, but those tools don’t help you decide when and how to refactor. We use AI and stats to do that, as well as to generate names when the new code needs them.

A tool that reduces the time to productionize code can be compared to having an extra engineer on staff. If you take this seriously, that’s a pretty big market. Stripe Research claims that developer inefficiency is a $300B problem [2]. Just about every tech company would become more efficient through increased velocity, fewer errors, and the ability to tackle more complex problems. It may even become unthinkable to write software without this sort of tool, the same way most people don't write assembly and use a compiler.

You can try the product by visiting our website, https://mutable.ai and creating an account on the setup page https://mutable.ai/setup.html. License keys are on the setup page once you’ve signed up (check your mailbox for an email verification link). I’ve bumped up the budget for free accounts temporarily for the day, I hope you enjoy the product !

In addition to inviting the HN community to try out the product, I’d love it if you would share any tips for reducing code complexity you’ve come across and of course to hear your ideas about this problem and tools to address it.

[1] https://youtu.be/tsGGpe-onZI?t=1067

[2] https://stripe.com/files/reports/the-developer-coefficient.p...



Impressive product, but honestly, how is this the solution we’ve come to?

“Should we get our data scientists to make the merest amount of effort to stop writing bad code?”

“No, we’ll use an AI to fix their code”

Just what? Notebooks are nice for exploratory code, but that’s it. Got problems with data scientists writing terrible code in notebooks? Make them stop then. Stop accepting PR’s, stop letting them do it.

I’m not saying this as an outsider, I’m a data scientist by trade too, I don’t think we should be enabling bad practices and habits that we wouldn’t tolerate anywhere else in software development.


I understand your frustration. I think people at times feel like they've graduated out of notebooks. But I think there is something truly special about the ability to visualize and inspect code all in one environment. I know that a majority of researchers at DeepMind (at least when I was there) use notebooks and many of them are excellent programmers!

We should certainly all strive to learn how to write better code and I think regular use of an AI tool like ours _can_ be pedagogical. Many people who did not have the time or opportunity to learn best practices can now join us in building technology because the AI is always there for them and never tires. AI assisted programming will help level out the playing field for talent around the world especially those without access to instructors or good job opportunities.


_can_ be or _will_ be?

Will the be used for learning better practices, or will it be used to avoid ever learning them since the AI takes care of it?


Maybe a bit of both :) I think it's more common these days to find programmers who've never used pointers before. Arguably that's a loss for them, especially when their mental model of how memory works starts to get leaky.

But I think people will also learn tons too. At the end of the day, most people learn by example. If you can see what the AI does to your code or suggests as a completion, you will learn best practices faster.


I've learned a lot from linters. Maybe this would be the next level up in linting, if it offers an assessment as it rearranges your code. I'd definitely use something like that to improve my coding skills.


This looks cool*

I can't tell from the website, but what I would absolutely love is if this could turn the notebook into a "train.py" with argparse to parse the arguments and model / program output saved on completion. This is a definite pain point for me, there are things I'd like to test interactively, but I default to trying to get out of a jupyter notebook as quickly as possible because all the code ends up having to be rewritten "properly".

For ML in particular, I bet there are cool integrations you could do with pytorch lightning (and other frameworks) to take me from trying a forward pass in the jupyter notebook to having the datamodel with dataloaders declared and then set up in a main() . There is lots that could be automated.

*(except the SaaS model, I have a really hard time supporting "api call" limited services that seem to just be charging me a subscription fee for the priviledge of running code, but I don't want to get into that.)


Thank you. Can you please email me at [email protected]? I think having a number of well defined "flows" _especially_ for the use case you highlighted can save people a lot of frustration!

(Totally understand you on the SaaS objections -- I don't think you should ever feel like you can't run your code! In the individual plan you should not hit the limit unless many people are using the same key.)


I think bad code in Jupyter is a symptom of both the tool and the user. Jupyter's flexibility for running cells out of order should never have been allowed. Code developed in this way will not be able to be transformed into a sequential program. One example of this is illustrated in your demo, how did the algorithm decide input_shape was an input but num_classes should remain a global? For me, it is obvious that the user's intent was that these are the variables that define the model.

Personally, I wouldn't pay those prices to hide the data scientist's problems. However, if you developed extensions to prevent and encourage the user to overcome the tool's problems, I would buy that for myself.

Also, you probably don't a demo that would fail to run in the second cell (in both unclean and clean code) .


> Personally, I wouldn't pay those prices to hide the data scientist's problems

I have data scientists that work for me that have this problem. I'd rather that everyone I hire wrote the kind of code I want out of the box, but for lots of reasons it's not always possible. I have tried enforcing frameworks and ended up wasting more money trying to get people to change their workflow. A developer or data scientist (even an inexperienced one) is expensive enough that $30/mo is nothing if it makes them more productive. I'd guess this is what the company is banking on.

(You'll see I made another comment where i say i don't like their pricing model, this is a matter of principle because I don't believe in just trying to wrap a program in SaaS in order to get more money without offering something beyond "you can run my code if you pay me". But the cost / benefit is still easy to justify)


Exactly, developer time is extremely valuable! Plus, even on an individual level wouldn't you rather spend less time doing something a machine can make a reasonably good guess for you? My internal heuristic has been anything that can be easily guessed is something AI + PL techniques should do for you.

I think developers in the future will have way more mental free space because of this trend towards accelerating developer tooling with AI.


Thanks, founder here. I personally try not to run cells out of order. But I feel like almost anyone at some point will want to run cells out of order.

The extraction is based partly on statistical edge detection. We are working on training transformers on actual diffs on GitHub that would be more natural.


Keep up the work, I hope you get traction. This space is definitely in need of something. In particular, I will be keeping tabs to see your upcoming tests feature.


Thank you! I completely agree. I certainly don't think people should give up using Jupyter because of the frustration of keeping code quality high enough to port to other environments. Test feature is coming very soon. Please feel free to email me at [email protected] if you have more thoughts on this.


I own a paid license for MutableAI tool.

A few notes:

1. The progress that was made by Omar and the team (from alpha to working version that is useful) is astounding. I am eagerly waiting for more improvements that will make my regular daily workflow even more fun.

2. The tool is useful for both junior and senior developers and data scientists. They get different things from it though: junior developers get cleaner code, fixes for simple mistakes, structure and demonstration of how things “should be done”. Senior developers get to skip mundane tasks, go from prototype to production code quicker and can concentrate on more complex things/details.

3. I especially like that there is an option for “on prem” installation. My current employer is very strict about using “online” tools that can leak our code outside. “cloud only” tool would be a deal breaker for us.

4. Considering the amount of time this tool can potentially safe, it will pay for itself in no time. I am mean really – if it saves me just a few minutes of mindless reformatting/editing/adding comments – it already paid for itself.


Congrats on the launch! As a former data scientist, it pumps me up to see more notebook-centric tooling, as I believe it is the best environment for data exploration and rapid iterations.

We're working on notebook tooling as well (https://github.com/ploomber/ploomber), but our focus is at the macro level so to speak (how to develop projects that are made up of several notebooks). In recent conversations with data teams, the question "how do you ensure the code quality of each notebook?" has come up a lot, and it's great to see you are tackling that problem. It'll be exciting to see people using both MutableAI and Ploomber! Very cool stuff! I'll give it a try!


Thank you ! Congrats on your launch as well. Ploomber + MutableAI. :)


I get some of the sentiment that people should write clean code and I have been very adamant with my teams to ideally use best practices and make their code (re)usable and production ready. But the truth is: nobody likes to clean up. Including me. To use a real life example: I don’t use a broom at home unless my roomba is unable to get to the spot. This is not going to solve all problems and I will probably have to fix some edge cases, but I love the solution you have come up with and will definitely give it a shot next time I come across an ugly notebook!


I love the roomba analogy! Definitely going to steal this. :)


Not to gripe on your work, But I am extremely skeptical about tools like these. How do you ensure the equivalence of the computation ? What are the SLA's on equivalence and how do you even verify if the programs are equivalent?

The solution to bad code written by data scientists is not more AI tools that write non-verifiable code. Its ergonomic API frameworks which can relieve the pain-points / bad practices via throughly testable / verifiable / deterministic code.

The site even promises test-case generation in the future. This seems very flaky at best.


I'm glad you asked. You are right to suggest that proving the equivalence of two programs is in general impossible (equivalent to the halting problem). However, if you start with one program with a known special structure and apply a known manipulation, it becomes possible in special cases to guarantee equivalence with some reasonable assumptions about how the original program is used (e.g. to give a trivial example, if one of your tests is to hash the source code of the new program against a specific value it will fail that test after refactoring -- macros are not extremely far from this contrived example).

There is no replacing the developer (anytime soon) that's why we don't overwrite your code. You still have to review the suggested changes yourself. I personally don't want to spend more time making changes an AI can guess for me to improve my code and even if I have to review every change that still saves a lot of time.

Also I agree, test cases are also hard because while it's easy to generate some tests that verify some trivial behavior, most of the meaningful behavior of programs requires knowing more than just the types of the inputs & outputs. But I believe AI can help with even this problem.


I saw a private demo of Mutable.ai last month. It was extremely impressive, and even more impressive was the pace of development. This is a clear must for anyone doing Jupyter.


this seems astroturfed


Nice project.

But the fact you need AI to convert a 'notebook' into 'proper code' shows you just how bad jupyter notebooks are.

(I don't mean 'notebooks' in general. I mean 'jupyter notebooks' specifically.)

It used to be that programmers were taught that "your main function should read like a table of contents, and delegate details appropriately". Now it's "Y U NO FIT EVRYTHIN IN ONE PAEG?"


I don't use Jupyter, can you take my sometimes ugly, research-grade Pytorch code, and turn it into a clean, production quality code?


Yes, most of our tech is not Jupyter specific. I just think it's an extremely good beachhead because of how prone to problems Jupyter notebooks are. Would you mind dropping me a line at [email protected]?


How do you think about running on-prem versus in the cloud? I can see some users preferring the ease and simplicity of using MutableAI in the cloud, but I can see larger corporations preferring to keep all code in-house. Have you thought about that?


Yes, in many cases companies do not feel comfortable with their code leaving their network. This presents an interesting technical challenge as most of the large scale transformer models require large amounts of resources not only to train but even to serve.

We DO offer an on-prem version of the product that while requiring a decent GPU, does not exactly require people to go out and buy a fleet of TPUs.

I personally think there will be a lot of developments in the efficiency of these models (e.g. DeepMind's RETRO) as well as the usual compute efficiencies always improving.

However, I also believe there will always be an incentive to make the models as big as resources allow, because "more is different" as PW Anderson said in his famous essay and I give kudos to the OpenAI team for pushing the limit on these models.


I'd be interested in on-prem product, otherwise I can't use it with any of the code I develop at work (which is the vast majority of my code). And I work at a small startup, so the price should be reasonable.


We're in touch. For others interested in on-prem you can email me at [email protected]. Thanks.


This is awesome! Notebook centric static analysis is definitely going to help data scientists level up their game. Tools like this help you improve your coding style simply by observing the changes that are proposed by it.

We have customers running batch data pipelines in production that contain notebooks. It’s because of the first class integration of notebooks in the orchestration tool we’ve built (https://github.com/orchest/orchest). It would be great to help our users catch errors before they bite them in prod. Let’s have a chat sometime :-)!


Thanks! Looks cool. Definitely feel free to reach me ([email protected]).


I’ve been following this project closely (oshams is a friend and I’m an investor in the company), and to echo Darmani’s comment both the product and the pace of development have been extremely impressive. Congrats on the launch!


The documentation in the screenshot seems pretty meaningless, they're already being described by the function name and parameters.

Are there more examples that we can see?


Good point. That one example was more about illustrating function extraction. I've seen some very sophisticated doc strings especially for functions with more parameters. In other cases it can struggle, for fun I tried asking it to document a Y Combinator function and it got stuck in a recursive description. I thought that was very funny. :)

I will definitely post more examples over the coming days. In the meantime, why don't you play with the system yourself and feel free to ping me at [email protected] with any more feedback.


Can you explain, in your pricing plan, what is an "API call"? It isn't very clear from the landing page what that entails.


Thanks, I think we should clarify this as well. Any call to our API via "Fast Forward to Production" (right click on notebook to see) or an autocomplete call.


I’m thinking about this in terms of something like black reformatting, where I might run it all the time as I work. 10 calls per day for the free tier seems like not enough to really explore the capabilities.


Good point. I've already bumped up the free usage plan on our backend and will probably keep the higher limits up for a while for the HN community. Will also update the language on our website. Enjoy :)


Is this just prettier (https://prettier.io/) for Jupyter combined with a bit of hoisting?


Thanks for your comment. First I want to say I'm a big fan of prettier and no we're very different.

At MutableAI we're laser focused on actually _transforming_ Jupyter notebooks (beyond formatting). Meaning we will actually remove dead code and in some cases refactor your code for you. We also use large scale transformers / neural networks to document your code, which IIRC prettier does not.


Excellent, much neeeded product!


Thank you !


I think the real trick is to never use Jupyter / notebooks to begin with.


Not really, data scientists love notebooks, it's pretty inefficient to move out of the notebooks.


Then maybe as data scientists we should stop writing production code in notebooks and instead write and deploy it properly, like the rest of the development community does, and stop papering over laziness and bad habits.

It’s only inefficient to move out of notebooks if you’ve written poorly structured, messy code in the first place. We wouldn’t accept that out of other devs, so I’m not sure why data scientists should magically get a pass.


I think seeing this as data scientists “getting a pass” is looking at it through the wrong lens. Data scientists and quants tend to be good at statistics, data modeling, and solving business problems, not software engineering except as a means to an end. You might be able to find a few people who have a combination of those skills, but you’re picking from a very small pool.


Wasn’t the point of the data scientist/ML engineer that they could combine the software engineering skills with the statistical and knowledge know-how? Which is why they’re paid so much?

If they can’t do the former, may as well just hire a software developer and a statistician for near-cheaper and have them work together - you save money and get better code.


Awesome demo, congrats on the launch!


Good luck!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: