Launch HN: MutableAI (YC W22) – Automatically clean Jupyter notebooks using AI

FridgeSeal · on Feb 24, 2022

Impressive product, but honestly, how is this the solution we’ve come to?

“Should we get our data scientists to make the merest amount of effort to stop writing bad code?”

“No, we’ll use an AI to fix their code”

Just what? Notebooks are nice for exploratory code, but that’s it. Got problems with data scientists writing terrible code in notebooks? Make them stop then. Stop accepting PR’s, stop letting them do it.

I’m not saying this as an outsider, I’m a data scientist by trade too, I don’t think we should be enabling bad practices and habits that we wouldn’t tolerate anywhere else in software development.

oshams · on Feb 25, 2022

I understand your frustration. I think people at times feel like they've graduated out of notebooks. But I think there is something truly special about the ability to visualize and inspect code all in one environment. I know that a majority of researchers at DeepMind (at least when I was there) use notebooks and many of them are excellent programmers!

We should certainly all strive to learn how to write better code and I think regular use of an AI tool like ours _can_ be pedagogical. Many people who did not have the time or opportunity to learn best practices can now join us in building technology because the AI is always there for them and never tires. AI assisted programming will help level out the playing field for talent around the world especially those without access to instructors or good job opportunities.

_aavaa_ · on Feb 25, 2022

_can_ be or _will_ be?

Will the be used for learning better practices, or will it be used to avoid ever learning them since the AI takes care of it?

oshams · on Feb 25, 2022

Maybe a bit of both :) I think it's more common these days to find programmers who've never used pointers before. Arguably that's a loss for them, especially when their mental model of how memory works starts to get leaky.

But I think people will also learn tons too. At the end of the day, most people learn by example. If you can see what the AI does to your code or suggests as a completion, you will learn best practices faster.

analog31 · on Feb 25, 2022

I've learned a lot from linters. Maybe this would be the next level up in linting, if it offers an assessment as it rearranges your code. I'd definitely use something like that to improve my coding skills.

version_five · on Feb 24, 2022

This looks cool*

I can't tell from the website, but what I would absolutely love is if this could turn the notebook into a "train.py" with argparse to parse the arguments and model / program output saved on completion. This is a definite pain point for me, there are things I'd like to test interactively, but I default to trying to get out of a jupyter notebook as quickly as possible because all the code ends up having to be rewritten "properly".

For ML in particular, I bet there are cool integrations you could do with pytorch lightning (and other frameworks) to take me from trying a forward pass in the jupyter notebook to having the datamodel with dataloaders declared and then set up in a main() . There is lots that could be automated.

*(except the SaaS model, I have a really hard time supporting "api call" limited services that seem to just be charging me a subscription fee for the priviledge of running code, but I don't want to get into that.)

oshams · on Feb 24, 2022

Thank you. Can you please email me at [email protected]? I think having a number of well defined "flows" _especially_ for the use case you highlighted can save people a lot of frustration!

(Totally understand you on the SaaS objections -- I don't think you should ever feel like you can't run your code! In the individual plan you should not hit the limit unless many people are using the same key.)

hervature · on Feb 24, 2022

I think bad code in Jupyter is a symptom of both the tool and the user. Jupyter's flexibility for running cells out of order should never have been allowed. Code developed in this way will not be able to be transformed into a sequential program. One example of this is illustrated in your demo, how did the algorithm decide input_shape was an input but num_classes should remain a global? For me, it is obvious that the user's intent was that these are the variables that define the model.

Personally, I wouldn't pay those prices to hide the data scientist's problems. However, if you developed extensions to prevent and encourage the user to overcome the tool's problems, I would buy that for myself.

Also, you probably don't a demo that would fail to run in the second cell (in both unclean and clean code) .

version_five · on Feb 24, 2022

> Personally, I wouldn't pay those prices to hide the data scientist's problems

I have data scientists that work for me that have this problem. I'd rather that everyone I hire wrote the kind of code I want out of the box, but for lots of reasons it's not always possible. I have tried enforcing frameworks and ended up wasting more money trying to get people to change their workflow. A developer or data scientist (even an inexperienced one) is expensive enough that $30/mo is nothing if it makes them more productive. I'd guess this is what the company is banking on.

(You'll see I made another comment where i say i don't like their pricing model, this is a matter of principle because I don't believe in just trying to wrap a program in SaaS in order to get more money without offering something beyond "you can run my code if you pay me". But the cost / benefit is still easy to justify)

oshams · on Feb 24, 2022

Exactly, developer time is extremely valuable! Plus, even on an individual level wouldn't you rather spend less time doing something a machine can make a reasonably good guess for you? My internal heuristic has been anything that can be easily guessed is something AI + PL techniques should do for you.

I think developers in the future will have way more mental free space because of this trend towards accelerating developer tooling with AI.

oshams · on Feb 24, 2022

Thanks, founder here. I personally try not to run cells out of order. But I feel like almost anyone at some point will want to run cells out of order.

The extraction is based partly on statistical edge detection. We are working on training transformers on actual diffs on GitHub that would be more natural.

hervature · on Feb 24, 2022

Keep up the work, I hope you get traction. This space is definitely in need of something. In particular, I will be keeping tabs to see your upcoming tests feature.

oshams · on Feb 24, 2022

Thank you! I completely agree. I certainly don't think people should give up using Jupyter because of the frustration of keeping code quality high enough to port to other environments. Test feature is coming very soon. Please feel free to email me at [email protected] if you have more thoughts on this.

writablemkv · on Feb 24, 2022

I own a paid license for MutableAI tool.

A few notes:

1. The progress that was made by Omar and the team (from alpha to working version that is useful) is astounding. I am eagerly waiting for more improvements that will make my regular daily workflow even more fun.

2. The tool is useful for both junior and senior developers and data scientists. They get different things from it though: junior developers get cleaner code, fixes for simple mistakes, structure and demonstration of how things “should be done”. Senior developers get to skip mundane tasks, go from prototype to production code quicker and can concentrate on more complex things/details.

3. I especially like that there is an option for “on prem” installation. My current employer is very strict about using “online” tools that can leak our code outside. “cloud only” tool would be a deal breaker for us.

4. Considering the amount of time this tool can potentially safe, it will pay for itself in no time. I am mean really – if it saves me just a few minutes of mindless reformatting/editing/adding comments – it already paid for itself.

edublancas · on Feb 24, 2022

Congrats on the launch! As a former data scientist, it pumps me up to see more notebook-centric tooling, as I believe it is the best environment for data exploration and rapid iterations.

We're working on notebook tooling as well (https://github.com/ploomber/ploomber), but our focus is at the macro level so to speak (how to develop projects that are made up of several notebooks). In recent conversations with data teams, the question "how do you ensure the code quality of each notebook?" has come up a lot, and it's great to see you are tackling that problem. It'll be exciting to see people using both MutableAI and Ploomber! Very cool stuff! I'll give it a try!

oshams · on Feb 24, 2022

Thank you ! Congrats on your launch as well. Ploomber + MutableAI. :)

elmalto · on Feb 25, 2022

I get some of the sentiment that people should write clean code and I have been very adamant with my teams to ideally use best practices and make their code (re)usable and production ready. But the truth is: nobody likes to clean up. Including me. To use a real life example: I don’t use a broom at home unless my roomba is unable to get to the spot. This is not going to solve all problems and I will probably have to fix some edge cases, but I love the solution you have come up with and will definitely give it a shot next time I come across an ugly notebook!

oshams · on Feb 25, 2022

I love the roomba analogy! Definitely going to steal this. :)

chse_cake · on Feb 24, 2022

Not to gripe on your work, But I am extremely skeptical about tools like these. How do you ensure the equivalence of the computation ? What are the SLA's on equivalence and how do you even verify if the programs are equivalent?

The solution to bad code written by data scientists is not more AI tools that write non-verifiable code. Its ergonomic API frameworks which can relieve the pain-points / bad practices via throughly testable / verifiable / deterministic code.

The site even promises test-case generation in the future. This seems very flaky at best.

oshams · on Feb 24, 2022

I'm glad you asked. You are right to suggest that proving the equivalence of two programs is in general impossible (equivalent to the halting problem). However, if you start with one program with a known special structure and apply a known manipulation, it becomes possible in special cases to guarantee equivalence with some reasonable assumptions about how the original program is used (e.g. to give a trivial example, if one of your tests is to hash the source code of the new program against a specific value it will fail that test after refactoring -- macros are not extremely far from this contrived example).

There is no replacing the developer (anytime soon) that's why we don't overwrite your code. You still have to review the suggested changes yourself. I personally don't want to spend more time making changes an AI can guess for me to improve my code and even if I have to review every change that still saves a lot of time.

Also I agree, test cases are also hard because while it's easy to generate some tests that verify some trivial behavior, most of the meaningful behavior of programs requires knowing more than just the types of the inputs & outputs. But I believe AI can help with even this problem.

Darmani · on Feb 24, 2022

I saw a private demo of Mutable.ai last month. It was extremely impressive, and even more impressive was the pace of development. This is a clear must for anyone doing Jupyter.

hackernewds · on Feb 25, 2022

this seems astroturfed

tpoacher · on Feb 25, 2022

Nice project.

But the fact you need AI to convert a 'notebook' into 'proper code' shows you just how bad jupyter notebooks are.

(I don't mean 'notebooks' in general. I mean 'jupyter notebooks' specifically.)

It used to be that programmers were taught that "your main function should read like a table of contents, and delegate details appropriately". Now it's "Y U NO FIT EVRYTHIN IN ONE PAEG?"

p1esk · on Feb 24, 2022

I don't use Jupyter, can you take my sometimes ugly, research-grade Pytorch code, and turn it into a clean, production quality code?

oshams · on Feb 24, 2022

Yes, most of our tech is not Jupyter specific. I just think it's an extremely good beachhead because of how prone to problems Jupyter notebooks are. Would you mind dropping me a line at [email protected]?

rbmattis · on Feb 24, 2022

How do you think about running on-prem versus in the cloud? I can see some users preferring the ease and simplicity of using MutableAI in the cloud, but I can see larger corporations preferring to keep all code in-house. Have you thought about that?

oshams · on Feb 24, 2022

Yes, in many cases companies do not feel comfortable with their code leaving their network. This presents an interesting technical challenge as most of the large scale transformer models require large amounts of resources not only to train but even to serve.

We DO offer an on-prem version of the product that while requiring a decent GPU, does not exactly require people to go out and buy a fleet of TPUs.

I personally think there will be a lot of developments in the efficiency of these models (e.g. DeepMind's RETRO) as well as the usual compute efficiencies always improving.

However, I also believe there will always be an incentive to make the models as big as resources allow, because "more is different" as PW Anderson said in his famous essay and I give kudos to the OpenAI team for pushing the limit on these models.

p1esk · on Feb 24, 2022

I'd be interested in on-prem product, otherwise I can't use it with any of the code I develop at work (which is the vast majority of my code). And I work at a small startup, so the price should be reasonable.

oshams · on Feb 25, 2022

We're in touch. For others interested in on-prem you can email me at [email protected]. Thanks.

ricklamers · on Feb 25, 2022

This is awesome! Notebook centric static analysis is definitely going to help data scientists level up their game. Tools like this help you improve your coding style simply by observing the changes that are proposed by it.

We have customers running batch data pipelines in production that contain notebooks. It’s because of the first class integration of notebooks in the orchestration tool we’ve built (https://github.com/orchest/orchest). It would be great to help our users catch errors before they bite them in prod. Let’s have a chat sometime :-)!

oshams · on Feb 25, 2022

Thanks! Looks cool. Definitely feel free to reach me ([email protected]).

mhartl · on Feb 24, 2022

I’ve been following this project closely (oshams is a friend and I’m an investor in the company), and to echo Darmani’s comment both the product and the pace of development have been extremely impressive. Congrats on the launch!

forgotmyoldacc · on Feb 25, 2022

The documentation in the screenshot seems pretty meaningless, they're already being described by the function name and parameters.

Are there more examples that we can see?

oshams · on Feb 25, 2022

Good point. That one example was more about illustrating function extraction. I've seen some very sophisticated doc strings especially for functions with more parameters. In other cases it can struggle, for fun I tried asking it to document a Y Combinator function and it got stuck in a recursive description. I thought that was very funny. :)

I will definitely post more examples over the coming days. In the meantime, why don't you play with the system yourself and feel free to ping me at [email protected] with any more feedback.

bravura · on Feb 24, 2022

Can you explain, in your pricing plan, what is an "API call"? It isn't very clear from the landing page what that entails.

oshams · on Feb 24, 2022

Thanks, I think we should clarify this as well. Any call to our API via "Fast Forward to Production" (right click on notebook to see) or an autocomplete call.

6gvONxR4sf7o · on Feb 25, 2022

I’m thinking about this in terms of something like black reformatting, where I might run it all the time as I work. 10 calls per day for the free tier seems like not enough to really explore the capabilities.

oshams · on Feb 25, 2022

Good point. I've already bumped up the free usage plan on our backend and will probably keep the higher limits up for a while for the HN community. Will also update the language on our website. Enjoy :)

morganslaw · on Feb 24, 2022

Is this just prettier (https://prettier.io/) for Jupyter combined with a bit of hoisting?

oshams · on Feb 24, 2022

Thanks for your comment. First I want to say I'm a big fan of prettier and no we're very different.

At MutableAI we're laser focused on actually _transforming_ Jupyter notebooks (beyond formatting). Meaning we will actually remove dead code and in some cases refactor your code for you. We also use large scale transformers / neural networks to document your code, which IIRC prettier does not.

hallqv · on Feb 24, 2022

Excellent, much neeeded product!

oshams · on Feb 24, 2022

Thank you !

dmicah · on Feb 24, 2022

I think the real trick is to never use Jupyter / notebooks to begin with.

idomi · on Feb 24, 2022

Not really, data scientists love notebooks, it's pretty inefficient to move out of the notebooks.

FridgeSeal · on Feb 24, 2022

Then maybe as data scientists we should stop writing production code in notebooks and instead write and deploy it properly, like the rest of the development community does, and stop papering over laziness and bad habits.

It’s only inefficient to move out of notebooks if you’ve written poorly structured, messy code in the first place. We wouldn’t accept that out of other devs, so I’m not sure why data scientists should magically get a pass.

paulgb · on Feb 25, 2022

I think seeing this as data scientists “getting a pass” is looking at it through the wrong lens. Data scientists and quants tend to be good at statistics, data modeling, and solving business problems, not software engineering except as a means to an end. You might be able to find a few people who have a combination of those skills, but you’re picking from a very small pool.

FridgeSeal · on Feb 25, 2022

Wasn’t the point of the data scientist/ML engineer that they could combine the software engineering skills with the statistical and knowledge know-how? Which is why they’re paid so much?

If they can’t do the former, may as well just hire a software developer and a statistician for near-cheaper and have them work together - you save money and get better code.

fudged71 · on Feb 25, 2022

Awesome demo, congrats on the launch!

Ozzie-D · on Feb 25, 2022

Good luck!