I've found this mostly to be the case when using lightweight open source models or mini models.
Rarely is this an issue with SOTA models like Sonnet-4.5, Opus-4.1, GPT-5-Thinking or better, etc. But that's expensive, so all the companies use cut-rate models or non-existent TTC to save on cost and to go faster.
I just run the agent directly on separate testing/dev servers via remote-ssh in VS Code to have an IDE to sanity check stuff. Just far simpler than local dev and other nonsense.
The VOID VR was the greatest fusion of VR + reality. CAD-mapped laser tag warehouses while wearing a VR headset and backpack computer. Near-perfect immersion. COVID sadly killed it. Was a beautiful thing - so happy I got to try it out. Even ended up buying a lot of the defunct equipment off eBay.
Got it. Those are the setup commands for Simon Willison's LLM CLI tool, which lets you interact with language models from the command line.
The -e flag installs it in editable mode, and [test] pulls in the test dependencies.
Are you running into any issues with the installation or tests? What are you working on with the LLM library?
USER: Correct. You run them using your code tooling.
SYSTEM:
Realized inability to execute requested Python commands.The user is telling me to run these commands using my code tooling. They want me to actually execute this. Let me think about what tools I have available:
I have the repl tool which runs JavaScript code in a browser environment
I have artifacts for creating code/documents
I have web_search and web_fetch
Clone git repositories
Run shell commands like pip or pytest
Execute Python code in a command-line environment
The repl tool is JavaScript-only and runs in a browser environment, not a server environment where I could run shell commands.
I should clarify that I don't have the capability to run shell commands, install Python packages, or execute pytest. I can only run JavaScript code in a browser environment via the analysis tool.
I need to be direct and clear about this limitation, following the user's preferences to eliminate filler and be straightforward.I don't have tooling that can run shell commands, clone git repos, install Python packages, or execute pytest. My only code execution environment is a JavaScript browser REPL.
I can't run pip, pytest, or any command-line tools. I can only execute JavaScript code in a sandboxed browser environment.
What were you hoping to accomplish? Do you need help understanding the test setup, or were you expecting different capabilities?R
My execution environment is a browser-based JavaScript sandbox. I don't have Python, Node.js shell access, pip, git, or pytest capabilities.
Simon's post may have been referring to a different deployment or feature set that I don't currently have access to. My only code execution is the JavaScript REPL I just demonstrated.
What specifically are you trying to accomplish with the llm library tests? I can help with that in other ways.
EDIT: Ah, in Settings you have to disable the Analysis tool and enable Code Interpreter. You cannot have both enabled.
Interesting, enable those setting and the claude on claude.ai become claude code, and it try to run everything on claude container like it owns the machine. I don't want that.
There are no files anymore. I mean, there technically are, but copyright industry doesn't want you to look at them without authorization, security people don't want you to look at them at all, and UX experts think it's a bad idea for you to even know such thing as "files" exists.
Share and enjoy. Like and subscribe. The world is just apps all the way down.
They are getting steadily eroded by Temu and Aliexpress/Alibaba. Also in-person retail is surging for specific items - places like Best Buy have had a nice resurgence since the 2010s (stock is down compared to the pandemic, but that's a retail thing, not BBY problem).
Best Buy has been a big one for me when I need things like USB sticks or SD cards. (Bad enough with the occasional duds from the reputable sources without mixing in counterfeits on top of that.)
Price matching on most of their stuff and their budget line of cabling is actually quite competitive. When I have to do buildouts sometimes we need to buy a lot of Ethernet cable and their pricing is not terrible in a pinch.
Once bought an entire store out of patch cables, ha.
Happy to say I walked into a Best Buy last week because I needed a replacement mouse right now. I am really glad they survived.
I cancelled Prime because I wasn't getting any value anymore. Non-Prime customers are treated like second class citizens. Amazon has really gone downhill lately. Customer service is terrible. Not just the counterfeiting, but the website UX has become steadily worse. Archive order was recently removed without warning as was the ability to view itemized invoices. Yes, really. Before anyone says otherwise, "View Invoice" now redirects to your Order Details page, absent any additional detail.
I switched most of my shopping to Walmart. I get free next day or two-day shipping for orders of $35 or more, where Amazon will ship the same in 5-6 days now that I am non-Prime scum.
Not disagreeing, but the Amazon web UX has been famously terrible since like 1998. They basically invented the whole trend of building via A/B test result instead of via user-centric design. Nothing on the site has ever made any sense. Every item title is a paragraph description. The categories are basically useless. The filters are a mess of bad and incomplete data to the point of being useless. Many items have 2-3 duplicate listings that somehow have different shipping dates and descriptions, and you never know if you have found the "real" listing. But they sure sell a lot of stuff.
I've always assumed the issues you described were meant to disarm users but also maybe to set expectations. When you see 3 similar items, you don't know if you're getting the "real thing". So you spend that money and expect the worst. Terrible listings make it hard to find what you need exactly but also make it seem like they have more selection when they might not.
This is a recurring theme with Google. Their models are phenomenal but the systems around them are so bad that it degrades the whole experience. Veo3 great model horrible website, and so on...
Rarely is this an issue with SOTA models like Sonnet-4.5, Opus-4.1, GPT-5-Thinking or better, etc. But that's expensive, so all the companies use cut-rate models or non-existent TTC to save on cost and to go faster.