This measure of LLM capability could be extended by taking it into the 3D domain.
That is, having the model write Python code for Blender, then running blender in headless mode behind an API.
The talk hints at this but one shot prompting likely won’t be a broad enough measurement of capability by this time next year. (Or perhaps now, even)
So the test could also include an agentic portion that includes consultation of the latest blender documentation or even use of a search engine for blog entries detailing syntax and technique.
For multimodal input processing, it could take into account a particular photo of a pelican as the test subject.
For usability, the objects can be converted to iOS’s native 3d format that can be viewed in mobile safari.
I built this workflow, including a service for blender as an initial test of what was possible in October of 2022. It took post processing for common syntax errors back then but id imagine the newer LLMs would make those mistakes less often now.
This measure of LLM capability could be extended by taking it into the 3D domain.
That is, having the model write Python code for Blender, then running blender in headless mode behind an API.
The talk hints at this but one shot prompting likely won’t be a broad enough measurement of capability by this time next year. (Or perhaps now, even)
So the test could also include an agentic portion that includes consultation of the latest blender documentation or even use of a search engine for blog entries detailing syntax and technique.
For multimodal input processing, it could take into account a particular photo of a pelican as the test subject.
For usability, the objects can be converted to iOS’s native 3d format that can be viewed in mobile safari.
I built this workflow, including a service for blender as an initial test of what was possible in October of 2022. It took post processing for common syntax errors back then but id imagine the newer LLMs would make those mistakes less often now.