An LLM ability to do a task is roughly correlated to the number of times that task has been done on the internet before. If you want to see the hype version, you need to write a todo web app in typescript or similar. So it's probably not something you can fix with prompts, but having a model with more focus on relevant training data might help.