Scheduled tasks in ChatGPT

UmYeahNo · 2025-01-15T20:50:03 1736974203

I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:

1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.

2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

[0] https://help.openai.com/

Terretta · 2025-01-15T21:20:31 1736976031

Same experience except mine insisted I had no tasks.

It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.

Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.

varispeed · 2025-01-16T01:57:04 1736992624

> it hallucinated that I had too many tasks.

How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).

derefr · 2025-01-16T04:42:38 1737002558

Re: 2 — for the same reason that you shouldn't host your site's status page on the same infrastructure that hosts your site (if people want to see your status page, that probably means your infra is broken), I would guess that OpenAI think that if you're looking at the support docs, it might be because the AI service is currently broken.

reustle · 2025-01-16T15:14:05 1737040445

> It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.

Example docs site that uses it: https://docs.browserbase.com

fooker · 2025-01-16T00:38:14 1736987894

You can hardly blame a product for not doing something that we don't know for certain to be possible.

neom · 2025-01-15T21:21:44 1736976104

I've thought about this a lot too and my guess is that because foundational modals take a lot to train, I don't think they are trained fairly often, and from my experiences you can't train in new data easily, so I think you'd have to have some little up to date side system, and I suspect they're very thoughtful about these "side systems" they place, from trying to build some agent orchestration stuff myself nothing ends up being as simple as as I expect with "side systems" and stuff easily goes off the rails. So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

miltonlost · 2025-01-15T22:40:52 1736980852

> So my thought was probably, given the scale they're dealing with, this is probably a low priority not actually particularly easy feature.

"working like OpenAI said it should" is a weird thing to put low priority. Why do they continuously put out features that break and bug? I'm tired of stochastic outputs and being told that we should accept sub-90% success rates.

At their scale, being less than 99.99% right results in thousands of problems. So their scale and the outsized impact of their statistical bugs is part of the issue.

neom · 2025-01-15T22:50:12 1736981412

Why are you setting your bar this way? Is it because of how they do their feature releases (no warning of it being an alpha or beta feature)? Their product, ChatGPT was released 2 years ago, and is a fairly complicated product. My understanding was the whole thing is still a pretty early product generally. It doesn't seem unusual that any startup doing something as big as they are to release features that don't have all the kinks ironed out. I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.

Also, I was specifically talking about it being able to understand the features it has in my earlier comment, I don't think that is the same problem as the remind me feature not working consistently.

miltonlost · 2025-01-15T23:07:09 1736982429

> I've released some kinda janky features to 100,000s of users before not totally knowing how it's going to preform with all of them at that scale, I don't think that is very controversial in product development.

Oh, that's because modern-day product development of "ship fast, break things" is its own problem. The whole tech industry is built on principles that are antithetical to the profession of engineering. It's not controversial in product development, because the people doing the development all decided to loosen their morals and think its Fine to release broken things and fix later.

That my bar is high and OpenAI is so low is its own issue. But then again, I haven't released a product where it could randomly tell people to poison themselves by combining noxious chemicals or whatever other dangerous hallucination ChatGPT spews. If I had engineered something like that, with the opportunity to harm people and being unable to guarantee it wouldn't, if I had engineered that misinformation was a possibility to be created at scale, if I had engineered this, I would have trouble sleeping...

neom · 2025-01-16T12:31:18 1737030678

So what's your plan? Opt out of ever using the products? You're a hypocrite if you continue to use them with a stance like that.

yosito · 2025-01-15T22:14:45 1736979285

I regularly use Perplexity and Cursor which can search the internet and documentation to answer questions that aren't in their training data. It doesn't seem that hard for ChatGPT to search and summarize their own docs when people ask about it.

neom · 2025-01-15T22:19:42 1736979582

You would want a feature like "self aware" to be pretty canonical, not based on a web search, and even if they had a discreet internal side system it could query that you controlled, if the training data was a year old, how would you keep it matched from a systems point of view over time? Also it's unclear how the model would interoperate the data each time it ran on the new context. It seems like a pretty complicated system to build tbh, esp when maintaining human created help and docs and FAQs etc is A LOT simpler and more reliable source of truth. That said, my understanding is behind the scenes they are working towards the product we experience just built around the foundational model, not THE foundational model is it pretty much is today. Once they have a bunch of smaller llms that do discreet standard tasks set up, I would guess they will become considerably more "aware".

baxtr · 2025-01-15T22:22:50 1736979770

Now imagine giving this "agent" a task like booking a table at a restaurant or similar.

"Yeah sure I got you a table at a nice restaurant. Don’t worry."

behnamoh · 2025-01-16T00:04:57 1736985897

> 2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

I agree, but then again, if you're a dev in this space, presumably you know what keywords to use to refine your search. RAG'ed search implies that the user (dev) are not "in the know".

m3kw9 · 2025-01-15T22:10:46 1736979046

Buggy af right now, 95% tasks failed and I get a ton of emails about it

ProofHouse · 2025-01-15T22:59:15 1736981955

Very, very, very buggy and really looks extremely low effort as with many OpenAI feature rollouts. Nothing wrong with an MVP feature, but make it at least do what it’s supposed to do and maybe give it 10% more extensibility than the bare bones.

netcraft · 2025-01-16T01:58:17 1736992697

I question the same things frequently. I routinely try to ask chatgpt to help me understand the openai api documentation and how to use it and it rarely is helpful, and frequently tells me things that are just blatantly untrue. At least nowadays I can link it directly to the documentation for it to read.

But I dont understand why their own documentation and products and lots of examples using them wouldn't be the number one thing they would want to train the models on (or fine tune, or at least make available through a tool)

_factor · 2025-01-16T03:42:10 1736998930

You mean converting $20 monthly subscribers into less profitable API users?

Mo3 · 2025-01-16T07:45:07 1737013507

Wait so... they made the LLM itself control the scheduling?

Yeah that's not gonna end well. I thought they, of all people, would know the limitations and problems.

ElijahLynn · 2025-01-15T22:03:18 1736978598

Yeah, I saw the 4o with Tasks today, tried it and asked "what is 4o with Tasks", it had no idea. I had to set it to web search mode to figure it out.

fooker · 2025-01-16T00:40:29 1736988029

If you ask me to describe how a human brain works, I'll have no idea and woukd have to search the web to get an (incomplete) idea.

dgfitz · 2025-01-15T23:21:46 1736983306

New killer feature: cron

Can’t imagine why everyone doesn’t pay $200/mo for even more features. Eventually I bet they can clean out /tmp!

chairhairair · 2025-01-15T23:46:54 1736984814

cron, but completely unreliable. How nice.

LLM heads will say “it’s not completely unreliable, it works very often”. That is completely unreliable. You cannot rely on it to work.

Please product people, stop putting LLMs at the core of products that need reliability.

kenjackson · 2025-01-16T02:24:33 1736994273

It's all a matter of degree. Even in deterministic systems, bit flipping happens. Rarely, but it does. You don't throw out computers as a whole because of this phenomena, do you? You just assess the risk and determine if the scenario you care about sits above or below the threshold.

dkjaudyeqooe · 2025-01-16T03:07:46 1736996866

A bit flip is a rare occurrence in an array typically tens of billions large.

The chance that the flipped bit changes a bit that results in a new valid state and one that does something actually damaging is astronomically small.

Meanwhile LLM errors are common and directly effect the result.

kenjackson · 2025-01-17T16:50:43 1737132643

My point is that your confidence level depends on your task. There are many tasks for which I'll require ECC. There are other tasks where an LLM is sufficient. Just like there are some tasks where dropped packets aren't a big deal and others where it is absolutely unacceptable.

If you don't understand the tolerance of your scenario, then all this talk about LLM unreliability is wasted. You need to spend time understanding your requirements first.

great_psy · 2025-01-16T02:34:24 1736994864

When’s the last time you personally had a bit flip on you?

mhitza · 2025-01-16T05:23:18 1737004998

You generally cannot know because we don't measure for it? Especially not on personal computers, maybe ECC ram reports this information in some way?

In practice I think it happens often enough, and I remember a blackhat conference talk from around a decade ago where the hacker squatted typoed variants of the domain of a popular facebook game, and caught requests from real end users. Basing his attack on the random chance of bitflips during dns lookups.

Related, but not the video I was referring to

https://news.ycombinator.com/item?id=5446854

rsynnott · 2025-01-16T11:25:50 1737026750

Not just that, cron, only non-deterministic! The future is now.

theshrike79 · 2025-01-16T07:51:50 1737013910

An actual killer feature would be a system that lets me define repeating tasks with natural language.

Then it would translate that into cron commands in the background.

postsantum · 2025-01-16T08:58:52 1737017932

I feel like the obligatory comment about Dropbox is coming your way

headcanon · 2025-01-15T23:00:06 1736982006

I'm trying to figure out how this would be useful with the existing feature set.

It seems like it would be good for summarizing daily updates against a search query. but all it would do is display them. I would probably want to connect it with some tools at minimum for it to be useful.

DeepYogurt · 2025-01-15T20:27:55 1736972875

They're really trying to juice the usage numbers

42lux · 2025-01-16T00:48:27 1736988507

"How chatgpt reminders saved my life and made me more productive." Videos on YouTube in 3,2,1.

JTyQZSnP3cQGa8B · 2025-01-15T21:41:21 1736977281

As long as it’s generating hype and funding, it brings us closer to their own definition of AGI. It’s the perfect plan.

srid · 2025-01-16T02:00:05 1736992805

Important caveat:

> ChatGPT has a limit on 10 active tasks at any time. If you reach this limit, ChatGPT will not be able to create a new task unless you pause or delete an existing active task or it completes per its scheduled time.

So this is pretty much useless for most real-world uses cases.

jumploops · 2025-01-16T00:53:48 1736988828

I'm surprised it took OpenAI this long to launch scheduled tasks, but as we've seen from our users[0], pure LLM-based responses are quite limited in utility.

For context: ~50% of our users use a time-triggered Loop, often with an LLM component.

Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

We're moving away from cron-esque automations as one of our core-value props (most new users use us for spinning up APIs really quickly), but the base functionality of LLM+code+cron will still be available (and migrated!) to the next version of our product.

[0]https://magicloops.dev/

MattDaEskimo · 2025-01-16T04:38:34 1737002314

This was a weak citation.

> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

---

ChatGPT tasks will become a powerful tool once incorporated into GPTs.

I produce lots of data. Lots of it, and I'd like to have my clients have daily updates on it, or even have content created based on it.

jumploops · 2025-01-16T21:42:20 1737063740

> None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

Sorry? My point was that these are the only overlapping features I've personally found useful that could be replaced with the new scheduled tasks from ChatGPT.

Even these shouldn't require an LLM. A simple cron+email would suffice.

The web scraping component is neat, but for my personal use-cases (tide tracking) I've had to use LLM-generated code to get the proper results. Pure LLMs were lacking in following the rules I wanted (tide less than 1 ft, between sunrise and sunset). Sometimes the LLM would get it right, sometimes it would not.

For our customers, purely scheduling an LLM call isn't that useful. They require pairing multiple LLM and code execution steps to get repeatable and reliable results.

> ChatGPT tasks will become a powerful tool once incorporated into GPTs.

Out of curiosity, do you use GPTs?

duskwuff · 2025-01-16T02:55:43 1736996143

> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

jumploops · 2025-01-16T21:45:54 1737063954

> Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

So far it's help name two children :) -- my wife and I like to see the same 10 ideas each day (via text), so that we can discuss what we like/don't like daily. We tried the sift through 1000 names thing and it didn't fit well with us.

> Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

That's exactly my point. Without further utility (i.e. custom code execution), I don't think this provides a ton of value at present.

dimitri-vs · 2025-01-16T03:22:13 1736997733

"ok Google, remind me to ____ every ____"

Am I missing something or is there exactly zero benefit here over native Apple/Google calendar/todo apps?

jumploops · 2025-01-16T21:47:59 1737064079

You're not missing anything, other than us using Siri :)

My point was that this new functionality, while neat at a surface level, doesn't provide much real utility.

Without custom code execution, you're limited to very surface-level tasks that should be doable with a cron+sms/email.

darkteflon · 2025-01-16T01:56:53 1736992613

Surely we want to be scheduling and calling LLMs from temporalio, dagster - even cron - instead of whatever this is. Why put the LLM at the middle?

joshstrange · 2025-01-16T14:31:55 1737037915

This feature is really bad (unreliable) and they don’t even make a good case for _why_ you would want to use this over literally any other reminder system. I guess it can execute an LLM to decide what to send to you at the scheduled time but its unreliability would never have me relying on it. Some use cases that might be interesting * Let me know the closing stock price for XXXXX * Compile a list of highlights from the XXXX game after it finishes But everything I can think of is just a toy, cool if it works but not ground breaking and possible with much more reliable methods. OpenAI really seems just be throwing stuff at the wall to see if it sticks then moving on and never iterating on the previous features. Dall-e is kind of a joke compared to other things (one-shot only), I trust Claude more for programming, o1 was ho-hum for my needs, desktop app still feels like a solution in search of a problem, etc.

reustle · 2025-01-16T15:15:30 1737040530

Has been consistently working for me, and it does web searching within the tasks.

i.e. look up some niche news on a topic and format it in a particular way

android521 · 2025-01-16T03:51:59 1736999519

I tried it and it failed to send me desktop notification. I did receive emails (at the wrong time). I do think it is too early to launch. 5 min test could have found out these bugs.It really hurts their brand.

ilaksh · 2025-01-15T21:50:07 1736977807

This will be a lot more useful when it's able to combine with more tools, such as in custom GPT actions, APIs, "computer use", the Python interpreter, etc.

ProofHouse · 2025-01-15T22:57:41 1736981861

Yeah, it’s pretty bad, embarrassingly so quite honestly. Literally a single developer in a day could probably significantly improve it. I’m sure that’s coming, but why don’t they just launch these MVP features at least a quarter baked. It’s essentially unusable as is. If it could ping me on my phone And advanced voice could open or I could go do a basic task, great I’m back to using it. But essentially as it is rolled out, it’s hilariously minimal and borderline unusable.

elif · 2025-01-16T00:20:05 1736986805

Works on my machine. (tm)

But it won't let me reschedule my task execution time or change its prompting... It will just go forever now I guess

kifler · 2025-01-15T21:22:27 1736976147

Oddly enough, I do not have access to scheduled tasks either on the app or web interfaces and I am a paying customer.

delgaudm · 2025-01-15T21:25:25 1736976325

It took me a minute to find it. It's a different model -- pull down the models list and you might see one with tasks.

amelius · 2025-01-15T21:32:40 1736976760

Can I ask it to check for deals on products and make it search the web several times a day?

geor9e · 2025-01-15T23:12:18 1736982738

sounds like theyre trying to get ahead of cron job wrappers so they dont get slammed at peak times

daveguy · 2025-01-16T00:15:29 1736986529

If it works correctly, wouldn't those still be peak times? Except with this they have to process the initial scheduling request in addition to the at-execution task.

mh- · 2025-01-16T00:53:45 1736988825

Everyone else's crons, synced to wall clocks, vs your centralized cron (task scheduler, really) that is aware of scheduled work and current load on your systems that consume the scheduled tasks.

Controlling the ability to nudge the wakeup times by small amounts of time can make a huge difference to your ability to manage spiky workloads like this.

geor9e · 2025-01-16T01:38:15 1736991495

A lot of answers don't go stale for hours or days. They'll do the task early, at an off-peak time, hidden from the user, double-check that it really wasn't time sensitive, then surface the saved answer at the time desired.

throwaway314155 · 2025-01-16T02:11:31 1736993491

How are they going to double check without incurring the cost of running it again?

geor9e · 2025-01-16T04:17:53 1737001073

Start with a regex (or fast tiny model) to flag obvious time-sensitive tasks. Else, do the task early by prompting it "if this requires up to the minute information, output cancel, else [prompt]". At best, it's 1 regex + 1 full inference. At worst, it's 1 regex + 1 output token + 1 full inference.

UltraSane · 2025-01-16T01:00:28 1736989228

This could be done with an API key and AWS Lambda in minutes.

retskrad · 2025-01-15T20:57:02 1736974622

OpenAI resembles the old Apple: ship the best experience. The ChatGPT app on every platform is the best in business and they are shipping polished features relatively quickly. It's quite the contrast to Apple of today, the world's largest company who is so inept that they are releasing Apple Intelligence, which is quite literally using ChatGPT 3.5 tech in 2025. It just shows how valuable CEO's like Altman, Musk and Jobs are to a corporation.

extr · 2025-01-15T21:22:48 1736976168

The ChatGPT UI/UX is pretty middling. They still don't have a proper answer to Claude Projects, plus they are focusing on shipping stuff like this instead of fixing the numerous papercuts with the chat experience in their UI. How is it that I can access the most powerful AI on the planet with o1 pro, but if I paste more than few pages of text there's no solution for that, it just overflows the input box and makes it impossible to navigate?

Jimmc414 · 2025-01-15T22:12:04 1736979124

> They still don't have a proper answer to Claude Projects

They added Projects in December:

https://help.openai.com/en/articles/10169521-using-projects-...

mh- · 2025-01-16T00:55:59 1736988959

ChatGPT's Projects feature has weird limitations I've run into. Features that work outside projects, do not necessarily work inside them.

I say this as someone who prefers using ChatGPT over Claude, but pays for both. Hoping they figure it out.

edit: restructured text to make sense.

dimitri-vs · 2025-01-16T03:28:27 1736998107

Only with 4o model which is lacking. Not with any of the o1 models. You also can't upload any documents, not even.txt files which is absurd.

throwaway314155 · 2025-01-15T22:31:11 1736980271

OpenAI projects don't work very well compared to Anthropic (which has its own limitations as is).

arghwhat · 2025-01-15T21:02:57 1736974977

The "old" Apple certainly didn't ship anything quick or on the bleeding edge, nor did they ship the "best" experience. They did, however, have somewhat different priorities than their competitors. They still do to some extent.

TylerE · 2025-01-15T21:07:51 1736975271

Apple Intelligence is running on device instead of racks and racks of cloud hardware. Of course it’s less sophisticated.

amelius · 2025-01-15T22:08:16 1736978896

Yeah, but knowing that doesn't make it much better; it's the wrong design choice.

mh- · 2025-01-16T00:58:30 1736989110

Agreed. The vast majority of their audience doesn't understand the difference. And among the subset that do, I imagine there's a fair number of us that don't care about the distinction. I just want it to work well.

apwell23 · 2025-01-15T21:37:45 1736977065

this has to be sarcasm

dmonitor · 2025-01-15T22:08:10 1736978890

their commenting behavior is strange. i'm not certain.

paul7986 · 2025-01-15T21:16:56 1736975816

Indeed which makes me excited for..

Open AI creating an AI phone with Microsoft ... release H.E.R. (the movie) in your pocket.

Your AI assistant / Agent is seen on the Lock Screen (like a FaceTime call UI/UX) waiting at your beckon to do everything for you /be there for via via text, voice, gestures, expressions, etc.

It interfaces with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledge-base (ask friends birthday if they allow that info).

Apple is indeed stale & boring to me (heavy GPT user) in 2025.