A high quality desktop GUI library is close to a game engine in terms of difficulty. You have to support a vast array of languages, accessibility, input paradigms, flexible layout, different screen resolutions and DPIs, styling, and so on. It's a massive undertaking.
Considering that some game engines tend to have their own GUI toolkits (e.g. UE4's Slate for example), you can say that it is more difficult to make a game engine than a GUI toolkit :-P.
I'd say making a comprehensive 2D GUI toolkit is actually more complex and difficult than making a 3D game engine - even one that includes a rudimentary 2D GUI toolkit. There's an enormous long tail to 2D.
My current work is exactly at the place where those tech universes overlap and integrate.
Is the long tail you're talking about more in 2D rendering itself, or in accommodating the great diversity of the humans using the UI, e.g. accessibility and internationalization?
Rendering does have a lot of interesting 2D-specific problems too that 3D engines don't optimize for (Raph Levien's blog has great stuff on this), but what the 3D engines lack is often more related to UI layout, input (e.g. sophisticated focus handling), advanced text (editing, layout, shaping).
Unity's probably been investing the most into a more complete 2D toolkit (which is their third or fourth generation of 2D UI toolkit bundled with the engine) lately. Among the FOSS game engine projects Bevy has made strong 2D suitable for UI an explicit goal. But that one true converged contender is still not even on the horizon yet, IMHO.
Game engine toolkits usually don't have to support accessibility, right-to-left languages, full Unicode, and many other things that mature application UIs need.
I'm not sure about RTL but AFAIK UE4's Slate does have unicode support and also seems to have accessibility support. If the tools built on Slate use that though, i don't know, but that is the same as any toolkit.