Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Warpd: A modal keyboard-driven virtual pointer (github.com/rvaiya)
260 points by philonoist on Oct 16, 2022 | hide | past | favorite | 118 comments


I've built an app that has the same goals (not operate a mouse) but approach it completely different.

Rather than try to simulate the moving the mouse itself, Shortcat [https://shortcat.app/] indexes the user interface (buttons, text fields, links, menus, etc) and enables fast fuzzy search of the interface. Type a word, abbreviations, or hints and hit Enter to click or action the element. Works almost everywhere on macOS, including browsers, Electron apps, and even iOS apps!

The goal is to minimise cognitive overhead to achieve a particular intent, so being able to type a word to hit a button, or active a deep menu item when you don't know the shortcut is quick and easy.

I'm currently working on a modal option which enables staying within Shortcat to navigate an interface, as well as chords for simulating scrolling and arrow keys.

Shortcat relies on using the Accessibility API to index UI elements however, and is dependent on how well an app or website has implemented it. One of the goals is to help improve accessibility implementations by exposing more people to its implementations and pushing for developers to fix broken or incorrectly implemented accessibility tagging.

Shortcat is macOS only for now as I haven't been able to investigate how viable doing this on Windows or Linux would be, especially on Linux considering all the different toolkits that exist.


I love this new wave of tools coming out for mouseless computer use. Chronic mouse use has destroyed my wrist so I have to avoid using it as much as possible.

I love Shortcat's approach in general, indexing the UI. However, the reliance on the Accessibility API is actually a significant downside in the real world in my experience since so many apps don't properly implement it. I feel like Warpd is a good complement to this, you could use Hint or Grid mode as a fallback when the indexing approach fails.

I wish I could use shortcat or Warpd, but unfortunately I'm on windows. Curious if anyone has any good tool recommendations for windows? Currently, I'm using:

1. Vimium for Chrome (so good, wish I could just use it across the OS).

2. Hunt and Peck: https://github.com/zsims/hunt-and-peck has been my favorite for OS-level use, a simple version of shortcat for windows. But, it's not maintained and not as slick as some of these newer tools.


If you're already using Vimium, I suggest trying qutebrowser, which takes keyboard accessibility to a whole new level, by making it a first-class feature for the entire browser.

It does basically cut out the mouse, and had a several-days learning curve for me, but after that it's pretty great. Here are some cool features, off the top of my head:

* Python-scriptable, though I haven't figured out how to use this yet.

* Bind javascript bookmarklets to a keyboard shortcut (use :bind with the jseval command)

* Toggle not only javascript, but image loading and a whole slew of other features, with a keyboard macro.

* Vertical tabs.

* All config is adjustable via commands.

* Keyboard macros like "pop tab into a new window", "clone tab", "close all other tabs", etc.

* Text selection using the keyboard.

* Quite similar keyboard dynamics to vim.

It has a built-in ad blocker, and you should run :adblock-update when you first use it.

Another browser which is similar, but which I haven't gotten into as much, is Luakit.


Oooooh, Hunt and Peck indicates that it's possible to make a Shortcat for Windows!

I would probably need to pay someone to build that particular version though cause the last time I built anything for Windows was like 15+ years ago


I have used a trackball for many years since my wrist started bothering me, and I love it. I am right-handed, and I use Logitech's 575 and MX Ergo. I prefer the Ergo, even though it is more expensive. I keep it beside me on the couch where I sit. That way my elbow makes a 90 degree angle. Very comfortable. My keyboard is on my lap and my monitor at eye level.


Nice, I've actually tried a trackball myself, but with the way I use my desk (sit/stand) it caused more problems than it solved (shoulder issues). Ergonomics is an art I suppose.


A bit late to the party, but I’ve just released v1.0 of TPMouse, for Windows: https://github.com/EsportToys/TPMouse


Please post this on Show HN. This looks cool!


I never comment on HN however I just want to say I've downloaded your app and it's very impressive - I'm going to try and incorporate this into my workflow the best I can. Thanks!


This app is quality. I can tell you have been working on it for years. Why not charge for it?


Haha, thanks :)

I did charge for it a couple of years ago, however I rebuilt the whole thing from scratch after a long hiatus and hadn't had bothered to reimplement licensing because the existing options all kinda suck, and figured I'd focus the time on features and usability first. I think with the modal mode in the next release will bring it much closer to a 1.0 release.


If you bundle it and release a paid for application on the App Store, I would totally buy it and even roll it out to my staff. The magic of the App Store allows you to do company wide roll-outs quite easily.


I'm not sure if an app like Shortcat can be released on the App Store given it uses the Accessibility APIs (sandboxing etc), also the 15-30% cut they take is a bit ooooof, but I do have plans to support company/teams licensing!


+1 this is awesome! I'd like to donate if I can :)

edit: nvm me, found the option in settings (on activation show shortcuts immediately).

Quick question, I've been playing around with Shortcat for a while. When I press the activation hot-key it takes about 4 seconds for the yellow two-letter denoated highlights to show up, despite the app's text stating "found n elements in ~0.20s". Is there a config option to instantly show the yellow highlights?


Thanks! I don't have a way to take tips yet, but you can support by pushing for developers to improve their accessibility implementations when you run into issues!

I see you found the setting for that. It was a deliberate default initially as the intended way to use Shortcat is to activate Shortcat and type what you want without waiting to see hints, as this is generally faster and less mental overhead IMO, especially for fast typists and well-structured interfaces.

However, some people prefer minimal keystrokes and I get that. I'm trying to figure out the right set of defaults to make it friendly to new users while nudging people to how Shortcat is designed to be used and will be tweaking it as I go.


Oh my! I came to the comment section to ask about a Mac app that I've seen a long time ago that did this. Lo and behold, you, the author, have written the first comment. :-)

Thank you for Shortcat, I used it a long time ago and loved it. Excited to giv it another go!


No worries! Glad you love Shortcat :D


Shortcat is utterly amazing. I really hope I can work this into my entire MacOS usage. You should be really proud of what you've made because this is fantastic!!


Any plans to add scrolling functionality to shortcat? I'd be able to move over completely from vimac if that gets added.


Working on that right now :)


Excellent! I'd love to have that on Linux.


Very interesting thing. I wonder if the gap of apps not supporting a11y could be reduced by using Tesseract to OCR the text.


That's what https://superkey.app/ does.


Excited to try this out! Is it planned to open source? I would love to try integrating this into Raycast


Is Raycast open source? Could only find that the plugins are on GitHub.


It won't be open source, but I will be adding an API so it can be integrated with other apps and scripted


I have been meaning to build something like this for myself, albeit for Linux. Does anybody know if there is any already existing efforts there?

Given that Linux doesn't have anything like an accessibility API, I think the only option is training ML models.


This is what CMD-Shift-? should be.

I think this paradigm along with more app developers putting all the important functions in menus is a strong contender for Maximum Intuitive Productivity


Have you considered using ML/OCR to figure out the position of the text relative to the screen? Seems much simpler than relying on accessibility APIs

Thank you for your hard work!


I have plans to use ML/OCR to augment results down the road but the AX APIs and ecosystem on most apps (that I encounter, at least) are generally decent. Also, OCR means it won’t understand buttons with just icons, whereas AX APIs can grab em just fine.

Thanks! It’s easily my longest running project at a decade


Should be doable on Linux for most mainstream apps due to the toolkits having a11y support, but obviously not all apps use mainstream toolkits.


This is excellent. I will be trying this out.


I've been playing around with ShortCat recently - really cool app! Keep up the good work.


Thanks :D


Looks very cool, have you considered adding voice input or is it already possible?


There’s already a very sophisticated system on MacOS for voice input so I feel like it’d probably be superfluous.


Looks really cool. Is it able to select text for copying?


I'm working on a version that allows sending arrow keys with modifiers to the targeted application, so soon!


Looks awesome! Wish I had this for windows!



I just tried that, this is excellent.


The whole manual says alt+meta+x.

Alt is Meta on modern keyboards. You can use ESC to emulate Meta in some applications. But this doesn't work here.

It should say A-W-x (alt+super+x) Super is Windows key these days.

https://en.wikipedia.org/wiki/Space-cadet_keyboard#/media/Fi...


Looking at the limitations, I hate how fragmented Linux is becoming for apps like this. Completely separate implementations for:

- X

- Wayland + Gnome Shell

- Wayland + wlroots-based compositor

- Wayland + sway or any other non-wlroots based compositor

Is Wayland really that much better that this is worth it? Why can't Wayland be aware of the compositor like X?


Wait is there a separate implementation for sway and other wlroots? Because sway is wlroots based, that's where wlroots originates?


I might be wrong about Sway not being wlroots based. But there are a lot of compositors that are not based on wlroots


Yeah, sway is wlroots. But GNOME and KDE are both doing their own thing, so the point is in generally perfectly valid.


This is growth pains.

Once the the need is understood well enough, a common extension will be written, and Mutter, KWin and sway/wlroots will implement it.

This is better for security than X11 free-for-all.


I agree that Wayland by design can be more secure - but judging from my personal threat model, if malicious code gets to attack X or Wayland, it's already all over.


> a common extension will be written, and Mutter, KWin and sway/wlroots will implement it.

Like how they standardized nice simple things like taking a screenshot?



Isn't the use case of apps injecting mouse and cursor events the "security free for all" that Wayland is trying to prevent?

Full disclosure, I am a Wayland skeptic. I don't think your focus on X input security is as justified as you probably think.


I think Wayland's security model was more worried about reading inputs than writing them (i.e. preventing keyloggers). Of course https://www.x.org/archive/X11R7.5/doc/security/XACE-Spec.htm... also exists, so...


Writing events is certainly a potential security problem.

I know in the Windows world, one of the UAC features was that a less privileged process can't send events to an elevated window.

In X11, I think last I checked most distros disable the XTEST extension by default out of security concerns. Skimming the warpd code, they are using XTEST for the X backend.

As I think of the keylogger problem, it's not really privilege escalation, is it? If you're running as the same user as all the other clients, you could ptrace(2) them and intercept their event loops. I guess there are some container-based app deployment solutions now where you could run stuff at different security levels, so maybe it's more of a legit issue now...


If it is implemented it will most likely go through the xdg-desktop-portal, with a policy controlled by compositor.

And most likely it won't be injection of mouse and cursor events, but something higher-level, like focus switching requests.


Are you aware that the use case here is simulating a mouse? Focus switching is not enough.


Ah, I mixed it up with another tool, sorry.

This one looks like a small feature in a compositor and not an external tool, really.

I guess it would take 100-200 LoC to implement in GNOME Mutter.


Then you'd need to implement it in every compositor.

Excuse me for being blunt. I don't know if you understand how shitty of a design you advocate. Solid designs do not require modifying core components to write application level features the original authors did not envision.


Excuse me for being extra blunt. I don't know if you understand how shitty of a design you advocate. Solid designs do not open users to being attacked and their credentials stolen by malicious applications, including sandboxed ones.

Moving cursor around is a compositor's domain, not some arbitrary application's that decided to fiddle with the user's input.


To you it's an arbitrary program. To the user it's a program they want to work.

An API should not be so preachy about which programs can theoretically be written. It should provide broad mechanisms.

It is very frustrating to work with people who think like you do, that 3 or 4 unrelated projects have to carve up narrow exceptions to how the platform works for every single use case, nominally because of theoretical harm of this exploit no one will write, but actually more based on your ego perception that you know better than every other developer on the planet.

So Wayland has this long list of impossible applications which are doable everywhere else. It's a prima donna.


Where were you seeing that. I only glanced through a handful of files, but it appears to be one implementation for me:

https://github.com/rvaiya/warpd/tree/master/src/platform/way...

If there is a distinction, and there might be, "completely separate" is exaggerating.

Edit: I see that only wlroot based implementations are supported so far, and of those there are some things broken in wayfire. Perhaps this is what you're referring to?


The wayland implementation does not support Gnome Shell on wayland (arguably the most mainstream combo since it's the default in ubuntu). To support Gnome Shell on Wayland, you need to include a Gnome Shell extension which has some headaches. X is a third implementation. And then there is one additional implementation per Wayland compositor that is not based on wlroots


Not a full and completely separate implementation though. Just wrappers to mask differences in compositor extensions that haven't been stabilized yet. If there are none the implementation is simply impossible.


Personally I don't think Wayland will get to a good spot until one compositor "wins" and provides interfaces for modification/extension such as e.g. X-like ability to write window managers without replacing the whole compositor. At this point I see no compelling reason to switch to Wayland for my own part.


There's XWayland for interop. Besides that, people have been moving away from X11 for years now.


I still question myself as to why


Have you actually used Wayland? Besides the obvious current issues (compatibility and/or missing features), it is way better. Even something as simple as moving and resizing windows feels noticeably faster and more responsive on Wayland compared to X.

I don't use it at the moment because of compatibility issues with some software that I use, but that's not Wayland's fault.


> Besides the obvious current issues (compatibility and/or missing features)

I mean, that's the whole biggest criticism of wayland, 15 years in and it still doesn't work (well?) with the biggest GPU brand. The criticism has always been about whether it is worth it to fragment linux land and throw away a few decades worth of work on X.


My criticism of Wayland has always been that they dropped a spec, forgot to make an actual server, and now every window manager needs to implement support for various extensions separately. It's a fine design if you assume everyone only uses Gnome, but...

It's not just throwing away a decade's worth of work on X, it's making everyone redo that work per display manager (thankfully wlroots exists as a kind of Wayland shared library).


You can reduce that list from 4 to 3 (sway and others use wlroots. GNOME is the one weird out in Wayland)


Eh. Sway uses wlroots, but KDE is a completely different option that they didn't list so it comes out in the wash.


Looks very useful! I especially like the ‘grid mode’ — I would never have thought of that idea myself. It’s just a pity it isn’t available on Windows, though I’ve previously had good experiences with Mousable [https://github.com/wirekang/mouseable].


(followup on previous comment) finished in three hours [2022-10-16-13:00-Z]:

https://github.com/EsportToys/AutoWarpd


how to activate it? I cant seem to figure out. The script is running but how to activate the grid mode?


Ctrl+Win+G to activate, then uijk to move quadrants and m,. for left/middle/right clicks


please upload full documentation. I am testing this right now.


Added rudimentary README to the repo


It's been in https://kaleidoscope.readthedocs.io/en/stable/plugins/Kaleid... for a while (as in, it's a plugin you can include in your custom open source keyboard firmware, with both 2x2 and 3x3 modes.)


When I first tried warp on my new keyboard, I really did not understand what was happening and thought it was bugged. The bindings came by default. I removed them and went on.

A few days later I wanted to add custom macros and had to read the docs. Skimming the warp section made me realize it's basically just recursive space positioning. I tried the mode again, and soon realized how incredibly useful this is.

It's a bit difficult for me to keep the state in my mind when going down the tree because there's no visual indicators. It's a keyboard firmware after all.


Saw this thread just now, I'm gonna try to implement this in AutoIt, let's see how long it takes me! [2022-10-16-09:58-Z]


What is AutoIt? A fork of AutoHotKey?


It's like AutoHotKey but designed to be more programming- rather than scripting-oriented. It is astonishingly easy to create GUIs in AutoIt in comparison, I use it to rapidly prototype UX ideas.

In fact, historically AHK is actually a fork of AutoIt.


Long time warpd user here: You should ditch grid mode for the much more efficient hint mode. Also, check out keyd by the same author. The combination of warpd/keyd easily saves me an hour of work every day.


Here is the link to keyd: https://github.com/rvaiya/keyd


This is great.

I have been using something called keynav[1], for getting a similar grid mode. I would never guess how intuitive it is.

It doesn't replace the mouse, but it's helpful for that occasional click in the middle of heavy keyboarding.

https://github.com/jordansissel/keynav


vim-easymotion for the entire screen. I love it. Useful in numerous ways from accessibility to constrained devices to keyboard-centric navigation.

In the 1980's, there was a thread of animosity directed at GUIs and mice as productivity-killers and providing accessibility to novices that robbed power-users of expressivity and automation as features shifted towards UIs over text mode applications. I think we can agree that with necessary and sufficient software engineering and UX, CLI-UI-API parity is achievable offering an easier learning curve, varying levels of user astuteness, mental models, and expressivity to accomplish a task by having different MVC "views" or "presentations" to interact with software or systems of any sort.


For windows users that would love something like this I recommend https://github.com/GavinPen/AhkCoordGrid


I’ve long thought eye tracking would be awesome in this style, this is the next best (and currently only technically viable) solution. Well done!


On Windows speech recognition or dragon naturally speaking, there is the mouse grid functionality. It divides the screen into a grid of nine tiles, then you type a number to select one of the tiles, then the tile gets divided into nine tiles, which recurses on down until you have a single coordinate selected.

I just wish I had an easy way to do that from the numpad. That way, to move the mouse to an arbitrary location I need it to be, I could type 19432 enter and know that corresponded to the coordinates to refresh the page I am reading, that way I could use the mouse less and less as I started to memorize the 80% case of where I need the mouse to go and just bang it out on the keyboard.


A very similar tool for macOS, inherently more native, called Scoot [1]

[1] https://github.com/mjrusso/scoot


For Android users, that's a feature of the Voice Access app.


Oh, neat; I'm very attached to keynav for this use case, but this is more portable. I'll have to dig into the Wayland limitations and caveats, since I thought that this was literally impossible to implement usefully there. Maybe this is one less blocker for me being able to switch now.


I use a thinkpad-style keyboard and my mouse is on the homerow. It feels like that is much more efficient than this, as you get the precision of the mouse without having to move your hand.

I don't understand why more people don't adopt it. Is it because it's so different from a normal mouse ?


From what I remember isn't that a tiny thumbstick? That's much, much slower then something like this.


No it's not. It's pressure sensitive and about as fast or faster than a trackpad with movement. I've never had a problem with speed, even across multiple, large, monitors.


Is this not what you mean: https://www.youtube.com/watch?v=7H8o_-7bKIU? I really doubt it's as fast/accurate as a trackpad even if you master it. This tool looks to be as fast as a mouse if you master it in many situations. But you would need a direct comparison by skilled users for each to be sure. I just don't see how a mini stick will ever beat pressing two buttons.


It's definitely not as accurate as a trackpad or a mouse, but it's not to hard to get very close. The benefit of it is that it's right on home row. You don't need to move your hands to use the mouse.

On my thinkpad, I use the trackpoint and trackpad equally.


My experience is that a mouse is most accurate, followed by a trackpoint, followed by a trackpad, but then again I rarely use trackpads. I inevitably move the mouse when I take my finger off or try to press buttons. Also if I leave it enabled, I inevitably teleport the mouse around the screen with my palms, "palm detection" or no.

Sadly every non-IBM/Lenovo trackpoint I've ever used is awful (although significantly improved by putting a Thinkpad cover bit on the joystick, if you're stuck with one).

That said, having played through the SC1 campaign with a trackpoint... even the best ones are not as good as a real mouse.


It’s faster, but less accurate than a trackpad, and certainly faster than keynav (and probably the thing in the post, which is a re-implementation of keynav).


Those nipples tend to get stuck in one place for me, so I have to disable them.


Windows has this in parts as KeyNavish, Fluent Search, Win-vind, Voice Finger, Window's accessibility's Voice Access, Window managers, etc. and still fall short.


What's the point of this?


Some people prefer a keyboard-focussed workflow and try to avoid using the mouse as much as possible.


Weird.


The only thing weird here is your lack of context and understanding, all while still commenting.


Some people need tools like this, as an assistive technology - think RSI, Parkinsons and other issues that affect dexterity or elbow movement. Not so weird.


Thank you for explaining - as a consequence of illness and disability, I can understand the need.

But why someone would intentionally make things more difficult for themselves as a preference, I don't get. It would be like walking around in crutches when you have two perfectly healthy legs.


I like to keep both hands on the keyboard. Every mouse movement incurs the cost of reaching the right hand for the mouse, then moving the right hand back and re-finding my place on the keyboard. I don't like that constant back-and-forth movement. It breaks my flow and it can make my arm ache.


If implemented well (as [0] is), it can actually be much faster than using the mouse for certain tasks. For example, when browsing Google results, it’s a lot quicker to navigate to a result by pressing the first letter or two of its link text than dragging the mouse to click the link.

As a more common example, I only launch applications by opening a prompt (e.g. Spotlight on Mac) and typing the first couple letters of the program I’m starting. This is much faster than navigating using the mouse to the applications folder/menu/dock/taskbar etc. and clicking an icon.

I agree keyboard-based navigation is not faster for everything. Luckily, tools like this don’t prevent you from also using a mouse!

[0] https://news.ycombinator.com/item?id=33222384


Reminds me of the times when every clickable element had an unde&rlined key and could be activated by alt-r. Then some designheads decided that it is non-beatiful and killed it.


If taken to the logical conclusion, your question extends to "why do we have keyboard shortcuts when you can just mouse there?" Taken to the illogical conclusion: "Why even have a keyboard when you can just use a mouse?"

There are times when a mouse is good, there are times when I don't want to take my hands off the keyboard and mouse for something.


> intentionally make things more difficult for themselves as a preference

No-one would do that, that would be crazy. People intentionally make things easier for themselves as a preference, and different people find different things easy or hard.


Because once you learn how to use it, the keyboard is much faster and more capable than a pointing device.

So it's less like crutches and more like rollerblades.


Seems unlikely you are a serious software developer, software engineer, or sysadmin. It’s well known mouse use slows you down and causes ergonomic issues.


Why would you use a mouse when you have a perfectly good keyboard with 68+ keys and God knows how many viable input combinations?


Thank you for the examples of necessity over examples of preference.


Is this a vimium for everything?!?!? IT SEEMS LIKE IT IS!


Has anyone used this in video games? FPS? MMORPGs?


Very useful, thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: