Would love to read about experiences actually using this (I mean Mycroft in general) — good, bad, or otherwise.
Also, though: why don't we have "text assistants"? Seems to me the process of deciphering spoken text is (or should be) entirely orthogonal to performing the actual task — changing the lighting, cranking up the AC/heat, arming the security perimeter, or whatever.
I think the reason is that voice recognition is hard and so far only the "BIGASS TECH!!!" corporations have been able to make it "mom or granny ready" — and they have no incentive to do that for free and let us make our own mash ups. They want to wall us into their ecosystems.
So from that standpoint, this looks pretty cool to me — even if the voice recognition isn't as good as the big three.
OTOH, to rebut my own point: I got the new Apple Watch Ultra and I noticed that I can map the side button to a "shortcut" (the Apple term for a script you create yourself to automate something) that just transcribes whatever I say, and sends it as text over SSH to any host I want. On my local LAN, the delivery time is well under 1000ms.
So that's getting pretty close to being able to use Siri as a generic voice recognizer, and then piping the input into whatever arbitrary/homebrew system I want.
To do it purely with voice though you have to be like "Hey Siri, do the funky chicken" (after naming the shortcut "do the funky chicken"). And then say the actual command phrase you want your home automation to do.
I played with Mycroft about two years ago. I had been using a couple Google home minis for a while for the usual things (play spotify, set timers, ask the weather, control lights around the hose). They worked perfectly for that. At the time I decided to de-Google my life and take back my privacy so I went looking for something open source that would provide me more control of my data. I found Mycroft and played with it for a few months.
I was pretty excited about it. I bought a ReSpeaker 2.0, which is an embedded device that can run Linux and has a six microphone array. I designed a custom 3d-printed case to hold the ReSpeaker and a small speaker to make my own little "Jarvis" box (Iron-man reference).
My favorite part about the whole thing was the customization. I wrote a couple of skills to do some other things for me. For example, I could say "Where can I watch X?" and it would use an API to search for a TV show or movie to see where it was available on Netflix, Amazon Prime, Disney+, etc and let me know. It's always been annoying to go Google and try to figure out where I can watch something streaming online, but limited to only the services I currently subscribe to. I wrote another skill that tied into my couchpotato instance so I could say "Download the movie X" and it would go find it and download it. If it found multiple matches, it would read off the top few matches and let me choose the correct one. I even tied those skills together so if the first skill couldn't find a movie at one of my streaming services it would ask if I wanted to download it and I could simply say "yes". I also modified the code to use a custom text to speech API so I could configure Mycroft to use a custom voice.
It was all really cool and I had a lot of fun playing with it. The biggest problem I ran into was the wake word recognition. It worked mostly OK for me on the ReSpeaker from close range but I found as I moved away it went downhill. It was especially bad if I had my device playing music, which is possibly the most common thing I was using my Google Home mini for. I had hoped that the ReSpeaker would help with this, because it had the six microphone array and some built-in loopback hardware to try and cancel out any noise that that was being generated by the ReSpeaker. So any sound output to the speakers would be looped back into the ReSpeaker and could be subtracted from the microphone's input. I found that I just couldn't get it to work well, though. I think the music was causing vibrations that were overloading the microphone array and causing it to be unable to hear me through the music. It's possible it could be improved with a better hardware design to help reduce vibration caused by the device's own speaker. Maybe it works better now, two years later. I think I had configured Mycroft to use Snowboy for wake-word recognition so I could name my Mycroft something else (Jarvis).
One day the Mycroft installation just stopped working on my device after I hadn't touched it in a week or more and I never went back to figure out what was wrong. It's still sitting on the corner of my desk unplugged. If I could have got the wake-word recognition working reliably with music playing I think I would have used it a lot, but I wasn't able to at the time.
I just recently bought a smart watch with a built in "Alexa" app that allows you to send voice commands to your phone which get processed through the watch's official app. I'm instead using Gadgetbridge on Android to interface to the watch. Some kind hacker updated Gadgetbridge to add very basic support for my watch's microphone, allowing you to send the raw voice data to an external application. I'm hoping I'll be able to use this to revive my Mycroft instance and I'll just send voice commands to Mycroft from my watch/phone via a custom Android app/service. In theory, I'll be wearing the watch all the time anyway and having the microphone on my person and right next to my face should hopefully help with the speech-to-text and I won't have to worry about a wake word at all. I've only just barely started working on this, though.
I gave up on mycroft after a long wait and built my own with respeaker and picovoice. i have 2 of them with different wake words. imo it's way better and easier than snowboy. i dont understand why people give their data to amazon to set a timer :)
You are using picovoice as the assistant? Is it en entire solution for that? Or are you running a DIY Mycroft device with picovoice as the wake word detector? I'll have to check this out but I've been trying to stick with open source technologies where I can. I don't trust that a free tier will remain free forever, but it may be worth testing out.
Google assistant on your phone can accept text input. If you're on a relatively recent version of Android you should be able to long-press the home button, then tap the keyboard icon in the popup. Works the same as a voice prompt
A lot of assistant functionality is just getting data from the internet, which search engines already know how to present and format in a useful way.
If you need to go to a specific spot in the house to write some text that turns on a light it seems easier to just walk to an actual light switch? For general automation then I think there are some visual block-based configurators to set up triggers for smart appliances otherwise.
This is actually how Mycroft handles it, more or less.
The wakeword ("hey Mycroft") is done on-device, but everything you say after that is sent to a speech-to-text API. That text is then routed to the appropriate skill to handle. So when you're writing the skill you only worry about the content of that text
Also, though: why don't we have "text assistants"? Seems to me the process of deciphering spoken text is (or should be) entirely orthogonal to performing the actual task — changing the lighting, cranking up the AC/heat, arming the security perimeter, or whatever.
I think the reason is that voice recognition is hard and so far only the "BIGASS TECH!!!" corporations have been able to make it "mom or granny ready" — and they have no incentive to do that for free and let us make our own mash ups. They want to wall us into their ecosystems.
So from that standpoint, this looks pretty cool to me — even if the voice recognition isn't as good as the big three.
OTOH, to rebut my own point: I got the new Apple Watch Ultra and I noticed that I can map the side button to a "shortcut" (the Apple term for a script you create yourself to automate something) that just transcribes whatever I say, and sends it as text over SSH to any host I want. On my local LAN, the delivery time is well under 1000ms.
So that's getting pretty close to being able to use Siri as a generic voice recognizer, and then piping the input into whatever arbitrary/homebrew system I want.
To do it purely with voice though you have to be like "Hey Siri, do the funky chicken" (after naming the shortcut "do the funky chicken"). And then say the actual command phrase you want your home automation to do.