If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?
Given there’s a number of knobs that you can use, what do you use and what works well?
- Wake word model. There’s the default models and custom
- Conservation agent and model
- Speech to text models (e.g. speech-to-phrase or whisper)
- Text to speech models
The biggest challenge, in my experience, is finding hardware that heats you well, and you can hear well.
On my PCs USB mic when I’m sitting directly on front of it, everything works quite well. Once I start stepping away, things start to get funky.
The biggest challenge, in my experience, is finding hardware that heats you well, and you can hear well.
Maybe a hairdryer?
Okay now answer: What gets wetter as it dries?
I don’t get it! What is it?
A towel!
Oh nice!
HA Voice Preview is cool but it’s a toy compared to Alexa. It’s not loud enough and doesn’t pick out the wake work well enough. Incredibly cool pipeline and ability to tweak it, once the hardware improves I’d love to replace all my Echo’s.
I agree that it’s not production ready and they know that too, hence the name. But in relation to your points, I plugged in some speaker as it’s not really that great of a speaker at all.
For the wake word, at some point they did an update to add a sensitivity setting so you can make it more sensitive. You could also ty donating your voice to the training: https://ohf-voice.github.io/wake-word-collective/
But all in all you’re spot on with the challenges. I’d add a couple more.
With OpenAI I find it can outperform other voice assistants in certain areas. Without it, you come up across weird issues, like my wife always says “set timer 2 minutes” and it runs off to OpenAI to work out what that means. If you says “set a timer for 2 minutes” it understands immediately.
What I wish for is the ability to rewrite requests. Local voice recognition can’t understand my accent so I use the proxied Azure speech to text via Home Assistant Clound, and it regularly thinks I’m saying “Cortana” (I’m NEVER saying Cortana!)
Oh and I wish it could do streaming voice recognition instead of waiting for you to finish talking then waiting for a pause before trying anything. My in-laws have a google home and if you say something like “set a timer for 2 minutes” it immediately responds because it was converting to text as it went, and knew that nothing more was coming after a command like that. HAVP has perhaps a 1 second delay between finishing speaking and replying, assuming it doesn’t need another 5 seconds to go to open AI. And you have to be quiet in that 1 second otherwise it thinks you’re still talking (a problem in a busy room).


