So I was finally able to get back to work on the sound recorder. The general gist is that this is meant to be an audio recorder with transcription and email capabilities, which is also ruthlessly simple to use. There’s one button. You push it, a recording says “recording” and it starts recording. You push the button again and it stops recording and says “stop”. It now runs the audio file through a speech recognition program, and then creates and sends an email with the transcription in the body and the wav file as an attachment. Relatives can figure out what to do with the content from there.
Progress so far:
I’d planned to follow step 3 of the Pi Spy tutorial but found that DeepSpeech was no longer supported(?) and hadn’t really been made with anything less than a Pi4 in mind (I’m using a 3b). Luckily, a bunch of other speech recognition options are available, and I settled on spchcat mostly because it was the first one I found that fit my use case.
If you’re going to install it on a raspberry pi, I very much recommend their issues page for getting through dependency hell. Especially if you put a 64bit OS on your pi. (Remember to get the :armhf version of whatever library it needs.) Pulseaudio also seems to help.
This is a pretty short post, I mostly just wanted to make my updated code available. It’s… not great. I’m not a programmer by trade, and I’m a strong believer in ‘finished not perfect’ even when I know what I’m doing. It seems to be functional, that’s about all I can promise. Maybe don’t let anyone shout bash commands around it. There’s also still no error catching around the length of the recording, or the transcription, though that at least doesn’t seem to cause any issues when it fails.
This is definitely more of a jumping off point than a proper finished product, but hopefully it’ll be useful to someone who’s trying to make the same thing or something similar. Even if it’s not perfect, maybe it’ll save you from repeating some of the work I’ve done so far.
We’re going to do another trial run, see what her feedback is, and update from there.
The updated code is here: https://mega.nz/file/LQlz1BjQ#3R6E9_k1jfmjzFUcBXq_Qi3IGf46iuYtZ95fQlAO-HI
The transcription isn’t great - unfortunately, improving on one of the current big open source speech to text programs is a bit beyond my capabilities. To be fair, it’s not much worse than a handful of commercial products I’ve seen
Oobabooga Textgen WebUI has Silero TTS built in. Messing with it, I wound up playing with their CLI from github.
They have STT too. It is a simple Python script that seems light weight to me (not much experience). Not super accurate but maybe an option. I saw somewhere where a person mentioned the background noise filtering and environmental noise is the majority of the issue. Like just filtering the audio can be just as effective as training in some cases. I never got that far into playing with it; only got the example running and moved on. The license for Silero is noncommercial too/BTW. Nice project. Thanks for sharing.
Very cool! I’ll admit I did much less research than usual when I picked spchcat, and I wouldn’t be against trying a different STT tool. spchat works quite easily, but it seems to be cutting off early, though I’m not sure if that’s a product of the software or the limitations of the Pi3B, or some configuration I missed.