These types of projects are driven by metrics, and teams have some kind of quota/goal that they need to reach by a certain date to keep the project on schedule. Bonuses or job security may be on the line here, and so you may see some desperate employees “going the extra mile” to reach their goals.
Relatedly, Alexa’s voice activation sensitivity is essentially a tunable number. It can be changed to be more sensitive, so that it will activate more easily (e.g. maybe you say “Alex” instead of “Alexa”). The people who control this are likely on the team with that deadline, so the incentives are there to lower this value in order to collect more data by recording personal conversations “accidentally”. Maybe a bad update goes out that causes Alexa to activate randomly, and they quickly fix it after a few days when they collected all the non-Alexa personal conversations they need for their AI.
That’s maybe a bit too deep into the paranoia/tinfoil hat spectrum for some, but history has shown that you can’t give big tech the benefit of the doubt. Especially when you see some of the documents from the Google trial, where executives discuss rolling back new features to improve arbitrary metrics in the short term so that they can get their bonuses for the quarter, even if it hurts consumers.
I think most people, me included, underestimate the scale of the operation. When you hear “company will use private data to do X”, you imagine what a reasonable person would do, like random sample a few conversations here and there. In reality they record everything permanently over months and years, far beyond what would be necessary to run the service.
It’s kind of crazy how we get this level of surveillance while still having software that will lose your data if you don’t hit Save often enough.
What’s fucked up is if you try to regulate it and make these companies have data retention policies. It creates a giant moat around them where no newcomer can have a chance to compete.
That’s because you are not enforcing data portability at the same time. Having studied and discussed the GDPR at length within tech circles, I became convinced that data portability is the ultimate right and the key to ensure continuing innovation
The new Amazon AI is going to be remarkably foul-mouthed. Every time it screws up (and it screws up a lot) I have to curse at it to make it shut up so it can hear the command again.
Yeah, I realized these things are terrible about a year ago. So, I hacked them into computer speakers using some cheap amps and a 12 volt power supply.
I love being able to dictate a grocery list but god damn is she stupid.
Good luck asking for cream cheese and chive crackers without ending up with cream cheese as one item and chive crackers as another. Or worse peanut butter and honey crackers as peanut butter and then honey crackers
The problem is that Alexa isn’t actually parsing the meaning of the total phrase, she’s taking each individual word as it comes. With that context, she would just as easily interpret your phrasing as “thing with thing on the side”. You’d still get chive crackers, honey crackers, peanut butter, and cream cheese.
Edit: I thought about this a bit more, and it seems to me the only way Alexa could actually understand what you wanted is if you said “chive cream cheese crackers” or “peanut butter honey crackers”. You have to implicitly make it one item and not a potential combination of multiple items.
So who thinks this conversation here on lemmy isn’t being used to train an AI? Maybe not right now but later?
Sure the relatively small size of lemmy means it might not be scooped up and trained on. But the point still stands. All that is publicly online is food for the big-corp AI builders. And while Alexa invading your home privacy is obviously a shitty thing, I’m not sure we’ve all thought through the new relationship between us, the internet and the big AIs.
Well I know I have no expectation of privacy here, but I’d rather open source LLMs train on my words along with proprietary ones, than some company hoarding information and selling it to each other.
Alexa devices use an onboard DSP to detect the wakeword and maintain a rolling audio buffer. On a positive match, the DSP wakes the main CPU which combines the saved buffer and any following speech and uploads it to the cloud where Alexa lives so she can try to figure out what you meant.
No audio is uploaded without being triggered by a wakeword. Also, the “mute” button physically cuts power to the mic, and the indicator LED is hardwired to the power rail as a failsafe indicator.
Not going to get much out of me then, most of what Alexander hears is what’s on the TV or music I listen to. If they want to train alexa on that, their fucked
Add comment