Alexa, get me the articles (voice interfaces in academia)
Thinking about interfaces has led me down a path of all sorts of exciting/mildly terrifying ways of interacting with our devices — from voice-user interfaces going mainstream to another go-round at those smart glasses (much lighter, and using retinal projection this time) to drawing in the air (why not, Apple?).
New forms of interaction and augmented-reality storytelling are exciting, and I can certainly see applications for the academic realm. But I’ve been thinking a lot about voice-user interaction, as this seems to be approaching saturation in many aspects of our lives, and I’m wondering how it might operate in a scholarly environment.
Credit to Jill O’Neill, who has written an engaging consideration of applications, discussions, and potentials for voice-user interfaces in the scholarly realm. She details a few use case scenarios: finding recent, authoritative biographies of Jane Austen; finding if your closest library has an item on the shelf now (and whether it’s worth the drive based on traffic).
Coming from an undergraduate-focused perspective, I can think of a few more:
- asking if there are any group study rooms available at 7 pm and making a booking
- finding out if [X] is open now (Archives, the Cafe, the Library, etc.)
- finding three books on the Red Brigades, seeing if they are available, and saving the locations
- grabbing five research articles on stereotype threat, to read later
Now, I know there are a lot of barriers and areas of development needed for all of these to work (as O’Neill points out, we’re at a functional baseline, not contextual intelligence). But I’m not so much interested in the question of how will it work? as will people want to use it?
I think the answer is a resounding yes.
Consider a recent report on voice technology by J. Walter Thompson Innovation Group London and Mindshare Futures, that states:
“People want voice assistants to show greater understanding, be able to initiate conversations, and preemptively solve problems” (16)
The report found that 60% of smartphone users agreed with the statement, “if voice assistants could understand me properly and speak back to me as a human can, I’d use them all the time” (16)
Other reports agree, showing a majority of people are comfortable with AI replacing human interaction for answering queries.
Why all this appetite for voice?
We are biased to seek cognitive ease (aka, the path of least resistance), and there’s evidence to suggest that voice interactions may be less cognitively taxing than text-based ones, especially when receiving information.
Also keep in mind our propensity for multitasking (or attempting to) — having our hands and eyes free at least provides the illusion that we’re available to do other things. In the busy life of a student, just-in-time information that feels easier to process is a godsend.
Finally, think about student perceptions towards the “getting” of information (especially in those last two cases). Searching for material is functional, not reflective, and efficiency and productivity are key (get the thing(s) in the least amount of time). Students grab things intending to read them now, or later — the act of getting isn’t necessarily connected to the act of thinking through the topic or question.
AI voice assistants are also particularly attractive because of their always-on availability, and the lack of anxiety that might be associated with approaching another human (even as users of these devices tend to anthropomorphize them– and the devices are being taught to recognize emotions).
Now, of course, the system has to work — it has to deliver appropriate, relevant results in a variety of contexts, and do this well enough that students trust the system’s choices (rather than falling victim to FOMO). This gets tricky because the system will need to understand, at the very least:
- what the student is actually asking for
- what they consider “relevant,” and “appropriate” (embedded assumptions about scope, complexity, the actual relationships between concepts mentioned, intended use, etc. all tangled up in here)
- when and how they want it (format)
Anyone who’s done reference interviews can tell you how tricky that is for a fellow human to tease out, let alone a program.
To be most satisfying to undergraduate users, the system will probably have to prioritize results with query keywords in the title, and those that are more overview-level rather than specific. A back-and-forth will also be necessary, to clarify ambiguous requests. The interaction will probably be more human-like than not, in the end.
(note: I’m imagining a combination voice-screen system here — the system grabs results and displays them for evaluation and/or later reading, rather than reading entire articles aloud)
The frustrations that users have with using formulaic commands for Alexa’s Skills sounds an awful lot like the frustrations students have with databases that require formulaic keyword combinations.
I just want you to understand me.
But machine learning continues apace, and contextual cues (information fed into the system about where you are and what you’re doing) can provide a nice boost to understanding.
So, as to whether people will use voice-user interfaces in the academic realm, I think yes. They’ll be using it in every other aspect of their lives, and these expectations will cross over. With software eating the world, we’re turning increasingly to machines to do our thinking for us. We’re already seeing applications of AI teaching assistants and voice-enabled dorms.
In terms of how we can keep up, things get trickier. If we want to make voice-interfaces work well with our systems, we’ll also have to invest in machine learning approaches, ourselves.
I completely agree with Chris Bourg’s statement that:
“I think it will be crucial that we avoid the temptation to continue to serve primarily individual human readers and let the computer scientists worry about how to apply machine learning and AI to vast libraries of resources.”
If we can’t make the case for in-house development, we run the risk of outsourcing this work to companies who do not necessarily share our commitment to privacy, equity, and intellectual freedom — which is a situation we’re already in.