6 months ago on
Imagine shopping for grocery, cooking and watering your garden all at the same time. With voice search and voice command technology this is absolutely possible today.
Voice search is simply searching the web by speaking to a device instead of typing it out. You may have already used this technology in your android phone while searching Google or in Google keyboard to convert speech to text in WhatsApp .In fact Location World estimates that 2 in 5 adults use voice search once daily.
Here are some statistics on voice search to get you up to date on how people are talking (or will be talking) to devices:
It is clear from the statistics that voice search is gaining momentum and will be at its peak by 2020. Sales of smart speakers like Amazon Echo and Google Home are skyrocketing. Echo Dot was the best selling product on Amazon for the holiday season of 2018.
Even without the statistics it is easy to anticipate that voice search will increase. We humans can speak 3x times faster than we can write. It also allows people to multitask while they are driving, cooking or doing household chores.
This is great for millennials who are hard pressed for time and like to multi-task. But this is also extremely convenient for older generation who will prefer speaking to typing it out. For example, it would be really easy for senior citizens to tell virtual assistants to order medicine rather than going out and getting it.
Although, only 10% of baby boomers use voice search compared to 35% of millennials, the habit will catch up soon with every generation because of voice as more natural communication channel.
This does not however mean that text search will completely fade off any time soon. There are some limitations to voice search because of which visual results will always have its own share.
Voice search gives only one best answer always. But users are sometimes looking for options rather than one best result. Consider shopping for shoes on Amazon. You would definitely like to see a few options before telling Alexa to place an order.
As of now voice search results seem best for day to day transactional and informational searches. Until there is a way to overcome this limitation, users will prefer to screen to shop for products and services.
What about ‘How to’ searches? Some information is best absorbed visually rather than auditorily. You would like to see how to bake a cake rather than hear it. This is why Google has started ranking videos and images over text for ‘How to’ searches wherever possible.
Surfing the internet is an experience in itself. Some people do this to learn about interesting things or pass their time. It is an interesting place where you can stumble upon something unexpected and new. This again cannot be replaced by voice results.
For the purpose of research and in-depth learning too, text is better. You can re-read the article at your own pace while making mental models of what is being read.
Nobody will be comfortable with Google announcing their search results to the whole house every time. This limitation can be overcome by smart earphones, but until then people will prefer visual results on their private screens.
The groundbreaking part about voice search is the speech recognition technology. Search ability is nothing new and we have been searching the internet for many years. Voice search is powered by the same search engine but by converting voice into text.
The interesting part is that machines can now talk and understand like humans. Human communication is unique on this planet and there are many layers to our communication.
There are so many dialects, nuances and intricacies to our language that we are surprised when a machine answers us correctly just like a human.
Just the tone of ‘everything is fine’ said by a woman can give you an idea that nothing is fine, even though she is saying completely the opposite.
For machines to learn language, various fields of computing, linguistics, mathematics and engineering have had to come together. Machines must pick up on the right sound, convert it into bits and bytes and recognize them as meaningful data. This is done through several processes like
Language modelling -likelihood of certain words following others is used to improve accuracy
Artificial neural networks – network models that can recognize sound pattern after extensive training
Sound Matching – Simple matching of words based on memory of sounds
Pattern and Feature analysis – where each word is broken down into bits and recognized from its key features like vowels or syllables
IBM was probably one of the first players to jump into voice recognition and showcased Shoebox, a machine understood up to 16 words, at the 1962 Seattle’s World Fair.
Voice search technology had a breakthrough in 1970 when Lenny Baum invented the ‘The Hidden Markov Modelling (HMM)’ approach to speech recognition. IBM continued to better itself using this model to improve speech recognition of more words.
Microsoft and IBM continued to launch software with voice recognition technology but most of us probably remember Google’s voice search app for iPhone and then Apple’s virtual assistant Siri in 2011.
Today Amazon, Google, IBM, Microsoft and Apple are major players in voice search. Each has its own advantages to play upon. Google has undoubtedly the best search results for Google Assistant while Amazon is a master of customer behaviour and data.
All are competing fiercely to be the market leader in speech and voice recognition market which is estimated to be worth $18.03 billion by 2023. Microsoft is collecting huge amount of data to teach its AI.
They have rented apartments in New York to pick up on sounds and dialects of languages. The idea is to be able to distinguish train noise, traffic and sirens from human voice.
Currently China’s search engine, Baidu, is at the top with 96% word accuracy. Google Now, Apple’s Siri and Microsoft Cortana are close at 95% accuracy.
Amazon Alexa too is almost on the same mark plus it has over 30,000 apps or ‘skills’ with 5000 new skills added every 100 days.
Industry experts are eying the 99% accuracy rate and say that use of voice search will explode at this level. With the current level of research and money that is going into voice recognition technology that time is not far.
Voice search has little to no scope for advertising right now. Even in future customers will likely be annoyed if they hear an ad before getting the result. The visual screen allows marketers to place their ads in various spaces on the screen (banner ads, sidebar, affiliate links etc), but voice medium will not have any such option.
Brands will have to step up their content marketing to be the most helpful answer users are looking for.
Mobile voice search are 3x times more likely to be local compared to text search. Hound, a voice query app, noted that more than 22% of information searched fell in the local search category (local restaurants, services, pet shops, ATMs, etc,). This is an opportunity for brick and mortar stores to grab.
Those that optimise for voice search and integrate their delivery channel with online ordering will be able to profit from this technology. We will share more on local voice search optimization in below section.
Human conversations are very contextual and hence our voice searchs are also going to be same. For example the questions ‘ which is the best speaker’ and ‘ how much is it for?’ are meaningful only when considered together. Without the context of first question the second question will be meaningless.
Virtual assistants also consider data on what apps the user is using and what his browsing history was. For this reason businesses will have to shift their focus from keywords to customer context. Touching the customer at the right time with the right message on the right device will be of essence.
To be found in voice searches companies will have to understand voice search optimization. Let us look what it is and how brands should optimize best for voice.
Voice search optimization is practices that make your content to appear in the voice search result. You must be familiar with keyword targeting for SEO, but how do you optimize for voice search?
Honestly, it is too early to answer this question already. We do not know how virtual assistants are picking results and how they will decide which result to voice in future. Google, Apple and IBM are probably figuring out a way themselves.
Google featured snippet and answer box is the search engine’s way to suggest one best answer for users in text search. Google assistant probably uses similar algorithm to pick a result for voice searches.
For now all we can do is study how people are searching by voice and what results Alexa and Siri are voicing.
A survey of MindMeld found that 40% of users had started using voice in past six months. But what exactly are people searching for?
Voice searches can be broadly categorised into four buckets which are –
Transactional Searches – “Alexa, buy 2 pizza base”
Informational Searches – “ Google, who was the actress in Happy Days?”
Navigational Searches – “Show me comparison of best phones in Amazon” These searches direct you to other apps.
Voice searches also tend to be more of question phrases. We are more likely to ask “How to bake a cake” than say “cake recipe”. Accordingly most of the voice searches start with ‘How to’, ‘When was’, ‘which’, ‘what’, ‘who’ and ‘why’.
Voice searches are also longer with more number of words and long tail keywords. This will help marketers to go niche in their content and target specific needs of users.
Meditative spotlight found out that 89% of people search for local business on their smartphone once a week. This means that local business optimization will work when it comes to voice search. After years of being dominated by big corporations, local businesses will finally have a level playing field to reach their customers.
Since users are looking for quick solutions around them while multi-tasking, mobile voice search is three times more likely to be local based compared to text search.
So far study of voice searches shows that they are long- tailed, specific, local and dominated by question phrases. Let us now look at what you can do as a business to optimize your site and content for voice search.
Keep in mind that virtual assistants only give one best answer in voice search. So it’s all about being the relevant answer in different contexts instead of ranking for keywords. It is expected that search engines will consider customer data before suggesting a result. So, the result will not depend on just helpful content but also on user’s location, history and apps he/she is using currently.
Until now everybody has told you that long form blogs do well. That is about to change with voice search. For transactional and informational queries users will like to listen to a one line answer. For everything that they want to learn on they would prefer visual information in the form of videos or pictures.
Ranking in voice search answer will be about giving one -line answers specific to demographics of the users. It will not be surprising if Google takes in account your business location, customer demographic and site visitor profiles before selecting your answer.
Your optimization for voice search will depend on how well you can understand the user intent and answer it. Use tools like AnswerThePublic to come up with common questions that user can ask on your topic.
Think of all the questions possible about your industry, product or services and answer them in a direct way. This will also make it more likely for you to be at the position 0 or featured snippet in Google.
Even if you do write a long form blog make sure that the answer is at the beginning of the post. You can also mark the main answer in your post so that Google finds it easier to pick. This brings us to the next point.
Along with content on your page Google also reads your structured data or structured data to understand the relevancy of content. This is HTML tags that go into your site’s source code.
Big search engines like Google, Yahoo, Bing and Yandex have come together to accept a common language of mark-up data that tells them the relevancy of content.
Because of schema mark-up Google is able to display snippets of search results with image of products, reviews and description arranged all neatly.
Though schema markup does not affect your search ranking directly (according to Google), having your information displayed conveniently does have a huge impact on your click through rate.
Also the information on opening hours, directions and reviews when you search for a local business is all because of schema markup. This is important for voice search because when users find you on searching they want to know directions to the store and opening hours etc.
Listing your business in Google My Business is also a great way to be found in local searches. Additionally optimize your content for your area, ‘near me’ and famous landmarks near your store.
As mentioned above majority of voice searches are questions that start with how, who, what, when and why. Optimizing for voice search therefore means answering these questions in the most helpful way.
Start with studying all the questions your customers have. You may have already done this for your FAQ section. The next step is to turn these questions into blog posts so that search engines can easily find answers. You could also repurpose your existing content to answer voice search queries better.
Google has more than 90% share in text search. So optimizing for search engine essentially means optimising for Google. However, this is not the case with voice search.
Apple, Amazon and IBM have started on a level ground with Google this time and some are even ahead of Google in certain types of searches.
Therefore it is important to understand how Alexa, Siri and Cortana work with voice searches. For example, Alexa only buys product that are marked with ‘Amazon’s Choice’ badge.
Not much information is available on how other search engines work, but as voice search on these platforms rise experts will analyse what makes results heard in voice search.
Before we close the topic there are some fears regarding voice search technology that are being raised by thinkers. Although equally dangerous, these fears are not receiving as much attention as issues in Artificial Intelligence or Data Privacy.
The ethical question is this- are we putting our decision making power in the hands of big corporations? Are we being robbed of options in name of convenience?
The idea of Alexa buying grocery for you is not comforting to everyone. It is possible for big corporations to push their own agenda (of profit), by recommending you a sugar loaded cereal that compromises on your health. This technology is perfect to mask the knowledge we need to hear and just feed us the information that conditions our minds.
Then there’s also the issue of data privacy. To optimize voice search for you, voice assistants will have complete access to what you say, do and buy on all devices. It will be an ever-present character in your life that learns more about you every day.
Voice search is going to disrupt how brands have been attracting leads and telling stories. This is not just about ranking in voice search but about changing your customer experience altogether. Users are going to expect 100% accuracy from voice recognition software and ready answers to all their queries.
This does mean however that visual content is going to go away. Visuals will always be important considering the fact that it is the most important medium for us to learn new things and connect with others.
Visual media will probably be dominated by videos and images though. Voice search fits in perfectly where the need is informational or the command is transactional.
Voice search market is still at its nascent stage with big players such as Google, Apple, Microsoft, IBM and Amazon trying to dominate. Only players who perfect their voice recognition and learn from huge amounts of data will become the leader.
As far as impact on businesses is concerned the change is going to be significant. From ad placement to content creation, brands will have to rethink everything they are doing now.
This is not necessarily a bad thing. In fact localization of search and contextual content will level the playing ground between large corporations and local businesses.
To stay relevant and be found by customers, brands must give out the right message at the right time on the right device. This involves answering relevant questions specifically using long tail keywords. Schema markup and local search optimization are also important for voice search optimization.