Voice AI as a Sustainability enabler for QSR / Drive-Thru / Drive-Ins & Restaurants

The QSR Drive-Thru has become the de facto option for all urban dwellers looking to grab a quick bite or a drink. The pandemic recently has only made the need more prominent with the need for less exposure to others and having the whole experience contactless. 

When you think of QSR drive-thru, the first things that come to mind are 

  • Easy of use / convenience to grab something on the go
  • Hassle free access to specific choices @Drive-Thru menu and voicing in of options
  • In-Out in a few moments. As per QSR Magazine, the average “Speed of Service” at QSR Drive-Thru at the top Restaurant chains are 

Do customer driving into a QSR Drive-Thru ever think of:

  • What is the environmental impact or the carbon footprint of using drive-thru
  • They are idling their vehicles which continue to spew GHGs continuously, there is a cost associated with it – for removing GreenHouse gases, who is to bear the cost of that – the individuals or the QSR chains or someone else
  • There is Voice input at the digital menu and all the constant back and forth communication happening with a personnel at the other end. More often than not, due to external interference/noise, more time is spent confirming the customer choices and preferences. This results in electricity of the facility to run, the gas emission from the vehicles etc.,
  • As per Natural Resource Defense Council (NRDC)

Passenger cars consume between 0.03 and 0.07 gallons of gasoline for every 10 minutes of idling. A driver who sits in traffic for 42 hours per year, therefore, wastes approximately 12.6 gallons of gas and emits 247 pounds of carbon dioxide equivalents. That doesn’t sound like a lot, but there are more than 200 million licensed drivers in the United States. Do the math, and traffic could be responsible for up to 25 million tons of carbon dioxide emissions.

QSR Drive-Thru – Voice AI Automation to the rescue

With the advancements in algorithms, Speech Recognition and Natural Language understanding have matured significantly in the last few years. The algorithms are capable of – 

  • Clearly understand the voice input in spite of external noise – using noise cancellation algorithms
  • Correlate the request and counter it with additional follow-up information back to the customers to have a more natural conversation
  • Create synthetic and life like voice to make users comfortable to interact
  • Always have a way to assess, understand and escalate to a human to intervene if the algorithm is unable to detect and take necessary and timely actions

These Algorithms have allowed for enterprises to address – labor shortage, provide round the clock availability while still supporting peak / non-peak business hours. However the fundamental questions remain. 

Does Voice AI provide business results while still addressing the Sustainability needs of the QSR Chain?

As per the World Economic Forum & The Register recent article, states, the Carbon emission of training the OpenAI’s GPT-3 is equivalent to driving a Car to the moon and back.

All AI Applications require lots of data to train the model. The environmental impact of AI Applications starts long before they are used in their end state production environments. All the effort required to collect data, pre-process it, label and annotate data, train the model, continuously monitor model performance and build a feedback loop based on the learnings or improving further accuracy of the models. 

In addition to these, there is computing, storage, network and other infrastructure based impact to the overall carbon footprint. 

The same applies to the Voice based AI applications as well. They require huge data to be labelled, annotated and pre-processed to ensure the application is able to recognize the voice input. In addition to this, the data needs are also a bit different for voice based applications that need to incorporate other dimensions as well. 

There are variety of other challenges to add to the mix – 

  • Accent
  • Dialects
  • External noise
  • Multiple people speaking at the same time
  • People talk to their partners, family members while deciding what they want to order in the QSR drive-thru
  • Cough, Sneeze, an ambulance or a fire-truck going by the side…
  • There are a plethora of external variables that could impact the Voice AIs efficacy and accuracy

Needless to say – anytime computing infrastructure is utilized to train the models, they use significant electricity, and more energy is used to keep the infrastructure at the best recommended levels for ensuring optimal utility.

  • Where does the energy required to run the data center come from, is it from renewable sources?
  • For cooling what other mechanism is used – is it just electricity or sources like natural air cooling or the Datacenter placed underground etc., 
  • GPU vs. CPU Server / PC required to train the model

How can QSR Chain – Drive-thru adopt a more accessible, inexpensive Voice AI solution that addresses Sustainability and environmental impacts and still provides business results?

As discussed earlier, Voice AI requires good data to train and a typical QSR offering variety of menu and variants like toppings, add-ons etc., make it quite challenging to train with limited data. 

However recent advancements in Edge Based Machine Learning Models with the right optimization could be an option to bring the best of both worlds – AI capabilities that are also environmentally sustainable. 

Although there have been several examples of Edge based Machine learning models with lower footprint – compute, training, scalability etc., in the Consumer Electronics / Appliance, there haven’t been many examples of Technology – startups, vendors or IT Services providers, who have found a middle ground to – reduce training needs of the machine learning model, reduce compute needs, make smaller model that can concurrently allow QSR chain to apply Voice AIs to effectively manage Drive–Thru traffic which may even include multiple lanes.

A typical drive-thru has several vehicles during peak hours, there are other external variables that could potentially impact the voice AI’s ability to detect the precise order / intent from the customer’s conversations.

With the advent of Zero / Low Shot learning which allows for limited data, more advanced algorithms that support domain adaptation, have smaller environmental footprint but still achieve all things that a QSR Drive-Thru demands.


The next time you drive to your favorite QSR Drive-thru chain to pick you favorite sandwich or smoothie – 

  • You stop by the QSR Digital menu board
  • Take a moment to add all the extras
  • Check what the family needs and voice out the order
  • A voice AI is continuously listening to the voice orders
  • The voice streamed information is sent to a Edge device
  • The edge device applies Speech To Text algorithm
  • Validates the text and applies it against the model
  • Deduces the items and the supplemental information
  • Generates a voice message of the order and seeks customers confirmation
  • Order is entered in the Order management system

All this process happens on-site at the QSR on a very small footprint device, that involves less electricity, less computing, less network based back and forth on a very nimble model that doesn’t require a lot of resources to complete its end-to-end processing.

Even if 10% of the 200 million drivers visit QSR Drive-Thru, this adds up to 2.5 million tons of CO2 emissions. Depending on the State specific regulatory requirements, there is a cost to offset this by the QSR Chain.

This process gets even more complex with identifying specific vehicle, make and model and precisely calculating the time it took for that car to enter & exit the QSR drive-thru and appropriately calculate the precise emission from these vehicles and then appropriately compute the offset requirement per vehicle.

This creates a WIN-WIN for both the QSR drive-thru chain as well as the customers.