google cloud platform

Data that Knows Humans: Google Cloud APIs for Face, Voice, and Video Detection

Digital transformation is so 2015! Companies have started thinking about artificial intelligence (AI) and how it can transform their way of running their businesses and they are already including it in their wider data analytics strategy! As a certified Google Analytics Cloud Platform partner, our  team was present at the 2017 Google NEXT conference, where there were many productive discussions around the developments in the AI field. In this blog post, I am excited to introduce you to three of my favorite APIs that are empowered with AI and that can provide us as digital analysts with new opportunities for insight and improved customer experience.

As digital analysts, we are stewards of:

  • clean data collection mechanisms
  • efficient data governance practices
  • demonstrating the impact of data on the business bottom line

We have come a long way through the power of storytelling, change management, and compassionate leadership. Yet I believe that there is still one group that we need to speak with more often: the Innovation Team.

While your innovation team may not be interested to hear about data warehouses, data lakes, data models, or data management platforms, they will still be interested to hear how they can create an innovative competitive edge for the organization with the combined power of data and artificial intelligence. It’s on you, the digital analyst, to show them the what, why, and how, and today I am going to help you with that.

Alors, on danse!

First things first – let’s review four of the foundational terminology for our discussion.

API

API stands for Application Programming Interface and allows computer programs to talk to each other. Marketers, data scientists or researchers can leverage APIs to collect live data from multiple sources. Among the many purposes that different APIs serve, they can provide an efficient way of automating the data collection processes.

REST API

REST in REST API stands for Representational State Transfer. REST APIs are among the most popular types of APIs. The beauty of a REST API is in that it allows us to access the data on a webpage in a JSON format via http. REST APIs don’t care about the programming languages (i.e: python, R, C#, iOs etc) either which makes working with them even easier.

In the past, we used to leverage the power of APIs to access structured web data for analysis purposes. But today, thanks to the power of artificial intelligence, we can run even more sophisticated analysis on (you guessed it right) unstructured data as well.

Structured Data

If the data is highly organized data and can easily fit into a table, we refer to it as structured data.

Device CategorySessionsAvg. Session DurationRevenue
Desktop140, 127, 3454:04$23,423,765
Mobile200, 123, 6541:10$22,123,897
Tablet90,120,2162:45$8,000,043

Structured Data fits easily into a table

Unstructured Data

In simple terms, if you can’t fit it into a table, it is unstructured data, and can come in two forms:

  1. Text Format: Social Media comments, Text Messages, or Email
  2. Non-Text Format: Video, Audio, Image
[Image by Laserfiche] – As analysts, we tend to focus on structured data, but unstructured data also holds great opportunity.

Now let’s look at three APIs that the Google Cloud Platform has to offer.

1. Cloud Video Intelligence API

This is a REST API which makes video content searchable. Thanks to this API, you can find the right entity in the video, learn what a video is about, find the best moment in the video and recognize inappropriate content in a video.

Features

  • Label detection
  • Regionalization
  • Integrated with Google Cloud Platform in 7 languages

Pricing

Not available yet. You can apply to access the private beta.

Use Case

With this API, you can take the analysis of your content marketing to the next level. Picture yourself as the Social Media Manager of Banana Republic. You have just finished a successful campaign by asking your customers to send one-minute videos while they are wearing Banana Republic and pitch why they should win you most recent contest. By the end of the campaign, you have compiled a lot of videos from your most valuable asset: your customers! As a next step you’d like to start analyzing the content by gender. The video intelligence API allows you to search the video by gender, so you can segment and your video library content by gender, uncovering insights for merchandising and marketing.

To learn more, you can watch the following overview from the Google NEXT 2017:
[videoembed url=”https://www.youtube.com/watch?v=mDAoLO4G4CQ”]

2. Cloud Vision API

This powerful REST API provides image data, which allows your developers to build applications that can see and understand the content of images. Talk about Content Marketing 3.0!

This powerful tool enables you to classify images and analyze emotional facial attributes. The Cloud Vision API is accessible to the public.

Features

  • Label detection
  • Face detection
  • Explicit content detection
  • Optical character recognition
  • Logo detection
  • Image attributes
  • Landmark detection

Pricing

Google pricing is based on blocks of 1000 units (that is, instances of API features applied to images) and the monthly usage. The fees for 1000 units per month is completely free, and there is a minimal fee as your usages increases to 1 million, 5 million and 20 million units per month.

Learn more about Cloud Vision API pricing.

Demo

I tested some of the features by analyzing the following photo which I took with 2 of my E-Nor colleagues recently.

From Left to Right, Sri (Reporting Consultant), Nesreen (Project Manager), Zara (Yours truly!)
E-Nor Summit 2017 – Santa Cruz, California

I found the results quite impressive. The API identified three faces in the photo, face 1, face 2, and face 3. The result of the analysis for face 1 (Zara) has a confidence level of 77% for expressing joy.

Cloud Vision API analysis of face 1

As the Cloud Vision API analyzed the other two faces, the confidence level for expressing joy increased to 91% and eventually 100%. This indicates the power of the API to learn from the pattern of data.

Additionally, the API identified interesting labels for our photo with relevant confidence level. We were taking a photo on our way to an early supper in time to catch the sunset at the beach. This explains explains the confidence level of the analysis for the photo to be taken during the day. I was using a selfie stick which explains why the Cloud Vision APIs confidence level in this photo being a selfie is only 56%. The car in the background is not our car but we were passing by the cars in a parking lot on our way to the restaurant.

Use Case

Let’s say that as manager for TSA (US airport security agency) at LAX (Los Angeles International), you have a mandate to improve travelers’ satisfaction with the security process. As part of this initiative, you ask users to take a photo before or after going through TSA and share with the TSA or other airport staff. The Vision API can help you search through the content and separate the happy and unhappy passengers. Next step is on you to design think a creative solution to decrease stress level while people go through the TSA line.

You can watch the following overview video to learn more:
[videoembed url=”https://www.youtube.com/watch?v=eve8DkkVdhI”]

3. Speech API

Thanks to the Speech API, your developers can build products to support your global customers by converting audio to text in over 80 languages and variants. The API can handle noisy environments and can support any device that sends either REST or gRPC requests.

Features

  • Automatic speech recognition
  • Real-time pre-recorded audio support
  • Global vocabulary (over 80 languages)
  • Noise robustness
  • Streaming recognition
  • Inappropriate content filtering
  • Word hints

Pricing

This API is priced per 15 seconds of audio processed after a 60 minute free tier. Please note that the monthly usage is capped at one million minutes per month. The pricing is for applications on personal systems such as phones, tablets/desktops. You will have to contact Google for pricing and approval for embedded devices such as cars, TVs, appliances or speakers.

Demo

I did a quick test of the Speech API. You can see the results in the following video. I say the same sentence in English, French, and Persian. The API picked up each sentence correctly despite my accent. As described below, we as analysts can mine the voice information for business insights. Here’s my demo:

[videoembed url=”https://youtu.be/TmQBa2fTWhw”]

My only recommendation is to add Le Français (Canadian) in addition to Le Français (France) 🙂

Use Case

This API could allow you to expand your reach. More customers, better bottom line 🙂 As we approaching the summertime (in the northern hemisphere), the number of visitors in most cities worldwide typically increases – especially in Canada, where many young immigrants invite family members from overseas. Not necessarily every visitor to Canada speaks English or French. That means the guests must wait for the host to get home after work to take them out. Imagine how much independence the retailers of the city could offer a new visitor if their app allowed them take voice orders in their native language and translate it into English or French. By the end of summer, a full analysis of all languages used could possibly open more doors to new markets or cultures.

You can watch the following overview video to learn more:
[videoembed url=”https://www.youtube.com/watch?v=wzp9dfVpeeg”]

Broader Datasets, New Opportunities

I hope that this blog post has provided you with enough knowledge to be dangerous about the cloud APIs. You now have better opportunities to combine structured and unstructured data for even more interesting analysis. Should you start the discussions with your innovation team and implement these new products in your larger data analytics strategy, you will open up new and fresh sets of data streaming into your analytics tools. Have fun and drop me a line if you have any questions! I would love to hear your stories of using any of these APIs in your business.

Happy innovating!

Resources

Cloud Vision API
https://cloud.google.com/vision/

Cloud Video Intelligence
https://cloud.google.com/video-intelligence/

Cloud Speech API
https://cloud.google.com/speech/

Zara Palevani

Zara is the Director of the Center of Excellence at Merkle | Cardinal Path. With a people-first mentality, an entrepreneurial attitude, and an unending thirst to learn and share, Zara always dreams big, thinks outside the box, and works smart. Zara has a proven record of taking the initiative in making strategic decisions to create success for her team and clients. Zara’s experience includes revamping the marketing assets of B2B business, initiating a digital marketing analytics practice for a Fortune 500 company, and providing professional development training in digital marketing, data analytics, and project management.

Share
Published by
Zara Palevani

Recent Posts

Optimizing user experiences with Digital Experience Analytics (DXA) platforms

As consumers become increasingly digitally savvy, and more and more brand touchpoints take place online,…

1 month ago

Enabling Value-Based Bidding with Google Tightlock

Marketers are on a constant journey to optimize the efficiency of paid search advertising. In…

1 month ago

Resolving “Unassigned” Traffic in GA4

Unassigned traffic in Google Analytics 4 (GA4) can be frustrating for data analysts to deal…

2 months ago

This website uses cookies.