Using Machine Learning Technology for Accurate Content Classification - DoubleVerify

DV Anna ZapesochiniDoubleVerify (DV) pioneered the verification space over a decade ago and has been making waves ever since. We continue to develop cutting-edge technologies in order to handle an ever-evolving digital ecosystem. In this first edition of our “Ask the Experts” series, we chat with DV’s Anna Zapesochini, VP of Product Management, about one of our key differentiators – the ability to classify content using artificial intelligence (AI).

Earlier this year, Anna was named as one of Business Insider’s Rising Stars of Ad Tech. At DV, she has built multiple solutions for the digital measurement and analytics platform, as well as brand safety and suitability. She’s currently spearheading the development of an AI-powered solution to classify video and audio content for ad buyers. Read our exchange below to learn more about Anna and her team’s work on content classification through the use of machine learning technology.

 

How did you first begin working in this field?

Like all the best things in life, it actually wasn’t something that I planned in advance. I graduated with a bachelor’s degree in Economics and began my career in academia as a research assistant. I then transitioned into the private sector where I developed my experience with various data analysis roles. I learned a lot about the power of machine learning and artificial intelligence during my years at Google, and I’m now lucky to leverage these learnings and skills as a VP of Product Management at DoubleVerify, where I’m responsible for brand safety and classification.

 

What you do requires amazing technical expertise. How would you describe what you do in layman’s terms?

Machine learning brings together statistics and computer science to enable computers to learn how to do a given task without being programmed to do so. My team focuses on machine learning for content classification. You can think of us as being almost like teachers that help machines learn how to identify the topic and context of the content, whether that content comes in the form of text, video, audio or images.

To help a machine learn, we train it with thousands of examples for each topic and content type. Much like human learning, practice or ‘training’ is really important for machine learning. If we want the machine to recognize ‘news’ then we need to provide it with thousands of varied examples of news content. To make sure the machine learning model is robust, we also have to make sure we help it learn what isn’t news.

This can be achieved by feeding it examples of other types of content like movies or video games. It’s really important that we get this part right. For example, if we want the machine to identify hate speech then we need to make sure it has examples of truly discriminatory or hateful content. We also need to train it with examples of content about legal actions or activism aimed at fighting discrimination. This way the machine can make a critical distinction and can learn that this type of content shouldn’t be identified as hateful.

 

How involved are you and your team in the development of machine learning models?

As a Product team, we help define which machine learning models are needed. Should we train a machine learning model to understand what is the meaning of the images and the motion in a video or also the meaning of the audio? Should the machine learning model analyze only the speech in the video or the background music as well? The product team also defines concrete goals for the machine learning models and we scope the systems that enable machine learning at scale.

 

What do you love most about working on machine learning?

There is a very long list, and I think that the greatest magic happens when I recognize that the machine is able to do something that most humans aren’t able to. It’s also incredibly cool to see how machines learn simpler tasks (e.g. recognize who is a composer of a certain music track) and think about what that can reveal to us about the mechanics of human learning.

 

What would surprise people about what you do? What’s one misconception people often have about content classification?

From my experience, ‘content classification’ can sound like a very gray or technical term to many people. The truth is that we need to ask a lot of interesting questions about what our clients actually need in order to plan the most relevant machine learning models that answer those needs, and there are a lot of moral and policy questions in this field. It’s far from trivial to define misinformation and hate speech, and even the definition of news or fictional content is not straightforward in our day and age.

There are also fascinating considerations in how to avoid bias when training machine learning models. While the technology can be complex at times, eventually, a lot of the questions that the Product and Product Policy teams face when building machine learning products are much more humanistic than technical.

 

In every field there are challenges. What are the biggest challenges in your field? How do you work in your role at DV to take on these challenges head first?

One of the biggest challenges in the Brand Safety Classification field is that there are a lot of cultural nuances when it comes to defining certain topics. For example, what might be considered insensitive or offensive varies across different areas of the world and different cultures. We’re always thinking about how to strike a balance between localizing our policies and finding generalized approaches for our key policies and products. DV is also well positioned because of our ‘semantic science’ capabilities. We have a whole team of linguistic experts who are able to tailor our models for specialization in different linguistic and cultural contexts.

 

What makes you proud of working at DV? What are the things you work on that you are most proud of?

I’m proud of working at DV as we help create a safer online ecosystem and we are fighting issues such as disinformation, hate speech and cyberbullying online in a very practical and tangible way. What makes me even more proud is that classifying these types of topics in user-generated environments is among the most challenging problems for social networks today. The fact that our teams can make an impact on the ecosystem and on society by addressing these challenging problems is something that I’m immensely proud of. The solutions we provide within our products allow brands to ensure that their values are reflected in their campaign spending.

 

What do you think are the most exciting developments in content classification? What can we expect to see in the future?

In addition to products that help identify harmful and sensitive content, we’re also thinking about more solutions that would help brands ensure that their unique set of brand values (as they relate to issues such as sustainability, equality, etc.) are well represented in the composition of their marketing dollars.

 

Are there any resources you would recommend for people who want to get to know more about machine learning technology?

Andrew Ng’s “Machine Learning Stanford” course on Coursera is a great introductory resource about machine learning. As a friendly introduction to computer science for non-technical people, I highly recommend Algorithms to Live By, by Brian Christian and Tom Griffiths.

 

If you’d like to learn more about the tools available to ensure brand safety and suitability in an evolving news cycle, download our guide