TECHNOLOGY

Understand AI Tagging: What is it and how does it work?

March 27, 2024

AI tagging is a machine learning process where algorithms recognize the content of unstructured data, assigning relevant metadata tags, markers, or labels making it searchable by key terms. AI tagging scans media such as images, videos, and documents, and quickly identifies and tags valuable moments that content owners may not even know they have.

AI generated tags describe the recognized content and its location within the media making it possible to search, organize, and retrieve key content later for a variety of important use cases. Also known as AI auto-tagging, is made up of algorithms that are “trained” on very large datasets, allowing them to learn patterns and associations between different objects, sounds, or text content. This training allows AI tagging solutions to quickly locate, identify, tag, and even predict where specific items will be.

How does AI tagging create metadata?

It’s important to deliver a fast ROI of AI investments. To do that, AI algorithms must accurately analyze files to recognize patterns, objects, motion, and sounds. Text documents and voice are analyzed based on language, keywords, and semantic meaning.

For video, once the analysis is complete, timecode-accurate markers or pointers as well as a set of tags or metadata are generated that describe the content of the file. Tags can include metadata such as objects, activities, emotions, and more, depending on the capabilities of the algorithm and the context of the content.

AI tagging uses neural network algorithms

How is AI technology able to outperform humans in some cases? Machine learning (ML) plays a crucial role in AI, providing image recognition, natural language processing, and speech recognition capabilities. Some popular machine learning architectures include convolutional neural networks (CNNs), artificial neural networks (ANNs), recurrent neural networks (RNNs), and transformers.

AI tagging relies on the neural network’s ability to mimic the human brain learning spatial hierarchies (distance) directly from data. Neural networks can also efficiently distribute and easily scale the ML computational workflows of pixel, bit, and vector analysis across processors (NPU, GPU, or CPU), just like nerve ganglions in the brain.

Neural networks can be adapted and combined to enable varying degrees of object and image recognition and natural language processing (NLP) capabilities. They are built on multiple layers or filters, this is where the term “deep learning” comes from. The basic taxonomy of neural networks can be classified into Input, Hidden, and Output.

Input – data being fed into the neurons
Hidden – the neural network algorithms cooperatively analyze data (transform – like ChatGPT)
Output – the neural network answers or predictions

AI solutions use mathematical algorithms running within neural networks to break down different types of content such as images or individual video frames, into pixels, which are represented by numbers. These mathematical representations can be analyzed, sorted and searched in many ways. There are different levels of AI capabilities from simple object detection, movement, actions, to NLP. NLP enables advanced fuzzy matching or semantic search capabilities that can navigate through the nuance of word meaning and context in human language.

AI tagging and vector databases

So how does a computer recognize a single face out of millions? It requires media content be transformed into “embeddings” to be read and analyzed. Embeddings are numeric representations of unstructured data that capture semantic or contextual information that allow computers to “see”. Embeddings are created by AI modeling tools, commonly called transformers, like Huggingface, Cohere, TensorFlow or PyTorch. The embeddings are then stored in a vector database where more algorithms or database search engines are used to sort, aggregate and query the data.

Common vector database query algorithms include K Nearest Neighbors (KNN) and TopK. Vector Similarity Search (VSS) uses these algorithms to allow AI and other query-based solutions to see objects within a vector database that are similar but not easily recognized. Examples of VSS include finding a face in a crowd or searching through thousands of SEC 10K Adobe doc filings that “sound scary” but never use the word scary. The combination of transformers and vector databases is how generative AI large language models (LLMs), like ChatGPT, understand the meaning behind the words in your questions and Output somewhat meaningful mashup answers or predictions from source Input data.

AI tagging use cases

There are a wide range of AI applications across various industries:

Content Publishing: Content publishers and marketers use AI-enabled content management systems to find key moments by adding relevant tags and metadata docs, images, and videos. M&E use cases include accelerated and lower-cost content organization and post-production workstreams creating highlight packages, showreels, transcriptions, content moderation, and news curation.

Digital Evidence Management: Simplifies the process of organizing and categorizing media, enabling efficient retrieval and reuse of digital assets for law enforcement or legal industries such as finding the sound of glass breaking in hundreds of hours of security footage.

E-Commerce: Online shopping platforms can create product classification and user behavior recommendation systems, allowing customers to have a Google-like search experience.

Archiving and Historical Preservation: Historical archives can digitize and categorize documents, photographs, and artworks, preserving culture for future generations and creating new revenue streams.

Content moderation and localization: Automatically locate a piece of content like beer logos and remove them for viewing countries that disapprove of alcohol use, accelerating relevant content understanding and moderation, increasing user experience.

Benefits of AI tagging

With the growing volume of images and videos being produced and shared online every day, manual tagging has become increasingly impractical and time-consuming. This is where the speed and relevance of artificial intelligent solutions are so attractive to the Media and Entertainment industry, Law Enforcement, Legal and many other practical applications.

ROI: Manual tagging can be impossible when dealing with large volumes of media files even following best practices. Efficient, well trained AI systems automate the process, allowing creators to focus on more strategic post-production tasks, saving money and creating revenue opportunities.

Accuracy: Artificial intelligent systems are engineered to be more accurate over time. In theory, humans are more accurate. But humans are inconsistent, expensive, and do make errors. AI takes the majority of the work away and gives the humans just a fraction of the work to make it perfect.

Scale: As the volume of content continues to grow with consistency, scalability becomes a crucial factor in managing and organizing it. AI-powered systems can scale to analyze and store large volumes of media without compromising performance.

Challenges of AI tagging

While AI solutions offer significant advantages, they are not without challenges and limitations. Some of the key issues include:

Accuracy: While tagging algorithms are continually improving, they are not perfect. Hallucinations, inaccurate, or irrelevant tags can be generated, particularly for complex or ambiguous content. Ensuring the accuracy of AI auto-tagging systems requires ongoing training and validation against known truthful data.

Domain Specific Intelligence: AI models trained on generic datasets may not perform well in domain-specific contexts. For example, an AI model trained on general images may struggle to accurately tag celebrity images or sports plays. The fine-tuning and customization of models is necessary to deliver results that deliver a positive return on investment (ROI).

Ethical Considerations: With growth of internet hyperscalers like Amazon, Microsoft, and Google in recent years, it raises the important topics of technical standards, AI governance and responsible use. Organizations must take steps to properly protect their customers’ privacy, content, and auto-generated metadata which is content creators intellectual property without copying it or putting their watermarking on it.

Affordability: AI is often offered only via API’s or SaaS services that must be bundled with storage and compute services. AI adopters are left to build and run their own solution with complex components, which is a tremendous burden to bear. While a ML platform can be built by open-source software (OSS) with standards from the Cloud National Computing Foundation (CNCF), it is challenging to build and run ML platforms for a long time. It can also be cost prohibitive when it is priced by the minute or hour of analysis. AI platforms are best delivered as a completely managed service to take the burden of running AI away from organizations seeking to adopt it.

AI tagging trends and developments

Some of the key trends to watch out for in the coming months include:

Multimodal AI: The integration of multiple modalities, such as text, images, and audio, recognition will enable more comprehensive and contextual understanding of media assets and more immediate understanding of relevant information.

Semi-Supervised and Self-Supervised Learning: AI governance is important as well as quality control. Advancements in semi-supervised and self-supervised learning techniques allowing humans to manage machine learning training will reduce the reliance on large-labeled datasets, making AI more accessible and cost-effective.

Personalization: AI systems increasingly become more personalized and customizable, allowing users to tailor tagging models to their specific preferences, biases, and requirements. This should lead to more accurate tagging results and improved customer experience accelerating adoption.

But why do we need to use computers? Why not continue to have humans manually scan media? Because global competition is fierce, especially within the M&E industry, cost saving and revenue accelerating solutions are in high demand. AI solutions provide the potential to do both. But image recognition does not give 100% results, normally demonstrating a precision rate of 80% – 90%. So the work must be supervised by humans yet the speed and capacity of AI tagging completely justifies the investment if it can be built and managed affordably.

Implementing AI tagging with Wasabi

By harnessing the power of machine learning and artificial intelligence, organizations can streamline their content management workflows, improve productivity, and enhance user experiences. Always running, automatically analyzing content on discovery, Wasabi AiR is the world’s first intelligent cloud media storage adding structure to unstructured data. Engineered specifically for the M&E industry, Wasabi AiR is a managed service that simplifies AI tagging with affordable AI capabilities accelerating content creation and lowering post-production costs. Click here to learn how much you can save with Wasabi AiR.