Exploring Cinematic Treasures with Amazon Titan Multimodal Embeddings

Introduction

In the vast realm of cinematic experiences, finding that one movie you vaguely remember but can’t quite recall the name of is a common predicament. Perhaps you remember specific scenes, such as a man with a hat amidst flames, a woman driving a pink car, or a raccoon on an alien planet, but the movie title remains elusive. AWS has introduced a groundbreaking solution: Amazon Titan Multimodal Embeddings. This advanced technology combines textual and visual data to create rich representations of multimedia content, revolutionizing how we search for movies.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Understanding Multimodal Embeddings

Amazon Titan Multimodal Embeddings harnesses the power of both textual and visual data to generate embeddings—numeric representations that capture semantic meaning and relationships between diverse types of data. Amazon Titan enables sophisticated search capabilities that transcend traditional keyword-based queries by encoding movie titles and posters into embeddings.

The Problem: Lost in Movie Limbo

Imagine you watched a captivating movie but failed to jot down its title. All you remember is a distinct scene—a man wearing a hat against a backdrop of flames. Without the title, finding the movie amidst thousands of options seems daunting. This is where Amazon Titan comes to the rescue.

The Solution: Multimodal Search

We can now search for movies based on textual descriptions or visual cues using Amazon Titan Multimodal Embeddings. You vaguely recall a movie featuring a man with a hat and fire in the background. AWS Titan analyzes the semantic meaning and visual attributes by inputting this description into the search engine, retrieving relevant matches like “Oppenheimer” or “V for Vendetta.”

Similarly, if you remember a scene with a woman driving a pink car, you can input this description to find movies like “Barbie” or “Legally Blonde.” Even abstract descriptions like “a raccoon and a tree with a face on an alien planet” can lead to relevant movie suggestions, such as “Guardians of the Galaxy.”

Implementation with MovieLens Data

To demonstrate the power of Amazon Titan, we utilized data from MovieLens, a platform that provides movie recommendations based on user preferences. We collected information on well-known movies released in 2024, including titles, posters, genres, and plot summaries.

Generating Embeddings

We created embeddings for movie posters and titles using the Amazon Bedrock API. The API converts images and text into embeddings, capturing their semantic meaning and visual characteristics. We obtained comprehensive representations of each movie by combining textual and visual embeddings.

import boto3
import json
import base64
from botocore.config import Config
 
# Configure AWS region and other settings
my_config = Config(
    region_name='us-east-1',  # Update with your desired region
    signature_version='v4',
    retries={
        'max_attempts': 10,
        'mode': 'standard'
    }
)
 
# Create a Boto3 client for the Bedrock Runtime service
bedrock_runtime = boto3.client(service_name="bedrock-runtime", config=my_config)
 
def get_embedding_for_poster_and_title(image_path, title):
    # Read the image file and encode it to base64
    with open(image_path, "rb") as image_file:
        input_image = base64.b64encode(image_file.read()).decode('utf8')
 
    # Prepare the request body containing the image and title
    body = json.dumps({
        "inputImage": input_image,
        "inputText": title
    })
 
    # Invoke the Titan Embedding model
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId="amazon.titan-embed-image-v1",
        accept="application/json",
        contentType="application/json"
    )
 
    # Decode the response and extract the embeddings
    vector_json = json.loads(response['Body'].read().decode('utf8'))
    image_name = image_path.split("/")[-1].split(".")[0]
 
    return vector_json, image_name, title
 
def get_embedding_for_text(text):
    body = json.dumps({
        "inputText": text
    })
 
    response = bedrock_runtime.invoke_model(
        body=body, 
        modelId="amazon.titan-embed-image-v1", 
        accept="application/json", 
        contentType="application/json"       
    )
 
    vector_json = json.loads(response['Body'].read().decode('utf8'))
 
    return vector_json, text
 
def query(text, n=5):
    text_embedding = get_embedding_for_text(text)
 
    query = {
        "size": n,
        "query": {
            "knn": {
                "titan_multimodal_embedding": {
                    "vector": text_embedding[0]['embedding'],
                    "k": n
                }
            }
        },
        "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]
    }
 
    response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)
 
    results = response.json()
 
    return results

import boto3

import json

import base64

from botocore.config import Config

# Configure AWS region and other settings

my_config = Config(

region_name='us-east-1', # Update with your desired region

signature_version='v4',

retries={

'max_attempts': 10,

'mode': 'standard'

}

)

# Create a Boto3 client for the Bedrock Runtime service

bedrock_runtime = boto3.client(service_name="bedrock-runtime", config=my_config)

def get_embedding_for_poster_and_title(image_path, title):

# Read the image file and encode it to base64

with open(image_path, "rb") as image_file:

input_image = base64.b64encode(image_file.read()).decode('utf8')

# Prepare the request body containing the image and title

body = json.dumps({

"inputImage": input_image,

"inputText": title

})

# Invoke the Titan Embedding model

response = bedrock_runtime.invoke_model(

body=body,

modelId="amazon.titan-embed-image-v1",

accept="application/json",

contentType="application/json"

)

# Decode the response and extract the embeddings

vector_json = json.loads(response['Body'].read().decode('utf8'))

image_name = image_path.split("/")[-1].split(".")[0]

return vector_json, image_name, title

def get_embedding_for_text(text):

body = json.dumps({

"inputText": text

})

response = bedrock_runtime.invoke_model(

body=body,

modelId="amazon.titan-embed-image-v1",

accept="application/json",

contentType="application/json"

)

vector_json = json.loads(response['Body'].read().decode('utf8'))

return vector_json, text

def query(text, n=5):

text_embedding = get_embedding_for_text(text)

query = {

"size": n,

"query": {

"knn": {

"titan_multimodal_embedding": {

"vector": text_embedding[0]['embedding'],

"k": n

}

"_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]

}

response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)

results = response.json()

return results

Building the Search Index

We leveraged AWS OpenSearch, a fully managed search and analytics suite, to index the embeddings. The search index stores the embeddings and metadata, such as movie titles, plot summaries, and genres. By enabling KNN search, we can retrieve the most similar movies based on a given query vector.

Retrieving Results

To retrieve movie recommendations, we first convert the search query into an embedding vector using the same API to generate embeddings. We then query the search index using the KNN algorithm to find movies with similar embeddings. The results include movie titles, posters, IMDb IDs, and plot summaries, providing users with comprehensive information to make informed choices.

def get_embedding_for_text(text):
    body = json.dumps({
        "inputText": text
    })
 
    response = bedrock_runtime.invoke_model(
        body=body, 
        modelId="amazon.titan-embed-image-v1", 
        accept="application/json", 
        contentType="application/json"       
    )
 
    vector_json = json.loads(response['Body'].read().decode('utf8'))
 
    return vector_json, text
 
def query(text, n=5):
    text_embedding = get_embedding_for_text(text)
 
    query = {
        "size": n,
        "query": {
            "knn": {
                "titan_multimodal_embedding": {
                    "vector": text_embedding[0]['embedding'],
                    "k": n
                }
            }
        },
        "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]
    }
 
    response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)
 
    results = response.json()
 
    return results

def get_embedding_for_text(text):

body = json.dumps({

"inputText": text

})

response = bedrock_runtime.invoke_model(

body=body,

modelId="amazon.titan-embed-image-v1",

accept="application/json",

contentType="application/json"

)

vector_json = json.loads(response['Body'].read().decode('utf8'))

return vector_json, text

def query(text, n=5):

text_embedding = get_embedding_for_text(text)

query = {

"size": n,

"query": {

"knn": {

"titan_multimodal_embedding": {

"vector": text_embedding[0]['embedding'],

"k": n

}

"_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"]

}

response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query)

results = response.json()

return results

Now, with the query function in place, we can execute searches based on textual descriptions and retrieve relevant movie recommendations. This enhanced search functionality further improves the movie discovery experience for users.

Conclusion

AWS Titan Multimodal Embeddings provide precise, rich embeddings by fusing textual and visual data, revolutionizing the movie finding process. This provides a smooth and user-friendly movie search experience by enabling users to locate films based on hazy memories of scenes, language, or imagery. Titan’s cutting-edge capabilities revolutionize how we find and enjoy cinematic material by making the process of rediscovering films accurate and straightforward.

Drop a query if you have any questions regarding Amazon Titan and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

Accelerated cloud migration
End-to-end view of the cloud environment

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. What are Amazon Titan Multimodal Embeddings?

ANS: – Amazon Titan Multimodal Embeddings is an innovative technology Amazon Web Services (AWS) developed that combines textual and visual data to create rich representations of multimedia content. These embeddings capture semantic meaning and relationships between different data types, enabling advanced search capabilities.

2. What kind of data does Amazon Titan utilize for movie discovery?

ANS: – Amazon Titan utilizes both textual and visual data for movie discovery. Textual data includes movie titles, plot summaries, and descriptions, while visual data comprises movie posters and images. By combining these modalities, Amazon Titan creates comprehensive representations of movies, enabling accurate and personalized search results.

WRITTEN BY Aayushi Khandelwal

Aayushi is a data and AIoT professional at CloudThat, specializing in generative AI technologies. She is passionate about building intelligent, data-driven solutions powered by advanced AI models. With a strong foundation in machine learning, natural language processing, and cloud services, Aayushi focuses on developing scalable systems that deliver meaningful insights and automation. Her expertise includes working with tools like Amazon Bedrock, AWS Lambda, and various open-source AI frameworks.