Voiced by Amazon Polly |
Introduction
In the vast realm of cinematic experiences, finding that one movie you vaguely remember but can’t quite recall the name of is a common predicament. Perhaps you remember specific scenes, such as a man with a hat amidst flames, a woman driving a pink car, or a raccoon on an alien planet, but the movie title remains elusive. AWS has introduced a groundbreaking solution: Amazon Titan Multimodal Embeddings. This advanced technology combines textual and visual data to create rich representations of multimedia content, revolutionizing how we search for movies.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Understanding Multimodal Embeddings
The Problem: Lost in Movie Limbo
Imagine you watched a captivating movie but failed to jot down its title. All you remember is a distinct scene—a man wearing a hat against a backdrop of flames. Without the title, finding the movie amidst thousands of options seems daunting. This is where Amazon Titan comes to the rescue.
The Solution: Multimodal Search
We can now search for movies based on textual descriptions or visual cues using Amazon Titan Multimodal Embeddings. You vaguely recall a movie featuring a man with a hat and fire in the background. AWS Titan analyzes the semantic meaning and visual attributes by inputting this description into the search engine, retrieving relevant matches like “Oppenheimer” or “V for Vendetta.”
Similarly, if you remember a scene with a woman driving a pink car, you can input this description to find movies like “Barbie” or “Legally Blonde.” Even abstract descriptions like “a raccoon and a tree with a face on an alien planet” can lead to relevant movie suggestions, such as “Guardians of the Galaxy.”
Implementation with MovieLens Data
To demonstrate the power of Amazon Titan, we utilized data from MovieLens, a platform that provides movie recommendations based on user preferences. We collected information on well-known movies released in 2024, including titles, posters, genres, and plot summaries.
Generating Embeddings
We created embeddings for movie posters and titles using the Amazon Bedrock API. The API converts images and text into embeddings, capturing their semantic meaning and visual characteristics. We obtained comprehensive representations of each movie by combining textual and visual embeddings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
import boto3 import json import base64 from botocore.config import Config # Configure AWS region and other settings my_config = Config( region_name='us-east-1', # Update with your desired region signature_version='v4', retries={ 'max_attempts': 10, 'mode': 'standard' } ) # Create a Boto3 client for the Bedrock Runtime service bedrock_runtime = boto3.client(service_name="bedrock-runtime", config=my_config) def get_embedding_for_poster_and_title(image_path, title): # Read the image file and encode it to base64 with open(image_path, "rb") as image_file: input_image = base64.b64encode(image_file.read()).decode('utf8') # Prepare the request body containing the image and title body = json.dumps({ "inputImage": input_image, "inputText": title }) # Invoke the Titan Embedding model response = bedrock_runtime.invoke_model( body=body, modelId="amazon.titan-embed-image-v1", accept="application/json", contentType="application/json" ) # Decode the response and extract the embeddings vector_json = json.loads(response['Body'].read().decode('utf8')) image_name = image_path.split("/")[-1].split(".")[0] return vector_json, image_name, title def get_embedding_for_text(text): body = json.dumps({ "inputText": text }) response = bedrock_runtime.invoke_model( body=body, modelId="amazon.titan-embed-image-v1", accept="application/json", contentType="application/json" ) vector_json = json.loads(response['Body'].read().decode('utf8')) return vector_json, text def query(text, n=5): text_embedding = get_embedding_for_text(text) query = { "size": n, "query": { "knn": { "titan_multimodal_embedding": { "vector": text_embedding[0]['embedding'], "k": n } } }, "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"] } response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query) results = response.json() return results |
Building the Search Index
We leveraged AWS OpenSearch, a fully managed search and analytics suite, to index the embeddings. The search index stores the embeddings and metadata, such as movie titles, plot summaries, and genres. By enabling KNN search, we can retrieve the most similar movies based on a given query vector.
Retrieving Results
To retrieve movie recommendations, we first convert the search query into an embedding vector using the same API to generate embeddings. We then query the search index using the KNN algorithm to find movies with similar embeddings. The results include movie titles, posters, IMDb IDs, and plot summaries, providing users with comprehensive information to make informed choices.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
def get_embedding_for_text(text): body = json.dumps({ "inputText": text }) response = bedrock_runtime.invoke_model( body=body, modelId="amazon.titan-embed-image-v1", accept="application/json", contentType="application/json" ) vector_json = json.loads(response['Body'].read().decode('utf8')) return vector_json, text def query(text, n=5): text_embedding = get_embedding_for_text(text) query = { "size": n, "query": { "knn": { "titan_multimodal_embedding": { "vector": text_embedding[0]['embedding'], "k": n } } }, "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"] } response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query) results = response.json() return results |
Now, with the query function in place, we can execute searches based on textual descriptions and retrieve relevant movie recommendations. This enhanced search functionality further improves the movie discovery experience for users.
Conclusion
AWS Titan Multimodal Embeddings provide precise, rich embeddings by fusing textual and visual data, revolutionizing the movie finding process. This provides a smooth and user-friendly movie search experience by enabling users to locate films based on hazy memories of scenes, language, or imagery. Titan’s cutting-edge capabilities revolutionize how we find and enjoy cinematic material by making the process of rediscovering films accurate and straightforward.
Drop a query if you have any questions regarding Amazon Titan and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.
FAQs
1. What are Amazon Titan Multimodal Embeddings?
ANS: – Amazon Titan Multimodal Embeddings is an innovative technology Amazon Web Services (AWS) developed that combines textual and visual data to create rich representations of multimedia content. These embeddings capture semantic meaning and relationships between different data types, enabling advanced search capabilities.
2. What kind of data does Amazon Titan utilize for movie discovery?
ANS: – Amazon Titan utilizes both textual and visual data for movie discovery. Textual data includes movie titles, plot summaries, and descriptions, while visual data comprises movie posters and images. By combining these modalities, Amazon Titan creates comprehensive representations of movies, enabling accurate and personalized search results.

WRITTEN BY Aayushi Khandelwal
Aayushi is a data and AIoT professional at CloudThat, specializing in generative AI technologies. She is passionate about building intelligent, data-driven solutions powered by advanced AI models. With a strong foundation in machine learning, natural language processing, and cloud services, Aayushi focuses on developing scalable systems that deliver meaningful insights and automation. Her expertise includes working with tools like Amazon Bedrock, AWS Lambda, and various open-source AI frameworks.
Comments