Voiced by Amazon Polly |
Introduction
In the vast realm of cinematic experiences, finding that one movie you vaguely remember but can’t quite recall the name of is a common predicament. Perhaps you remember specific scenes, such as a man with a hat amidst flames, a woman driving a pink car, or a raccoon on an alien planet, but the movie title remains elusive. AWS has introduced a groundbreaking solution: Amazon Titan Multimodal Embeddings. This advanced technology combines textual and visual data to create rich representations of multimedia content, revolutionizing how we search for movies.
Understanding Multimodal Embeddings
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
The Problem: Lost in Movie Limbo
Imagine you watched a captivating movie but failed to jot down its title. All you remember is a distinct scene—a man wearing a hat against a backdrop of flames. Without the title, finding the movie amidst thousands of options seems daunting. This is where Amazon Titan comes to the rescue.
The Solution: Multimodal Search
We can now search for movies based on textual descriptions or visual cues using Amazon Titan Multimodal Embeddings. You vaguely recall a movie featuring a man with a hat and fire in the background. AWS Titan analyzes the semantic meaning and visual attributes by inputting this description into the search engine, retrieving relevant matches like “Oppenheimer” or “V for Vendetta.”
Similarly, if you remember a scene with a woman driving a pink car, you can input this description to find movies like “Barbie” or “Legally Blonde.” Even abstract descriptions like “a raccoon and a tree with a face on an alien planet” can lead to relevant movie suggestions, such as “Guardians of the Galaxy.”
Implementation with MovieLens Data
To demonstrate the power of Amazon Titan, we utilized data from MovieLens, a platform that provides movie recommendations based on user preferences. We collected information on well-known movies released in 2024, including titles, posters, genres, and plot summaries.
Generating Embeddings
We created embeddings for movie posters and titles using the Amazon Bedrock API. The API converts images and text into embeddings, capturing their semantic meaning and visual characteristics. We obtained comprehensive representations of each movie by combining textual and visual embeddings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
import boto3 import json import base64 from botocore.config import Config # Configure AWS region and other settings my_config = Config( region_name='us-east-1', # Update with your desired region signature_version='v4', retries={ 'max_attempts': 10, 'mode': 'standard' } ) # Create a Boto3 client for the Bedrock Runtime service bedrock_runtime = boto3.client(service_name="bedrock-runtime", config=my_config) def get_embedding_for_poster_and_title(image_path, title): # Read the image file and encode it to base64 with open(image_path, "rb") as image_file: input_image = base64.b64encode(image_file.read()).decode('utf8') # Prepare the request body containing the image and title body = json.dumps({ "inputImage": input_image, "inputText": title }) # Invoke the Titan Embedding model response = bedrock_runtime.invoke_model( body=body, modelId="amazon.titan-embed-image-v1", accept="application/json", contentType="application/json" ) # Decode the response and extract the embeddings vector_json = json.loads(response['Body'].read().decode('utf8')) image_name = image_path.split("/")[-1].split(".")[0] return vector_json, image_name, title def get_embedding_for_text(text): body = json.dumps({ "inputText": text }) response = bedrock_runtime.invoke_model( body=body, modelId="amazon.titan-embed-image-v1", accept="application/json", contentType="application/json" ) vector_json = json.loads(response['Body'].read().decode('utf8')) return vector_json, text def query(text, n=5): text_embedding = get_embedding_for_text(text) query = { "size": n, "query": { "knn": { "titan_multimodal_embedding": { "vector": text_embedding[0]['embedding'], "k": n } } }, "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"] } response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query) results = response.json() return results |
Building the Search Index
We leveraged AWS OpenSearch, a fully managed search and analytics suite, to index the embeddings. The search index stores the embeddings and metadata, such as movie titles, plot summaries, and genres. By enabling KNN search, we can retrieve the most similar movies based on a given query vector.
Retrieving Results
To retrieve movie recommendations, we first convert the search query into an embedding vector using the same API to generate embeddings. We then query the search index using the KNN algorithm to find movies with similar embeddings. The results include movie titles, posters, IMDb IDs, and plot summaries, providing users with comprehensive information to make informed choices.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
def get_embedding_for_text(text): body = json.dumps({ "inputText": text }) response = bedrock_runtime.invoke_model( body=body, modelId="amazon.titan-embed-image-v1", accept="application/json", contentType="application/json" ) vector_json = json.loads(response['Body'].read().decode('utf8')) return vector_json, text def query(text, n=5): text_embedding = get_embedding_for_text(text) query = { "size": n, "query": { "knn": { "titan_multimodal_embedding": { "vector": text_embedding[0]['embedding'], "k": n } } }, "_source": ["movieId", "title", "imdbMovieId", "posterPath", "plotSummary"] } response = requests.get(base_url + "/multi-modal-embedding-index/_search", auth=HTTPBasicAuth(username, password), verify=False, json=query) results = response.json() return results |
Now, with the query function in place, we can execute searches based on textual descriptions and retrieve relevant movie recommendations. This enhanced search functionality further improves the movie discovery experience for users.
Conclusion
AWS Titan Multimodal Embeddings provide precise, rich embeddings by fusing textual and visual data, revolutionizing the movie finding process. This provides a smooth and user-friendly movie search experience by enabling users to locate films based on hazy memories of scenes, language, or imagery. Titan’s cutting-edge capabilities revolutionize how we find and enjoy cinematic material by making the process of rediscovering films accurate and straightforward.
Drop a query if you have any questions regarding Amazon Titan and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, and many more.
To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.
FAQs
1. What are Amazon Titan Multimodal Embeddings?
ANS: – Amazon Titan Multimodal Embeddings is an innovative technology Amazon Web Services (AWS) developed that combines textual and visual data to create rich representations of multimedia content. These embeddings capture semantic meaning and relationships between different data types, enabling advanced search capabilities.
2. What kind of data does Amazon Titan utilize for movie discovery?
ANS: – Amazon Titan utilizes both textual and visual data for movie discovery. Textual data includes movie titles, plot summaries, and descriptions, while visual data comprises movie posters and images. By combining these modalities, Amazon Titan creates comprehensive representations of movies, enabling accurate and personalized search results.
WRITTEN BY Aayushi Khandelwal
Aayushi, a dedicated Research Associate pursuing a Bachelor's degree in Computer Science, is passionate about technology and cloud computing. Her fascination with cloud technology led her to a career in AWS Consulting, where she finds satisfaction in helping clients overcome challenges and optimize their cloud infrastructure. Committed to continuous learning, Aayushi stays updated with evolving AWS technologies, aiming to impact the field significantly and contribute to the success of businesses leveraging AWS services.
Click to Comment