AI/ML, AWS, Cloud Computing, Data Analytics

6 Mins Read

Optimizing Generative AI with Amazon Bedrock Model Distillation

Voiced by Amazon Polly

Introduction

As enterprises rapidly adopt generative AI, the trade-off between performance, latency, and cost becomes a central concern. Large language models (LLMs) deliver impressive capabilities, but their computational requirements make them expensive to operate at scale. Amazon Bedrock Model Distillation addresses this challenge head-on, offering a streamlined approach to create smaller, faster models that retain the power of their larger counterparts.

In this blog, we explore how Amazon Bedrock Model Distillation works, compare it to traditional fine-tuning, and examine the AWS configurations needed to implement it.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Amazon Bedrock Model Distillation

Using Amazon Bedrock Model Distillation, you can transfer knowledge from a powerful teacher model (such Claude 3, Titan, or Llama 2/3) to a smaller student model from the same provider. This procedure greatly lowers inference costs and delay while optimizing the student model to perform certain tasks close to the teacher-level.

There are two main data sources for distillation:

  1. Model Invocation Logs: These are automatically recorded from earlier use of the Bedrock model. (Activate the model invocation logging in the Amazon Bedrock console.)
  2. JSONL Custom Dataset: Structured prompt-response pairs uploaded to Amazon S3.

Amazon Bedrock creates responses for every question using the teacher model, then refines the student model based on these instances to produce a condensed model that is best suited.

ad

How the Distillation Workflow Operates?

Step 1: Select Models

  • Teacher model: A sizable, powerful model, such as the Titan Nova 34B, Meta Llama 3 70B, or Claude 3 Sonnet.
  • Student model: A more affordable, smaller model from the same family of providers (Tritan Express, Llama 3 8B, etc.).

The same provider must supply both models. For instance, a Titan Express student model must be distilled from a Titan Nova teacher model.

 

Step 2: Input Preparation

Amazon Bedrock allows two data input formats:

  • Model Invocation Logs: Captured during production usage and stored in Amazon S3. You can filter logs using custom metadata such as project, priority, or intent to include only relevant records.
  • JSONL Format: A custom dataset stored in Amazon S3 where each line represents a full prompt. Some entries may include assistant-generated responses as golden examples.

Example of JSONL record:

step2

Specifically, the value bedrock-conversation-2024 must be present in the mandatory field, schemaVersion, for every record at this launch. The role assigned to the model can be indicated by a system prompt optionally included in the record. The assistant role, which contains the intended response, is optional in the messages field, but the user role, which contains the input prompt given to the model, is necessary.

You can only have one user prompt at preview since the Anthropic and Meta models only support single-turn conversation prompts.

  • Amazon Bedrock asks the instructor model to produce answers if just prompts are given.
  • A maximum of 15,000 prompt-response pairs may be included
  • Bedrock increases training diversity using augmentation and variation techniques.

Step 3: Run the model distillation

Training data can be obtained using historical model invocation logs or by uploading training data to Amazon S3 in JSONL format.

Let’s examine some methods for beginning distillation jobs:

Go to the Amazon Bedrock console on Amazon and select the “Create Distillation job” option under “Customization methods.”

step3

a. Enter the model’s name in the “distilled model name” field. AWS KMS key can be added by selecting Model encryption, and tags can be added, which is optional.

step3a

b. Choose Select model to pick the teacher model of your choice and select Llama.

step3b

c. 3.1 70B Instruct

step3c

d. Select a student model by using the drop-down menu. For this instance, choose Llama 3.1 8B. Using this setup as an inference parameter, the instructor model will generate synthetic data.

step3d

e. As discussed in the prior section, there are two approaches to providing a distillation input dataset.

step3e

f. The training dataset should be uploaded to the Amazon S3 bucket you set if you intend to upload the JSONL file. Choose the Amazon S3 location under the distillation input dataset for your training dataset.

g. Choose to expand the Amazon VPC settings section and specify Amazon VPC.

h. In the Distillation output metrics data, provide the Amazon S3 path for the bucket where you wish to store the distilled model’s training output metrics.

step3h

i. Choose a way to grant Amazon Bedrock the necessary AWS IAM permissions to perform the distillation under Service access. This is accomplished by assigning a service role. You can use an existing service role if you have already defined it. Choose to create and use a new service role and enter a service role name if you wish to create a new role.

step3i

j. Select Create Distillation job after adding all the configurations for the Amazon Bedrock Model Distillation job. You can view the distillation job’s status under Jobs when it begins.

step3j

Step 4: Deploy and evaluate the model distillation

Following the distillation process, you can examine the training metrics stored in your assigned Amazon S3 bucket. Details like step_number, epoch_number, and training_loss are some of them that aid in determining how well the model fine-tuning works.

The next step is to deploy your distilled model by purchasing Provisioned Throughput after you are happy with the model’s performance based on these metrics.

The model’s ability to process and return inputs and outputs is referred to as provisioned throughput based on these factors:

  • The selected student model The pricing and throughput capabilities of various models vary.
  • Number of Model Units (MUs): The number of input and output tokens that the model can process in a minute across all requests is determined by each MU.
  • Duration of commitment: There are three options, no commitment, one month, or six months. Greater hourly rate discounts are offered for longer commitments.

Follow these steps to use the Amazon Bedrock console to deploy the distilled model:

  1. Select Custom models from the navigation pane on the Amazon Bedrock console and select provisioned throughput

step4

step4b

2. For the Provisioned throughput name, enter a name and choose the model that needs to be deployed.

3. After the distilled model has been deployed using a Provisioned Throughput, you can see the model status as In Service when you go to the Provisioned throughput page on the Amazon Bedrock console.

step4c

4. You can interact with this distilled model in the Amazon Bedrock playground, select Chat/text, and then select the distilled model in Custom & Managed endpoints.

Conclusion

Amazon Bedrock Model Distillation is a game-changing tool for businesses developing AI at scale. It blends the runtime efficiency of compact models with the accuracy of sophisticated models. The procedure needs little engineering work, is completely managed, and is intricately linked with AWS services. Amazon Bedrock Model Distillation guarantees that your models are quick, affordable, and safe, regardless of whether you’re developing AI solutions for voice assistants, search, support, or analytics.

Drop a query if you have any questions regarding Amazon Bedrock Model Distillation and we will get back to you quickly.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

  • Reduced infrastructure costs
  • Timely data-driven decisions
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. Can I use different providers for teacher and student models?

ANS: – No, both must be from the same provider.

2. How many prompt-response pairs can be used in a distillation job?

ANS: – Amazon Bedrock supports up to 15,000 prompt-response examples.

WRITTEN BY Venkata Kiran

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!