Amazon Lex • Confidence Management

Confidence Score Management & Confirmation Strategies

How to configure confidence thresholds in Amazon Lex — and why getting this right is fundamental to bot performance.

When Amazon Lex processes a customer input, it returns a confidence score indicating how certain the recogniser is about the matched intent or slot value. How you respond to that confidence score determines the customer experience.

Effective confidence management is a foundational element of bot optimisation — it directly impacts recognition accuracy, customer experience, and containment rates. Yet it's one of the most commonly misconfigured aspects of a Lex deployment.

Core Concepts

There are a few items to consider:

  1. Some customer inputs should not require confirmation. The most obvious example is a yes or no question. Where the recogniser is highly performant, confirming the response adds unnecessary friction.
  2. Some inputs must always be confirmed. For example, an amount of money or an account number. The cost of getting these wrong is too high to skip confirmation.
  3. There is a range in between where the confidence score must be evaluated — confirm if it falls under the threshold but above the minimum threshold.

The Three Confirmation Strategies

This means we need to understand the concept of NEVER, NORMAL, and ALWAYS confirmation strategies:

Strategy Behaviour Use Case
NEVER Never confirm the recognised value Yes/no responses, high-confidence simple inputs
ALWAYS Always confirm, regardless of confidence Financial amounts, account numbers, critical data
NORMAL Confirm based on confidence score thresholds Most other inputs — the nuanced middle ground

It's obvious how to treat NEVER and ALWAYS. But NORMAL is where it gets more involved.

Configuring Confidence Thresholds

To implement the NORMAL confirmation strategy effectively, two thresholds must be set:

Low Confidence Threshold

This is the minimum acceptable confidence score. To start, we need to set a low confidence threshold. This is done at bot level, however you can exercise more control in your codehook Lambda — by having an if statement that says if confidence is less than your custom low confidence threshold, then go down the no-match route.

High Confidence Threshold

Then you need to set the high confidence threshold (this is the one you are enquiring about). I typically start with a high confidence threshold of about 0.8. Anything higher with a confirmation strategy of NORMAL will NOT confirm. Anything below the high confidence threshold but higher than the low/custom low confidence threshold — we should confirm what the recogniser understood.

Recommended starting point: Configure a low confidence threshold of 0.4 and a high confidence threshold of 0.8 per intent (or globally). Refine as part of your optimisation strategy.

The behaviour at each level:

Important Considerations

Thresholds Vary by Intent

These values will change for each intent as the recogniser will perform differently depending on how complex or simple your interaction model is. An intent with 5 clearly distinct sample utterances will behave differently to one with 50 overlapping variations. Tune per intent based on observed performance.

Dynamic Configuration

The next thing to ensure is that you set the confidence thresholds somewhere that is accessible and can be changed dynamically — without requiring a code deployment. Suitable options include:

This allows the optimisation team to adjust thresholds in response to production data without waiting for a release cycle.

Ongoing Refinement

Initial threshold values are a starting point. They must be refined iteratively as part of the post-launch optimisation process. Review confidence score distributions regularly, identify where the bot is over-confirming (frustrating customers) or under-confirming (making errors), and adjust accordingly.

Summary

Confidence score management is not a set-and-forget configuration. It is a continuous tuning activity that directly impacts customer experience and bot performance. Teams that invest in dynamic, per-intent threshold management and pair it with a regular optimisation cadence will see measurable improvements in recognition accuracy and containment rates over time.