RASA

Prakash Dale · August 9, 2020

ChatBOT Python Rasa

Rasa Open Source is a machine learning framework for automated text and voice-based conversations. NLU understands the user’s message based on the previous training data been provided:

Intent classification: Interpreting purpose/intent of the user message
Entity Extraction: Recognizing structured data.

Core decides what happens next in this conversation. Its machine learning-based dialogue management predicts the next best action based on the input from NLU, the conversation history, and your training data.

Best Practices for designing NLU data

Use real data

Avoid using tools to auto-generate training data. It may lead model to loose ability to generalize and is only able to recognize phrases, it’s seen before. i.e. overfitting of the data.
Keep training examples distinct across intents

When training examples are two similar, intent confusion results.

Merge on intents, split on entities

  ## returning user
  * greet
      - utter_ask_if_new
  * inform
      - slot{"status": "returning"}
      - utter_welcome_back
    
  ## new user onboarding
  * greet
      - utter_ask_if_new
  * inform
      - slot{"status": "new"}
      - utter_create_account

Use synonyms wisely

A common misconception is that synonyms are a method of improving entity extraction. In fact, synonyms are more closely related to data normalization, or entity mapping. Synonyms convert the entity value provided by the user to another value—usually a format needed by backend code.
Understand lookup tables and regex

Lookup tables and regex get featurized. That means they are used to train the NLU model itself. This is why you can include an entity value in a lookup table and it might not get extracted - while it’s not common, it is possible.

For best results, you should make sure to include a few of the entities used in lookup table and regexes in your training examples.
Leverage pre-training entity extractor

Names, dates, places, email addresses, etc entity types can be extracted without lot of training data using pre-trained entity extractors available in Rasa - SpacyEntityExtractor, DucklingEntityExtractor
Always include an out-of-scope intent

An out-of-scope intent is a catch all intent for anything that user might say that’s outside of that assistant’s domain.
Handle misspelled words

It’s a given that the messages users send to your assistant will contain spelling errors

Before turning to a custom spellchecker component, try including common misspellings in your training data, along with the NLU pipeline configuration below. This pipeline uses character n-grams in addition to word n-grams, which allows the model to take parts of words into account, rather than just looking at the whole word. By doing so, it can better recover from misspellings.
```
  language: "en"

  pipeline:
    - name: ConveRTTokenizer
    - name: ConveRTFeaturizer
    - name: RegexFeaturizer
    - name: LexicalSyntacticFeaturizer
    - name: CountVectorsFeaturizer
    - name: CountVectorsFeaturizer
      analyzer: "char_wb"
      min_ngram: 1
      max_ngram: 4
    - name: DIETClassifier
      epochs: 100
    - name: EntitySynonymMapper
    - name: ResponseSelector
      epochs: 100
```
Treat your data like code
Test your updates

For more details click here

Share: Twitter, Facebook