GPT-3 (Generative Pre-trained Transformer 3) has demonstrated outstanding performance in very diverse natural language understanding and generation tasks. OpenAI, its creators, have released a limited open API that allows developers to test it.
We test here its ability to generate useful summaries of conversations between a customer and one or more bank agents, a business case that would potentially save time to our customer-facing teams and improve customer experience. The tests were run on ten unedited Spanish conversations about banking extracted from publicly available online forums.
Without any previous fine-tuning, transfer-learning or retraining on the domain’s data, we show that in most cases GPT-3 Davinci manages to generate the desired summaries, identifying the core issue or question, the product or service involved, the recommendation or answer given and its state of completion.
Despite the limited scope of the tests, the results are encouraging and suggest further exploration of the model’s value for our business.
Business context
Companies are rapidly shifting towards an increasingly digital business-consumer relationship model. A large amount of the customer’s questions and complaints managed by our bank agents is held in online text channels and is only expected to grow over time, along with voice calls.
Keeping track of these conversations, often involving more than one bank agent across different channels, can prove to be difficult and time-consuming, which suggests turning to automation for assistance. Now, while very high hopes are set on AI to assist humans in charge, the issues involved in these conversations are of the highest importance and sensitivity for our customers, which leaves little room for mistakes.
So, just how ready is the strongest natural language processing (NLP) model ever to date [January 2021] for these tasks? To which extent could it help us during on-going conversations with our customers, full of nuances and complexities? And would an out-of-the-box GPT-3 work on Spanish text without any model customization, re-training or transfer learning whatsoever?
The experiment
Task: given the contents of a conversation (an exchange of text messages between a person and one or more bank agents), generate a summary that is useful to whomever joins and takes over as an agent.
The following criteria was defined to measure the utility of each summary produced:
- Does it get the core issue or question from the customer? (e.g. bank commissions).
- Does it identify the product or service involved? (e.g. a credit card).
- Does it include the recommendation or answer given? (e.g. renegotiate conditions)
- Is the issue fully solved? (i.e. clearly stated by the customer).
Text corpus chosen (in Spanish!)
After evaluating a broad number of public datasets suitable for NLP research, most of them in English, we decided to try our luck on original Spanish text from publicly available online forums about the banking sector. The topics discussed by users in them, along with the flow and dynamics of questions and answers published, kept the closest resemblance to our main business case.
There were certain differences to note though:
- No actual bank agents participate in them, it was just users helping each other. A real bank agent would always ultimately try to retain the customer and would seldom or never publicly disagree with other peer agents.
- Often several solutions were discussed, which differs with the more accurately focused approach of an experienced agent.
- The tone was more informal and the grammar often irregular: lack of capitalization or punctuation, spelling mistakes and typos plagued most of the source Spanish text.
All these aspects would make the summarization task harder than what the actual text from a customer-agent conversation would imply. But on the other hand, the results would be more encouraging if it worked out well against all odds…
Test cases used
- Ten Spanish conversations were chosen from public forums of different lengths and number of participants.
- The text was kept unedited (retaining all typos, lack of capitalization, etc.).
- The conversation subject, possibly hinting some of the answers, was removed from the data. So was all metadata (user id, timestamps, etc.).
- The users replying to the original question were tagged as “Agente 1”, “Agente 2”, etc., and the initial user as “Cliente” [Customer].
Some of the conversations selected can be seen below, in “Sample conversations and summaries”.
GPT-3 engines tested
OpenAI offers four versions of GPT-3, with different strength vs speed & cost ratios. The engines chosen were:
- Davinci, which with the original 175 [English] billion parameters (175 x 109) trained on one of the largest multilingual datasets ever, including Common Crawl and the full Wikipedia, “is generally the most capable engine”.
- Curie, a faster engine which, though not as strong as Davinci, “is quite capable for many nuanced tasks like sentiment classification and summarization”.
The two engines were tested directly on OpenAI’s playground, given the number of cases to test was not high.
Babbage and Ada, the two more narrow-task-oriented engine variations, were not tested.
Engine settings
All four engines allow setting the following parameters, with a high impact on the text generated. We adapted the values from one of the examples provided by OpenAI in one summarization case:
- Response Length (256): Maximum number of characters to generate.
- Temperature (0.1): We used low temperature since we were requesting straight-forward answers to questions, not creative text generation.
- Top P (1): Controls generation diversity (e.g. 0.5 => consider only half of all likelihood weighted options).
- Frequency Penalty (0.37): To discourage verbatim repetitions.
- Presence Penalty (0): To discourage the likelihood of talking about new topics.
- Best Of (1): To generate a single completion at each call.
"can" - Google News
March 01, 2021 at 01:28PM
https://ift.tt/303nWI4
Can GPT-3 help during conversations with our Spanish-speaking customers? - BBVA
"can" - Google News
https://ift.tt/2NE2i6G
https://ift.tt/3d3vX4n
Bagikan Berita Ini
0 Response to "Can GPT-3 help during conversations with our Spanish-speaking customers? - BBVA"
Post a Comment