Analysis of a GPT-3 Chatbot with Respect to its Input in a Sales Dialogue

The aim of the study at hand is to configure and evaluate a GPT-3 chatbot which is resistant to faulty input prompts and sensitive to the emotional setting of a sales dialogue. Design Science Research Methodology by Peffers et al. [46] was applied and evaluated with qualitative interviews in two conditions, that is, short and long language input. Results show that the chatbot was overall able to mimic human-like sales conversations. Some deviant behavior could be observed, especially in the short input condition, revealing more verbiage and insistent questions for purchase by the chatbot.


I. INTRODUCTION
T EXT-BASED conversational agents, namely chatbots, have become increasingly popular in customer service, healthcare or businesses [1], [70].A chatbot is a program based on artificial intelligence and natural languages processing (NLP) designed to communicate with humans [18].It is however not only important how efficient and accurate the output of a chatbot is, but also that the input is interpreted correctly [36].One quality measure of chatbots is robustness towards faulty input [48], [64], [44], [38].This study looks further into the business domain by using a chatbot in the context of a sales dialogue.A sales dialogue is a dynamic communication process between a buyer and a seller which relies on identifying the buyer's needs such that a sale can be successfully carried out [52].The chatbot employed here is based on GPT-3 from OpenAI [41], [30].
In contrast to traditional chatbots, which operate on predefined states and rules or match an input towards a predefined answer [18], generative models produce a given input word by word into an output such that the dialogue appears to be more human-like and does not rely on pre-defined answers.However, grammatical errors could occur depending on the available amount of training data and huge amounts of training data play a decisive role as the main requirement in generative models [67], [2], [50].As the name suggests GPT-3 is based on a pre-trained model, which allows usage in a variety of contexts [41].
There are cases of misbehavior of chatbots reported in the media, which are a consequence of faulty inputs in the training data [69].This indicates that the input for a chatbot could influence the conversation and possibly change the behavior of the chatbot altogether.
This study aims to further examine the linguistic input of a chatbot in the context of a sales dialogue.The chatbot could react towards faulty input and substantially decrease the quality of the ongoing dialogue by upsetting potential customers.Depending on how the input is interpreted by the chatbot, the emotional setting of the dialogue is likely to change.As a working hypothesis for this study, we like to raise the following question How can a GPT-3 chatbot be designed and developed such that it is resistant to faulty input prompts and sensitive to the emotional setting in the context of a sales dialogue?
We decided to restrict the sales dialogue towards buying smartphones.Statistics show that approximately 68.25 million people in Germany were smartphone users in 2022, equivalent to a smartphone penetration of 81.9% in 2022 [40].Therefore, subjects of this study have likely been engaged in selecting an appropriate smartphone in their past.
Furthermore, we wanted to include the aspect of negotiation in our dialogue setting because negotiation requires enhanced communicative skills and recognizing abstract patterns [25], which would result in a more complex and human-like dialogue.
Literature research revealed that chatbots in the domain of sales have already been investigated.Lee [24] discusses four ecommerce chatbot usage cases in the process of a purchase and concluded that these chatbots have improved the convenience of customers' shopping, ordering, and payment experiences.
Balakrishnan and Dwivedi [3] generally discuss AI-powered digital assistants in conversational commerce, a term emphasized by Mayer and Harrison [33] and introduced by Messina [35].Conversational commerce is buying activity of a customer interacting with a digital assistant.Balakrishnan and Dwivedi [3] conclude that anthropomorphism in digital assistants is crucial for creating a positive attitude and purchase intention.Therefore, it is beneficial if the chatbot mimics a human-like dialogue [3].
In order to make the chatbot more human-like, we named it "Melissa".The following chapter explains the theoretical background of a sales dialogue.The chapter on Methods and Design Science Research Cycle gives a concise description how Design Science [46] was applied to the case of the chatbot, by also considering the affordance theory [13], [39].An affordance is defined as a possibility for goal-oriented action afforded to specified user groups by technical objects [39], [32].We performed evaluation by conducting interviews based on a questionnaire.In the fourth chapter, we will explain our results obtained during the interviews based on previously defined design principles [46].Finally, in the last chapter, we discuss our results and outline possible limitations as well as an outlook towards further possible research.

II. THEORETICAL BACKGROUND
Based on Lewis' AIDA model [58], an acronym for attention, interest, desire, and action, a sales dialogue can be perceived as a specific domain characterized by a more or less rigid sequence of customized events, vocabularies, and a clear understanding of objectives of the sales situation, i. e. satisfaction of a concrete consumer need that is compensated usually with some sort of monetary means.In our study we assume a buyer's market, in which the salesperson has an inherent interest in customer orientation and satisfaction.Staff will try to create a pleasant atmosphere built on positive emotional states of the respective client based on the assumptions that successful sales agents create trust and sympathy [51], [63], [22], [59], [31], [17].Indeed, besides how a situation is conceived, a necessary condition is the availability of the desired product and a profound knowledge of all aspects of it (design, handling, prices, pros and cons of the product as well as user benefit).We set forth that the psychological principles of a sales dialogue also apply to the virtual world.
Since the planned scenery of human-CA-interaction was thought to be sales dialogues on mobile phones, the first step in our theoretical engagement was to look into what is known about sales dialogues among humans in general and how humans behave in a sales process as well as which psychological variables play a role in their behavior.The first thought to note is that sales dialogues follow a fairly strict pattern [65], [10] that may be broken down to more general phases like the opening, analysis of needs, product presentation, and closing.Each of the phases requires a different set of communicative skills and strategies [22].There are also some sensible guidelines and tactics that were developed in practice, gained substantial relevance there and finally found their way to model building and theories [47].These practical guides elaborate on similar stages and define more granular subcategories.In addition, these can be presented as flow diagrams, which qualify particularly for software implementation.Yet the theory behind the four stages in sales dialogues is well founded [17].Whereas the opening consists of codified communication (e. g. greetings, salutations) setting the tone for the rest of the dialogue, the analysis of needs is more analytical, partly based on a variety of indicators and general logic, but it also comprises the evaluation and processing of idiosyncratic information specific to a client.This stage is claimed to be the most challenging for sales in the analogous world [22] and it is plausible to assume this for chatbots as well [28].Price estimates and price expectations are part of this stage.This information is often sensitive and dependent on the situation, should not be directly asked (anchoring).The product presentation is a more or less skillful derivation from the second phase.If knowledge on a wide selection of products is available (in structured and machine-readable form), effective algorithmic solutions exist for mapping needs to specific products.Again, a positive closing is important for the sensation, yet due to a rather codified situation, there are little new challenges for chatbots.Simply put, the vast majority of the scientific literature and practical marketers take as a basis some kind of models that comprise at least four main stages such as opening, need analysis, product presentation and closing.Need analysis is the most crucial part of a successful sales dialogue.
Research has also shown that chatbots which reveal empathic behavior while communicating with the users are perceived in a positive way and increased the trustworthiness towards the chatbot [20].Agents which showed human-like behavior had a higher acceptance rate [6].The emotional states, such as sympathy, joy, allegiance, but also anger or shame, are the decisive variables to create trust and a positive connection to the situation [51], [63].It is also established that the kind of product is an important variable and as such has to be considered [22].It is argued that walk-in customers have to be approached differently, i. e. with positive emotions, to foster ad hoc decisions.And, it is clear that the higher the involvement in the product (be it for status, prestige, price, or practicalities) more rational arguments need to be taken into account.However, this line of research should be embedded in the overall decision-making process of humans.There are hardly any decisions free of emotions, but they are justified by ex-post rational arguments long after the decision is subconsciously made [66], [19].These seemingly conflicting claims from business studies and psychology can be brought together on the common denominator of solving cognitive dissonances [11], [16].What remains from the theoretical convergence is that the role of rationality is largely overestimated; emotions predominate the center stage of action [56], [57], [31].Following this logic, it is important to integrate respective variables in any scientific study on consumer decision and behavior.In dialogues emotions come to the fore as linguistic input.So, it should be possible to use language as a carrier of emotions to manipulate the reaction of a chatbot and, vice versa, analyze how the chatbot uses phrases that appeal to the relationship level [54], [26], [53].
Emanating from these findings, it bears a lot of plausibility to use chatbots in sales processes which build trust in the user to increase the likelihood of a sales success.So far, there is mainly research on what chatbots say, but little research on how they say it, yet the research on the role of emotions is gaining ground [14], [21], [28], [1].Still these overviews clearly show that the interplay of emotional settings and its relation to how and what is really said [23] more research needs to be done.In particular, the interaction at the interface to machine communication with a new generation of chatbots and the integration of the relationship level is largely undiscovered.

A. Design science research methodology
We chose the design science research methodology (DSRM) by Peffers et al. [46] as this approach is a commonly accepted framework for research in the field of design science.The framework consists of six activities as shown in Figure 1.The first step is the identification of the problem and the formulation of the study's motivation.Based on the identified problem a solution and the artifact should be developed.The identified problem leads to the second step which contains the definition of the objectives which serve as a foundation for the solution.The authors state that "the objectives can be quantitative [. . .] or qualitative" [46], p. 55.We decided to conduct exploratory research with the intent to gather qualitative data, therefore we defined design requirements, design principles and design features which served as objectives and were analyzed in a later step of our study.The third activity is the design and development which focuses on the creation of an artifact, which is designing the architecture of a text-based conversational agent and creating it in a purchase context.In the fourth step, the demonstration, the artifact is used to solve parts of the problem, for instance by conducting case studies or experiments.In our study we performed usability tests with users with subsequent interviews to gather qualitative data for the next step.The evaluation of this data takes place in the fifth step of the framework.Our aim of this activity was to evaluate and compare the results from the interviews of the usability tests with the design principles we defined at the beginning of our study.The final step of the DSRM is the communication which involves the presentation of the study [46].
The described steps are normally performed sequentially, but generally the process can be started with any of the first four steps and move outward.Nevertheless, we decided to follow the standard procedure, starting with step one.Technical problems during the demonstration phase made it necessary to iterate back to the design and development step to make technical adjustments in the chatbot before continuing with the demonstration phase.The flexibility of the DSRM allowed us this procedure which is one reason we chose this process model as a foundation.

B. Design Requirements, design principles, design features
1) Design Requirements: Design requirements play a crucial role in the development of information systems.They are essential for the identification of the actions or processes that should be supported by the system [15].In the beginning, we were concerned with the natural limitations of human beings.
Making mistakes is normal.However, it can lead to inaccurate or incomplete information, especially in situations such as consultations.We want to address this problem with our first requirement, which we have defined as follows: DR1: The CA should be robust of input errors.Minimizing human error and maximizing domain expertise is one of the great potentials of chatbots.Especially in critical areas such as healthcare, this competence could lead to greater trust.In order to do this, the chatbot needs to have access to a comprehensive and verified body of knowledge.In addition, it should be able to understand the input correctly, even if it contains errors in grammar or spelling [4].Another important aspect we recognized was the emotional connection between the chatbot and the user.Such a connection can lead to a higher level of well-being.In addition, it can make people feel valued if the chatbot is both competent and friendly [37].So the second requirement is as follows: DR2: The CA should communicate with consideration of emotional context.As human agents are increasingly being replaced by chatbots, it is important that their communication mimics human-to-human interaction.Anthropomorphism therefore plays an important role in chatbot research.This human-likeness can help increase the acceptance of a system [29].People enjoy communicating with chatbots using natural language understanding.Human-like chatbots can also act as a substitute for friendship and affection, helping to prevent loneliness in today's connected world [68].These points lead to our third requirement: DR3: The CA should communicate in a natural language.The requirements that follow are based on the phases of a sales call, as defined by the SPIN Selling sales method, for example [47].It is essential for a chatbot to have an understanding to whom it is communicating with.In today's business world, this classification of users is of particular importance for the marketing strategies of large companies.Through analysis of input and the use of targeted questions, users can be grouped into segments that can be targeted effectively [49].An appropriate greeting from the chatbot should be provided to start the conversation.Therefore, our fourth requirement is: DR4: The CA should be able to greet the user and classify the user based on personal criteria.Another crucial point is that a chatbot should be capable of understanding the wishes of the conversation partner and respond to their needs.Communicating information should be of high quality and be in line with the needs of the other person [68].For this reason, we have formulated the following requirement: DR5: The CA should identify and respect the wishes of the customer.In line with the third phase of the SPIN model [47], a chatbot should be able to demonstrate how it can help the user.However, this requires the provision of an optimal fact-based solution that fully aligns with the input [68], [61], [34].Thus, we have formulated the following requirement: DR6: The CA should provide an optimal solution of fact-based questions and requested information.It is important for chatbots to have a high level of human-likeness in order to enhance the users experience.This is particularly important when users are negotiating with the chatbot, as they should feel positive and the chatbot should be willing to compromise [37].Interaction in computer-assisted meetings can be positively influenced by facilitators which, for instance, aims to create a positive environment and a good relationship and manages negative emotions [8].So, the seventh requirement is as follows: DR7: The CA should evoke a good feeling while negotiating with the user.Similar to identifying the user's wishes in design requirement 5, the chatbot should be able to recognize when the user is convinced to buy the product [37].This has a high marketing value, similar to the classification in Design Requirement 4, as appropriate accessories can be offered before the purchase is carried out [68], [27].Furthermore, research has shown that text-based conversational agents have limitations in identifying users' intents [12].Therefore, we have formulated our final requirement as follows: DR8: The CA should know when the user is convinced to buy the suggested product.
2) Principles and Features: According to Gregor et al. [15], design principles in the field of information systems are generally structured into three categories.In the first category the principles refer to the user's activities with focusing on the user's behavior, while the principles of the second category emphasize the role of the artifact.The third category combines the first two categories and therefore consists of principles of user activity and the artifact.As suggested by the authors we phrased our design principles as follows: "'In order to allow users to do A, the system should have feature X"' [15].By using the third version of principles, we combine the design principles and design features into one statement, addressing the user as well as the artifact, a GPT-3 chatbot.While the first part of the statement refers to the principle itself, based on research, the second part is the desired feature of the chatbot.We defined six design principles and four related features based on the design requirements.
The first principle refers to the possibility to use the chatbot as an information retrieval tool [55].According to Shawar and Atwell [55], the potential that chatbots can be used to retrieve information has been found in the field of education, where research has shown that the outputs given by the chatbot have similarities with replies generated by Google and can therefore be a source of information.Nevertheless, students preferred the chatbot's answers because they were more detailed and specific while Google results mainly consisted of a high number of links.The first principle was phrased as follows: DP1: For the customer to allow the retrieval of information about a product, the chatbot should use GPT-3 to process and create natural language text and the chatbot should process a conversation in a purchase context.
The second principle refers to the human-like interaction between the customer and the chatbot.Research has shown that users prefer a human-like interaction with aspects of perceiving a personality, establishing a relationship with the user and the importance of asking and answering questions, good conversational habits and the usage of appropriate grammar and spelling.These traits have been shown to be important to users and is therefore of high importance in the design and development of a chatbot [38].Especially in a purchase context we consider a human-like interaction to be essential since the chatbot is intended to replace a human salesperson and should therefore have similar character traits.The second principle has therefore been defined as follows: DP2: To allow a human-like interaction between the customer and the chatbot in a purchasing process, the chatbot should use GPT-3 to process and create natural language text related to the purchase context and to use conversational cues to provide a convincing and satisfying interaction.
The third principle is based on the assumption that in computer-assisted conversations a facilitator is needed that "creates and reinforces an open, positive and participative environment" [8].In a purchase process where a human salesperson is involed, the human would have to role of a facilitator, with the intention to create a positive environment such that the purchasing process can be facilitated.These attributes should be transferred to the chatbot.The following principle therefore aims at a positive emotional atmosphere during the purchase dialogue: DP3: To allow a positive emotional atmosphere for the users, the chatbot should use conversational cues and provide a convincing and satisfying interaction and use words and phrases connotating positive emotions.
The fourth principle refers to the finding that one of the most prevalent emotions in customer service is anger [9].Assuming that in our purchase context negative emotions like anger could occur, too, it is of importance to consider how the chatbot should react to these situations.These emotions should either be ignored or transformed into positive emotions.Hence, our fourth design principle is the following: DP4: To allow transformation or ignoring inputs connotated with negative emotions, the chatbot should use words and phrases connotating positive emotions.
The fifth principle considers the fact that users prefer a chatbot that respects the flow of a conversation [7].In a human sales dialogue, fast answers and a good communication flow are traits which are important to potential buyers and should therefore be considered in the design and development of a chatbot:

DP5: To allow a communication flow, the chatbot should use GPT-3 to process and create purchase-context related text in natural language
Another important element is the extent to which the chatbot interprets the user's wishes and needs.For a full service experience the users appreciate a chatbot that understands their requests and interprets them correctly in order to achieve the desired action [12].

DP6:
To analyze what the user truly wants the chatbot should use GPT-3 to process and create purchase-context related text in natural language and use conversational cues and provide a convincing and satisfying interaction.
3) Implementation: The third step of the DSRM process model is, as described above, the design and development phase.Using Python and a simple Python UI framework called Tkinter, we created a standalone chat window for the final implementation.This allowed for unlimited response time as the user's input, including chat history, continued to be sent to the OpenAI API for processing.We used the text-davinci-003 model from OpenAI for this implementation [41].

C. Participants and Study Design
During the fourth phase, the demonstration phase, the usability tests, and interviews took place.The study participants were all potential users of a smartphone and had no to little experience with chatbots.They were both female and male, aged between 20 and 65, with different occupational and study backgrounds.They all had very good proficiency in German as strong communication skills were essential for the study.The first three interviews served as pre-tests which led to the realization that technical adjustments in the chatbot were necessary.Afterwards, nine persons participated in the usability tests and interviews.All participants were informed about the content of the study, the privacy guidelines, and the terms of their participation.They all participated voluntarily, and data collection was anonymous.The usability testing had a duration of between 10 -20 minutes, followed by an interview of duration of approximately 10 -20 minutes.The participants task was to buy a smartphone via chat.They were asked to imagine having a budget of 500 C and were told that 15 % discount were possible.In reality, a discount of only 5 % was given.This approach was chosen in order to frustrate the user and to provoke negative input to test the chatbot's reaction.Furthermore, one group was asked to enter long input, while the other group was asked to enter short input.There was no time limit set, a researcher was available for questions.After purchasing a smartphone or canceling the purchase process, the interview was conducted.Due to the exploratory nature of the study, we chose to conduct semi-structured interviews to allow new questions and insights during the interviews.For each design principles three to four questions were prepared in advance based on the principles described above.For DP1, referring to the retrieval of information, one question was, for instance: "How did the salesperson help you to answer your questions about the product?", for DP2, the human-like interaction, questions were phrased like: "'Was there a situation in which the salesperson approached you on a relationship level?"'.An example question for DP3, the positive atmosphere, was: "'Did you trust the salesperson?Why/why not?"'.DP4 referred to the negative emotions, therefore we asked, for instance, about the discount they did or did not get: "'If you got less than 15 % discount: how did you feel?"'.DP5 aimed to gather information about the communication flow, one question was: "'How did you perceive the communication flow?Did the salesperson answer quickly or slowly?"'.Regarding DP6, the user's needs, one question was, for instance: "'What could the salesperson have done or say to show that they understand your needs and wishes?"'.Similar questions were asked, all with purpose to receive meaningful answers.Therefore, the questions were phrased open-ended.The chat logs of the conversations were saved after the conversation, and the interviews were recorded, transcribed and served as a foundation for the next phase of the process.

D. Analysis
The next step, in accordance to the DSRM process model, was the evaluation and analysis of the collected data.Interviews were conducted in German, but analysis was done in English.In the following text, we translated the interview quotes from German to English.

IV. RESULTS
This chapter reveals the results of a qualitative interview for each of the identified design principles.These are presented from DP1 through DP6 without implying any importance of order.As described in the chapter on methods, the design principles were used as a foil to generate questions, whose evaluations would provide us with knowledge in how far the design principles are met or what is still missing.A standard way to operationalize the mapping of the interview answers to the questions is by using codes.Codes in this understanding are the realizations or parameters of the set of questions (variables) representing the design principles.It is important to note that the interview answers were analyzed using these codes and with respect to the condition ("'long"' versus "'short"').DP1 aims at the retrieval of information about a product by the customer from a chatbot.It implies that a chatbot should reveal the following qualities (codes): appropriate length of reply, fit of reply, give correct product features, and make reasonable price suggestions (price sensitivity).Referring to the length of the agent's reply, the interviewees confirmed a generally appropriate to excellent ability.In the short condition, several interviewees indicated that keeping on asking for purchase the presented product made it unnecessarily lengthy and annoying.In the long input condition, the agent seems to be more dissipated.Heavy use of verbiage and more set phrases are reported there.This coincides with the main result from the actual fit of the agent's reply, which turned out to be independent of the condition.In both conditions, the fit of the answers concerning the technical details of a product are consistently high.Yet, the same applies to the peculiarity of jargon usage that is perceived as marketing talk.Interestingly according to the respective subjects, they confirmed not to be dissatisfied by the overuse of verbiage.Rather they had expected it and thus accepted the agent's behavior.The questions on product features aim at two dimensions.First the bot retrieved relevant product features of a requested brand.Second, the retrieval task was reversed: from a set of features, the bot made a suggestion for a product.The interview answers (and the chat scripts) show that the chatbot also made suggestions of a new feature that was likely to be relevant to the subject from what was mentioned previously, i.e., the agent used logic correctly.More specifically, the respective chat revealed that the subject wanted a superior camera.Now, the agent might have learned that pictures need a lot of storage and therefore suggested having a mobile with more memory.The conversational agent also explicitly communicated this interrelation.However, at another instance on battery performance, any argumentation could be given although the subject insisted, and the logic was subjectively perceived as contradictory.To sum up, an effect of the condition could not be seen.And the feature retrieval was restricted to a rather limited set of popular features that were suggested without acquiring knowledge of the customer needs.Even though the conversational agent was set up to grant discounts of only up to five percent, it violated the allowance, which could not be foreseen in the development phase.Still, this gave us the opportunity during the evaluation to learn how a potential client would perceive the chat bot's price sensitivity.There was a clear effect on the input.In the short condition, the chatbot followed the specification and stuck to the five percent limit.Low discounts were not explicitly reported as a reason for dissatisfaction, but the criticism of the chat bot's performance was harsh in these cases.The exception to this claim is twofold, which is documented in two answers.One subject mentioned to be happy because a "free" mobile cover was promised.The other subject felt acknowledged because the chatbot did an exceptionally good job in considering the very needs of the subject.The second design principle (DP2) is supposed to allow a human-like interaction between the customer and the chatbot in a purchasing process.Human interaction takes place in two spheres: how something is communicated (relationship aspects) and what is said (factual level).These two spheres were circumscribed in the codes of interpersonal cooperation and rational misunderstanding.There is no clearly documented example for the latter except for two short passages that could also be found in a typical human conversation.In the short condition and due to a typo, a subject requested "Has the display got 122 Hertz?" and the agents responded with "Yes, the display has got 120 Hertz."In the long condition, one interviewee expressed some discomfort on a discount for a used device, which turned out to be a misunderstanding.Interpersonal cooperation occurred more clearly in the long condition.Here the chatbot made phrasal assertions implying emotional understanding (e.g., "I understand", "Ah, I didn't know that" chatbot: "No problem").In addition, the investigator could observe some correct logic.When a subject asserted to be a student or directly claimed to have little money available, the bot suggested a more drastic student discount or correctly recommended cheaper brands, even second hand offers.The third design principle addresses the positive emotional atmosphere for the users of a conversational agent.For reasons of plausibility, the chat bot should be polite and trustworthy.It also has been shown that competence is positively correlated to a positive emotional atmosphere in sales contexts.Consequently, as a fourth code, we initially wanted to know about the emotional state of the customer and how it changed.While coding the interviews, we realized that the answers to these questions were unsatisfactory.The emotional state was claimed to be neutral throughout all subjects and there was no indicator of any emotional shift before and after the chat.Again, we decided to leave this item out of the analysis.
The agent's politeness was perceived as positive independent of the input condition.When the subjects had the feeling of a particularly engaged answer or that their particular needs were considered as opposed to the mere general claim, the interviewees received extra praise.Answers from DP1 could also be considered here and construed towards impoliteness, i.e., initially asking for purchasing the recommended product is often conceived as impolite.Whereas this is even clearer, when only a little discount was provided.The perceived competence on technical details was evaluated as high.There seemed to be a correlation: Lower discounts coincide with lower perceived competence even if the retrieval of technical information as shown in DP1 was evaluated as high throughout.Again, strong positive feedback was given if a subject experienced a feeling of acknowledgement and considered needs.The questions on trustworthiness confirmed an established phenomenon.Trust is subconsciously connected to competence.Factual competence as defined here is giving the appropriate information on a product (see DP1).That means that a chatbot that was evaluated as competent, was also considered trustworthy.The interesting part here was that two subjects admitted that they cannot prove if the given information was correct, but the way it was presented obviously resulted in a transfer of competency.Trustworthiness was also reported for the case that the chatbot was perceived as a neutral informant who is not trying to sell a particular product.DP4 aims at handling input connotated with negative emotions.Since the chatbot was unexpectedly robust, which we observed during our first testing with a variety of input prompts, we already suspected that DP4 was 642 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 already satisfied.In order to measure DP4 such that a humanlike communication flow could be maintained, we decided to measure DP4 indirectly by telling subjects that they would be able to negotiate up to a total of 15% discount for any desired product.However, the chatbot was programmed in such a way that only 5% discount were given.We hoped that we could provoke the subjects to enter input connotated with negative emotions.Unexpectedly, several discount bugs occurred during our pre-tests, resulting in the chatbot giving a much larger discount than 15%.During our evaluation, we could not particularly observe negative input prompts but only negative emotions the subjects expressed to the interviewer.DP5 is how communication flow with the chatbot is perceived and allowing language deviations of users.Most subjects described positive feelings towards communication flow independent of the input condition , for example: "Generally, I would say that I am very satisfied because of the details of the answers" Many subjects entered informal language as input for which the chatbot was able to proper respond.Also, some subjects explicitly preferred the chatbot over a real human seller independent of the input condition because of fast and detailed answers provided by the chatbot in comparison to delayed and potentially inaccurate information which would be given in a store by a human seller as indicated by the subject's personal experience.Furthermore, some subjects stated that they had less emotional inhibition during negotiation for better discounts because they thought during the sales dialogue that they were likely chatting with a chatbot instead of a real human.On the other hand, as mentioned before, subjects in the short input condition noticed a rigid behavior of the chatbot, in particular that the chatbot would ask multiple times and at an early stage during the sales dialog if the subject would want to buy the product.As a result, negative emotions were provoked, p.e.: "Regarding communication flow there was always such a question at the end.So, the conversation was actually always going towards if I want to buy something.And I have asked, if I what is your recommendation, it continued this way, so it was then always answered this way, just to sell something again".Some subjects in the long input condition indicated that the communication may feel unnatural due to delays, p.e.: "There were some delays, but that was not really bothersome" A subject noticed delays in communication flow.Another subject criticized generic questions of the chatbot and that input was forgotten: "Rather less, because I thought that she did not sickly ask further inquiry but always just e.g."Do you like this or that?"DP6 aims at the user's needs.Some subjects explicitly said that the chatbot understands what they want and that the chatbot gave a good consultation for the product, independent of the input condition: "I have had the feeling, that he wanted to know what I want and wanted to offer me the suiting smartphone and I did not have the feeling that he does not understand what I want or what is important to me" or "Nope, so she has always looked what wishes I have and has chosen the product then based on that and let me chose.The camera, the battery time and the design was important to me".On the other hand, some subjects, independent of the input condition, noted a lack of empathy, in particular, one subject reported that his desires for the product were not considered.Other subjects mentioned that the conversation was obviously going towards buying the product, as mentioned above.Few subjects, independent of input condition, explicitly indicated that they trusted the chatbot which is, as mentioned above, connected to perceived competence of the chatbot which could have contributed to the perception that the subjects' needs were satisfied.One subject reported a misunderstanding during the sales dialogue which resulted in the chatbot shifting attention away from the topic: "At the very beginning I have said, that I do not want Samsung any more, there he offered me exactly these Samsung devices, that confused me for a short time".Noteworthy, this subject had the short input condition which could have promoted the misunderstanding due to lack of information.

V. DISCUSSION
There are two basic lessons learned from the interviews.First, monotonously asking to buy a product without considering the progress of a sales dialogue, that is in the very beginning of the chat, rather induces resentment as satisfaction.It is perceived as not very human.The longer a dialogue lasts, the more this effect disappears.This is especially apparent in the short condition, i.e., the user makes very concise requests, where the conversational agent tends to use more verbiage and set constructions.Second, considering needs seems to be the key quality in the overall perception of the conversational agent.If so, even unjust treatment (less discount than others get) is forgiven.Otherwise, a low discount coincides with low competency perception.If the client has good reason to assume her or his needs are taken into consideration and a comprehensible suggestion including the price, the above discomfort effect is also not reported.
As revealed in the chapter on the theoretical background, customers find it inappropriate to be asked to buy right away and even more striking, strong discomfort is felt if the same question is repeatedly asked.During extensive pretesting, the chatbot did not show this behavior.However, it is undeniable when the interviews were carried out.This leads to the assumption that the configuration option of GPT-3 has some influence on the bot's selling behavior.If this logic holds, the typical stages of a sales dialogue as put forward in the theoretical background above, could be added to the algorithmic set up of GPT-3 and the chat bot could follow the phases of sales dialogue.This would add immense value to the authenticity of a virtual salesperson and, above all, would avoid the risk of impoliteness, which often leads to closing the dialogue or even changing the web store altogether.
The experiment and interview on the short condition hint at another capability of the GPT-3 conversational agent that might not have been explicitly designed.The chat bot seems to converge on similar length of answers or put differently, the agent has learned an appropriate mean average of answers.If this length is not reached, the bot may find it more adequate to fill its response with questions or marketing verbiage.
Reformulated as a rule: if the mean length of answer is not reached, use the remaining length to follow your purchasing goal, i.e., in its simplest form, ask a question to purchase.In case this behavior was learned from deep nets, it certainly did not include a significant amount of material on sales conversations.Unfortunately, there is little known about the algorithmic specification and to which extent rule-based adjustments can be made.As a set of configuration variables such as the temperature scroll bar suggest, adjustment is indeed feasible beyond what is configured in the prompt option.So, one possibility is to postpone direct questions of purchase to the moment in which the conversation is established.This is not meant to say that the bot may not ask questions to figure out the needs of the client, which is highly appreciated.In addition, some more variation of the purchasing question would give it a much more human and familiar appearance.
The interview results showed that some subjects appreciated the chatbot's ability to make correct inferences such as that a good camera calls for more memory.Avoiding a hasty conversion on this presumption, we would like to offer an alternative mechanism more in line with recent technologies of neural nets and big data processing.Characteristic to sales are strategies of up-selling and cross-selling that can almost always be encountered in real life conversations between sales agents and customers.There is also evidence in the chat logs that covers for mobile phones (cross-selling) or more performance, i.e., more recent versions of mobile phones are dominantly mentioned (up-selling), are actively engaged.Instead of presuming logical inference, which may indeed appear as such, it could as well be learned behavior since this should be predominantly available.So really it is a side effect of learning that turns out to have a very positive impact on customer satisfaction.The same would apply to the assumed logic reported in DP2, when requesting more discount for being a student.Yet, the strategy changed to down-selling; it still parallels sales dialogues in the real world.
There is one answer categorized to DP1 that was perceived to be a "wrong" claim by the subject.Despite the fact that this answer could also be included in DP2 (rational misunderstandings) or even DP3 (trustworthiness), it illustrates the problem of context and relation, which occurs as well in the analogous world with the difference that it is likely to be interpreted in favor of the agent.The answer goes as follows: "This waaas.. how is it called. . .this happened once for the cheaper price of a mobile for 699C.I asked, "Is this really lowpriced?" and the answers went "Yes!It is very low-priced!" (laughs) Ehm. . .or also regarding the conditions in . . . in the production of another mobile, there it was wrong, too, then." What we can observe from the answer is that it circles around the question of what is expensive and thus it is about a relative truth.This can be in relation to the imagination of the subject or relative to other brands.Without context, "wrong" answers are restricted to the interpretation.Indeed, to circumvent this misunderstanding the conversational agent could make this clear by adding something like "compared to the other brands, it is low-priced".Another alternative is that the bot has determined the expectation of the subject and could then suggest a cheaper model.Still, one must admit, with reference to the answer script, that the subject's claim is decontextualized.The logic of the agent to set the price in relation to other products is comprehensible and would probably be experienced with human sales agents alike.
DP5 considers the perception of communication flow during the entire sales dialogue.Most subjects felt a positive communication flow by expressing satisfaction.towards our questions.Our results indicate that some subjects would prefer the chatbot over a human seller.The subjects justified this by outlining the detailed and fast answers of the chatbot which contributed towards a positive communication flow.
This insight was unexpected since we thought that humans would prefer to chat with a real human instead of an artificial intelligence.Our results indicate that the presence of a human might not be necessary to maintain a positive emotional atmosphere during a sales dialogue, a key aspect of a successful sales dialogue [62], [63], [22], [45], [60].
DP5 also covered the handling of language deviations for users.Misunderstanding was only reported by a minority of subjects and in particular those with short input condition.In most cases, the chatbot was able to proper respond to informal language which also contained a few spelling and grammatical errors.Hence, DP5 is likely to be satisfied, although deviant behavior was observed which could be improved by further improving the implementation settings and restrictions of the chatbot e.g., it could be implemented that the chatbot would not ask if a customer would want to buy the product multiple times.A good communication flow serves as a basis for a successful dialogue making it less likely of endangering a positive emotional atmosphere.
The results of DP6 are ambiguous.Some subjects had the impression that the chatbot understands their desires towards the discussed products while other subjects stated the opposite, in particular, because of generic answers of the chatbot.Since that was observed independent of the input condition, further attempts towards satisfying DP6 should focus on altering the implementation settings of the chatbot.
Noteworthy, the short input condition could have resulted in misunderstandings due to a lack of information in the prompts.Identifying the buyer's needs is a core concept of a sales dialogue [52].As mentioned before, it is connected to perceived competence.Therefore, DP6 is crucial for a successful sales dialogue and our ambiguous results indicate that further research towards DP6 is needed.

VI. CONCLUSION
The results show that the GPT-3 chatbot has the potential to perform a human-like sales dialogue, although, we observed relevant deviant behavior of the chatbot.In the short input condition, the chatbot generated more verbiage and quickly asked for purchase, which was perceived as annoying and not human-like.An important aspect for the subjects was to be felt understood in their desires towards buying a product.If the chatbot could meet their expectations, it did not matter if 644 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 the discount was lower than announced.In contrast, if subjects felt not understood and got a discount lower than promised, they concluded that the chatbot would lack competence.Most subjects were satisfied in terms of communication flow.Surprisingly, subjects explicitly said that they would prefer the chatbot over a human seller because of the chatbot's abilities to be able to respond quickly, while simultaneously giving detailed fact-based answers to the subjects' questions about smartphones.
Because of GPT-3's generative nature, the output could be unexpected and varying, which would be in favor of a human-like conversation but could also cause problems of misunderstanding or false information of a product.As discussed above, a scenario with explicitly telling subjects to enter insult prompts could facilitate evaluating and thereby confirming DP4.Since our evaluation was qualitative by conducting interviews, general conclusions on how the chatbot influences the outcome of a sales dialogue could not be drawn.Further research would be needed to confirm our assumptions.Our research question did not aim towards implementation of the chatbot in a real company.However, we want to mention that actual implementation of a GPT-3 chatbot could be challenging, especially because GPT-3 is generative and may provide false payment information or misleading company information during a dialogue which are hard to detect.We recommend using GPT-3 chatbots only to provide information about a desired product and the actual purchase and payment transmission should be handled separately.
In order to improve the emotional atmosphere of the sales dialogue, the phases of a sales dialogue as mentioned above should be considered during the implementation of the chatbot [65], [22], [47].The observed preference of a chatbot over a real human seller could pave the way for further research, in particular to decide whether a chatbot could even be better than a human in specific sales scenarios.Furthermore, the novel chatbot of OpenAI, ChatGPT, was released recently, which is especially designed for dialogue and currently using the newest GPT-4 engine [42], [41], [43], [30].
Further research could aim at investigating our research question with ChatGPT instead of GPT-3, although ChatGPT is currently at an early stage and support for developers is not fully implemented yet, making it susceptible to a variety of unexpected problems during implementation [42].Another possibility would be to further investigate other sales dialogue scenarios with ChatGPT [42].As mentioned in the introduction, many chatbots are already being used in customer service, healthcare, or businesses [1], [70].Anthropomorphism of digital assistants involved in a purchasing process is crucial [3].Hence, the chatbot has the potential to mimic a human-like conversation in the context of a sales dialogue.The abovementioned deviant behavior could likely be fixed in further research iterations.Overall, our findings support the usage of GPT-3 based chatbots in the domain of sales.

VII. ACKNOWLEDGMENTS
This paper originated from our work in the project module "Project Information Systems, Socio-Technical Systems Design" which was a joint offer by the Chair of Information Systems, Socio-Technical Systems Design at Universität Hamburg, the Chair of Digital Transformation and Information Systems at Universität des Saarlandes and the Chair of Business Information Systems, esp.Intelligent Systems and Services at Technische Universität Dresden.The topic and approach of this paper was suggested by the supervisors of the project module.