Classifying Speech Acts in Political Communication: A Transformer-based Approach with Weak Supervision and Active Learning

We present a study on the automatic classification of speech acts in the domain of political communication, based on J. R. Searle's classification of illocutionary acts. Our research involves creating a dataset using the US State of the Union corpus and the UN General Debate corpus (UNGD) as data sources. To overcome limited labelled data, we employ a combination of weak supervision and active learning techniques for dataset creation and model training. Through various experiments, we investigate the influence of external and internal factors on speech act classification. In addition, we discuss the potential for further analysis of speech act usage, using the trained model on the UNGD corpus. The findings demonstrate the effectiveness of Transformer-based models for automatic speech act classification, highlight the benefits of weak supervision and active learning for dataset creation and model training, and underscore the potential for large-scale statistical analysis of speech act usage in the domain of political communication.


I. INTRODUCTION
W HEN we use language, we often want to go beyond the mere conveying of information, but rather want to accomplish a communicative goal and perform a social function, such as making a promise or issuing a threat.In his seminal work "How to do things with words", J. L. Austin has called these kinds of utterances illocutionary acts, which is not an act of just saying something, but an act of doing something by saying something [1].According to Austin, there are two types of utterances: constatives, which make statements about the world, and performatives, which are acts performed by the speaker with the intention of fulfilling a social function.This concept was later developed further by Searle and the general term speech act has been developed since for such performative utterances [2].Unlike interrogative and imperative sentences that are marked with either a "?" or "!" at the end of a sentence, performatives are not orthographically highlighted as such.Indeed, a central insight of Austin is the fact that there is apparently no trivial indicator differentiating constatives from performatives.As humans, we thus have to rely on linguistic and contextual information to properly recognize performatives and to act upon them accordingly.
While speech acts have an important socio-linguistic function in everyday communication, they are also playing a crucial role in political communication, as they are used by politicians to perform specific actions and influence their audience [3,4].
Examples include promises for the purpose of getting elected, or requesting legislative action.By analysing the use of speech acts in political discourse, researchers can gain a better understanding of the intentions and motivations of political actors and how they seek to influence public opinion.
In this study, we present a robust computational framework for detecting and correctly identifying speech acts in political communication based on large transformer-based language models for automated and scalable analysis of speech acts in large datasets.The contributions of this paper are as follows: • Our approach is based on Transformer-based language model classifiers [5], and we employ a combination of weak supervision and active learning techniques for dataset creation and model training.
• Through various experiments, we investigate the influence of external and internal factors on speech act classification.
• We show a concrete use case that demonstrates the use and potential of automatic speech act annotation in large text corpora containing political communication.
The use of speech acts in political communication has up to now received only little attention despite their pervasiveness and utility in this mode of communication (see [6,7]).
Underlying this concept is the idea that the interconnected network of deontic moral forces within society are established and sustained through specific types of speech acts.J.R. Searle postulates that declarative speech acts, in particular, play a pivotal role in creating institutions and institutional facts.Nevertheless, it has not yet resulted in significant changes, possibly due to the substantial challenges involved in operationalizing his ideas for empirical research." The current literature on the use of speech acts in political communication has mainly focused on individual sentences or small corpora, lacking a larger, macroscopic perspective.However, with the advent of digital resources that are freely accessible and machine-readable, there is an opportunity to analyse larger datasets of political speeches, for instance parliamentary debates [8] or speeches of singular politicians [9].This availability of material opens up new perspectives for rather empirical research questions along the lines of political sciences, international relations, computational social science and corpus linguistics.
An issue surrounding this topic is the vast amounts of data that need to be processed in order to gain valuable insights into macroscopic patterns of their uses among political actors.Especially since certain types of utterances are quite infrequent and sporadic, the criterion applies that manual sampling would certainly not be able to produce representative distributions for quantitative evaluations in any reasonable amount of time.With the advances of computational approaches for accessing different language phenomena from Natural language Processing (NLP) and Artificial Intelligence (AI) made in recent years, it is now possible to find this language phenomenon in an automated way and with good quality in large amounts of data.
Future applications will be the quantitative analysis of political communication in international institutions [10].This will provide a computational method for studying how to do international politics with words [11].With the methodology shown here, the following research questions, for example, become conceivable: How can speech act theory be applied to understand declarations of war, inaugurations, pardons, government statements, etc.? How do international obligations and the self-binding of sovereign states to international norms and rules emerge under conditions of anarchy?II.RELATED WORK Automatic speech act classification has been a subject of research for some time, focusing on dialogue acts [12,13,14].Earlier studies on Korean employed various methods, including Hidden Markov Models [15], maximum entropy models [16], and supervised machine learning algorithms [17].Unlike Austin and Searle's speech acts, dialogue acts specifically target synchronous language used in direct communication and many of the classification schemes used in research do not align with Austin's or Searle's speech act classification.
In recent years, researchers have increasingly utilized Deep Neural Networks (DNNs) in Natural Language Processing (NLP) due to advancements in computing capabilities.This has led to the adoption of more sophisticated approaches in addressing the issue at hand.Notably, recurrent neural networks (RNNs) [18] and convolutional neural networks (CNNs) [19] have been explored as viable options for tackling the challenge of speech act detection.Existing research on speech and dialogue act detection primarily focuses on synchronous language in online communication.However, little attention has been given to speech act detection in political communication, except for notable exceptions such as [20].The emergence of the Transformer architecture [5] and pretrained Transformer-based language models like BERT [21] has significantly changed the way NLP is practised, and automated processing of even difficult language problems is becoming increasingly possible.To the best of our knowledge, no previous work exists on classifying speech acts according to Searle's classification of illocutionary acts.

III. EXPERIMENT DESIGN
In our experiments, we developed a Transformer-based model specifically designed to classify sentences according to Searle's classification of illocutionary acts, referred to as "speech acts" in this study.We constructed the dataset using weak supervision and further refined it through active learning techniques.Additionally, we explored possible avenues for future research, which encompass improving the model's performance and assessing its applicability and relevance in computational political science studies.

A. Corpora
To construct a suitable dataset for model training and initial analysis, we have selected two corpora of similar yet distinct sub-domains within political communication: the US State of the Union (SOTU) addresses and the speeches delivered at the UN General Debate.The SOTU corpus comprises the State of the Union presidential addresses given annually from 1790 to 2017. 1 In our experiments, the SOTU corpus' temporal range was limited to between 1990 and 2017.The UN General Debate corpus (UNGD) comprises speeches delivered by national representatives, including presidents and foreign ministers, from each UN member country during the annual UN General Debate [8].Spanning the years 1970 to 2020, this corpus provides a rich collection of political speeches from a diverse range of international stakeholders.Aside from the transcriptions of speeches in English, the UNGD corpus also includes a variety of metadata for each speech such as the year and session of the UN General Debate that the speech was held in, the name of the speaker, their position, the original language of delivery as well as the represented country.We have adopted a standardized format for referencing individual speeches in the UNGD corpus, using the year of the speech and a three-letter country code based on ISO 3166-1 alpha-3 to denote the speaker's country he or she represents, such as "1979 IRQ" for the 1979 speech of the representative of Iraq.
We formulated the task of identifying speech acts as a multi-class classification problem.To achieve this, we have employed the theoretical framework of illocutionary acts introduced by Searle [2] and identified five classes of speech acts and one open class for sentences where none of the labels can be assigned: • ASSERTIVE, the speaker commits to the truth of a stated expression.
• EXPRESSIVE, the speaker expresses their personal thoughts and feelings • COMMISSIVE, the speaker commits themselves to a future action.
• DIRECTIVE, the speaker issues orders or instructions to the recipients.
• DECLARATIVE, the speaker, using granted institutional powers, alters or defines (social) realities.
• NONES, an open class for sentences that contain none of the above speech act types.

B. Heuristic Labelling using Weak Supervision
When creating a dataset to train our model, we need to define the assignment of labels by existing linguistic properties of each class.To assign speech act annotations to linguistic expressions containing the linguistic properties just mentioned, we use the weak supervision library skweak [22].This library enables the generation of so-called weak labels through annotator functions.Although these labelling functions rely on simple heuristics and may not achieve particularly high precision, they effectively enable us to efficiently search for speech acts employed in political communication during the dataset creation process.
A crucial aspect of Austin's seminal work on speech acts pertains to the notion that certain verbs are closely linked to a specific class of speech act, which he coined performative verbs.We relate closely to this basic concept in our definition of the relevant linguistic features for speech acts, in order to define heuristics for the labelling functions.We chose the following verb relationships to supervise the labelling process: • Assertive: think, know, believe, convince, presume, assume, admit • Expressives: apologize, condole, lament, deplore, forgive, welcome, thank, forgive, boast • Commissives: promise, vow, guarantee, offer, will, refuse, volunteer • Directives:: ask, order, command, request, beg, plead, pray, invite, permit, advise, must, should • Declaratives:: announce, declare, dismiss, nominate, pronounce, pass, adopt, support, oppose, advocate, condemn To capture general speech act features, we developed several labelling functions that assign a general "speech act" label [SA] which is added to the matching sentences.This approach allows us to assign weak labels to encompass general features that are not confined to a specific speech act.Sentences containing performative verbs are assigned the speech act label considered characteristic for them, as shown in the list above.We derived the labelling rules from sample sentences in Austin's work "How to do things with words" [1].Note, that multiple labelling functions can be applied to a single sentence if the criteria are met.In a final step, these weak labels, which represent a supervision signal, are subsequently aggregated to form final labels.This aggregation model takes into account the varying degrees of confidence associated with each weak label.By leveraging the sequential dependencies of the weak labels between sentences, the aggregation ensures a more accurate labelling outcome.The labelling functions can be briefly described as follows: • Subject is 1st person: Assigns the label [SA] if the subject of a sentence is either "I" or "we".
• Main verb is present tense: Assigns the label [SA] if the main verb of a sentence is in present tense.
• Object is 2nd person: Assigns the label [SA] if the object of the sentence is the pronoun "you".
• Sentence contains imperative: Assigns the label [DI-RECTIVE] if the sentence is an imperative without an overt subject.
• Sentence contains interrogative: Assigns the label [DI-RECTIVE] if the main verb precedes the subject and the sentence ends with a question mark "?".
• Sentence contains performative verb: Assigns the label associated with the performative verb [ASSERTIVE, EXPRESSIVE, COMMISSIVE, DIRECTIVE, DECLARA-TIVE] if the lemma of a performative verb is present in the sentence.Processing the SOTU as well as the UNGD corpus using this weak supervision approach yields a list of sentences that likely contain speech acts.

C. Active Learning
Active learning is an approach that utilizes query strategies to select the most informative samples from an unlabelled pool of data, guided by a classifier trained on a set of existing labelled data.The goal is to intelligently choose data points for labelling, in order to improve the performance and efficiency of the learning process.Over the years, many query strategies have been proposed by various researchers, most of them using prediction based query strategies [23].Active learning follows a process that involves two pools: a labelled pool and an unlabelled pool.The goal is to transfer data from the unlabelled pool to the labelled pool.In prediction-based query strategies, a primary model is trained on the labelled pool, which is then used to identify the most informative samples to query from the unlabelled pool.These queried samples are then presented to a domain expert (also known as oracle in active learning jargon), who manually confirms or rejects the labels of the primary model.The labelled samples are subsequently added to the labelled pool.This process continues iteratively until a predefined stopping criterion is met.With this approach, active learning allows the progressive enhancement of the model's performance by actively selecting the most informative samples for annotation.
We used the active learning library Small-Text [24] to facilitate experiments with active learning.This library provides a user-friendly and consistent interface which makes it very simple to set up the experiments.After some initial testing, we selected the prediction-based query strategy Prediction Entropy as our choice and set a fixed iteration size of 10 iterations as the stopping criterion.In each iteration, we queried 20 samples.
To effectively utilize the library, it is essential to choose a transformer-based language model as the underlying framework for the classification task.Among the various models we tested, we ultimately opted for the light-weight DistilBERT [25] with the default configuration provided through Hugging-Face. 2This decision was primarily driven by DistilBERT's significantly reduced computational overhead, with minimal performance drawbacks compared to the resource-intensive BERT model.Since the iterative approach involves many training processes, the model accelerates them significantly, The results from the weak supervision process were subsequently filtered by a domain expert, resulting in a valid semi-manual training dataset for initializing the active learning process.Despite the linguistic features serving as a basic codebook for annotating queried samples, a considerable number of samples proved to be non-trivial.Given the fuzzy nature of speech act use and their interpretation, disagreements are unavoidable.For training, we stratified the dataset by class and performed a split, allocating 60% of the data for training and 40% for evaluation.This choice of a 60:40 split was made to ensure that both sets capture a representative sample of each class.By evaluating each iteration's performance using F1scores for individual classes as well as the macro average, we could effectively track the model's progress over iterations and visualize it through a learning curve.

A. Active Learning and Classifier Performance
After applying the weak supervision approach and correcting its output, our initial dataset consisted of 175 samples in the labelled pool and 118 samples in the evaluation set.Once the initial model was trained using the samples found in the labelled pool, we further refined it through the active learning process.In each iteration, we queried, annotated, and then added 20 newly labelled samples to the labelled pool.After each addition, the model was retrained using the augmented labelled pool.This iterative process continued until the stopping criterion of 10 iterations was met.
Looking at Fig. 1, after performing ten active learning iterations, we observed a rise in F1-macro performance from 0.69 of the initial model to 0.78 of the best model in the ninth iteration before the performance cratered somewhat again in the tenth iteration.In Table I

B. Adding context sentences in the training data
The current model employed in our research utilizes single sentences as the units of classification.While this approach may simplify the classification task, it is not necessarily evident that speech acts occur independently of each other.Therefore, it is important to acknowledge the potential limitations of this local single sentence approach.
To better understand the relationships between single sentences, we conducted an investigation by visualizing the occurrence of all speech acts from five randomly selected speeches.Figure 2 reveals a strong clustering behaviour in the use of speech acts, even from this small sample.
We conducted an additional analysis by calculating the pointwise mutual information (PMI) between each speech act and its neighbouring co-occurring speech acts in a sample of 100 randomly selected speeches.The PMI measures the statistical association between two events, in this case, speech acts and their neighbouring counterparts.Positive PMI scores indicate a higher than random likelihood of co-occurrence.On the other hand, negative scores indicate a lower than expected likelihood of co-occurrence.Our analysis revealed a significantly higher co-occurrence between speech acts of the same class, as would be expected by random chance.Furthermore, we observed moderately positive associations between directives and commissives, as well as between expressives and commissives.These findings suggest that considering the context of neighbouring sentences might be crucial for accurate classification of speech acts.
To test this hypothesis, we investigated the effectiveness of extending the classification unit by incorporating a window of neighbouring context sentences around a sentence from the trainings set.This approach enables the classifier to capture the contextual information surrounding a sentence.However, as depicted in Figure 3, our findings indicate that this approach 742 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.did not yield a substantial improvement over using a single sentence window.

C. Feature Importance of Metadata
In the preceding section, we demonstrated the significance of linguistic features in the classification of speech acts, as evidenced by the rather promising results obtained from the transformer model.In addition to the raw speeches, the UNGD Corpus includes various metadata, such as the name and position of the speaker, the country represented, and the original language of the speech.To explore the potential significance of metadata in relation to the occurrence of speech acts within the speeches, we trained a separate model in the form of a Random Forest (RF) regressor.This model is aimed to predict the distribution of each speech act class based on the available metadata alone.The objective of this approach is to examine the potential influence of extra-linguistic features, such as metadata, on the distribution of speech acts by examining the importance of each feature.The hyper-parameters for the RF model were left at their default values provided by scitkitlearn, which consisted of using 100 trees and the Gini impurity criterion.
We compared the performance of the trained model to a baseline dummy classifier and observed a substantial improvement in goodness of fit for commissives (reducing the mean squared error from 10.75 to 2.4) and directives (reducing the mean squared error from 42.8 to 9.1).These results indicate a significant correlation between the metadata and the distribution of commissives and directives.
To identify the most influential features, we conducted an analysis of feature importances in the model.In RFs, feature importance quantifies the average decrease in impurity across all trees for each feature [26].Figure 4 showcases the results of this analysis for commissives and directives, providing interesting insights.The feature importance analysis indicates that certain variables have a notable impact on the distribution of speech acts.For commissives, the role of the speaker is a significant variable, with the category of president standing out prominently.Additionally, the country represented by the speaker also plays a crucial role, with USA and Japan being particularly influential factors.
Similarly, for both commissives and directives, the original language of the speech exhibits a strong effect on the distribution of speech acts.Notably, speeches delivered in English demonstrate a substantial influence on the observed distributions.

D. Linguistic Features for Qualitative Assessment
A long-lasting shortcoming of using deep neural networks architectures such as Transformers is the black box nature of these models.Different approaches under the banner of Explainable AI such as CAPTUM [27] and shapley values [28] have been investigated in order to elucidate which features are important in a model's prediction.In our investigations, we used CAPTUM which is implemented with the transformers interpret library. 3 This approach assigns attribution scores to each of the (sub-)token features which quantify the importance of each feature for the classification of a particular class.These scores are scaled between 1, indicating maximum attribution for classifying this as a particular class and - indicating that this feature is strongly associated with class other than the predicted one.
Firstly, we look at the examples with most confidence so that we can understand whether the words shown in Section III-B, each typical of the respective speech act types, are also important features for classification.Second, we investigated which kind of samples expose the lowest confidence in the model in terms of prediction probability in the hope of identifying features that cause uncertainty within the model's prediction confidence.We see a strong connection between Austins's performative verbs and the words that are identified as important features through the training process.As the examples shown in Table 5 demonstrate, the most important features are performative verbs that fit almost prototypically with the examples shown earlier in Section III-B.For example, the phrases we will for commissives, we support for declaratives or let us for directives contain typical performative verbs that are also found in our dataset and are also important features in the classification (see Figure 5).Of interest, however, are the words that do not belong to the group of performative verbs but are nevertheless learned as positive features in each speech act type, as it is shown in Figure 5. Here, assertiva have strong thematic connections to international relations (e.g.international trade).Other examples for assertives from the dataset also contain words like economy or cooperation.Commissiva, on the other hand, have a strong connection to a vocabulary combining promises regarding problems to be solved (e.g.corruption squad, road to peace).Declaratives often include country names, as some utterances emphasize support for different states (Horn of Africa, region).Directives and expressives function somewhat differently here.Used as rhetorical or linguistic means of communication, expressives typically include addressing specific persons or expressions of appreciation (ambassador, president, gratitude).In the case of directives, we often find statements to allow oneself a comment or a motivating request (reason before bloodshed).
For the example with a low confidence shown in Figure 5, it turns out that other rather unspecific words are also negatively evaluated and appear in greater accumulation.This greater accumulation of negative words is very characteristic for low confidence classifications in the dataset, although no specific thematic or communicative connections or patterns are recognizable.
This observation shows us perspectives for further use of speech acts.On the one hand, certain speech acts can be expected in certain concrete thematic or regional contexts and one can match this expectation with real political communication (assertive, declarative, commissive).Discrepancies or patterns can then be described for different discourses and thus allow for a qualitative assessment.Second, rhetorical devices can be derived as a component of strategic communication (expressives, directives).This is especially interesting, since analyses are conceivable here that refer to power relations between the involved actors.

E. Showcase: speech acts as time series data
This section will emphasize the practical application of the model through a demonstrative analysis in the UNGD corpus.We will use sentence-based annotations of speech acts to quantitatively measure the shift in communicative patterns within the UN General Debate.Specifically, we want to demonstrate a method for exploring potential influences of significant political events on speech act usage patterns.It is important to note that this demonstration serves as an illustrative example rather than a rigorous scientific investigation of speech act usage.By highlighting the viability of our approach, we aim to emphasize its potential as a valuable research tool in the fields of political and social sciences.For this purpose, we decided to focus on the political events that unfolded in Ukraine from 2013 to 2014.Later being known as the Maidan Uprising, a series of protests and civil unrest unravelled on November 21, 2013, primarily centred around Kyiv's Maidan Nezalezhnosti (Independence Square) [29].These protests ultimately led to the Revolution of Dignity in February 2014, albeit with a heavy toll of 108 protesters and 13 police officers losing their lives in the clashes [30].Consequently, the Ukrainian parliament, with a significant majority of 328 out of 450 votes (approximately 73% of the votes), voted to remove President Viktor Yanukovych from power.Yanukovych disputed the legality of the vote and sought assistance from Russia.Russia criticized the events, labelling them a "coup".Subsequently, pro-Russian protests erupted in southern and eastern Ukraine.Russia occupied and eventually gained control of Crimea, while armed pro-Russian groups seized government buildings and declared Donetsk and Luhansk as independent states.This resulted in the onset of the Donbas war, prompting international efforts to diplomatically address and find resolution to the unfolding events within the international community.
To investigate the change in the strategic political communication following those events, we analysed the speech act usage of directives, which primarily attempt to motivate the addressees to take an action, of Ukrainian representatives in the UN General Debate.By choosing this scenario, we aim to clearly demonstrate the ways in which events can affect the language used in international political discourse by identifying and quantifying speech acts as an indicator variable.
To identify a statistically significant change in usage pat-Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.terns of directives between the period starting in 2014 and the most recent available speech in 2019, we utilized the R library CausalImpact [31].Specifically, we compared the directive usage frequency of Ukrainian representatives after the Maidan events (2014-2019) with those during the period before the events, which spans from the dissolution of the Soviet Union in 1991 up until 2013.The software creates a Bayesian structural time-series model which is used to predict how the response would have evolved after an event, called intervention, if the event had never occurred.The result is the cumulative effect of the intervention on the time series.
In the model, we see a positive effect on the use of directive speech acts in the period between 2014-2019.The first panel in Figure 6 showcases the observed data and a calculated prediction derived from the data of the pre-intervention period.The pre-intervention period is enclosed by two dotted vertical lines and serves as the baseline for comparison with the postintervention period.The second panel shows the pointwise causal effect of the model, which is the calculated difference between the observed data and the prediction.The third panel presents the cumulative effect of the model, which is the summation of the pointwise causal effect over time.The positive effect on the use of directive speech acts observed during the period after the Maidan protests in Ukraine in 2014 is statistically significant and unlikely to be due to random fluctuations.It is important to note that during this period, not only the frequency of directive speech acts increased significantly, but the speeches also showed a general increase in length.This suggests that the involvement and activation of the international community by Ukrainian representatives has increased and become a larger part of Ukraine's political communication.

V. DISCUSSION
In this paper, we investigate for the first time how the linguistic phenomenon of speech acts described by Searle can be automatically identified in text corpora using a transformerbased approach and be made accessible for quantitative and qualitative analyses.In doing so, the study focuses on the identification of speech acts within very large datasets of political communication, which makes the results interesting for political science research.Furthermore, the study presents very transparently which linguistic properties are included for the definition of speech acts.Our experiments demonstrate the effectiveness of using a combination of weak supervision and active learning for dataset creation in the classification of speech acts.Weak supervision enables the identification of samples that satisfy defined linguistic features, while active learning can provide new examples from an unlabelled dataset.The results show a 0.09 increase in F1 score within 9 iterations of the model, indicating the usefulness of these techniques in expanding small datasets and improving model performance.The problems we encountered in identifying expressives and declaratives in terms of recall indicate that there may be variance between the examples in the training data and the test data, or the linguistic characteristics of these speech acts may be ambiguous, making it difficult for the classifier to properly delineate between these and other classes.However, evaluating the entire process still poses challenges, especially with the limitations of our evaluation set, which has a rather small size.The properties of this set might introduce a high degree of variance and possible biases in the evaluation.To address these challenges, future research will focus on developing a robust evaluation framework and larger evaluation sets for the methods employed.
Aside from linguistic features, our observations indicate that extra-linguistic factors play a role in influencing the distribution of speech acts.Specifically, variables such as the speaker's role, the country represented, and the original language of the speech have shown significant influence on the occurrence and distribution of speech acts.These findings emphasize the importance of considering contextual factors when studying speech acts in political discourse.Understanding the impact of these variables can contribute to the development of more accurate speech act classification frameworks and provide valuable insights into the factors shaping the use of speech acts.
Additionally, our findings suggest tentative evidence of relationships between speech acts, as utterances of the same class tend to occur in proximity to each other.This observation not only contributes to our understanding of how speech acts are used in actual communication, but also provides valuable insights for the classification task.While our initial attempts at utilizing neighbouring utterances did not yield satisfactory results, further exploration in this area holds promise for enhancing the accuracy of speech act classification.One Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
possible approach is to employ techniques such as linear chain Conditional Random Fields (CRFs) modelling [32] or similar sequential models.Such modelling can capture the sequential dependencies among speech acts and exploit the contextual information provided by neighbouring utterances.By incorporating such techniques, we can potentially improve the classification performance by considering the sequential nature of speech acts.
The application of our model concerning the use of directives by the representatives of Ukraine demonstrate the potential of utilizing the trained model for quantitative research.We presented a statistical analysis technique to explore the impact of political events on speech act usage patterns.By examining specific periods and comparing them to relevant baselines, we can gain insights into how political events shape language use in international discourse.The approach presented here can be extended to investigate other geopolitical events and their influence on speech act patterns, opening the potential for providing a deeper understanding of the dynamics of political communication, which is crucial for researchers and practitioners in fields such as political science, international relations, and diplomacy.In addition, our analysis of the relevant features with CAPTUM demonstrated that the qualitative evaluation of speech acts has the potential to enhance our understanding of the strategic use of speech acts.Automatic speech act detection can thus contribute to various fields by facilitating empirical studies on persuasive strategies employed by political actors.It not only enables the monitoring of shifts in rhetoric and discourse, but also provides insights into the motivations and intentions behind political speeches.By employing and further developing this technology, researchers can delve deeper into understanding the intricate dynamics of political communication and gain a comprehensive understanding of the strategies and objectives employed by politicians.

Figure 1 .
Figure 1.Learning curve of f1 macro performance on test set over all classes

Figure 2 .Figure 3 .
Figure 2. Speech act usage pattern over the course of a speech.The bars represent a speech in full length and the colours mark a respective speech act.It is recognizable that the same speech acts often occur consecutively in the sequence.

Figure 5 .
Figure 5. Example sentences visualized with the library CAPTUM [27] where the positive contributing features are shown in green and the negative contributing features are shown in red.The brightness indicates the importance of the features.The tokenization is taken from the transformer model.

Table I TABLE
we show detailed information about the classification performance of the model trained in the ninth iteration.Examining the individual F1-scores, we noticed large discrepancies between the highest score of the commissive class and lowest score of the expressive class.In terms of precision, PRESENTING THE PERFORMANCE OF THE CLASSIFIER FROM THE 9TH ACTIVE LEARNING ITERATION OVER ALL SPEECH ACT CLASSES assertives do have a medium performance, indicating that the training examples provide very ambiguous features to the classifier.In terms of recall, expressives and declaratives show comparatively medium performance.In the subsequent experiments presented in sections IV-B, IV-D as well as the showcase demonstrated in section IV-E, we utilized the model trained during the 9th iteration, which incorporated an additional 180 training samples obtained through the active learning process. 1, Figure 4. Importance of metadata in commissives and directives