Efficient Deep Learning Approach for Olive Disease Classification

From ancient times olive tree cultivation has been one of the most crucial agricultural activities for Mediterranean countries. In recent years, the role of Artificial Intelligence in agriculture is increasing: its use ranges from monitoring of cultivated soil, to irrigation management, to yield prediction, to autonomous agricultural robots, to weed and pest classification and management, for example, by taking pictures using a standard smartphone or an unmanned aerial vehicle, and all this eases human work and makes it even more accessible. In this work, a method is proposed for olive disease classification, based on an adaptive ensemble of two EfficientNet-b0 models, that improves the state-of-the-art accuracy on a publicly available dataset by 1.6-2.6%. Both in terms of the number of parameters and the number of operations, our method reduces complexity roughly by 50% and 80%, respectively, that is a level not seen in at least a decade. Due to its efficiency, this method is also embeddable into a smartphone application for real-time processing.


I. INTRODUCTION
O LIVE tree cultivation represents one of the most impor- tant activities of agriculture for the civilizations of the Mediterranean area.Indeed the countries of this area produced roughly 65% of the world's olive oils in the last years [1].Olive-derived products have shown health benefits due to their compounds [2].In addition, olive trees are known to adapt to environmental stresses such as salinity, drought, heat and high levels of ultraviolet B rays [3], [4], [5] generating, during the millennia, 600 species within 25 genera [6].However, even olive trees are affected by diseases: some of them are visible on their fruits and can happen only during specific periods of the year, while others have visible signs on the leaves [7].The signs of a disease can be different in different hosts and can evolve over time.
Although olive cultivation techniques have been perfected over the centuries, artificial intelligence has only recently entered the olive industry, bringing a series of significant innovations and improving the management of many issues, like as predicting crop yields, plant health monitoring, disease prevention, identification and classification, irrigation management, monitoring and management of agricultural activities [8], [9], [10] (e.g.sowing, harvesting, pruning,...), even for olive disease [11], [12].We propose here a highly efficient solution that allows to classify olive diseases affecting leaves directly from images taken by standard smartphone cameras.This paper is organized as follows: in Sec.II, the dataset used for experiments is described; in Sec.III, the solutions and the experimental setup are described, while results are shown in Sec.IV.The paper ends with a discussion and conclusion in Sec.VI.

II. DATASET DESCRIPTION
To test our solution, the largest publicly available dataset [13] has been used: it is composed of 3400 images representing olive leaves affected by Alucus olearius or Olive peacock spot or healthy.Tab.I shows the distribution of the classes, while a sample of images for each class is shown in Fig. 1.

A. EfficientNet
We selected EfficientNet-b0 [14] as the core model because, according to its structure and the obtained results, it has the best accuracy/complexity trade-off.Two main factors give the efficiency of this architecture: the first is the compound scaling (Fig. 2) by which input scaling (i.e.input size), width scaling (i.e.convolutional kernel size) and depth scaling (i.e. the number of layers) are performed in conjunction since, by observation, they are dependent; the second is the use of the inverted bottleneck MBConv (first introduced in MobileNetV2, an efficient model designed to run on smartphones) as a main module, reducing the complexity of convolution by expanding and compressing the channels.

B. Ensembling
The most significant contribution to this work is given by ensembling: it is a technique of combining several models, called weak models, in order to provide produce a model having better results than a single one [15].Ensembling is also known to reduce errors and improve the model's generalization capabilities.Due to its resource-consuming nature and the exponential growth of model complexity, however, ensembling is scarcely used in computer vision.By contrast, our method allows performing ensembling in an adaptive and efficient way (Fig. 3): • we use only two weak models (achieving minimality and efficiency); • the ensemble is not a typical aggregation function, but it is performed using a linear combination layer, trainable by gradient descent (obtaining adaptivity); • the ensemble is performed using the deep features instead of the output, excluding redundant operations (for efficiency).

C. Validation pipeline
The validation pipeline can be split into two main phases: 1) 5-fold cross-validation with end-to-end EfficientNet-b0 training, using transfer learning [16] from ImageNet pretrained models [17], because transfer learning provides faster convergence; 2) 5-fold cross-validation with fine-tuning of the ensemble, using the two best models from the previous phase.The design choices used during the validation are: Input size: set to 512×512 because, after a preliminary investigation, it gives the best trade-off between image quality and computational costs.Batch size: set to the maximum available using our GPU (32GB RAM), which is 50 for the end-to-end and 200 for the fine-tuning.Regularization: early-stopping with patience of 10 epochs is used, helping to prevent overfitting.Optimizer: AdaBelief [18] with learning rate 5 • 10 −4 , betas (0.9, 0.999), eps 10 −16 , using weight decoupling without rectifying, in order to have both fast convergence and generalization.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 3: Graphical scheme of the models used in this work: on the left, an end-to-end trainable EfficientNet-b0; on the right, the fine-tunable adaptive ensemble.parameters.

IV. EXPERIMENTAL RESULTS
According to Tab.II, the EfficientNet-b0 with the selected design choices already provides a good starting point with an average F1-score of 0.969, 0.983 and 0.999 for test, valid and train set, respectively, with high robustness (i.e.low variance).The ensemble further reduces the variance and improves the generalization power (i.e.performance on valid and test) by an average of +1.5% and +1.4% on test and valid, respectively.The final errors are 12 (test: 9; valid: 2; train: 1) and in Fig. 4 the confusion matrix for the test set is shown.
The strength of the proposed solution is even more significant when compared with the State of the Art (SOTA) Tabs.III-IV, indeed the EfficientNet-b0 has the values of the same metric as the best performing SOTA model, and it uses only 52% of parameters and 21% of FLOPs, while considering the Ensemble the complexity (both parameters and FLOPs) is roughly doubled, but it is still lower than the SOTA, for a +1.6% on all the metrics.

V. DISCUSSION
In order to stress our method, we tested an ensemble of five weak models: while using other datasets, generally this improves the results a little at the expense of complexity, as Tab.V shows, in this case, the results don't improve, the errors remain exactly on the same 12 images even if distributed among the different splits.TABLE II: Metrics (F1-score) on the subset of 5-fold cross-validation runs of both end-to-end weak (left) and fine-tuning ensemble models (right).The ensemble has a twofold contribution: improving generalization performances (+1.5% on test, +1.4% on valid, on average) and robustness (halving the deviation).Data is organized best-to-worst fold (top-to-bottom), and then the models corresponding to the first two rows in the left table are used as weak models for the ensemble.This approach to ensembling has been recently introduced and discussed in [21], [22]; it has already proved excellent applicability to AI-based methods for agriculture [23].

Weak
Specifically in [21], we tested our method on seven benchmarking datasets, that are: CIFAR-10 [24], CIFAR-100 [24], Stanford Cars [25], Food-101 [26], Oxford 102 Flower [27], CINIC-10 [28] and Oxford-IIIT Pet [26].The results demonstrated that our novelties improve the SOTA for each dataset by an average of 0.5%, using different kinds of images, reducing complexity in terms of the number of parameters up to sixty times and of FLOPs up to one hundred times.This results in a considerable saving of time and costs compared to most recent models (i.e.Vision Transformers [29]).
In [30], our method was also tested on images of plants taken on the field, in different environments, backgrounds, light conditions and at different stages of growth of the weeds.This defined the baseline for an in-progress work, in which, with the help of farmers taking pictures directly on the field using a mobile app [31], a set of models trained and being continuously extended, are contributing to significantly improving the classification of about a hundred of the main stressors that can interfere with wheat cultivation, such as weeds, pests, diseases and damages.
Another real-world application using this solution on a different domain was presented and discussed in [22]: using a public database of lung ultrasound, the SOTA was reached with 100% of accuracy in classifying healthy from Covid-19 from pneumonia cases.

VI. CONCLUSIONS
In this paper, we presented an efficient adaptive ensemble method to classify olive leaf diseases using two EfficientNet-b0 as weak models.The ensemble is performed by a linear layer that combines the features of the weak models.Our method increased the generalization strenghtby about 1.5% and reduced the variance.Moreover, by parallelizing the independent weak models, the complexity is comparable to a single weak model, having 52% of parameters and 21% of FLOPs of the best SOTA solution.
Due to its efficiency, given a significantly smaller architecture in terms of the number of tunable parameters and floating point operations comparable to those of a decade ago, this solution can also be embedded into a smartphone application for real-time classifications.
Further studies will be performed to investigate the use of the efficient adaptive ensemble method with a greater number of weak models.

Fig. 1 :
Fig. 1: Samples from the dataset for Aculus olearius (first row), Healthy (second row) and Olive peacock spot (third row) Validation metric: Weighted F1-score which better takes into account both errors and data imbalance.Dataset split: training and test subsets are preset, in every run of the 5-fold cross-validation, the training set is split 80/20 in train/valid.Standardization: data are processed in order to belong to a distribution with values around the average and the unit standard deviation, improving stability and convergence of the training.Obviously, each run of the cross-validation of both phases is associated with a different initialization of the random model 890 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023

Fig. 4 :
Fig. 4: The confusion matrix on the test split of the best ensemble model.

TABLE I :
Data distribution of the dataset used.
[14]ple of scaling types, from left to right: a baseline network example, conventional scaling methods that only increase one network dimension (width, depth, resolution) and, at the end, the EfficientNet compound scaling method.Image taken from the original paper[14].

TABLE III :
Comparing metrics of the SOTA models.Since, in their papers, the authors did not mention if the values refer either as mean/best or on test only/whole dataset, we reported the mean values (best in brackets) on both test only and whole dataset.

TABLE V :
Metrics (F1-score) related to the best ensembles of five weak models.