Deep Neural Networks Application for Cup-to-Disc Ratio Estimation in Eye Fundus Images

Glaucoma is the second eye disease causing blindness worldwide. Optic Cup-to-Disc ratio (CDR) is a commonly applied method in glaucoma detection. The CDR is calculated based on Optic Disc (OD) and Optic Cup (OC) in eye fundus image screening. Therefore, the accurate segmentation of these two parameters is very important. Lately, Deep Neural Networks have demonstrated great effort in automated Optic Disc and Optic Cup segmentation but the overlapping between regions of OC and OD cause the challenge to obtain CDR automatically with high accuracy. In this paper, we assess the performance of CDR evaluation on three modifications of the Convolutional Neural Network (CNN) U-Net, namely Attention U-Net, Residual Attention U-Net (RAUNet), and U-Net++ applied on publicly available datasets RIM-ONE, DRISHTI, and REFUGE. We calculated the ground truth CDR value of testing eye fundus images of these datasets and compared it with the CDR value obtained by trained CNNs. Our results show that Attention U-net obtains the closest CDR to the ground truth CDR value but the identification of early-stage glaucoma needs an improvement.


I. INTRODUCTION
G LAUCOMA is a progressive eye disease caused by dam- age to the optic nerve which is critical to vision.Usually, there are no symptoms in its early stages, and without proper treatment, glaucoma can lead to blindness.The evaluation for glaucoma starts by evaluating the cup-to-disc ratio (CDR) which is the ratio of the vertical optic cup diameter (VCD) to the vertical optic disc diameter (VDD) of a fundus image [1].Fig. 1. presents an example of an eye fundus image.Depending on this ratio, several stages of glaucoma are distinguished.The cup-to-disc ratio of 0.4, 0.5 -0.7, and above 0.7 indicate early-stage glaucoma, moderate-stage glaucoma, and severestage glaucoma respectively.A healthy eye has a CDR of 0.3 [2].The CDR calculation is based on the segmented optic disc and optic cup regions.OD appears as a bright oval region, and OC takes place as the brighter oval region in the center of the optic disc (Fig. 1).
Addressing the limitation of medical resources in many areas worldwide [3], deep learning methods become successful in medical image segmentation [22].Especially convolutional neural networks (CNN) demonstrated powerful representation and generalization abilities [4] [5] [6].In automated glaucoma This work was not supported by any organization identification, the CNNs are trained on eye fundus images having the ground truth labels of OD and OC prepared by the ophthalmologists.Therefore, precise segmentation of the optic disc and optic cup is of the essence.However, accurate calculation of cup-to-disc ratio value is still in the development stage and faces challenges [7].Here, the complexity occurs due to overlapping in the optic disc and optic cup areas.The automated deep learning-based algorithms fail in differing the boundaries of OC in the eye fundus image.The publically available fundus image datasets have insufficient images and segmentation masks to learn CNN for OD and OC segmentation, and CDR calculation.
In this research, we aim to evaluate CNN's ability to accurate segmentation of OD and OC used for further CDR calculation.As the stage of glaucoma is identified by the CDR value, our experiment seeks to verify the equivalence of the CDR calculated with the help of CNN with the CDR calculated by ophthalmologists.Therefore, the datasets consisted of eye fundus images with the ground truth of OD and OC will be used.

II. RELATED WORK
The resultant challenges in CDR estimation prompt researchers to seek the improvements in presence of deep learning-based methods and new proposals.
Zhao et al. in [7] introduced a direct CDR estimation method based on a semi-supervised learning scheme.The proposed method is directly regressing the CDR value based on  [8] proposed an approach by utilizing linear iterative clustering (SLIC) and a feed-forward neural network classifier.The classifier was used to classify the superpixels to detect the boundaries of OD and OC.The final detection and segmentation of OD and OC were completed by applying morphological operations and an elliptical estimation.In [9] the two-stage deep learning-based approach for CDR estimation was proposed.In the initial stage of optic disc segmentation, the U-Net was adopted.At a later stage, image-processing algorithms are used to estimate the CDR.In [10] a modified U-Net model was presented to locate OD.The CDR was calculated by segmented OD and OC incorporating the adaptive thresholding.
The discussion above shows the variety of methods applied in CDR estimation and it is difficult to compare their effectiveness.The summary is provided in Table I.The metrics used in the performance evaluation of the proposed approaches are dice coefficient (Dice), Jaccard Index (IoU), sensitivity (Sen), specificity (Spec), and area under the curve (AUC).In this paper, the OD and OC segmentation task will be solved by using CNNs to calculate CDR and to compare the glaucoma stage by CDR value obtained using CNNs with the glaucoma stage provided by the experts.Therefore, the Dice measure was chosen to evaluate the obtained results.

III. METHODOLOGY
A detailed description of the applied methods is presented in four sub-sections.The sub-section 3.1.presents the datasets.In the sub-section 3.2.the applied image preprocessing techniques are described.The sub-section 3.3.presents the convolutional neural networks used in our experiments.The subsection 3.4.provides the details of metrics used for convolutional neural networks performance evaluation and cup-to-disc ratio calculation.

A. Dataset description
The public dataset DRISHTI-GS [11] contains 101 images with ground truth divided into 50 training and 51 testing images.All the images have been marked by four eye experts.All images were taken centered on an optic disc with a Field-Of-View (FOV) of 30 degrees and saved in the PNG uncompressed image format with a resolution of 2045 x 1752.Ground truths were collected from data experts.
RIM-ONE v.3 [12] is a public dataset consisting of 159 annotated stereo eye fundus images.The images were taken by a Nidek AFC-210 camera and saved in JPEG image format with a resolution of 2144 x 1424.The OD of each image has been segmented by two experts in ophthalmology to create the ground truth.
REFUGE [13] is a public dataset containing 1200 fundus images, with ground truth and clinical glaucoma labels.The dataset is split 1:1:1 into 3 subsets equally for training, validation, and testing.The training set with a total of 400 color fundus images taken by a Zeiss Visucam 500 fundus camera of size 2124 x 2056 is provided with the corresponding glaucoma status and the unified manual pixel-wise ground truths.The testing dataset contains 800 color fundus images taken by a Canon CR-2 camera of size 1634 x 1634 and is split into 400 testing images and 400 validation images.The images of validation and testing subsets were used in this paper only.

B. Preprocessing
For the purpose of image diversity increasing, various image augmentation techniques, namely image zooming by 20%, rotation by an angle of rotation from 0°to 45°, and horizontal and vertical flipping were applied.With this approach, the number of images in each dataset was extended to 1000.The region of interest (ROI) with the double size of OD area was extracted automatically by cropping the area around the centroid of optic disc and optic cup accordingly.The ROI images were resized to a size of 512 x 512 pixels by bicubic interpolation [14].

C. Convolutional neural networks
The three CNNs, namely UNet++ [15], Attention U-Net [16], and Residual Attention U-Net (RAUNet) [17] with sig-nificant improvement in image segmentation have been chosen to be trained for optic disc and cup segmentation.During the training of these CNNs, the binary cross-entropy loss function [1] and Adam optimizer [5] have been used.The parameters such as batch size, learning rate, and dropout rate for each CNN were searched by applying the KerasTuner framework.
UNet++ [15] is a nested and dense skip connections-based method.In the encoder part, the feature maps incur a dense convolution block.Here, the pyramid level causes the number of convolution layers.Due to the nested skip pathways, the proposed network generates full-resolution feature maps at multiple semantic levels.
Attention U-Net [16] contains the encoder, decoder, and attention gate at the skip connection of each level.The pretrained network ResNet50 takes a place as an encoder, which consists of residual blocks with skip connections overcoming the vanishing gradient problem.The decoder contains upsampling, and concatenation.Each convolution layer is followed by a rectified linear units (ReLU) activation function and batch normalization.
Residual Attention U-Net (RAUNet) [17] is an encoderdecoder-based network, where the encoder is constructed of pre-trained ResNet34 for semantic features extraction.The decoder contains a new augmented attention module (AAM) for multi-level features fusion and global context capturing.

D. Metrics
The evaluation metrics such as the Dice coefficient (Dice) [19], [20] and the cup-to-disc ratio (CDR) [21] are used in this paper.
The cup-to-disc ratio is calculated by dividing the OC diameter by the OD diameter [10].
Dice, which describes the similarity between the two images, is applied to evaluate the performance of trained CNNs in OD and OC segmentation.
where, S -the predicted output map by segmentation, L -the ground truth binary map.

IV. EXPERIMENT AND RESULTS
The experiment was run by training the three different CNNs, namely UNet++, Residual Attention U-Net, and Attention U-Net on a dataset consisting of combined eye fundus images and their binary labels of DRISHTI-GS, REFUGE, and RIM-ONE and named as a combined dataset.The training of convolutional neural networks was performed on a single GPU machine [18] with 1TB of RAM using Keras included in TensorFlow version 2.9.1.An early stopping technique seeking a minimum for validation loss was applied to reduce unnecessary training time.The Adam optimizer and binary cross-entropy loss function were used during the training.The KerasTuner framework was applied to search for parameters, namely the learning rate, batch size, and dropout rate of each network.The trained CNNs were tested on 50 testing images of each dataset, REFUGE, RIM-ONE, and DRISHTI-GS separately to evaluate the Dice and calculate the CDR for the predicted OD and OC by each CNN.The CDR values were grouped into ranges of (0.3-0.4], (0.4-0.7], and above 0.7 according to glaucoma stages.Table II provides the evaluated Dice of the optic disc and cup segmentation testing the trained convolutional neural networks on REFUGE, RIM-ONE, and DRISHTI-GS test datasets.Here, comparing the performance of CNNs in OD and OC segmentation, the Attention U-Net demonstrates the highest Dice value of 0.9789, 0.9732, and 0.9770 for OD segmentation and 0.8769, 0.8742, and 0.8549 for OC segmentation on DRISHTI-GS, REFUGE, and RIM-ONE test datasets respectively.This leads to results in Table III presenting the mean and variance of ground truth CDR obtained on images SANDRA VIRBUKAIT Ė, JOLITA BERNATAVI ČIEN Ė: DEEP NEURAL NETWORKS APPLICATION FOR CUP-TO-DISC RATIO ESTIMATION 1193 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. of each test dataset and the mean and variance of CDR calculated by Unet++, Attention U-Net, and Residual Attention U-Net on testing images of each test dataset.Assessing the mean and variance of ground truth CDR and the CDR calculated using the segmented OD and OC by CNNs, the best result was obtained by Attention U-Net. Using the same approach, the amount of eye fundus images in each test dataset was calculated and evaluated the percentage of how many images each CNN is able to calculate the correct CDR in comparison with ground truth.The results are shown in Table IV.The obtained percentage of truth CDR and CDR calculated using segmented OD and OC by CNNs indicates that the Convolutional Neural Networks better identify moderate-stage glaucoma and severe-stage glaucoma cases but the identification of early-stage glaucoma is quite poor.For example, using Attention U-Net for REFUGE dataset images, the CDR, compared to the truth CDR, was calculated correctly for 91% of moderate-stage glaucoma images and 100% of severe-stage glaucoma images.Meanwhile, the correct CDR was calculated only for 21% of early-stage glaucoma images.The showcase examples of the optic disc and cup segmentation in the images of early-stage glaucoma by Attention U-Net and the ground truth are provided in Fig. 2. Fig. 2 (a) shows the cases when the predicted CDR by CNN is near the ground truth value of CDR.Fig. 2 (b) shows the cases when the CNN is wrong in the optic disc and cup segmentation and predicts the CDR value of moderate-stage glaucoma for early-stage glaucoma images.This can be caused by a noticeable difference in image quality.The images, for which the values of CDR were predicted correctly, are brighter and indicate more clear boundaries of OD and OC.Meanwhile, the boundaries of OD and OC in the wrongly predicted value of CDR are blurry.As the only RIM-ONE dataset has images of non-glaucoma cases, these have been tested separately and the results of the mean and variance of CDR are shown in Table V.The obtained CDR results indicate CNN's ability to segment non-glaucoma cases quite accurately.This can be influenced by clear boundaries of the optic disc and cup in images of healthy eyes.V. CONCLUSION The three Convolutional Neural Networks, namely U-Net++, Attention U-Net, and Residual Attention U-Net were applied in this paper for the evaluation of cup-to-disc ratio.The experiments show that the non-glaucoma cases were identified quite accurately by all three CNNs.However, evaluating the ability of CNNs in identifying the different glaucoma stages, it is noticed that CNNs perform better in identifying moderatestage glaucoma and severe-stage glaucoma, but the early-stage glaucomatous cases are poorly identified.Attention U-net was able to identify 50% of early-stage glaucoma cases in RIM-ONE, and 13% and 20% early-stage glaucoma cases were identified by Residual Attention U-Net and U-Net++ respectively.In the REFUGE dataset, only 21%, 14%, and 13% of early-stage glaucoma cases were identified by Attention U-NET, U-Net ++, and Residual Attention U-net respectively.CNNs misidentify cases of early-stage glaucoma by classifying them as intermediate-stage glaucoma.Which is not so bad as such cases will be noticed by the doctors.However, further research and the refining of CNNs are needed.

Fig. 2 .
Fig. 2. Early-stage glaucoma.(a) The correct CDR value.(b) The wrong CDR value.The ground truth of OD and OC is indicated by green and red boundaries respectively.The segmented OD and OC are indicated by blue and white boundaries respectively.

TABLE II DICE
OF OD AND OC SEGMENTATION BY DIFFERENT CNN

TABLE III GROUND
TRUTH CDR OF EYE FUNDUS IMAGE AND CALCULATED CDR USING CNN ON EACH TEST DATASET SEPARATELY