The study conducted in this work aims to thoroughly examine the practical obstacles and technical deficiencies in the application of deep learning for citrus disease identification. Considering the economic importance of citrus crops and the rising incidence of diseases like HLB, Citrus Canker, and Black Spot, it is essential to establish reliable and precise detection systems. Considering the survey conducted for the existing state of the art techniques to detect diseases prevailing in citrus fruits the questions were designed and are listed in table 3
The primary finding of this study is that automated systems for detecting citrus diseases can achieve significantly greater accuracy and reliability in real-time scenarios when employing effective ensemble approaches and enhanced picture pre-processing techniques. Due to the efficacy of NL-FuRBE, it establishes a new benchmark for detecting citrus diseases and provides precision farmers with an economical, scalable method for early disease identification. This method can be enhanced in the future to incorporate mixed datasets and real-time field operations utilising low-power devices. This would enhance integrated agricultural management and provide sufficient food availability for the global population. The overall framework of the proposed methodology demonstrated through Fig. 2 and step by step process is represented through Algorithm 4.
The prevalence of diseases in fruits may lead to widespread loss of the yields. To control the spreading of disease it is important to detect the disease at its onset.
The design of an automated system for disease detection requires the image dataset to train the deep learning model. The proposed model is trained using public available lemon disease dataset released online in February 2025, consisting of 1354 raw images with 9 class- 8 types of diseases (anthracnose, bacterial blight, citrus canker, curl virus, deficiency leaf, dry leaf, healthy leaf, sooty mould, and spider mites) and Healthy class.This latest dataset chosen for the work reflects the current trends and broader disease types coverage and is captured under wider range of conditions. The sample images belonging to each disease category are shown in Fig. 3.
Pre-processing and data balancing are essential elements of deep learning, guaranteeing effective model training and applicability in diverse situations. Some classes may be outnumbered by datasets in comparison to others. Data balancing strategies, such oversampling minority classes, under sampling majority classes, or employing SMOTE to create synthetic data, can address this issue. Resizing, normalizing, flipping or rotating images, and noise reduction are all operations that enhance the quality of input during pre-processing. The amalgamation of these operations improves the model's performance, reduces its bias, and enables accurate identification of all classes. The principal benefits of these strategies include adaptability to new data, less overfitting, enhanced consistency in performance across all classes, and increased model accuracy. This research involves the oversampling of minority classes. Figure 4 illustrates a strong class imbalance regarding disease prevailing in citrus fruits, with a highly varying number of images per class. The preconfigured models employed on this imbalanced dataset will exhibit suboptimal performance regarding disease detection in citrus fruits, resulting in diminished accuracy and skewed predictions. Figure 5 illustrates the balanced count for each class, facilitating equitable learning across all classes to enhance prediction and accuracy. The images from each class are reduced to a resolution of 128x128x3 due to varying original resolutions. Random transformations like as orientation, scaling, and flipping are applied to the training dataset. Images in the training dataset are randomly rotated by ±40 degrees and moved horizontally and vertically by 20%, respectively. The training dataset has a 20% variance in magnification adjustments. Furthermore, horizontal flipping of images is executed while preserving their intrinsic significance. During the execution of operations on images, such as rotation, scaling, and flipping, certain pixels may stay unoccupied. The nearest pixel value is utilized to address these gaps.
In the diagnosis of citrus diseases by deep learning, image quality is crucial for the precise identification and classification of illness symptoms. Images obtained in actual agricultural settings frequently exhibit numerous types of noise, including motion blur, variable lighting, shadows, and background clutter, which might mask essential disease indicators such as lesions, spots, or discolouration. Denoising is an essential pre-processing technique that improves image quality by eliminating undesired distortions. Denoising approaches enhance the visual quality of input data, allowing deep learning models to extract more significant and consistent features, thus augmenting classification accuracy and robustness. Furthermore, denoising enhances the generalizability of models across various field circumstances and imaging devices, such as smartphones and drones. Incorporating classical filters (e.g., Gaussian or median filters) into the pre-processing stage can substantially improve model performance, particularly in resource-constrained or noisy settings. These conventional filters smoothen the image globally thereby blurring of edges. Figure 6 shows the output using VAD for sample image taken from dataset.
Also each colour channel is treated separately leading to colour artefacts. While applying filters for denoising the major factor to be considered is preservance of edges. In such context Vector Valued Anisotropic Diffusion method minimizes the noise and also preserve the edges in the image. Each pixel is treated as an independent vector comprising of R, G, B vectors in combined form. The smoothing process is guided using the combined gradient magnitude. Diffusion denotes a method that enhances an image by reducing the disparity in intensity levels among adjacent pixels. Isotropic diffusion uniformly smooths in all directions, resulting in edge blurring. This can be controlled by applying modified approach where the diffusion is varied along different directions. The process is thus known as Anisotropic diffusion.
The operation of this filter is governed by the following key components:
The amount of diffusion is varied across the pixels over the image and the amount is varied as soon as an edge is encountered. This process reduces the diffusion effect on the sharp edges thereby preserving the edge information. The gradient for each individual channel is calculated through Eq. (1). The resultant image can be calculated by Eq. (3) by combining R G B channels.
where represents intensity value of individual channel.
Top-hat and bottom-hat filters(THBH) are morphological methods employed to enhance features in an image by highlighting minor aspects and details such as bright spots, dark spots, and textures. For image quality enhancement considerations, morphological filter fromed by amalgamation of two filters using elementary morphological operations i.e. Top hat filter and Bottom hat filter is a suitable option. The formation of morphological filter module is elaborated in Fig. 8. The top hatted version is computed by subtracting the resultant of opening operation between original image and kernel from the original image. Figure 7 shows the output using THBH for sample image taken from dataset.
This solves the problems relating to background illumination and perform enhancement of the bright objects with respect to a dark background. The bottom-hatted variant is a dual filter of top-hat filtering. The filtered picture is derived by subtracting the original image from the result of applying a closure operation on the original image using a kernel. This will enhance the picture contrast for effective lesion segmentation. Moreover, the sharp noise peaks are removed. The structuring element(K) plays an important role in the operation of this filter. For top hatted image the features lesser than K are removed from the image and the features greater than k are enhanced. Whereas in bottom hat version the smaller feature elements are preserved. The combined resultant image is thus enhanced. The erosion process entails the removal of specific pixels based on the attributes of the kernel. K, and is implemented as given in Eq. (4).
Where I(r, c) is the original input image and K is the kernel or structuring element.
Through the process of dilation, more pixels are added to the object boundaries depending upon the characteristics of the kernel K, and is implemented as given in Eq. (5).
An operation termed Opening is constituted by the sequential application of erosion and dilation, utilizing a same kernel element. The outcome is the refinement of contours and the elimination of narrow peaks. It is implemented as:
A composite operation termed Closing is constituted by a sequence of dilation succeeded by erosion, employing an identical kernel element. The outcome is the refinement of contours and the eradication of minor depressions. It is implemented as:
In the previous section, we have discussed the two categories of filters in order to remove noise from the images as well as preserving or enhancing the image quality. The subjective evaluation of image may not guarantee the effectiveness of the proposed image enhancement filter. Therefore, an objective evaluation is required to be conducted so as validate the performance of the Image enhancement filter. The objective evaluation has been conducted by evaluating the standard image quality metrics listed in table 4.
The choice of an effective image enhancement filter is vital for enhancing visual quality and maintaining key image attributes. To objectively measure and compare the efficacy of enhancement techniques, three widely recognized image quality evaluation metrics are utilized: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Natural Image Quality Evaluator (NIQE). PSNR measures the ratio of the highest possible signal power to the noise power impacting the image, with elevated values signifying superior noise reduction. SSIM evaluates perceptual similarity of pictures by analyzing brightness, contrast, and structure, so effectively assessing structural fidelity. NIQE, a no-reference metric, evaluates image quality by analyzing statistical deviations from natural scene statistics, with lower values indicating superior perceptual quality. Through the concurrent analysis of these measures, filters may be evaluated with more reliability, facilitating the selection of the enhancing technique that minimizes distortion while maximizing perceptual clarity and naturalness.
To obtain the enhanced images, we have done implementation through MATLAB. The rigorous testing of variants of VAD and THBH filters in order to obtain the optimal values of quality metrics including PSNR, SSIM and NIQE. For implementation of VAD the testing was done considering varying number of iterations and the optimal values of metrics were obtained with iterations=3. At each iteration, directional gradients (North, South, East, West) were computed for each color channel, followed by gradient magnitude calculation and application of the diffusion process to preserve edges while smoothing noise.For conductivity function, the value is calculated using exponential function. Similarly, the testing of THBH filter was done considering different shapes of structuring elements like Disk, Diamond, cube and rectangle with different sizes of radii. The optimal performance was obtained through the disk shaped kernel with radii=3. Table 5 shows the comparitive analysis metric wise for VAD and THBH filter. The implemntation results demosterated in this table shows the values obtained for sample image taken from each image class.
From the value of metrics obtained through implementation, the results obtained for VAD filter satisfies the selection criteria designed in algorithm 3. The testing has been done on sample images belonging to each class of disease and healthy images. In numerous disease categories, the VAD filter produces either elevated or equivalent PSNR and SSIM, while consistently attaining reduced NIQE values, indicating enhanced perceptual quality. For example, in the instance of Anthracnose00005, VAD attained a PSNR of 29.7 and NIQE of 5.0717, in contrast to 28.2277 and 8.2514 for the Top Hat Bottom Hat technique, signifying superior signal fidelity and naturalness. Comparable trends are observed in bacterial_blight00003, curl_virus00007, and sooty_mould00002, wherein VAD markedly enhances NIQE scores while preserving competitive PSNR and SSIM. In several instances, such as Citrus_Canker00009 and Dry_leaf00007, the Top Hat Bottom Hat technique demonstrated marginally superior PSNR or SSIM; nonetheless, it remained inferior in NIQE. This suggests that although both techniques can improve image features, VAD consistently provides a superior equilibrium of objective quality and perceptual realism, rendering it a more reliable option for enhancing images of plant diseases.
Fusion in ensemble learning is the technique for aggregating predictions from several base learners into a single output. This stage is essential, as how well the outputs of the base models are merged determines the usefulness of an ensemble in addition to their diversity and correctness. While fusing the predictions of multiple base learners, the underlying problem is giving uniform priority to all the base learners, which results in lower classification accuracy. This section presents a brief overview of the utilized base learners and their related technical aspects, which are later utilized to develop the proposed model based on fuzzy-based ranking. Each base learner generates the value of a confidence score indicating the probability of disease presence. Multiple customized base learners ensure robustness and increased accuracy. The fuzzy-ensemble method subsequently integrates these confidence scores through fuzzy set theory, considering the uncertainty and variability inherent in the scores. The resultant fuzzy ensemble calculated score serves as the deciding criterion for disease diagnosis. This method seeks to enhance the accuracy and reliability of citrus disease detection by leveraging the advantages of various classifiers and integrating uncertainty into the final decision-making process. The weights of the early layers of three base learners are fixed, while the subsequent layers are optimized on the Lemon leaf disease dataset utilized.
To leverage the strengths of multiple base models and enhance the robustness and accuracy of our predictions, we introduce a novel NL-FuRBE (nonlinear Fuzzy Rank-Based Ensemble). A Fuzzy Rank-Based Ensemble is an advanced fusion methodology that integrates the advantages of many classifiers or models by allocating fuzzy ranks according to their efficacy or confidence, instead of depending exclusively on definitive voting or averaging. It is particularly advantageous in situations such as disease detection, where model uncertainty and overlapping class borders frequently occur. This approach considers the prediction scores of each base classifier for every individual test case independently.
The diverse base learners employed are denoted as . In our proposed methodology, we have employed VGG-19, AlexNet, and Xception as the base learners. Each base model is trained independently on the training dataset for citrus leaf diseases () to learn the underlying patterns and relationships within the data. Upon completion of the training phase, each base model generates prediction scores for a given input sample belonging to the test dataset (). the complete steps are shown in Algorithm 4.
Let the prediction score of each base model for individual disease or healthy class c be denoted as , where and C is the total number of classes. I represents the number of base models. In our dataset, there are a total of nine different classes. Therefore, the total prediction sets will include values as:
For each test sample , the prediction scores from each base model are subjected to a fuzzy ranking process through three different nonlinear functions. Let , , and denote the fuzzy ranks generated by the nonlinear functions for each base model corresponding to all classes in the dataset. The equations for the nonlinear functions are given as:
Equation (8) representa a hyperbolic tangent function which is a symmetric function centered around 1. The rank value is calculated based on prediction value- lower the prediction score lower is the rank allocated.
Equation (9) represents an exponential function which assigns rank based on the prediction scores of each base classifier. As soon as the predicted score approach towards 1, the minimum rank value is assigned to a particular classifier for a particular class label.
Equation (10) represents a sigmoid function. The sigmoid is suitable for smoother transitions from higher to lower weights. This will also generates the ranks for each individual classifier based on prediction score. These ranks combined together forms the base for aggregate rank generation.
To obtain the final ensemble prediction for a test sample , the fuzzy ranks generated by all base models are aggregated. For each class c, we calculate an aggregated fuzzy rank by combining the ranks from all N base models.
The fused rank score is given by:
The final predicted class for the test sample is the class with the smallest aggregated fuzzy rank:
The class with the maximum aggregated fuzzy rank is considered the final prediction of the proposed methodology. This decision rule prioritizes the class that receives consistently high confidence and low uncertainty scores across the ensemble of base models thereby improving the classification accuracy.