Texture feature extraction for the lung lesion density classification on computed tomography scan image

The radiology examination by computed tomography (CT) scan is an early detection of lung cancer to minimize the mortality rate. However, the assessment and diagnosis by an expert are subjective depending on the competence and experience of a radiologist. Hence, a digital image processing of CT scan is necessary as a tool to diagnose the lung cancer. This research proposes a morphological characteristics method for detecting lung cancer lesion density by using the histogram and GLCM (Gray Level Co-occurrence Matrices). The most well-known artificial neural network (ANN) architecture that is the multilayers perceptron (MLP), is used in classifying lung cancer lesion density of heterogeneous and homogeneous. Fifty CT scan images of lungs obtained from the Department of Radiology of RSUP Dr. Sardjito Hospital, Yogyakarta are used as the database. The results show that the proposed method achieved the accuracy of 98%, sensitivity of 96%, and specificity of 96%.


Introduction
Cancer is the most common cause of death in the world as revealed from WHO (World Health Organization) statistic data.In 2012, it was recorded that cancer induced 8.2 million of deaths and lung cancer was recorded contributing of 1.59 million deaths that this disease is the highest mortality rate compared with liver cancer, stomach cancer, colorectal cancer, breast cancer and oesophageal cancer [1].As reported by the WHO in 2014, in Indonesia, 30.866 deaths were caused by lung cancer, with 8.390 and 22.476 deaths of female and male respectively.The lung cancer then became the most common cause of death towards male in Indonesia [2].Lung cancer is an abnormal growth of the lung cells in body tissues and grow to be cancer cells [2] in one or both parts of them, which commonly caused by smoking [3].It is divided into Small Cell Lung Cancer (SCLC) and Non-Small Cell Lung Cancer (NSCLC) [3].
The radiology examination of CT scan is one of early detection methods of the lung cancer to make the initial phase diagnosis in order to minimize the mortality rate.The image interpretation and assessment of lesion characteristic is subjective and various depend on the radiologist experience.Hence, the digital image processing of CT scan lungs image is necessary with an expectation that it can provide a second opinion diagnosis process .Based on morphological description on the CT Scan image there are number of criteria of the primary lung cancer diagnosis including ground glass opacity, irregular speculated margin, density, size of tumour, air bronchogram, lobulated, and enhancement [4].This research is focused on the morphological characteristics detection of lung cancer lesion density.The lesion density is a description of the tissue density which can be further divided into heterogeneous density and homogeneous .A digital imag e processing is required for morphological characteristics of lung cancer lesion density detection.
A number of researches has been conducted for feature extraction of texture, such as a research which carried out by Uyun [5] about pattern density detection on the mammogram image by using feature extraction method of GLCM (Gray Level Co-occurrence M atrices).The results obtained a strong significance towards the determination of breast can cer.Furthermore, Devan, et al [6] in their research used the extraction method of texture feature of GLCM, GLRLM (Gray Level Run Length Matrices) and entropy to identify the characteristics of three lung tissues types including normal, fibrosis, and carcinoma.The research result showed that the features used can differentiate three types of lung tissues properly.The texture feature extraction by using the histogram and GLCM was also conducted by Patil, et al [7] to identify the benign cancer and malignant cancer by using Xray image.Tun, et al [8] conducted Otsu segmentation and texture feature extraction to identify the lung cancer stages of stadium I, II, III, IV by using GLCM method.A number of other researchers have used the method of the classifications of MLP (Multilayer Perceptron).One of example is a research which conducted by Anand [9] in classifying the lung tumour as cancer and normal.The research conducted by Ahmad, et al [10] and Mitrea, et al [11] also implemented MLP to classify the image of colorectal cancer while research conducted by Valarmathi, et al [12] was to classify the mammogram image.
In this research, the identification of the morphological characteristics of lung cancer lesion density is conducted by using the texture feature extraction method and classification of lesion density.The image processing phases are the preprocessing by cropped RoI (Region of Interest), segmentation process by using Otsu segmentation, morphological operation and the feature extraction of the texture based upon the histogram and GLCM.The result of the texture feature extraction would be used for the class ification phase using the method of M LP.

Materials and Methods
This research is uses the data from the Department of Radiology of RSUP Dr.Sardjito Hospital, Yogyakarta, consist of 50 CT scan images of lung cancer.The aim of this research is to identify the morphological characteristics of lesion density from CT scan image in the case of primary lung cancer.Fig. 1 illustrates the block diagram of conducted research which include the pre-processing, segmentation, morphological operation, feature extraction and classification.

Pre-processing
At the pre-processing stage, the image is cropped on the RoI as initial step to make the research focus on the lesion part.This process is conducted manually to facilitate the process of identifying the morphology of lesion density.The cropping result of RoI is shown in Fig. 2.

Otsu Segmentation
Otsu segmentation is a process of classification of pixel to differentiate two parts: object and background [13][14] by calculating the threshold values automatically based on the input image [15].At first, the main principle of Otsu is to determine the probability of intensity value of i in the histogram which is calculated by (1).N states is the total number of all pixels in the image and ni states is the number of pixels with the intensity i.
The weighting in two classes: object and background are calculated with the equation ( 2) and (3) in which L states is the number of grey level. (2) The mean of object and background was calculated with the Equation ( 4) and ( 5).Equation ( 6) states   2 called as between-class variance (BVC).The total means are calculated with equation by using equation (7).The optimum threshold value was obtained by maximizing BVC and more less computation time [13].
Fig. 3 shows the segmentation process by using Otsu method.The result of the Otsu segmentation showed the fixed form of lesion object separated from the background.

Morphological Operation
The output of Otsu segmentation is binary image that will be used as a template to get the lesion area by applying a simple morphological operation such as AND, OR and NOT.AND operation is used in this research.Fig. 4 shows the process of morphological operation.

Feature Extraction
The texture extraction method consisted of three groups: statistic method, structural method and spectral method.In this research, the statistic method is used of the first order based upon the histogram and the second order with the base of GLCM as it can identify the density of cons tituent tissues by using the intensity of grey level with the highest performance in a number of previous researches .

Histogram-Based Texture
The simplest extraction method of the statistic properties for the texture is the order one which based on the histogram.In order to obtain the histogram-based statistic properties, the texture of an image can be calculated using the following features ( 8) -( 13) [13].
The formula resulted mean of object brightness.In this case, m refers to number of mean value, i refers to the grey level in the image and p(i) represents the probability of emergence of i and L presenting the highest grey level.Skewness refers to the level of asymmetrical towards the mean.It will be negative (-) if the histogram curve tends to be on the left side of from the value means and it will be positive (+) if otherwise.

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − ∑ 𝑝(𝑖) log 2 (𝑝 (𝑖))
−1 =0 (12) Entropies present a level complexity of an image.The higher of the value represent the high complexity of the image.Entropy also indicates the quantity of information contained in the data spread.

6) Smoothness
The level of smoothness of an image could be measured by the smoothness value.The low smoothness value shows that the image has the rough intensity.

Gray Level Co-occurrence Matrices (GLCM)
The GLCM method was firstly published by Haralick in 1973 with 28 values of features.GLCM uses the texture measurement in second order by considering the relationship between the pair of two pixels of original image [13].
In example, f(x, y) refers to the image with the size of Nx and Ny that has a pixel with probability.Thus, the L level and  ⃗ are the spatial direction vectors. ⃗ ( ,  ) is defined as the number of pixels with  ∈ 1, . .,  occurred in the offset  ⃗ towards the pixel with the values of  ∈ 1, . ., that can be stated in the formula [16] : ⃗ (, ) = # {( 1 ,  1 ), ( 2 ,  2 ) ∈ (  ,   ) × (  ,   )|( 1 ,  1 ) In this case, offset  ⃗ refers to angle and distance of pixel.
For example, Fig. 5 shows four directions for GLCM.Entropy is the size of complexity of grey level of an image.The values will be low if the elements of GLCM are close to the value of 0 or 1 and the value will be high if the elements of GLCM have the relatively equal value.

5) Correlation
Correlation is the size of the correlation of linearity from a number of pixel pair and provides the information regardin g the linear structure in image. Remark:

Classification
The artificial nerve tissue can be used for classification and to identify the pattern of object [18].MLP is the formation of artificial nerve system that are mostly used in terms of education and application [19].MLP has abilities to learn and give the better performance of classification are proven in a number of research [10] [20].At the classification stage, this research used MLP method by using three layers, consisting of input layer, hidden layer, and output layer.The classification was conducted using Weka machine learning [21].K-fold cross validation is chosen to evaluate the performance of training and testing feature from the dataset before being classified [22].Technically, the architecture of MLP used in this research is illustrated in Fig. 6.

Results and Discussion
Based on the experiment in this research, the texture feature extraction based on histogram and GLCM is taken from 50 images and the average value of feature shown in Table 1 for each features to difference heterogeneous and homogeneous lesion.Classification based on MLP is used in this research could facilitate the process of classification of heterogeneous and homogeneous lesion with the highest accuracy.Fig. 7 shows the confusion matrix of proposed method describes that TP = 24; the number of image with characteristics heterogeneous lesion recognizable as heterogeneous from 25 cases, TN = 25; all number of image with characteristics homogeneous lesion recognizable as homogeneous, FN = 1; only one image with characteristics heterogeneous lesion density recognizable as homogeneous lesion, and FP = 0; there is no image with characteristics homogeneous lesion recognizable as heterogeneous lesion.

Conclusion
This research proposes a method to identify the characteristics and classification of lesion density of primary lung cancer by using the histogram and GLCM-based texture feature extraction.The combination of histogram and GLCMbased texture feature extraction obtained achieved the accuracy of 98%, sensitivity of 96%, and specification of 96%.The obtained results are able to show quantitatively that the two methods are able to identify the characteristics of the difference of lesion density between the heterogeneous and homogeneous lung cancer.Thus, it can help the radiologists in interpreting the image.Furthermore, the proposed method in this research can be recommended as one part of CAD development to diagnose the lung cancer.In future work, it is suggested to propose other segmentation technique and feature extraction for density lesion detection.

Fig. 1 .
Fig. 1.Block diagram of the identification of lesion density morphology

Fig. 3 .Fig. 4 .
Fig. 3. Process of Otsu segmentation: (a) Image of RoI; (b) Segmentation image deviation () refers to the level of statistic spread measuring the pattern of data spread and provides the level of contrast.3)Skewness  = ∑ ( − ) 3 ()

Fig. 6 .
Fig. 6.Architecture of MLP for the classification of lesion densityThe performance of classification is measured from the prediction of accuracy, sensitivity and specification aspects as expressed in (24) -(26).Where, TP, TN, FP and FN are true positive, true negative, false positive, and false negative, respectively.

Fig. 7 .
Fig. 7. T he confusion matrix of proposed method From the conducted experiment result, the ranked features difference influential are contrast, skewness, mean, entropy, ASM, energy, IDM, standard deviation, smoothness, and correlation.The value of all texture feature extraction method based on the histogram and GLCM required in the classification process.There are 50 images input of data which contains 25 heterogeneous image and 25 homogeneous image.The total combined features are 26 features of each image.Table2shows the accuracy, sensitivity, and specificity classifying rate.