# Determination of co-deletion status of 1p / 19q chromosomal arms in low-grade glioma by cross-correlation-periodogram model analysis

In our study, the detection of the 1p / 19q co-deletion status was determined by analysis of S-MRI images of glioma subjects considered as discussed in the section. 2.1 and described in the following subsections.

### Glioma Segmentation and Data Normalization

The glioma portion was extracted from MRI using ground truth provided in the TCIA database. Here, the ground truth images were used as a mask to segment the entire glioma. After segmentation, each glioma image was normalized by z scores using the equation. 1 in order to balance the intensity.

begin {aligned} z = frac {X (i, j) – mu} { sigma} end {aligned}

(1)

or, ( mu ) and ( sigma ) denotes the mean and standard deviation of the image X(I, j) respectively.

### Detection of Tumor Tissue Heterogeneity Across Slices by VoCC

The change in tumor heterogeneity between slices was studied using a cross-correlation (CC) which assesses whether two successive slices (here, the ROI of the glioma) of the MRI image volume have common characteristics and are therefore correlated. . Therefore, CC analysis is likely to reveal whether differences in the molecular characteristics of the glioma result in differences in the structural arrangement across the slices.

Given two successive glioma slices X and Y, the 2D cross-correlation function is defined as

begin {aligned} C (i, j) = sum _ {m = 0} ^ {M-1} sum _ {n = 0} ^ {N-1} X (m, n) Y ( mi, nj) end {aligned}

(2)

or, (- (M-1) the I the (M-1) ) and (- (N-1) le j le (N-1) )

In order to evaluate the evolution of the heterogeneity of the tumor volume between two successive sections, we proposed a new function “CC variation” (VoCC) which examines the change in CC for different offset values. VoCC was derived as:

begin {aligned} sigma _ {CC} ^ {2} = frac { sum _ {i = 1} ^ {2M-1} (C (i, j) – { overline {C}} ( bullet, j)) ^ {2}} {2M-2} end {aligned}

(3)

or,

begin {aligned} { overline {C}} ( bullet, j) = frac {1} {2M-1} sum _ {i = 1} ^ {2M-1} C (i, j ) end {aligned}

Since VoCC quantifies the change in uniformity of intensity values ​​over successive slices, this measurement is relevant for examining volumetric behavior (behavior between slices) between mutant and wild-type gliomas. An application of the proposed VoCC was reported in our previous post16 where the source distribution of VoCC was calculated in order to verify whether significant differences exist between two classes of gliomas.

### Feature extraction

#### Examination of the presence of a 3D periodicity in coded and unsupported gliomas 1p / 19q

It was observed that the VoCC obtained for the two classes exhibited marked visible differences between the two classes of gliomas, with and without a 1p / 19q co-deletion. The essence of these visible differences was captured by extracting appropriate characteristics that would be useful for classification between mutant and wild-type gliomas. As discussed previously, in our previous work16 the source distribution of VoCC showed significant differences between two glioma subtypes. In the present study, the applicability of VoCC to assess the heterogeneity of gliomas is further investigated by determining its volumetric periodicity. In this paper, the power spectral density (PSD) estimate of the VoCC corresponding to the two classes was calculated using the Lomb-Scargle power spectral density estimate (LSPSD) to illustrate the differences in spectral signature of two classes. Lomb (1976) and Scargle (1982) postulated the Lomb-Scargle periodogram; an algorithm that helps detection and characterization of periodicity17.18. There is little reported work in the literature that has used the LS periodogram in order to find periodic patterns in the field of genetics and the biological rhythmic process.19,20,21,22.

The LS periodogram was formulated as follows:

begin {aligned} P_ {LS} (f) = 0.5 frac { sum _ {n} sigma _ {CC} ^ {2} Cos (2 pi f [t_{n}-tau ]) ^ 2} { sum _ {n} sigma _ {CC} ^ {2} Cos ^ {2} (2 pi f [t_{n}-tau ])} + 0.5 frac { sum _ {n} sigma _ {CC} ^ {2} Sin (2 pi f [t_{n}-tau ]) ^ 2} { sum _ {n} sigma _ {CC} ^ {2} Sin ^ {2} (2 pi f [t_{n}-tau ])}) end {aligned}

(4)

or, ( sigma _ {CC} ^ {2} ) is VoCC which is a function of ‘t’, given by equation 3, ( tau ) is the time delay and is specified for each frequency ‘f’ to ensure the time shift invariance: ( tau = frac {1} {4 pi f} tan ^ {- 1} { frac { sum _ {n} Sin (4 pi t_ {n})} { sum _ {n} Cos (4 pi t_ {n})}} )

The 3D volumetric periodicity of each LGG subject was measured by taking the internal product of the VoCC periodogram between two successive slices. If the VoCC traces of successive slices show a different pattern, the internal product of the respective periodogram will be of the following nature:

1. (a)

The internal product of two corresponding spectra will be significantly different from the unequal input spectrum. peaks.

2. (b)

The corresponding peaks of two periodograms may not occur at the same location.

3. (vs)

There is a big difference in the corresponding peak amplitude of two spectra.

As a result, the corresponding peaks of two VoCCs may not coincide at a similar location. There will also be a big difference in the corresponding peak amplitude of two periodograms. This will result in an almost flat spectrum with reduced oscillation.

Conversely, the dominant peak will be visible in both spectra at the same location when VoCC through the slices shows a similar pattern. There will be an equal number of peaks and the corresponding peaks of two periodograms will occur at a similar location. The corresponding peak amplitude of two spectra will also be almost equal. As a result, the internal product of the respective periodogram will also display a similar profile to that of the input spectrum with comparatively more oscillations.

The above concept of determining the 3D periodicity change across MR slices was performed for each LGG subject to predict the presence of the 1p / 19q co-deletion. We hypothesize, the periodic pattern change across slices is negligible for cases with a 1p / 19q co-deletion. The change in volumetric periodicity is quantified by extracting spectral characteristics suitable for classification. The spectral characteristics extracted include:

1. (I)

Differential energy between two periodograms,

2. (ii)

Total volumetric energy: It is defined as the total energy of the internal product of two peridograms.

3. (iii)

3D LSPSD Cutoff Frequency: It is defined as the frequency at which the amplitude of the LSPSD estimate is almost zero.

### RUSBoost classification

The dataset considered in our study is poorly balanced with a ratio of 2: 1 (mutated: wild type). In such cases, building an efficient classification model is a difficult task. When there are many more examples in a specific class than in another class (data imbalance), the performance of traditional machine learning classification models drops dramatically. These algorithms tend to predict only majority class (negative class) data where the minority class (positive class) is treated as noise and is often ignored. Thus, there is a high probability of error in classifying the positive class by classifying all instances as negative class. The two most common techniques used to improve this class imbalance problem are data sampling and amplification.9. The distribution of classes is balanced by a sampling technique that either removes samples from the majority class (downsampling) or adds samples to the minority class (oversampling). Alternatively, amplification is an advanced data sampling technique that can improve the performance of any weak classification model by iteratively creating an ensemble model. At each iteration step, the sample weights that were misclassified during the current iteration are modified. Such a technique is very effective in dealing with the class imbalance problem where the highest weights are assigned to minority class examples which are likely to be misclassified in subsequent iterations. RUSBoost is an example of a hybrid sample / boost algorithm that incorporates Random Downsampling (RUS) – a technique that randomly removes data samples from the majority class9.

Let ‘n’ examples in dataset ‘V’ represented by a tuple ((x_ {k}, y_ {k}) ) or (x_ {k} ) is a point in feature space ‘X’, and (y_ {k} ) be the class label in a set of class “Y” labels. The algorithm begins by initializing the weight of each example to 1 / n, where “n” is the number of training examples. If the total number of iterations is noted ‘P’ (represents the number of classifiers in the set model), then P weak hypothesis (H_ {t} ) are iteratively trained (t = 1 to P) using a classification algorithm ‘Low Learning‘as follows: First, RUS removes the majority class examples until the minority and majority class examples are balanced (1: 1). This will result in a new set of training data (V_ {t} ^ {‘} ) have a new weight distribution (W_ {t} ^ {‘} ). In the next step, (V_ {t} ^ {‘} ) and (W_ {t} ^ {‘} ) are passed to ‘Low Learning‘(basic learner) in order to create the weak hypothesis (H_ {t} ). Based on actual training data set “V” and weight distribution “” (W_ {t} )‘, the pseudo-loss ( delta _ {t} ) is calculated. After that, the distribution of weights for the next iteration (W_ {t + 1} ) is updated using the update weight parameter (at}) followed by standardization. Finally, after the iterations “P”, the study hypothesis (The tax)) is returned as a weighted vote of each weak hypothesis.