Based on Figure 5, PCA was applied to the data set to compare the compositional pattern between the analyzed water samples and to identify the factor that reflects with each other (Singh et al., ). PCA was performed on the raw dataset comprising all the 13 water quality parameters (DO, BOD, COD, SS, pH, NH3-NL, DS, TS, NO3, Cl-, PO4, E.coli, coliform) with 275 observations to identify the pollution sources. PCA is able to describe the relationship between analytical variables than single analytical variable alone. VF1 (Eigenvalue 4.213) represents 21.40% of the total variability in one axis (VF1) comprising DO, AN and PO4. VF1 represents moderate loading matrix of coliform and E. coli. while DO was negatively correlated to AN and PO4 owing to the decrease of DO values in the increasing AN and PO4 inputs in the water body at Kuantan River. VF2 explain DS, TS and Cl in new variable with strong factor loadings.
According to Table 1 and Figure 6a, DO, AN and PO4 were strongly correlated to VF1 (32.409% of variance) and a new variable termed as fertilizer waste which explains that NH4 likely to comes from the vicinity of animal farm and agricultural nonpoint source (Crowther et al., ; Singh et al., ; Song et al., ). Moderate loading of coliform and E. coli suggested minimum contribution of fecal pollution to the agriculture waste in Kuantan River Basin.
As shown in Table 1 and Figure 6b, surface runoff was named after VF2 (20.430% of variance) with high factor loadings for DS, TS and Cl-. While for VF3 (9.605% of variance) (Figure 6c) was strongly correlated with BOD and COD representing the influence of anthropogenic input typically organic pollution such as runoff from solids or waste disposal activities (Song et al., ). VF4 (9.110% of variance) and VF5 (7.858% of variance) were completely different from the other VFs owing to only one parameter that significantly related to their corresponding axis (Figure 6d and e). Thus, VF4 and VF5 were named as chemical and mineral changes (pH) and erosion (SS), respectively. The new variables created were further introduced to two different numerical modeling networks for WQI prediction and apportioning the sources that contribute to Kuantan River Basin.
In this study, factor scores from PCA after varimax rotation were used in receptor models development using MLR and ANN. Both models were further compared to evaluate the performance on the data set. The use of PC based models was considered more dynamic, due to elimination of collinearity problems and prediction improvement (Sousa et al., ). Moreover the utility of APCS that contain minimum input for both model compared to the raw data set was beneficial since it will increase the computational efficiency and interpretability and reduce the noise and redundancy for the model.
Referring to Table 2, the R
value for APCS-MLR model in this study is 0.87 and the model indicates that 87% variability of WQI explained by the five independent variables used in the model. While for adjusted R
it is always less than R
and increases only if the new term improve the model (Aertsen et al., ). Mean Square Error (MSE) and Root Mean Square Error (RMSE) measure residual errors which give estimation of the mean difference between observed and modeled values of WQI. The minimum value of MSE for APCS-MLR result (Table 2) corresponds to best network topology (Sousa et al., ).
Best model performance are Akaike’s Information Criteria (AIC) and Schwarz Bayesian Criteria (SBC) values and R
and adjusted R
values closet to unity (Aertsen et al., ). In general AIC, Bayesian Information Criteria (BIC) and SBC estimate the loss of accuracy caused by accounting a number of parameters and the number of data points used in its calibration. The small difference for AIC and SBC values signify that MLR was a fit method for WQI prediction. The high and great difference between values of AIC and SBC from the APCS-MLR model in this study (Table 2) indicate that the model has inadequacy in terms of fitness and robustness.
Based on Figure 7, fertilizer waste accounts as the highest pollution contributor to Kuantan River Basin while the next main contributor was anthropogenic input that may come from the vicinity area of Kuantan River basin. The negative standardized coefficient of independent variables (fertilizer waste, surface runoff, anthropogenic input and erosion) is based on negatively correlation to WQI values (as all the four independent variable decrease, WQI value increase). As shown in Figure 8, this proved that this model is able to predict WQI values from the varimax factor of PCA with negligible precision. In Figure 9, the verification and applicability of the model was influenced by the existence of the outlier observations as shown also in Figure 8.
APCS-ANN (WQI) was developed to investigate which pollution patterns contribute most to the Kuantan River Basin. Previously five VFs were generated from PCA after varimax rotation and the VFs were used as input parameter for ANN model. The five input parameters were fertilizer waste, surface runoff, and anthropogenic input, chemical and mineral changes and erosion and WQI as output. Based on Figure 10, APCS-ANN model developed produced good accuracy with R
value, 0.9680 (Table 3- all input) for both training and testing sets with 66.76% and 33.33% of the overall data set. The correlation coefficients for both set approach to 1 which further explain the network output almost equal to the output (Garcia and Shigidi, ) and high accuracy for the cross validation with minimum value of RMSE (Rossel and Behrens, ). As shown in Figure 9 the predicted WQI values from the training set are able to follow the pattern recognized by the training set and produce high reliability and goodness-of-fit. The RMSE was chosen as main criteria to determine model performance. The APCS-ANN model has low value of RMSE (2.6409) compared to the APCS-MLR model (5.6200).
As shown in Figure 11, although the range is quite broad, the residual data were evenly distributed in the zero values. Only few outliers and extreme values were identified which only contribute minimum error to model robustness. As shown in Table 3, APCS-ANN model with all input parameters were selected as the most appropriate model for WQI forecasting with high R
is (0.9680) and low RMSE (2.6409) as compared to other models. From the sensitivity analysis, the highest pollutant that contributed to Kuantan River Basin (WQI variation) was identified. Fertilizer waste (L-FW) accounted as the main pollution contributor (high percentage contribution, 59.82%) due to the exclusion of the parameters results in reduction of R
(0.8115) and high RMSE (6.9275) which signify the model. Anthropogenic input (L-AI) was identified as the second pollution contributor (percentage contribution, 22.48%), R
(0.9092) and RMSE (4.549) followed by surface runoff (L-SR), erosion (L-E) and lastly chemical and mineral changes (L-CMC) which were the least contributors as the inputs influencing the APCS-ANN model performance.
APCS-MLR in apportionment of sources affecting water quality reveals that industrial discharge contributed the highest pollutant of ammonia observed (Dalal et al., ). However, application of in APCS-ANN Kuantan River Basin indicates a better accuracy than APCS-MLR shows that this is not an industrialized region yet it is governed by agriculture (palm oil plantation); thus fertilizer seems to be the major contributor. Therefore this study is expected to establish the baseline comparison in identifying the pollution contribution for future water resources and management.