The article is devoted to the search for optimal growth parameters of wheat shoots using neural networks. The main parameters affecting the growth of wheat shoots are determined. A training sample has been compiled for working with a neural network, and a neural network has been built. Using the forecasting method, the optimal parameters for the growth of wheat shoots were determined depending on the concentration of cadmium in the soil, soil moisture, ambient temperature, and the concentration of added bacteria Bacillus subtilis.
Forecasting the Growth of Wheat Shoots based on Neural Networks
Svetlana Mustafina1*, Natalya Uspenskaya1, Denis Smirnov2, Denis Yashin2 Sofia Mustafina1, Oleg Larin3
1 Dept. Mathematics and Information Technology, Bashkir State University, Ufa, Russia.
2Dept. Information Technology and Management Systems, K.G. Razumovsky Moscow State University of Technologies and Management (the First Cossack University), Moscow, Russia.
3Еlectric Power Department, South-West State University, Kursk, Russia.
ABSTRACT
The article is devoted to the search for optimal growth parameters of wheat shoots using neural networks. The main parameters affecting the growth of wheat shoots are determined. A training sample has been compiled for working with a neural network, and a neural network has been built. Using the forecasting method, the optimal parameters for the growth of wheat shoots were determined depending on the concentration of cadmium in the soil, soil moisture, ambient temperature, and the concentration of added bacteria Bacillus subtilis.
Keywords: wheat shoots, concentration Cd, Bacillus subtilis 26Д, ambient temperature, soil moisture, neural network.
INTRODUCTION
Currently, the soil contains a large number of various heavy metals that interfere with plant growth. Heavy metals are considered the most common environmental pollutants [1-3] because of the high utilization in agriculture and industries [4]. In this regard, various methods are being developed to improve the growth of cultivated plants, such as corn, wheat and rye, and others. The most popular way of improving plant growth is to add various pesticides and microorganisms. In this case, the most productive and safe way is the use of Bacillus subtilis.
Biological statistics is a branch of knowledge that allows the application of statistical methods in biology to analyze features with a continuous and discrete nature of distribution, to find existing patterns, and make decisions.
Finding optimally selected factors to obtain the best result is a difficult task for an analytical solution. It is necessary to take into account a different number of parameters that may affect the final result. To solve such problems, artificial neural networks are increasingly being used, which help build the dependence of input parameters on the output parameter and find the optimal solution to the problem.
COMPiling training sample FOR TRAINING A NEURAL NETWORK
To train the neural network in scientific research, the Statistica software package developed by StatSoft was used. Statistica software implements the functions of data analysis, data management, data mining, and data visualization using statistical methods [5, 6].
When conducting experiments on the effect of the dosage of Bacillus subtilis 26D on plant growth in contaminated soil with different metals, an array of data was obtained [7–9]. From the data set, the main parameters were selected that have a significant effect on the growth of shoots in plants. After selecting the main parameters, a data table was created for the neural network model in the Statistica program.
Table 1 presents the input and output data for creating a neural network model. The input parameters are the concentration of cadmium in the soil (mg/kg), soil moisture (%), ambient temperature (°C), and concentration of Bacillus Subtilis 26D (cell/ml). The output parameter is wheat shoots (cm). Table 1 shows part of the sample [7-9].
Table 1. Table with data sampling for a neural network
№ |
Input value |
Output value |
|||
Concentration Cd, mg/kg |
Soil moisture, % |
Ambient temperature, °C |
Bacillus subtilis 26D, cell/ml |
Wheat shoots, cm |
|
1 |
0 |
60 |
28 |
0 |
38 |
2 |
0 |
65 |
25 |
0 |
37 |
3 |
0 |
61 |
24 |
0 |
36 |
4 |
0 |
61 |
26 |
0 |
37 |
5 |
0 |
62 |
27 |
0 |
38 |
6 |
0 |
60 |
27 |
0 |
37 |
7 |
0 |
69 |
23 |
0 |
38 |
153 |
500 |
70 |
26 |
1000000 |
44 |
154 |
500 |
65 |
25 |
1000000 |
45 |
155 |
500 |
62 |
23 |
1000000 |
43 |
156 |
500 |
61 |
28 |
1000000 |
43 |
157 |
500 |
65 |
23 |
1000000 |
44 |
158 |
500 |
69 |
27 |
1000000 |
44 |
159 |
500 |
68 |
28 |
1000000 |
43 |
160 |
500 |
69 |
27 |
1000000 |
44 |
After creating a table with input and output parameters for training, testing, and validation of a neural network, we will start training the neural network in the Statistica software package.
CREATION, TRAINING, AND TESTING OF A NEURAL NETWORK IN THE STATISTICA SOFTWARE PACKAGE
To create a table with the source data, the following steps were taken. Statistica software was launched. The New tab was selected on the main panel, after which a window appeared in which Spreadsheet was selected to create a table in Statistica. In the Create New Document tab, in the Number of Variables field, the number 5 was entered, which corresponds to the number of variables. In the Number of Cases field, number 160 was entered. Double Data was selected as the Default Data Type. The location was selected in a separate window. The image format was common. Next, the OK button was pressed. The program created an empty table with five columns Var1, Var 2, Var 3, Var 4, Var 5, and 160 rows. The completed table is shown in Fig. 1.
Fig. 1. Filled Source Data Table.
After creating and filling the table with data, it is necessary to create, train, and test neural networks. In the toolbar in the Learning section, Neural Networks was selected. In the New Analysis field, an analysis type was selected from the submitted list to create a new Regression analysis and the OK button was pressed (Fig. 2).
Fig. 2. Neural Network creation.
Next, a new window for configuring the neural network opened, where three tabs were presented: Quick, Sampling (CNN and ANC), and Subsampling. Quick tab was selected. In the field for Strategy for Creating Predictive Models, Automated Network Search was marked (ANS). Then, the Variables button was clicked. Next, the variable window was selected for data analysis. In the first window, the output parameter (wheat shoots, cm) was selected, in the second one the input parameters (concentration of Cd, mg/kg; soil moisture,%; ambient temperature, °C; Bacillus subtilis 26D, cell/ml) were selected. Below in the line "Continuous Targets", the number 5 automatically appeared which is responsible for column number of the variable, and in the line "Continuous Inputs" 1-4 next to the inscription "Show Appropriate Variables Only" is unchecked and the OK button was pressed. After that, the selected values were appeared in the Variables field in the neural network settings window (Fig. 3.).
Fig. 3. The neural network architecture settings window.
Selecting the Sampling tab (CNN and ANS), in the Sampling Method group in the Train (%) field, we entered 70, Test (%) - 15, Validation (%) - 15, and Seed for Sampling - 1000 (Fig. 4.). After setting up the neural network, button OK was pressed.
Fig. 4. Sampling (CNN and ANS) Tab.
In the Automated Network Search (ANS) tab (Fig 5.), in the Network Types parameter group, a checkmark was placed next to the inscription MLP (Multilayer Repeater).
We determined the required number of neurons in the hidden layers of the perceptron according to the formula, which is a consequence of the theorems of Arnold-Kolmogorov-Hecht-Nielsen [10, 11]:
(1)
,
,
where Ny – output signal dimension (Ny = 1);
Nw – the required number of synaptic connections;
Nx – input signal dimension (Nx = 4);
Q – the number of elements in the set of training examples ().
By evaluating the required number of synaptic connections Nw using this formula, it is possible to calculate the required number of neurons in the hidden layers. For example, the number of neurons in the hidden layers of a two-layer perceptron will be equal to:
(2)
We calculated by formula (2) the maximum and minimum number of neurons.
The minimum number of neurons is:
The maximum number of neurons is:
According to the received data in the Min. Hidden Units was introduced 3, Max. Hidden Units - 35. In the Train/Retain Network group, in the Networks to Train field, we entered 20, Network to retain– 5, and select Train.
Fig. 4. Automated Network Search (ANS) Tab
After setting up the neural network, training began (Fig 5).
Fig. 5. Neural network training
Ready-made neural networks are shown in Fig. 6. The window presented 5 neural networks that have the smallest mean-square errors for the control and test samples.
Fig. 6. Ready-made neural networks
RESULTS
Having studied the 3D histograms of the dependencies presents in Fig. 7, we can conclude that the highest growth of wheat shoots is observed at the highest content of Bacillus Subtilis 26D equal to 1×106 cell/ml, at Ambient Temperature from 25 to 29 °С.
Fig. 7. 3D histograms of the dependencies: х (Input), у (Output), z (Target)
However, the dependence of soil moisture on the concentration of Cd had two options: with a concentration of Cd from 0-100 mg/kg, Soil moisture should be equal to 70-71%, and with a concentration of Cd from 150 to 300 mg/kg, soil moisture should be from 63% to 66 %. There is also a relationship between the concentration of Cd and ambient temperature [12]. The highest shoot growth at ambient temperature from 25 °C to 29 °C was observed when the Cd content in the soil was from 100 mg/kg to 300 mg/kg.
From this we can conclude the input parameters lying in the range were the best:
To predict the length of wheat shoots, we used the created neural network. To do this, in the Custom Inputs window, we added the selected parameters and examined the results. According to the table in Fig. 8, the neural network at number 5 had the best learning indicators, so when forecasting data, we focused on its indications [13, 14].
Fig. 8. Window «Summary»
We conducted the first experiment in predicting the output parameter using a neural network. We introduced the following parameters: Bacillus Subtilis 26D was 1×106 cell/ml, ambient temperature was 25 °C, soil moisture was 63%, Cd concentration was 0 mg/kg. To predict the output parameter, we selected Custom Inputs [15, 16]. Next, in the window that opens, we entered the minimum parameters and selected OK. With these parameters, the length of wheat shoots was 40.49 cm (Fig. 9).
Fig. 9. Prediction of wheat shoots length with the minimum selected parameters
Similarly, we forecasted the maximum parameters in the neural network. We introduced the following parameters: Bacillus Subtilis 26D was 1×106 cell/ml, the ambient temperature was 29 °C, soil moisture was 71%, Cd concentration was 300 mg/kg. With these parameters, the length of wheat shoots was 43.22 cm (Fig. 10).
Fig. 10. Prediction of wheat shoots length with the maximum selected parameters
Similarly, we carried out forecasting for the average selected parameters in the neural network. We introduced the following parameters: Bacillus Subtilis 26D was 1×106 cell/ml, the ambient temperature was 27 °C, soil moisture was 66%, Cd concentration was 150 mg/kg. With these parameters, the length of wheat shoots was 44.67 cm (Fig. 11).
Fig. 11. Prediction of wheat shoots length with the average selected parameters
Similarly, we predicted the shoot length for constant input variables, except for Cd concentration: Bacillus Subtilis 26D was 1×106 cell/ml, the ambient temperature was 29 °C, soil moisture was 71%, and Cd concentration varied from 0 mg/kg to 500 mg/kg. The results are presented in Fig. 12.
As you can see, the greatest length of wheat shoots is achieved with a small concentration of Cd equal to 100 mg/kg. If we compare the first study with the seventh one, we can see that the obtained dependence between Cd concentration and soil moisture is correct [17]. Therefore, it is necessary to choose this parameter correctly to get the best result.
Fig. 12. Prediction of the length of wheat shoots at different concentrations of Cd
Conclusions
When predicting the optimal parameters for the growth of wheat shoots, the following input parameter ranges were identified:
The results showed the applicability of neural networks for predicting plant growth and the optimal set of input parameters [18, 19]. Neural networks can also be used to predict situations that require immediate adjustment. This forecasting method can be used in laboratories, agriculture, as well as for educational purposes.
The reported study was funded by the Ministry of Science and Higher Education of the Russian Federation according to the research project FZWU-2020-0027.
REFERENCES