The purpose of this study is to examine the application of data mining techniques in the prediction of climate effect on agriculture with discussion on different data mining methods which are helpful in building a predictive data mining model.
The Hybrid Knowledge Discovery Process for Data Mining is followed to build the predictive model that analyzes and predicts the agricultural output. This methodology was developed, by adopting the Cross-Industry Standard Processes for Data Mining (CRISP-DM) model to the needs of academic research community. The work is based on finding suitable data sets as well as best predictive model that helps in achieving high accuracy and generality for Agricultural output for the selected climatic indicators (Rainfall, Temperature and Humidity). For solving this problem, different data mining classification techniques (Eight WEKA classifiers: Gaussian Processes, Linear Regression, Multilayer Percepton, SMOreg, Decision Table, M5Rules, M5P and REPTree) were evaluated on different data sets.
The experimental results obtained from this study shows; the Decision Table Classifier has the highest optimal accuracy score with maximum optimal Correlation Coefficient Percentage (CCP) of 97.8%, minimum optimal Root Mean Square Percentage Error (RMSPE) of 3.9% and an optimal Time of 0.02 seconds to build the Agricultural Output Predictive model.
Finally, by extending WEKA software source code, an application (predictive-model-prototype) which is termed as “Agricultural Output Predictive System” with a user-friendly GUI is developed and deployed for the usage of domain experts (end users). Therefore, the results obtained from this research indicate that data mining classification models are very useful in predicting agricultural outcomes for the effective and efficient utilization of available climatic data to support experts and farmers in making strategic planning as well as proactive and knowledge-driven decisions.
1.1 BACKGROUND TO THE STUDY
Prediction of climate effect plays an essential role for agriculture and other industrial sector. A vast proportion of agricultural activities are convincingly affected by climate conditions. From the short-term perspective, the temperature and precipitation are essential condition for crop growth and yield in agriculture. In agricultural practices, every crop has its own minimum, optimal, maximum temperature for growing. The crop stops growing when the temperature goes below the minimum temperature. The crop growth increased as the temperature goes up from the minimum to the maximum temperature. However the crop growth decreases as the temperature goes beyond its optimal temperature to the maximum temperature. The crop growth stops again when the temperature reaches its maximum temperature. Warmer temperature may favor some crops to grow more quickly and increase their yields, but it could also reduce growth and yields for other kinds of crops. So accurate prediction of future climate effect and weather condition could help farmers select the proper crops in order to increase growth and yield as well as economic incomes. The fast growth of crops such as grains may reduce the amount of time that seeds need to grow and mature (Semenov and Porter, 2005). The crop growth not only depends on the temperature, but also on soil water and nutrient elements such as nitrogen, phosphorus and potassium which are all connected to the climatic conditions. The soil nutrients are absorbed by crops with soil water adsorption. The proper soil moisture is strongly related to the precipitation.
Furthermore, the cyclic distribution of rainfall/precipitation also affects agriculture (the crop growth and yields). Similar to responding to the temperature, some crops favor the wet climatic condition, others favor dry climatic condition. So accurate prediction of future climatic effect could help farmers select the crops too in order to maximum crop growth yields as well as economic income. However, this study will examine the application of data mining techniques in the prediction of climatic effect on agriculture.
Moreover, data mining is a practical and functional technique to find the helpful pattern from the huge dataset. So it secured an important place in agriculture because the field agriculture contains the many data such as soil data, crop data, and climate data and so on. Real time climatic data is difficult to analyze and manage so various algorithms in data mining like K-Means clustering, Apriori algorithms and other statistical methods are used to analyze the agriculture data and provide the useful pattern. The climate create the great impact on the agriculture so the crop growth and crop yield level are depends on the climate. Real time climatic data can offer helps to the farmers for planting a particular variety of crop because it gives high yield and also this real time data helps to alert the farmers for protecting their agriculture field from the climatic disasters. Agro climatic research centers and meteorological departments provide real time data to the farmers.
If a particular crop is planted during a suitable climate it will provide the good yield so the economic level of the country can improve. So there is need to predict the suitable climate for planting a crop because the climatic vulnerability and agriculture vulnerability to climate can affect the yield level. Elicitation and analysis of historical climate data and crop yield level of the particular region can helps to predict the future climatic condition of that particular region. Analysis is required for finding the future climatic conditions of particular region where the data mining plays an important role for analyzing historical climate data and find the required solution.
However, making an accurate prediction of climate is one of the major challenges facing meteorologist all over the world. Since ancient times, prediction of climate effect has been one of the most interesting and fascinating domain. Scientists have tried to forecast meteorological characteristics using a number of methods, some of these methods being more accurate than others (Elia, 2009).
Prediction of climate change entails forecasting how the present state of the atmosphere will change. Present climatic conditions are obtained by ground observations, observations from ships and aircraft, radiosondes, Doppler radar, and satellites. This information is sent to meteorological centers where the data are collected, analyzed, and made into a variety of charts, maps, and graphs. Modern high-speed computers transfer the many thousands of observations onto surface and upper-air maps. Computers draw the lines on the maps with help from meteorologists, who correct for any errors. A final map is called an analysis. Computers not only draw the maps but predict how the maps will look sometime in the future. In predicting the climate effect by numerical means, meteorologists have developed atmospheric models that approximate the atmosphere by using mathematical equations to describe how atmospheric temperature, pressure, and moisture will change over time. The equations are programmed into a computer and data on the present atmospheric conditions are fed into the computer. The computer solves the equations to determine how the different atmospheric variables will change over the next few minutes. The computer repeats this procedure again and again using the output from one cycle as the input for the next cycle. For some desired time in the future, the computer prints its calculated information. It then analyzes the data, drawing the lines for the projected position of the various pressure systems. The final computer-drawn forecast chart is called a prognostic chart, or prog. A forecaster uses the progs as a guide to predicting the climate effect. There are many atmospheric models that represent the atmosphere, with each one interpreting the atmosphere in a slightly different way. The effects, or impacts, of climate change may be physical, ecological, social or economic. It is predicted that future climate changes will include further global warming (that is, an upward trend in global mean temperature), sea level rise, and a probable increase in the frequency of some extreme weather events (Wikipedia, 2010).
Data mining, also called Knowledge Discovery in Databases (KDD), is the field of discovering novel and potentially useful information from large amounts of data (Rushing, Ramachandran, Nair, Graves, Welch and Lin, 2005). In contrast to standard statistical methods, data mining techniques search for interesting information without demanding a priori hypotheses, the kind of patterns that can be discovered depend upon the data mining tasks employed. By and large, there are two types of data mining tasks: descriptive data mining tasks that describe the general properties of the existing data and predictive data mining tasks that attempt to do predictions based on inference on available data. This techniques are often more powerful, flexible, and efficient for exploratory analysis than the statistical techniques (Bregman and Mackenthun, 2006). The most commonly used techniques in data mining are: Artificial Neural Networks, Genetic Algorithms, Rule Induction, Nearest Neighbor method, Memory-Based Reasoning, Logistic Regression, Discriminant Analysis and Decision Trees.
1.2 STATEMENT OF THE PROBLEM
The earth system and climate condition has been changed by the human activities. The third report of the Intergovernmental Panel on Climate Change (ICPP) stated that there is now new and stronger evidence that most of the warming observed over the last 50 years is attributable to human activities. The global warming is referred to the increase in the mean air temperature as a result of increased atmospheric loading of greenhouse gases such as carbon dioxide from fossil fuel combustion. All these go a long way to affect agricultural activities. So the prediction of climate effect on agriculture in terms of temperature and precipitation is critical to maintain a productive agricultural sector.
In this research, both Artificial Neural Networks (ANN) and Decision Trees (DT) will be used to analyze meteorological data gathered from the Nigerian meteorological agency station over the period of ten years (2006 - 2016), in-order to develop classification rules for the climate parameters over the study period and for the prediction of future climatic conditions on agriculture using available historical data. The targets for the prediction are those climatic effects that affect agricultural activities daily like changes in minimum and maximum temperature, rainfall, evaporation and wind speed.
1.3 OBJECTIVES OF THE STUDY
The general objective of this study is to examine the application of data mining techniques in the prediction of climate effect on agriculture while the following are the specific objectives:
- To examine the application of data mining techniques in the prediction of atmospheric temperature effect on agriculture.
- To examine the application of data mining techniques in the prediction of rainfall effect on agriculture
- To examine the application of data mining techniques in the prediction of evaporation effect on agriculture
1.4 RESEARCH QUESTIONS
- What is the prediction of atmospheric temperature effect on agriculture using data mining technique?
- What is the prediction of rainfall effect on agriculture using data mining technique?
- What is the prediction of evaporation effect on agriculture using data mining technique?
1.5 SIGNIFICANCE OF THE STUDY
With the increase of economic globalization and evolution of information technology, climate data are being generated and accumulated at an unprecedented pace. As a result, there has been a critical need for automated approaches to effective and efficient utilization of massive amount of climate data to support farms and individuals in strategic planning and cultivation of agricultural products. Data mining techniques have been used to uncover hidden patterns and predict future trends and climatic effect on agriculture. The competitive advantages achieved by data mining include increased yield, revenue, reduced cost, and much improved agricultural activities.
Thus, the significance and applicability of this research is very high for a country like Nigeria where agricultural activities remain a major source of income that will have a massive impact on individual farmers and national economy growth. Utilizing this model helps the agricultural sector to build a strategic plans and approach in order to achieve a better agricultural output with an optimal profit easily.
This research will also be a contribution to the body of literature in the area of the application of data mining techniques in the prediction of climate effect on agriculture, thereby constituting the empirical literature for future research in the subject area.
1.6 SCOPE/LIMITATIONS OF THE STUDY
This study will cover the sub-variables that are associated with climate effect on agriculture which includes the temperature, rainfall, moisture and evaporation. These current data on climate will be used to predict future agricultural outcomes in Nigeria using data mining method. The data that will be used will cover the period of 10 years between 2006 and 2016.
LIMITATION OF STUDY
Financial constraint- Insufficient fund tends to impede the efficiency of the researcher in sourcing for the relevant materials, literature or information and in the process of data collection (internet, questionnaire and interview).
Time constraint- The researcher will simultaneously engage in this study with other academic work. This consequently will cut down on the time devoted for the research work.
Semenov, Mikhail A., and J. R. Porter. ”Climatic variability and the modelling of crop yields.” Agricultural and forest meteorology 73.3 (2005): 265-283.
Elia G. P., 2009, “A Decision Tree for Weather Prediction”, Universitatea Petrol-Gaze din Ploiesti, Bd. Bucuresti 39, Ploiesti, Catedra de Informatică, Vol. LXI, No. 1
Wikipedia, 2010, "Effects of Global Warming" From Wikipedia - the free encyclopedia, retrieved from http://en.wikipedia.org/wiki/Effects_of_Global_Warming in March 2010
Bregman, J.I., Mackenthun K.M., 2006, Environmental Impact Statements, Chelsea: MI Lewis Publication