3048003073400030734030480000725424030480000

DATABASE MANAGEMENT SYSTEMS

COURSE CODE-CSE2004

FACULTY: Prof. Govinda K

TITLE-

ANALYSIS OF RAINFALL PATTERNS IN INDIA USING CORRELATION AND MULTIPLE CORRELATION ALGORITHMS

SUBMITTED BY

Saumya Gupta

(17BCE2157)

Nikhilesh Kumar

(17BCE2242)

ABSTRACT

Weather prediction is a complex and a challenging skill that includes observing and processing immense measures of data. Weather frameworks can run from little, just a couple of miles in breadth that last a few hours to extensive scale rain and snow storms up to a thousand miles in width and going on for a considerable length of time. The investigation of the climate framework is, to an expansive degree, the investigation of the statistics of weather; thus, it isn’t astonishing that statistical reasoning, analysis and displaying are inescapable in the climatological sciences. Statistical analysis evaluates the impacts of vulnerability, both regarding observation and measurement and as far as our comprehension of the procedures, that govern climate changeability. Statistical analysis additionally encourages us to recognize which of the numerous snippets of data got from observations of the climate framework are deserving of synthesis and interpretation. Role of correlations in weather forecasting and particularly in predicting rainfall patterns is well known and has been in use since decades. In our project we have used linear and multiple correlation algorithms to analyse rainfall patterns in across different states of India for different months.

INTRODUCTION

Weather prediction is a challenging issue witnessed by the world in the most recent decade. The prediction is becoming more complex because of the regularly changing weather conditions. Numerous models have been talked about for anticipating the weather data accepting the related attributes as independent variables. For effective analysis of the weather, it is important to comprehend different influencing factors that reason the weather changes. It is in this way important to recognize the relationship between these attributes for better comprehension of the weather data.

In our project we have implemented correlation and multiple correlation algorithms on a data set. Using our database we have analysed rainfall trends in 10 states-Rajasthan, Kerala, Punjab, Uttar Pradesh, Maharashtra ,Tamil Nadu ,Andhra Pradesh, Sikkim, Arunachal Pradesh ,West Bengal in the months on Jan- Feb ,March-May, June-September, October-December by finding linear and multiple correlations among the attributes.

LITERATURE REVIEW

Correlation analyzes the relative position of cases along two variables. Increase in level of one variable is associated with an increase in the other, the relationship is positive. On the off chance that an increase in one is associated with a decrease in the other, the relationship is negative (an inverse correlation). A correlation coefficient gives a more precise indication of the degree of the relationship between two variables. The value of a correlation coefficient can range from +1 (a perfect positive correlation) to – 1 (a perfect negative correlation). The null hypothesis is that there is no anticipated relationship between the two variables (correlation coefficient = 0).

The coefficient of multiple correlation is a measure of how well a given variable can be anticipated utilizing a linear function of a set of different variables. It is the correlation between the variable’s values and the best predictions that can be computed linearly from the prescient variables.

The coefficient of multiple correlation takes values between 0 and 1; a higher value demonstrates a superior consistency of the dependent variable from the independent variables, with a value of 1 showing that the predictions are actually right and a value of 0 demonstrating that no linear blend of the independent variables is a superior indicator than is the fixed mean of the dependent variable.

The coefficient of multiple correlation is computed as the square root of the coefficient of determination, yet under the specific assumptions that an intercept is incorporated and that the most ideal linear indicators are utilized, while the coefficient of determination is characterized for more general cases, including those of nonlinear expectation and those in which the anticipated values have not been gotten from a model-fitting procedure.

While a noteworthy correlation between variables enables us to make predictions from one to the next, it doesn’t set up a causal relationship.

The correlation coefficient is bound between – 1 and 1 and reveals to you the linear relationship between these two variables. A coefficient near 1 means a solid and positive association between the two variables (when one of them develops, alternate does, likewise, and when one of them decreases, the other one does likewise).

A coefficient near – 1 means solid negative association between the two variables, this is, perceptions with a vast value in one of the variables have a tendency to have a little value in the other variable or vice-versa. A coefficient near 0 means no linear relation between the two variables. You must be watchful with the accompanying issues:

1)Association does not mean necessarily a causal relation between both variables. For instance, there may be a third variable you have not considered and this third variable may be the explanation for the conduct of the other two.

2)Even if there is a causal relationship between the variables, the correlation coefficient does not disclose to you which variable the cause is and which is the effect.

3)If the coefficient is near 0, it doesn’t necessarily mean that there is no relation between the two variables. It means there is anything but a LINEAR relationship, however there may be another kind of functional relationship (for instance, quadratic or exponential).

3048003073400030734030480000725424030480000Proposed methods

Correlation is a mutual relationship or connection between two or more things. In statistical terms, it means interdependence of variable quantities. In common usage it most often refers to how close two variables are having a linear relationship with each other. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. when used in a technical sense, correlation refers to any of several specific types of relationship between mean values. There are several correlation coefficients, often denoted ? or r, measuring the degree of correlation. The most common of these is the Pearson correlation coefficient, which is sensitive only to a linear relationship between two variables

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. It is the correlation between the variable’s values and the best predictions that can be computed linearly from the predictive variables. The coefficient of multiple correlation, denoted R, is a scalar that is defined as

the Pearson correlation coefficient between the predicted and the actual values of the dependent variable in a linear regression model that includes an intercept.

Result Analysis

Here considering 1 as jan-feb,2 as march-may,3 as june-sept,4 as oct-dec spans the coefficients of linear correlations are-

r12=.676821401

r13=.087790086

r14=.109499623

r23=.552589652

r24=.134637126

r34=-.09273779

The coefficients of multiple correlations are-

r1.23=.609311062

r1.24=.673633578

r1.34=.175537735

r2.34=.580862634

r3.24=.572721217

r4.23=.215853173

r1.234=.609311062

r2.134=.884116556

r3.124=.567235428

r4.123=.197204323

After analysing the coefficients of correlations for mean rainfall for different months following conclusions can be made-

Rainfall patterns in the months of January to February and March to May are moderately correlated in positive linear pattern. Rainfall patterns in the months of January to February and June to September are very weekly correlated in positive linear pattern. Rainfall patterns in the months of January to February and October to December are very weekly correlated in positive linear pattern. Rainfall patterns in the months of March to May and June to September are moderately correlated in positive linear pattern. Rainfall patterns in the months of March to May and October to December are very weekly correlated in positive linear pattern. Rainfall patterns in the months of June to September and October to December are very weekly correlated in negative linear pattern. Effect on mean rainfall for the month of January to February of the months of March to May and June to September altogether are moderately correlated in positive linear pattern. Effect on mean rainfall for the month of January to February of the months of June to September and October to December are very weekly correlated in positive linear pattern. Effect on mean rainfall for the month of January to February of the months of March to May and October to December are moderately correlated in positive linear pattern. Effect on mean rainfall for the month of March to May of the months of June to September October to December are moderately correlated in positive linear pattern. Effect of mean rainfall for the month of October to December on the months of June to September and March to May are weekly correlated in positive linear pattern. Effect of mean rainfall for the month of June to September on the months of March to May and October to December are moderately correlated in positive linear pattern. Effect of mean rainfall for the month of January to February on the months June to September, March to May and October to December are moderately correlated in positive linear pattern. Effect of mean rainfall for the month of March to May on the months January to February, June to September and October to December are strongly correlated in positive linear pattern. Effect of mean rainfall for the month of June to September on the months January to February, March to May and October to December are moderately correlated in positive linear pattern. Effect of mean rainfall for the month of October to December on the months January to February, March to May and June to September are very weekly correlated in positive linear pattern

By analysing how rainfall patterns in different months in different states of India are related important predictions can be made. Graphs showing the dependence of rainfall patterns of different months are shown below.

The graph between mean rainfall for the month of January to February and March to May

The graph between mean rainfall for the month of January to February and October to December

The graph between mean rainfall for the month of March to May and June to September

The graph between mean rainfall for the month of March to May and October to December

The graph between mean rainfall for the month of June to September and October to December

The graph between mean rainfall for the month of January to February and June to September

Graph between coefficient of multiple correlation versus mean rainfall in different months

Conclusions

The project is based on correlation and multiple correlation algorithms. Using our database we will be analysing rainfall trends in 10 states-Rajasthan, Kerala, Punjab, Uttar Pradesh, Maharashtra ,Tamil Nadu ,Andhra Pradesh, Sikkim, Arunachal Pradesh ,West Bengal of India in the months on Jan- Feb ,March-May, June-September, October-December by finding total and partial correlations among the attributes. The coefficients of correlation will determine how strongly or weakly the attributes are related which will helps in analysing what is rainfall pattern in India. Climatic patterns of a region remain constant for a very long span of time. By analysing the mean rainfall in different months we got the linear and multiple correlation coefficients which can be used as a reference for forecasting weather in coming years. The coefficient of correlation lies in the range of -1 to 1. If the coefficient of correlation is greater than 0.5 we call it strong correlation, if it is equal to 0.5 we call it moderately correlated else it is a weak correlation. If the coefficient of correlation is positive it means that the increase in value of one will cause an increase in value of another parameter. If the coefficient of correlation is negative it means that the increase in value of one will cause a decrease in value of another parameter By observing the correlation coefficients we can observe that if mean rainfall of one month is strong correlated to mean rainfall of other month/months then mean rainfall in those month/months will strongly affect the mean rainfall this particular month. We can observe that multiple correlation is a better way of weather forecasting than linear correlation as with the help of it we can know how mean rainfall in many different months as a whole will affect rainfall in one particular month.

References-

https://journals.ametsoc.orghttps://www.kaggle.com/rajanand/rainfall-in-indiawww.wikipedia.com https://study.com/…/pearson-correlation-coefficient-formula-example-significance.htmlProbability and statistics for engineers and scientists book by- Walpole and mayers

https://docs.oracle.com