• Subject Name : Statistics

## Statistical Analysis Research Report

Introduction

Variable Selection

Descriptive Statistics

Missing Data Handling

Graphical Data Visulization

Non Parameteric Testing for Nominal Data

Correltaion Analysis

### Introduction to

The information set given for this errand has been drawn from the freely accessible PISA information (the OECD's Program for Worldwide Understudy Evaluation), in this case the ‘school survey’, ordinarily completed by each test school’s central.

#### Part 1 Variables Under Consideration

Let us consider these 3 variables into account:

1. Total No. of interactive whiteboards in the school altogether
2. Total No. of data projectors in the school altogether
3. Total No. of computers with internet connection available for teachers in the school.

These 3 variables are numeric scale type as far as datatype is concerned. We are also interested in knowing if there exists any kind of significant relationship between the variables. We are interested in knowing the summary descriptive statistics of these variables along with the distribution of the data.

Frequency count:

 Statistics Total No. of interactive whiteboards in the school altogether Total No. of data projectors in the school altogether Total No. of computers with internet connection available for teachers in the school. N Valid 677 675 678 Missing 108 110 107

Here the missing data indicates the blank responses given by the correspondent along with few that have indicated as system missing. Both these data’s together constitute the missing values.

Issues with Missing Data:

The concept of lost values is vital to get it in arrange to effectively oversee information. In case the lost values are not taken care of appropriately by the analyst, at that point he/she may conclusion up drawing an wrong deduction almost the information. Due to dishonorable dealing with, the result gotten by the analyst will contrast from ones where the lost values are show.

Ways of Handling Missing data:

The analyst may take off the information or do information ascription to supplant the them. Assume the number of cases of lost values is greatly little; at that point, a master analyst may drop or overlook those values from the investigation. In measurable dialect, on the off chance that the number of the cases is less than 5% of the test, at that point the analyst can drop them. In the case of multivariate investigation, on the off chance that there's a bigger number of lost values, at that point it can be superior to drop those cases (instead of do ascription) and supplant them. On the other hand, in univariate examination, imputation can diminish the sum of inclination within the information, on the off chance that the values are lost at random.

### Overall summary Descriptive Statistics

 N Range Minimum Maximum Mean Std. Deviation Variance Skewness Kurtosis Statistic Statistic Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error Total No. of data projectors in the school altogether 675 200 0 200 38.30 28.821 830.675 1.351 .094 3.794 .188 Total No. of interactive whiteboards in the school altogether 677 129 0 129 18.71 19.267 371.234 1.650 .094 3.805 .188 Total No. of computers with internet connection available for teachers in the school. 678 3300 0 3300 128.93 223.660 50023.785 8.477 .094 94.697 .187 Valid N (listwise) 672

In our case SPSS by itself considers only the valid data as part of the analysis. Here we have chosen to not impute the missing values and hence SPSS itself only takes the valid observations into consideration while deriving the descriptive statistics for the variables under consideration. The average number of data projectors in the school altogether is around 29 while the average interactive whiteboards in the school accounts to 19. The average number of computers with internet connections available to teachers happens to be approximately equal to 129.

We observe that for all the three variables the skewness values are greater than1. Hence we can conclude that variables are all highly skewed variables. The kurtosis values for all the three variables indicate to be greater than 3. Hence, we conclude that the variables are leptokurtic in nature. This means the tails are longer and flatter with the central peaks higher and sharper.

### Relationships Between the Variables

 Correlations Total No. of interactive whiteboards in the school altogether Total No. of data projectors in the school altogether Total No. of computers with internet connection available for teachers in the school. Total No. of interactive whiteboards in the school altogether Pearson Correlation 1 .222** .055 Sig. (2-tailed) .000 .153 N 677 673 675 Total No. of data projectors in the school altogether Pearson Correlation .222** 1 .328** Sig. (2-tailed) .000 .000 N 673 675 674 Total No. of computers with internet connection available for teachers in the school. Pearson Correlation .055 .328** 1 Sig. (2-tailed) .153 .000 N 675 674 678 **. Correlation is significant at the 0.01 level (2-tailed).

If we look at the correlation table we observe that the total number of interactive whiteboards in the school and the total number of projectors in the school are related linearly. Around 22% of linear relationship exists and statistically the p_value indicates significance as p_value is <0.05.

Similarly there happens to be a linear relationship between the data projectors and computers connected with the internet. The linear relationship happens to be 32.8% and p_value indicates that this relationship too is significant.

The QQ plots in order to check for the variable Distribution:

 Case Processing Summary Total No. of interactive whiteboards in the school altogether Total No. of data projectors in the school altogether Total No. of computers with internet connection available for teachers in the school. Series or Sequence Length 785 785 785 Number of Missing Values in the Plot User-Missing 26 28 25 System-Missing 82 82 82 The cases are unweighted.

 Estimated Distribution Parameters Total No. of interactive whiteboards in the school altogether Total No. of data projectors in the school altogether Total No. of computers with internet connection available for teachers in the school. Normal Distribution Location 18.71 38.30 128.93 Scale 19.267 28.821 223.660 The cases are unweighted.

Let us consider a variable which has a nominal datatype. In this particular dataset let us particularly look at the variable addressing the following question

Which of the following definitions best describes the community in which your school is located? Varaile name: (SC001C01TA_AU)

### Frequency Distribution

 Which of the following definitions best describes the community in which your school is located? Frequency Percent Valid Percent Cumulative Percent Valid A small rural community (with fewer than 1 000 people) 15 1.9 2.2 2.2 A small country town (1 000 to about 3 000 people) 35 4.5 5.1 7.2 A medium-sized country town (3 000 to about 15 000 people) 76 9.7 11.0 18.2 A larger town (15 000 to about 50 000 people) 71 9.0 10.2 28.4 A very large town (50 000 to about 100 000 people) 57 7.3 8.2 36.7 A city (100 000 to about 1 000 000 people) 178 22.7 25.7 62.3 Close to the centre of a very large city (with over 1 000 000 people) 126 16.1 18.2 80.5 Elsewhere in a very large city (with over 1 000 000 people) 135 17.2 19.5 100.0 Total 693 88.3 100.0 Missing No Response 18 2.3 System 74 9.4 Total 92 11.7 Total 785 100.0

Here we observe that the total number of invalid responses or missing values is around 92 which accounts to 11% in comparison to the entire data set. The frequency indicates the school is closely located and accessible to the city and large city premises.

Graphical description of the Variable:

We would like to see the relation between the community in which the school is located with the Geographical location of the school. In order to find this relation we look towards constructing cross tables and get details about the distribution of the data.

 Which of the following definitions best describes the community in which your school is located? Total A small rural community (with fewer than 1 000 people) A small country town (1 000 to about 3 000 people) A medium-sized country town (3 000 to about 15 000 people) A larger town (15 000 to about 50 000 people) A very large town (50 000 to about 100 000 people) A city (100 000 to about 1 000 000 people) Close to the centre of a very large city (with over 1 000 000 people) Elsewhere in a very large city (with over 1 000 000 people) Geographic location of school (major categories) Metropolitan 0 2 12 18 24 162 123 135 476 Provincial 6 22 58 46 33 14 2 0 181 Remote 6 8 6 6 0 1 0 0 27 Total 12 32 76 70 57 177 125 135 684

We would like to see if these 2 variables are significantly related or not. In order to do this we have conducted the chi-square testing

 Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 499.390a 14 .000 Likelihood Ratio 503.241 14 .000 Linear-by-Linear Association 370.938 1 .000 N of Valid Cases 684 a. 7 cells (29.2%) have expected count less than 5. The minimum expected count is .47.

The chi-square test is significant as we see the p_value is <0.05. This indicates both these variables are significantly related to each other.

### References for Applied Nonparametric Statistical Methods

Sprent, P. (1989), Applied Nonparametric Statistical Methods(Second ed.), Chapman & Hall

Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley

Hollander M., Wolfe D.A., Chicken E. (2014). Nonparametric Statistical Methods, John Wiley & Sons

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Statistics Assignment Help

## Get It Done! Today

• 1,212,718Orders

• 4.9/5Rating

• 5,063Experts

### Highlights

• 21 Step Quality Check
• 2000+ Ph.D Experts
• Live Expert Sessions
• Dedicated App
• Earn while you Learn with us
• Confidentiality Agreement
• Money Back Guarantee
• Customer Feedback

### Just Pay for your Assignment

• Turnitin Report

\$10.00

\$9.00Per Page
• Consultation with Expert

\$35.00Per Hour
• Live Session 1-on-1

\$40.00Per 30 min.
• Quality Check

\$25.00
• Total

Free
• Let's Start

+ View More

Get
500 Words Free