Applied Nonparametric Statistical Methods Assignment Sample

Subject Name : Statistics

Statistical Analysis Research Report

Table of Contents

Introduction

Variable Selection

Descriptive Statistics

Missing Data Handling

Graphical Data Visulization

Non Parameteric Testing for Nominal Data

Correltaion Analysis

Introduction to

The information set given for this errand has been drawn from the freely accessible PISA information (the OECD's Program for Worldwide Understudy Evaluation), in this case the ‘school survey’, ordinarily completed by each test school’s central.

Part 1 Variables Under Consideration

Let us consider these 3 variables into account:

Total No. of interactive whiteboards in the school altogether
Total No. of data projectors in the school altogether
Total No. of computers with internet connection available for teachers in the school.

These 3 variables are numeric scale type as far as datatype is concerned. We are also interested in knowing if there exists any kind of significant relationship between the variables. We are interested in knowing the summary descriptive statistics of these variables along with the distribution of the data.

Frequency count:

Statistics
		Total No. of interactive whiteboards in the school altogether	Total No. of data projectors in the school altogether	Total No. of computers with internet connection available for teachers in the school.
N	Valid	677	675	678
N	Missing	108	110	107

Here the missing data indicates the blank responses given by the correspondent along with few that have indicated as system missing. Both these data’s together constitute the missing values.

Issues with Missing Data:

The concept of lost values is vital to get it in arrange to effectively oversee information. In case the lost values are not taken care of appropriately by the analyst, at that point he/she may conclusion up drawing an wrong deduction almost the information. Due to dishonorable dealing with, the result gotten by the analyst will contrast from ones where the lost values are show.

Ways of Handling Missing data:

The analyst may take off the information or do information ascription to supplant the them. Assume the number of cases of lost values is greatly little; at that point, a master analyst may drop or overlook those values from the investigation. In measurable dialect, on the off chance that the number of the cases is less than 5% of the test, at that point the analyst can drop them. In the case of multivariate investigation, on the off chance that there's a bigger number of lost values, at that point it can be superior to drop those cases (instead of do ascription) and supplant them. On the other hand, in univariate examination, imputation can diminish the sum of inclination within the information, on the off chance that the values are lost at random.

Overall summary Descriptive Statistics


	N	Range	Minimum	Maximum	Mean	Std. Deviation	Variance	Skewness		Kurtosis
	Statistic	Statistic	Statistic	Statistic	Statistic	Statistic	Statistic	Statistic	Std. Error	Statistic	Std. Error
Total No. of data projectors in the school altogether	675	200	0	200	38.30	28.821	830.675	1.351	.094	3.794	.188
Total No. of interactive whiteboards in the school altogether	677	129	0	129	18.71	19.267	371.234	1.650	.094	3.805	.188
Total No. of computers with internet connection available for teachers in the school.	678	3300	0	3300	128.93	223.660	50023.785	8.477	.094	94.697	.187
Valid N (listwise)	672

In our case SPSS by itself considers only the valid data as part of the analysis. Here we have chosen to not impute the missing values and hence SPSS itself only takes the valid observations into consideration while deriving the descriptive statistics for the variables under consideration. The average number of data projectors in the school altogether is around 29 while the average interactive whiteboards in the school accounts to 19. The average number of computers with internet connections available to teachers happens to be approximately equal to 129.

We observe that for all the three variables the skewness values are greater than1. Hence we can conclude that variables are all highly skewed variables. The kurtosis values for all the three variables indicate to be greater than 3. Hence, we conclude that the variables are leptokurtic in nature. This means the tails are longer and flatter with the central peaks higher and sharper.

Relationships Between the Variables

Correlations
		Total No. of interactive whiteboards in the school altogether	Total No. of data projectors in the school altogether	Total No. of computers with internet connection available for teachers in the school.
Total No. of interactive whiteboards in the school altogether	Pearson Correlation	1	.222^**	.055
	Sig. (2-tailed)		.000	.153
	N	677	673	675
Total No. of data projectors in the school altogether	Pearson Correlation	.222^**	1	.328^**
	Sig. (2-tailed)	.000		.000
	N	673	675	674
Total No. of computers with internet connection available for teachers in the school.	Pearson Correlation	.055	.328^**	1
	Sig. (2-tailed)	.153	.000
	N	675	674	678
**. Correlation is significant at the 0.01 level (2-tailed).

If we look at the correlation table we observe that the total number of interactive whiteboards in the school and the total number of projectors in the school are related linearly. Around 22% of linear relationship exists and statistically the p_value indicates significance as p_value is <0.05.

Similarly there happens to be a linear relationship between the data projectors and computers connected with the internet. The linear relationship happens to be 32.8% and p_value indicates that this relationship too is significant.

The QQ plots in order to check for the variable Distribution:

Case Processing Summary
		Total No. of interactive whiteboards in the school altogether	Total No. of data projectors in the school altogether	Total No. of computers with internet connection available for teachers in the school.
Series or Sequence Length		785	785	785
Number of Missing Values in the Plot	User-Missing	26	28	25
Number of Missing Values in the Plot	System-Missing	82	82	82
The cases are unweighted.

Estimated Distribution Parameters
		Total No. of interactive whiteboards in the school altogether	Total No. of data projectors in the school altogether	Total No. of computers with internet connection available for teachers in the school.
Normal Distribution	Location	18.71	38.30	128.93
Normal Distribution	Scale	19.267	28.821	223.660
The cases are unweighted.

Let us consider a variable which has a nominal datatype. In this particular dataset let us particularly look at the variable addressing the following question

Which of the following definitions best describes the community in which your school is located? Varaile name: (SC001C01TA_AU)

Frequency Distribution

Which of the following definitions best describes the community in which your school is located?
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	A small rural community (with fewer than 1 000 people)	15	1.9	2.2	2.2
	A small country town (1 000 to about 3 000 people)	35	4.5	5.1	7.2
	A medium-sized country town (3 000 to about 15 000 people)	76	9.7	11.0	18.2
	A larger town (15 000 to about 50 000 people)	71	9.0	10.2	28.4
	A very large town (50 000 to about 100 000 people)	57	7.3	8.2	36.7
	A city (100 000 to about 1 000 000 people)	178	22.7	25.7	62.3
	Close to the centre of a very large city (with over 1 000 000 people)	126	16.1	18.2	80.5
	Elsewhere in a very large city (with over 1 000 000 people)	135	17.2	19.5	100.0
	Total	693	88.3	100.0
Missing	No Response	18	2.3
	System	74	9.4
	Total	92	11.7
Total		785	100.0

Here we observe that the total number of invalid responses or missing values is around 92 which accounts to 11% in comparison to the entire data set. The frequency indicates the school is closely located and accessible to the city and large city premises.

Graphical description of the Variable:

We would like to see the relation between the community in which the school is located with the Geographical location of the school. In order to find this relation we look towards constructing cross tables and get details about the distribution of the data.

		Which of the following definitions best describes the community in which your school is located?								Total
		A small rural community (with fewer than 1 000 people)	A small country town (1 000 to about 3 000 people)	A medium-sized country town (3 000 to about 15 000 people)	A larger town (15 000 to about 50 000 people)	A very large town (50 000 to about 100 000 people)	A city (100 000 to about 1 000 000 people)	Close to the centre of a very large city (with over 1 000 000 people)	Elsewhere in a very large city (with over 1 000 000 people)	Total
Geographic location of school (major categories)	Metropolitan	0	2	12	18	24	162	123	135	476
	Provincial	6	22	58	46	33	14	2	0	181
	Remote	6	8	6	6	0	1	0	0	27
Total		12	32	76	70	57	177	125	135	684

We would like to see if these 2 variables are significantly related or not. In order to do this we have conducted the chi-square testing

Chi-Square Tests
	Value	df	Asymp. Sig. (2-sided)
Pearson Chi-Square	499.390^a	14	.000
Likelihood Ratio	503.241	14	.000
Linear-by-Linear Association	370.938	1	.000
N of Valid Cases	684
a. 7 cells (29.2%) have expected count less than 5. The minimum expected count is .47.

The chi-square test is significant as we see the p_value is <0.05. This indicates both these variables are significantly related to each other.

References for Applied Nonparametric Statistical Methods

Sprent, P. (1989), Applied Nonparametric Statistical Methods(Second ed.), Chapman & Hall

Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley

Hollander M., Wolfe D.A., Chicken E. (2014). Nonparametric Statistical Methods, John Wiley & Sons

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Statistics Assignment Help