Table of Contents
Introduction
Variable Selection
Descriptive Statistics
Missing Data Handling
Graphical Data Visulization
Non Parameteric Testing for Nominal Data
Correltaion Analysis
The information set given for this errand has been drawn from the freely accessible PISA information (the OECD's Program for Worldwide Understudy Evaluation), in this case the ‘school survey’, ordinarily completed by each test school’s central.
Let us consider these 3 variables into account:
These 3 variables are numeric scale type as far as datatype is concerned. We are also interested in knowing if there exists any kind of significant relationship between the variables. We are interested in knowing the summary descriptive statistics of these variables along with the distribution of the data.
Frequency count:
Statistics |
||||
Total No. of interactive whiteboards in the school altogether |
Total No. of data projectors in the school altogether |
Total No. of computers with internet connection available for teachers in the school. |
||
N |
Valid |
677 |
675 |
678 |
Missing |
108 |
110 |
107 |
Here the missing data indicates the blank responses given by the correspondent along with few that have indicated as system missing. Both these data’s together constitute the missing values.
Issues with Missing Data:
The concept of lost values is vital to get it in arrange to effectively oversee information. In case the lost values are not taken care of appropriately by the analyst, at that point he/she may conclusion up drawing an wrong deduction almost the information. Due to dishonorable dealing with, the result gotten by the analyst will contrast from ones where the lost values are show.
Ways of Handling Missing data:
The analyst may take off the information or do information ascription to supplant the them. Assume the number of cases of lost values is greatly little; at that point, a master analyst may drop or overlook those values from the investigation. In measurable dialect, on the off chance that the number of the cases is less than 5% of the test, at that point the analyst can drop them. In the case of multivariate investigation, on the off chance that there's a bigger number of lost values, at that point it can be superior to drop those cases (instead of do ascription) and supplant them. On the other hand, in univariate examination, imputation can diminish the sum of inclination within the information, on the off chance that the values are lost at random.
N |
Range |
Minimum |
Maximum |
Mean |
Std. Deviation |
Variance |
Skewness |
Kurtosis |
|||
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Statistic |
Std. Error |
Statistic |
Std. Error |
|
Total No. of data projectors in the school altogether |
675 |
200 |
0 |
200 |
38.30 |
28.821 |
830.675 |
1.351 |
.094 |
3.794 |
.188 |
Total No. of interactive whiteboards in the school altogether |
677 |
129 |
0 |
129 |
18.71 |
19.267 |
371.234 |
1.650 |
.094 |
3.805 |
.188 |
Total No. of computers with internet connection available for teachers in the school. |
678 |
3300 |
0 |
3300 |
128.93 |
223.660 |
50023.785 |
8.477 |
.094 |
94.697 |
.187 |
Valid N (listwise) |
672 |
In our case SPSS by itself considers only the valid data as part of the analysis. Here we have chosen to not impute the missing values and hence SPSS itself only takes the valid observations into consideration while deriving the descriptive statistics for the variables under consideration. The average number of data projectors in the school altogether is around 29 while the average interactive whiteboards in the school accounts to 19. The average number of computers with internet connections available to teachers happens to be approximately equal to 129.
We observe that for all the three variables the skewness values are greater than1. Hence we can conclude that variables are all highly skewed variables. The kurtosis values for all the three variables indicate to be greater than 3. Hence, we conclude that the variables are leptokurtic in nature. This means the tails are longer and flatter with the central peaks higher and sharper.
Correlations |
||||
Total No. of interactive whiteboards in the school altogether |
Total No. of data projectors in the school altogether |
Total No. of computers with internet connection available for teachers in the school. |
||
Total No. of interactive whiteboards in the school altogether |
Pearson Correlation |
1 |
.222** |
.055 |
Sig. (2-tailed) |
.000 |
.153 |
||
N |
677 |
673 |
675 |
|
Total No. of data projectors in the school altogether |
Pearson Correlation |
.222** |
1 |
.328** |
Sig. (2-tailed) |
.000 |
.000 |
||
N |
673 |
675 |
674 |
|
Total No. of computers with internet connection available for teachers in the school. |
Pearson Correlation |
.055 |
.328** |
1 |
Sig. (2-tailed) |
.153 |
.000 |
||
N |
675 |
674 |
678 |
|
**. Correlation is significant at the 0.01 level (2-tailed). |
If we look at the correlation table we observe that the total number of interactive whiteboards in the school and the total number of projectors in the school are related linearly. Around 22% of linear relationship exists and statistically the p_value indicates significance as p_value is <0.05.
Similarly there happens to be a linear relationship between the data projectors and computers connected with the internet. The linear relationship happens to be 32.8% and p_value indicates that this relationship too is significant.
The QQ plots in order to check for the variable Distribution:
Case Processing Summary |
||||
Total No. of interactive whiteboards in the school altogether |
Total No. of data projectors in the school altogether |
Total No. of computers with internet connection available for teachers in the school. |
||
Series or Sequence Length |
785 |
785 |
785 |
|
Number of Missing Values in the Plot |
User-Missing |
26 |
28 |
25 |
System-Missing |
82 |
82 |
82 |
|
The cases are unweighted. |
Estimated Distribution Parameters |
||||
Total No. of interactive whiteboards in the school altogether |
Total No. of data projectors in the school altogether |
Total No. of computers with internet connection available for teachers in the school. |
||
Normal Distribution |
Location |
18.71 |
38.30 |
128.93 |
Scale |
19.267 |
28.821 |
223.660 |
|
The cases are unweighted. |
Let us consider a variable which has a nominal datatype. In this particular dataset let us particularly look at the variable addressing the following question
Which of the following definitions best describes the community in which your school is located? Varaile name: (SC001C01TA_AU)
Which of the following definitions best describes the community in which your school is located? |
|||||
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
A small rural community (with fewer than 1 000 people) |
15 |
1.9 |
2.2 |
2.2 |
A small country town (1 000 to about 3 000 people) |
35 |
4.5 |
5.1 |
7.2 |
|
A medium-sized country town (3 000 to about 15 000 people) |
76 |
9.7 |
11.0 |
18.2 |
|
A larger town (15 000 to about 50 000 people) |
71 |
9.0 |
10.2 |
28.4 |
|
A very large town (50 000 to about 100 000 people) |
57 |
7.3 |
8.2 |
36.7 |
|
A city (100 000 to about 1 000 000 people) |
178 |
22.7 |
25.7 |
62.3 |
|
Close to the centre of a very large city (with over 1 000 000 people) |
126 |
16.1 |
18.2 |
80.5 |
|
Elsewhere in a very large city (with over 1 000 000 people) |
135 |
17.2 |
19.5 |
100.0 |
|
Total |
693 |
88.3 |
100.0 |
||
Missing |
No Response |
18 |
2.3 |
||
System |
74 |
9.4 |
|||
Total |
92 |
11.7 |
|||
Total |
785 |
100.0 |
Here we observe that the total number of invalid responses or missing values is around 92 which accounts to 11% in comparison to the entire data set. The frequency indicates the school is closely located and accessible to the city and large city premises.
Graphical description of the Variable:
We would like to see the relation between the community in which the school is located with the Geographical location of the school. In order to find this relation we look towards constructing cross tables and get details about the distribution of the data.
Which of the following definitions best describes the community in which your school is located? |
Total |
|||||||||
A small rural community (with fewer than 1 000 people) |
A small country town (1 000 to about 3 000 people) |
A medium-sized country town (3 000 to about 15 000 people) |
A larger town (15 000 to about 50 000 people) |
A very large town (50 000 to about 100 000 people) |
A city (100 000 to about 1 000 000 people) |
Close to the centre of a very large city (with over 1 000 000 people) |
Elsewhere in a very large city (with over 1 000 000 people) |
|||
Geographic location of school (major categories) |
Metropolitan |
0 |
2 |
12 |
18 |
24 |
162 |
123 |
135 |
476 |
Provincial |
6 |
22 |
58 |
46 |
33 |
14 |
2 |
0 |
181 |
|
Remote |
6 |
8 |
6 |
6 |
0 |
1 |
0 |
0 |
27 |
|
Total |
12 |
32 |
76 |
70 |
57 |
177 |
125 |
135 |
684 |
We would like to see if these 2 variables are significantly related or not. In order to do this we have conducted the chi-square testing
Chi-Square Tests |
|||
Value |
df |
Asymp. Sig. (2-sided) |
|
Pearson Chi-Square |
499.390a |
14 |
.000 |
Likelihood Ratio |
503.241 |
14 |
.000 |
Linear-by-Linear Association |
370.938 |
1 |
.000 |
N of Valid Cases |
684 |
||
a. 7 cells (29.2%) have expected count less than 5. The minimum expected count is .47. |
The chi-square test is significant as we see the p_value is <0.05. This indicates both these variables are significantly related to each other.
Sprent, P. (1989), Applied Nonparametric Statistical Methods(Second ed.), Chapman & Hall
Corder, G. W.; Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley
Hollander M., Wolfe D.A., Chicken E. (2014). Nonparametric Statistical Methods, John Wiley & Sons
Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Statistics Assignment Help
1,212,718Orders
4.9/5Rating
5,063Experts
Turnitin Report
$10.00Proofreading and Editing
$9.00Per PageConsultation with Expert
$35.00Per HourLive Session 1-on-1
$40.00Per 30 min.Quality Check
$25.00Total
FreeGet
500 Words Free
on your assignment today
Get
500 Words Free
on your assignment today
Doing your Assignment with our resources is simple, take Expert assistance to ensure HD Grades. Here you Go....
Min Wordcount should be 2000 Min deadline should be 3 days Min Order Cost will be USD 10 User Type is All Users Coupon can use Multiple