## Gateway EVP Project Descriptive Statistics: A Brief Explanation

11/24/2014

By Theresa Byess

We have had some questions regarding the publication of our Descriptive Statistics, rather than Inferential Statistics, for the Gateway project. ( A table of the DS is available at the bottom of this page.)  The primary question, of course, being “what are they and what do they mean”? I wanted to take this opportunity to explain a little about why we published this data and what it means for the project.

First, our intention was not to “infer” as such is the case with Inferential Statistics.  Descriptive statistics are used to summarize the basic features of the data in the study in which they are produced.  We used descriptive statistics because the purpose of the study was to generate quantitative data, meaning we were attempting to quantify the hypothesis by generating numerical data that can be transformed into useable statistics.  We were attempting to generalize results from a larger sample population.  Our purpose of the study was to formulate facts and ascertain any patterns or correlations.  Because we were testing pre-specified concepts and hypotheses surrounding the theories of EVPs, we used quantitative, rather than qualitative, methods.  It is a deductive process, meaning we logically deduced based on general statements.  We examined potential possibilities to reach a specific logical conclusion.  We also used this method because it is more objective than qualitative methods, using statistical information analyzed as the basis for conclusions.  This method provides an “overall” point of view based on this statistical information.  The descriptive statistics simplifies this information.

So, now that you know why we used this particular method of research and analysis, I can begin to explain what the statistical information means.

The mean, median, and mode are often referred to as the central tendency, which is an attempt to describe what the typical data might look like.  They can be thought of all different forms of expressing an average for the data.  The mean is the most common form of expressing the central tendency and is also the reason it appears first in the statistical representation of our data.  It can be viewed as the true average for it – the sum of all of the useable frequencies (adding together all the useable samples we collected) and dividing that sum by the total number of frequencies to reach our average.  Here’s an example:

If our frequencies were 5, 10, 18, 26, 3, and 110, the sum of these numbers would be 172.  We have a total of 6 numbers in that set.  Hence, we divide to reach our average:

Therefore, our mean, or average, for this set of data would be 28.67 Hz.  The concept is not difficult.  The mean for our data was 596.04 Hz.

The median is nothing more than the middle value of the set of data we had.  The median for our data set was 250 Hz.  The median can be difficult to manually calculate, especially when there is such a large set and also when that set is in even numbers.

All of these calculations were predetermined based on formulas, which automatically calculated this information when certain points of data were entered into the database.  This allows our results to more accurate.

The mode is probably the most important to us here at IRG with this study simply because it represents the most commonly occurring number in the data set, which in our case was 1,000 Hz.  In other words, 1,000 Hz occurred more frequently than any other frequency range within the data set.  Hence, I can use this number to predict future behavior.  I can predict, then, EVPs will most likely fall within the 1,000 Hz Ultra Low Frequency (ULF) range assuming the results from our calculations remain constant.

We also had to factor in the uncertainty of our measurements because there are so many factors potentially at play, such as interference.  There are two ways of statistically representing that uncertainty – standard error (AKA standard deviation of the mean) and standard deviation (AKA standard deviation of a single measurement).    This data describes how “spread out” it was.

Our mean frequency, for instance, was 597 (rounded for example) out of the 1,000 useable samples.  However, not all of the frequencies were 597.  Some were lower, others were higher.  The standards describe to us how spread out this information was.  In this case, it was sample standard deviation rather than population standard deviation.  All we had was a sample but wished to make a general statement.

I should note here that while I have some background in statistical analysis, I am by no means an expert.  The information I am providing is based on my limited knowledge and we are attempting to send this information to statistical experts for their analysis of the data.

Having said that, I believe the standard error, for this particular study, is irrelevant.  However, I will know more about this statistic once we have received that information.

Here’s a great example of my lack of knowledge in this area.  I understand that a lower standard deviation usually means that the values in the data are closer to the mean on average and a larger standard deviation means the values in the data are farther away from the mean on average.  The fact that I believe we have a larger standard deviation simply means there is a larger amount of variation in the samples being studied.  Because the frequency range is generalized, the variation is higher.  However, if we were to focus on a smaller set, such as the ULF and SLF, the dominant ranges of data, our standard deviation would be a lot less, reflecting a smaller data set.  Our standard deviation of 945.51 (rounded) reflects the fact the variance from the average is higher. I also know the closer the standard deviation is to the mean, the more reliable the mean can be considered.   In this case, it would appear the mean is not reliable given the standard deviation.  However, this is something we are attempting to verify from statistical experts at Princeton and will update this information once it has been received.

Sample variance reflects the variance within our sample size.  We took a sample of the total population, 1,000 useable samples in this case, and used that sample size to estimate the frequencies of the entire population, or in this case EVPs.  The sample variance helps us determine how spread out the frequencies are.  Again, while I understand this concept, I am not sure how to apply this knowledge to our results, which we have submitted for a more professional insight.

I think perhaps the single most important statistic in the set is that of the confidence interval.  Using the 95% confidence level interval, it is the most useful way to support the reliability and validity of the results we are showing.  Reliability, of course, refers to repeatability.  We believe our results are consistent and have a representative sample that is a true reflection of that which we were researching – EVPs.  Our conclusions are based on the idea that a smaller confidence level compared to the mean is close to its true value, allowing us to have confidence in it.

However, it should be noted here that reliability does not necessarily mean our conclusions are valid.  It merely means it is more likely.  This is the reason for phase 2 of the Gateway EVP Project – to test our conclusions based on this information.

Kurtosis and Skewness are, to our understanding, more mathematical and graphical representations of the data, in which we have an extremely limited understanding.  This information is currently being examined by multiple professional sources specifically trained in the area of statistical analysis.  Additional information will be update once it becomes available.

The Minimum and Maximum, I believe, is a representation of the lowest and highest points of data for our set.  1.2 Hz (rounded) being the lowest while 7143 Hz (rounded) is the highest.

The Sum, of course, is pretty self-explanatory.  It is the sum of the entire data set.

The Count is how many samples were included and calculated.

I hope this helps our readers understand a little more about the descriptive statistics and what the purpose for them is.  If you have any questions, comments, or concerns, please feel free to let us know.  We will post additional information as it becomes available. 