Talk:Simple random sample

Wiki Education Foundation-supported course assignment

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 09:22, 17 January 2022 (UTC)[reply]

Conditional Probability

The following appeared before I edited this article, and is nonsense:

For instance, if I have 10 names in the hat, the probability of

any one name being drawn is 1 out of 10. After the first name is

drawn, there are nine names left in the hat changing the

probability of anyone being selected as the second name is 1 out

of 9. Since different names have different probabilities

depending on the sequence in which the drawing is done, the

resulting sample will not be a simple random sample.

The conditional probability that a certain name will be chosen on the second draw given that it was not chosen on the first draw, is 1/9, not 1/10. But the unconditional probability that a certain name will be the second one chosen, calculated in ignorance of which name was chosen first, is 1/10. Moreover, prior to the experiment, the probability of any particular name's being included in a sample without replacement is no different from the probability that any other particular name will be chosen. Thus there is no discrimination against or in favor of any particular name in sampling without replacement. All of this is covered in elementary statistics courses. -- Mike Hardy

<<You seem to be conflating two ideas, that of conditional probability and that of ignorance. The probability of a name being drawn WITHOUT REPLACEMENT drops from 1 in 10, to 1 in 9, to 1 in 8, and so on, whether we know the drawings are dependent (conditional) or not. Our ignorance does not change the probabilities. Only if the names are REPLACED do the probabilities remain at 1 in 10 for each subsequent drawing. This is the interpretation in elementary statistics texts. B. Moore>>

Merge suggested with Random sample

Not a good idea, in my opinion. The two articles cover quite different ideas. I have commented in a bit more detail on Talk:Statistics and Talk:Random sample. Avenue 13:48, 23 September 2005 (UTC)[reply]

I agree - they aren't the same thing. Simple random samples assign an equal probability to each unit in the sampling frame, whereas they don't have to have equal probabilities of selection in a random sample.

- I removed the merge tag and added an expert tag. Can you explain in the article the difference between a random and simple random sample? Thatcher131 15:01, 17 February 2006 (UTC)[reply]

Random sampling versus simple random sampling

<<Random sampling refers to the fact that any member of a population has an equal likelihood of being selected. This is what you are using to define a SIMPLE random sample, something that actually refers to the likelihood of any sample of size n in the population having the same chance of being selected. The distinction is critical.

For example, if I have a classroom with 60 students arranged in six rows of 10 students each and I select a sample of 10 students by rolling a die, then select the row corresponding to the outcome, this is a random sample. It is a random sample because each student in the classroom had an equal chance (1 in 6) of being in the row selected. However, it is NOT a "simple random sample" because not all possible samples of size 10 in this classroom have the same chance of being selected. Thus, any stratified or cluster sampling may begin with a random sample but can never be a simple random sample. By the same token, if a sample does NOT begin with a random sample, it cannot be a simple random sample, either. B. Moore>>

Example distinguishing random and simple random

I added a simple example that should help illustrate the distinction between a random sample and a simple random sample. Steve Simon 04:09, 17 October 2006 (UTC) yeah if u said sooooo!!!!!!!!!!!!!!!!!!!!!!!!!!! —Preceding unsigned comment added by 206.78.136.181 (talk) 20:53, 6 March 2009 (UTC)[reply]

This example is not very clear. It does not state which type of sampling the first example falls into. It also does not help to clarify this confusing statement in the first paragraph that "simple random sampling, and should not be confused with Random Sampling." Based on the general Random Sampling article, it appears that simple random sampling is a specific kind of random sampling, not a distinct process as that sentence implies. I suggest clarifying this example by explicitly stating that it is simple random sampling, and changing the sentence in the first paragraph to convey that simple random sampling is a specific kind of random sampling rather than a distinct process. Leopd (talk) 21:55, 24 March 2011 (UTC)[reply]

Change some emphasis in the whole page?

I don't know who is following this page, but I thought I would float an idea before I do a major edit.

Here's the issue:

This page puts way too much emphasis on the idea that all individuals in a population should have an equal probability of being chosen in the SRS. The definition of SRS is instead that all subsets of the given size should have an equal chance of being chosen. (So, e.g., an SRS of size $n$ in a population $P$ is a an element of the set $S=\{A\subset P|\#(A)=n\}$ chosen according to the uniform distribution on $S$ .

I realize that the article does mention this correct definition, but really only in passing. The correct definition should be given a central role, and then the fact that this means all individuals have an equal chance, as a consequence, could also be mentioned in passing.

The point is that all of the rest of the usual statistical machinery (the Central Limit Theorem, sampling distributions, confidence intervals, tests of significance ...) depends upon a SRS exactly because of the precise (correct) definition, which allows a particular SRS to be a fair representative of all the samples of that size. Whereas a sample chosen so that each individual has the same chance might be a very non-uniformly chosen set of $n$ individuals.

The example I use here in class is usually something like this: Suppose some group had four students: two men and two women. Suppose my method of trying to choose a SRS of size two is to flip a coin and choose one of the men, and then to flip again and choose one of the women. Then each individual has a probability of 1/2 of being chosen, but this method chooses the four heterosexual couples with equal probability and chooses the two same-sex couples with zero probability -- it is not a SRS!

One other issue:

OK, one more thing I think confuses people and which needs to be mentioned here is that calling a sample a SRS is grammatically weird. That is, it looks like a property of the sample, but it is not! It is a property of the method used to create the sample. You cannot tell by looking only at the sample if it is an SRS, despite the grammar which seems to have "simple random" modify "sample".

This is a common source of confusion for my students, so I think it needs to be mentioned somewhere....

So my suggestions are:

The whole page gets a change of emphasis, as suggested above.
Some mention of the distinction I'm making here gets pointed out (maybe with the above example?).
Some mention is made of why this is important in (much of) the rest of statistics.
Maybe some mention of the connection with stratified random samples (particularly if the above example, which is of course a stratified random sample, is included).
Some mention is made of the fact that "SRS" is a grammatically weird definition in that it is really describing a sample chosen according to a process which has some property, and is not a property that can be checked by looking at the sample itself.

What do you think?

Should I make a trial edit with all of the above and people who are following this page can take a look?

(Note: I changed a version of this page to have at least a mention of the correct definition (in terms of equiprobability of sets of size $n$ , not individuals) a few years ago. In the intervening years, the page has gone through quite a few changes .. and I haven't been following it.)

Sampling students with replacement

The example problem says "Consider a school with 1000 students, divided equally into boys and girls, and suppose that a researcher wants to select 100 of them for further study."

The calculation of probability P = 1 - (1 - 1/1000)^100 of a student (ultimately) being a selected when using selection with replacement appears to be incorrect because it fails to account for the fact that the selection process must continue until 100 distinct students have actually been obtained for the sample population. If during the process a previously drawn name is drawn again, the name is returned to the pool and another drawing is made as if the duplicate drawing never occurred. At the end of the process, which may take more than 100 drawings to complete, there must be 100 distinct students in the resulting sample population. The calculation given makes sense only if 100 names are to be selected (possibly with duplicates, due to replacement), and whatever resulting set of students obtained (possibly less than 100, if any duplicate names were drawn) is taken as the sample population.

To take an extreme (unlikely) example: suppose Sally Smith's name is chosen 100 times in a row. The calculation given only makes sense if the resulting sample population in this case would be the single student Sally Smith. This procedure won't help the researcher and hence won't solve the example problem. Blargoner (talk) 05:38, 23 November 2015 (UTC)[reply]