dm-hse: Exercise (insignificance of significance)

Use Random Sample to prepare data with 1000 data instances. Prepare 100 variables with normal distribution, and a random variable whose value is either 0 or 1, with equal probabilities. (Which distribution? Hint: not binomial.) Can any of the 100 variables be used to predict the value of the binary variable? In other words: if we split our data according to the binary variable, can we find some variables whose means in these two groups are significantly different? Use one of the widgets you know from Tuesday to find and count such variables. You can repeat this experiment several times by generating new random data sets.
Try replicating what you've seen in the lecture. We had 45 binary variables (equal outcome probabilities) and a sample size was 100. We computed correlations and plotted the distribution of correlations (with a proper connection between Correlations and Distributions widget). We also used Sieve + Score Combinations to find statistically significantly correlated pairs, and observed their p-values.

Connecting Correlations and Sieve Diagram as shown here, and then clicking rows in Correlations and observing what happens in Sieve may also be interesting.

Zadnja sprememba: četrtek, 3. marec 2022, 14.50