2% sampling should fine fine for a UPI column. Sampling at low percentages is fairly accurate for any column that is unique or nearly-unique.
Unfortunately, there is no way I know of to derive the perfect sampling percent to use. When the optimizer has control (when you code in USING SYSTEM SAMPLE, or USING SAMPLE), it attempts itself to set the optimal sampling percent at that point in time, based on looking at the detail within past full statistics collections, and trying to detect patterns within the statistic that it might need to account for, like skew. But I cannot tell you first-hand that I have studied the choices the optimizer in this regard. I have not. But in my experience the optimizer tends to be conservative in its decisions and is probably doing a better job than either you or I could do in picking a good sampling percent.
But as you have found out, the optimizer will often not use sampling at all when you specify USING SYSTEM SAMPLE, because it doesn't have enough background information to be confident in selecting a good sampling percent, or doesn't believe the statistic is suitable for sampling.
Thanks, -Carrie
2% sampling should fine fine for a UPI column. Sampling at low percentages is fairly accurate for any column that is unique or nearly-unique.
Unfortunately, there is no way I know of to derive the perfect sampling percent to use. When the optimizer has control (when you code in USING SYSTEM SAMPLE, or USING SAMPLE), it attempts itself to set the optimal sampling percent at that point in time, based on looking at the detail within past full statistics collections, and trying to detect patterns within the statistic that it might need to account for, like skew. But I cannot tell you first-hand that I have studied the choices the optimizer in this regard. I have not. But in my experience the optimizer tends to be conservative in its decisions and is probably doing a better job than either you or I could do in picking a good sampling percent.
But as you have found out, the optimizer will often not use sampling at all when you specify USING SYSTEM SAMPLE, because it doesn't have enough background information to be confident in selecting a good sampling percent, or doesn't believe the statistic is suitable for sampling.
Thanks, -Carrie