Statistics collection recommendations – Teradata 14.0 - comment by escueta

June 17, 2014, 1:49 pm

≫ Next: New opportunities for statistics collection in Teradata 14.0 - comment by carrie

≪ Previous: New opportunities for statistics collection in Teradata 14.0 - comment by nazygholizadeh

Hi,
I'm just new to Teradata and my background is Oracle. When collecting Statistics in Teradata, after setting up a table for stats collection, do I need to have another script to refresh the stats on a daily basis?
Thanks
Chris

↧

New opportunities for statistics collection in Teradata 14.0 - comment by carrie

June 18, 2014, 9:58 am

≫ Next: Statistics collection recommendations – Teradata 14.0 - comment by carrie

≪ Previous: Statistics collection recommendations – Teradata 14.0 - comment by escueta

Hi Nazy,

On 14.10 (and 14.0 as well) you no longer need to collect statistics on PARTITION, unless your table is a partitioned table. For partitioned tables you should definitely collect on PARTITION.

You may not need to consciously replace PARTITION stats on non-partitioned tables with summary stats, as summary stats will automatically be collected at the time any column/index on the table has its stats collected. Collection on PARTITION will update summary stats as well. But if you have not collected any stats on the table for a while, you can explicitly collect stats on just summary stats for the table by issuing. It's a very fast execution, and a good idea to use it after loading activity on a table when you don't have time for full stats recollections.

COLLECT SUMMARY STATISTICS ON table-name;

The asterisk in HELP STATS tells you the number of rows in the table on which the statistics were taken. It will change as summary stats are refreshed, even if other statistics on the table have not been refreshed. So you could say that number is a result of summary stats. That's the correct way to look at it.

Thanks, -Carrie

↧

Statistics collection recommendations – Teradata 14.0 - comment by carrie

June 18, 2014, 10:02 am

≫ Next: More on ResUsageSAWT When Collection and Logging Rates Differ - comment by LUCAS

≪ Previous: New opportunities for statistics collection in Teradata 14.0 - comment by carrie

Hi Chris,

You don't really "set a table up" for stats collection, you just decide which stats you wish to collect and then build a script with collect stats statements for those columns or indexes and then run the script. You can rerun the collect stats statements (using the same syntax as the initial collection if you wish) whenever you want to recollect stats for those columns or indexes. Sometimes that is weekly, sometimes monthly, in some cases daily. It depends on how much the table has grown since the previous collection.

If you want to recollect statistics on all statistics that exist on a given table at the same time, you can simply issue a table-level statistics collection statement.

When performed individually, the statement to initially collect stats and to recollect stats is the same. So you don't really need another script, but you could have another script if you only wanted to recollect a subset of the stats.

In 14.10 you can put your stats recollections under the control of the Automated Statistics Management feature. I have another blog posting that describes that a little bit. If you are using the Automated Stats Manager feature, you never have to issue scripts to recollect.

Thanks, -Carrie

↧

More on ResUsageSAWT When Collection and Logging Rates Differ - comment by LUCAS

June 23, 2014, 1:40 am

≫ Next: More on ResUsageSAWT When Collection and Logging Rates Differ - comment by carrie

≪ Previous: Statistics collection recommendations – Teradata 14.0 - comment by carrie

Carrie,
what about AWT analysis in V14.10 ?
CollectIntervals column does not exist any longer, so i wonder wether another column is superseding the old one,
or the difference between Log Time and CollectTime has disappeared ?
Maybe an update document about AWT for V14.10 is available ? I couldn't find it.
i'm trying now to produce graphical views on InUseMax across days, with a bar by Node: does an average InUsemax makes sense ?
Thanks,
Pierre

↧

More on ResUsageSAWT When Collection and Logging Rates Differ - comment by carrie

June 24, 2014, 8:28 am

≫ Next: Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by b2s

≪ Previous: More on ResUsageSAWT When Collection and Logging Rates Differ - comment by LUCAS

Hi Pierre,

Starting in 14.0 and up, there is no longer a CollectIntervals column in any ResUsage table. In the ResusageSAWT table, MailBoxDepth and WorkTypeInUse00-15 are now track fields and no longer needs to be divided by the CollectIntervals column. The contents in those fields represent a snapshot taken at the end of the logging interval. In prior releases the contents of those fields represented a sum of snapshots taken at the end of each collect interval. So you had to divide by collectintervals in order not to get an inflated number. You no longer are required to do that.

The 14.0 AMP Worker Task and ResUsage Monitoring orange book (available at Teradata at your service) documents this in Chapter 8.

When it comes to AMP worker tasks monitoring, I usually prefer to look at max of Inusemax rather than the average, so I can see what the worst case AWT usage is on any one AMP (either on the node or systemwide). The worst case is usually more important than the average because if one AMP runs out of AWTs, it will impact all queries doing all-AMP operations. In that regard max of inusemax is more actionable than is the average. It will also allow you to more easily identify skewed processing on a node (where one or more AMPs on the node are holding on to AWTs longer than AMPs on other nodes). But it really depends what you want to get out of the monitoring. There is no one right way to use this information.

Thanks, -Carrie

↧

Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by b2s

June 25, 2014, 7:20 am

≫ Next: Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by carrie

≪ Previous: More on ResUsageSAWT When Collection and Logging Rates Differ - comment by carrie

Hi Carrie,

If you are using TASM, then the expedited status is automatically given to a workload that has a tactical enforcement priority

We have two versions of Teradata ie Teradata 12 and Teradata 14.
1)In Teradata 14, we do have WD's that have tactical enforcement priority. So far I have come across WorkTypeMax08 having 6 as Max. So, theoretically this means the actual AWT reserved is 14(6+6+2)..is this right?
I'm aware of the limit of 20 AWT's for expedited AWT's, however, will this apply to the automatically reserved AWT's because of tactical enforcement priority?
2)In Teradata 12, even though we have WD's that have tactical enforcement priority, I'm unable to see the WorkTypeMax08 go beyond 0 ? Am I missing something?

Bottom line, setting the limit on MsgWorkEight AWTs anywhere between 30 and 60 is probably reasonable. If you're not sure how to set it, make it the same as the MsgWorkNew limit, at 50.

As you said earlier, Max Expeditied AWT's can be 20(9+9+2), what is the use of having even 30?
If indeed there is a need for throttling, shouldn't Max Expedited AWTs be less than 9?

↧

Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by carrie

June 26, 2014, 8:13 am

≫ Next: Don’t confuse SLES11 Virtual Partitions with SLES10 Resource Partitions - blog entry by carrie

≪ Previous: Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by b2s

For TASM in Teradata 12 it is not enough to set the enforcement priority to tactical. You must explictly expedite the allocation group. This is done in the GUI screen that shows all the workload management setup, including RP, AG, AG relative weight, and WD.   If a AG/WD has a tactical enforcement priority, it will have a checkbox under a column named "expedite". If you check that column then requests within that AG/WD will use the work08/09 work types.

As you have discovered, that is not necessary in Teradata 14. Once you make the AG/WD as tatical, it is automatically expedited and will use Work08 work types.

WorkTypeMax08 has nothing to do with the the number of reserved AWTs that you select. WorkTypeMax08 is reporting the maximum number of tasks that were using work08 within that logging interval, but it is not telling you how many of those tasks were using AWTs from the reserve pool for expedited queries, or how many were using AWTs from the unassigned pool.

You have to explicitly select a reserve count in order for work08 tasks to use a reserved AWT. When you select a reserve count, that is the number of AWTs that will go into a special reserve pool. Work08 tasks can draw AWTs from that pool, but if that pool is empty they will try to draw AWTs out of the unassigned pool.

You can set the reserve count at 2 and still have 10 or 20 or 30 tasks running in the work08 work type. Those are two different things. The fact that WorkTypeMax08 = 6 does not tell you what the tactical reserve count is. It could be there are zero tactical reserves, but if there are expedited workloads that are doing work, you will see usage in work08 even with no reserves defined.

The limit of 20 you mention is for the number of reserved AWTs you are able to define. But you can have a greater number of tasks running that use the work08 work type.   If that happens, all of them will not be able to count on having a reserved AWT waiting for them and will attempt to use an AWT from the unassigned pool.

Max Expedited AWTs puts a limit on how many tasks will be allowed to start up using the work08 work type. The default is 50, because there is a similar limit of 50 on how many tasks can be active using the worknew work type.   This number of 50 has no relationship with the number of reserved AWTs you can select. It only controls how many tasks can use the work08 work type at one time, whether or not they are able use a reserved AWT for their work.

Thanks, -Carrie

↧

Don’t confuse SLES11 Virtual Partitions with SLES10 Resource Partitions - blog entry by carrie

June 30, 2014, 1:39 pm

≫ Next: Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by ThiagoBagietto

≪ Previous: Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by carrie

Cover Image:

Because they look like just another group of workloads, you might think that SLES11 virtual partitions are the same as SLES10 resource partitions. I’m here to tell you that is not the case. They have quite different capabilities and purposes. So don’t fall victim to retro-conventions and old-school habits that might hold you back from the full value of new technology. Start using SLES11 with fresh eyes and brand new attitudes. Begin at the virtual partition level.

This content is relevant to EDW platforms only.

Background on SLES10 Resource Partitions

Use of multiple resource partitions (RP) in SLES10 originated due to restrictions in the early days on how many different priorities each RP could support. The original Teradata priority scheduler had four external performance groups and four internal performance groups contained in a single default RP. Even today, the original RP (RP 0, the default RP) usually supports no more than four default priorities of $L, $M, $H, and $R.

In Teradata V2R5 came the ability to add resource partitions, but even then each new resource partition could only support 4 different external performance groups, similar to how RP 0 worked. This forced users to branch out to more RPs if they had a greater number of priority differences. So it was common to see 4 to 5 RPs in use, and some users raised complaints that that wasn’t enough to provide homes to the growing mix of priorities they were trying to support.

In V2R6, priority scheduler was enhanced to allow more than 4 priority groupings in any RP. At that time we encouraged users to consolidate all their performance groups into three standard partitions for ease of management: Default, Standard, and Tactical. Generally, a Tactical RP was needed to give special protection to short tactical queries. Some internal work still ran in RP 0 so it was recommended that you avoid assigning user work there, which necessitated that a “Standard” RP be set up to manage all of the non-tactical performance groups. In SLES10 many users embraced this three-RP approach, while others went their own way with subject-area divisions or priority-based divisions among multiple RPs (creating a Batch RP and a User RP, for example).

Here are four rationales for the multiple resource partition usage patterns that are in heavy rotation with SLES10 today. For the most part they came into being due to restrictions within the SLES10 priority scheduler which encouraged out-of-the-box use of multiple RPs on EDW platforms, whether you thought you needed them or not.

Internal work: Some sensitive internal work ran in RP 0, the so the recommendation was to avoid putting user work there.
Protection for tactical work by isolating it into its own RP with a high RP weight. A high RP weight contributed to a more stable relative weight (allocation of resources) for tactical workloads.
Desire to more easily swap priorities between load and query work by different times of day (by making one change at the RP-level instead of multiple changes at the level of the allocation group). These RP-level changes often included the desire to add RP-level CPU limits on RPs supporting resource-intensive work, in order to protect tactical queries at certain times of the day.
Sharing unused resources with an RP. Some sites liked putting all work from one application type in the same RP so that if one of the allocation groups was idle, the other allocation groups of that type would get their relative weight points. The SLES10 relative weight calculation benefits groups within the same RP, such that they share unused resources among themselves first, before those resources are made available to allocation groups in other RPs.

Very limited examples of using resource partitions for business unit divisions has been in evidence among Teradata sites on SLES10, partly because of only having four usable RPs and partly because the SLES10 technology has not been all-encompassing enough to support the degree of separation required.

What Has Changed with SLES11?

A lot.

First, let’s address the four key motives (or rationales) users have had for spreading workloads and performance groups across multiple RPs in SLES10, but looking at it from the SLES11 perspective.

Internal work: In SLES11 all internal work has been moved up in the priority hierarchy above the virtual partition level, where it can get all of the resource it needs off the top, without the user having to be aware or considerate of where that internal work is running. There is no longer a need to set up additional partitions to avoid impacting internal work.
Protection for tactical work: The Tactical tier in SLES11 is intended (and is) a turbo-powered location in which to place tactical queries, where response time expectations can be consistently met without taking extraordinary steps. The Tactical tier in SLES11 is first in line when it comes to resource allocation, right after operating system and internal database tasks. This eliminates the need for a special partition solely for tactical work, or as a means of applying resource limits on the non-tactical work.
Desire to more easily swap priorities: There is something to be said for grouping workloads that need priority changes at similar times into a single partition, because then you only have to make the change in one place. But that is a fairly minor issue on either SLES10 or SLES11 with the advent of TASM planned environments. You’re not saving that much during TASM setup to indicate a change in one place (a virtual partition) vs. making a change in several places (multiple workloads) when those changes are going to be happening automatically for you at run time each day. There is no repetitive action that needs to be taken by the administrator once a new planned environment has been created. New planned environments can automatically implement new definitions, with lower priorities for some of the workloads and higher for others, no matter how many workloads are involved.

Applying higher level (partition-level) resource limits on a group of workloads at the partition level, as we have see in some SLES10 sites, is much less likely to be needed in SLES11 (I personally believe it will not be needed at all). That is because the accounting in SLES11 priority scheduler is more accurate, giving SLES 11 the ability to deliver exactly what is specified. No more, no less. There is no longer a performance-protection need for resource limits or an over-/under-allocation of weight at the partition level. And because that need has gone away, the argument in favor of separate partitions for performance benefit is less compelling.

Sharing unused resources. Sharing unused resources among a small set of selected workloads is available on each SLG Tier as it exists within a single virtual partition in SLES11. If an SLG Tier 1 workload is idle, the other workloads placed on SLG Tier 1 will be able to share its allocation before those resources are made available to other workloads lower in the hierarchy. The order of sharing of unused resources is guided by the priority hierarchy in SLES11 and does not require multiple partitions to implement.

The Intent and Vision of SLES11 Virtual Partitions

A virtual partition in SLES11 is a self-contained microcosm. It has a place for very high priority tactical work in the Tactical tier. It has many places in the SLG Tiers for critical, time dependent work across all applications ranging from the very simple to the more complex. And at the base of its structure in Timeshare it can accommodate large numbers of different workloads submitting resource-intensive or background work at different access levels, including load jobs, sandbox applications and long-running queries. Within its self-sufficient world, priorities at the workload level can be changed multiple times every day if you wish, using planned environments in the TASM state matrix.

If you’re on an EDW platform with SLES11, you are offered multiple virtual partitions, but their intent is different from SLES10 resource partitions. Virtual partitions were implemented in order to provide a capability that SLES10 was not well suited to deliver: Supporting differences in resource availability across multiple business units, or distinct geographic areas, or a collection of tenants.

Virtual partitions are there to provide a method of slicing up available resources among key business divisions of the company on the same hardware platform. Once you get on SLES11, if you begin moving in a direction that made sense in SLES10, you lose the ability to sustain distinct business units in the future. And you’ll be less in harmony with TASM/SLES11 enhancements going forward.

New capabilities around virtual partitions, such a virtual partition throttles in 15.0, and other similar enhancements being planned, are all being put in place with the same consistent vision of what a virtual partition is. Keep in step with these enhancements and position yourself to use them fully, by letting go of previous conventions and embracing the new world of SLES11 possibilities.

Ignore ancestor settings:

Apply supersede status to children:

↧

Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by ThiagoBagietto

July 3, 2014, 5:10 pm

≫ Next: Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by srinivas486

≪ Previous: Don’t confuse SLES11 Virtual Partitions with SLES10 Resource Partitions - blog entry by carrie

Good Night!

How do I download the Teradata database express to install?

tks

↧

Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by srinivas486

July 10, 2014, 4:49 am

≫ Next: Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by carrie

≪ Previous: Reserving AMP Worker Tasks? Don’t let the Parameters Confuse You! - comment by ThiagoBagietto

Carrie,
As per your last comment, "Usually, each AMP will use from 1 to 2 tasks to support it's work on behalf of one request"
If we issue a multi statement request involving selects from different tables or if we issue a join query involving different tables. My assumption regarding the query execution is, The operations can be performed parallely. even though if it is a single request, AMPS can make use of more AWT's (more than 1 to 2) to perform the parallel operations.
I think No of AWT's that requires to run a user request depends on how many parallel operation steps the query/request can be broken into.
Please correct me if i am wrong and help me understand.

↧

Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by carrie

July 11, 2014, 1:44 pm

≫ Next: Tips on using Sampled Stats - comment by Santanu84

≪ Previous: Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by srinivas486

Lakshimi,

You are correct that parallel steps can result in more than 1 or 2 AMP worker tasks being used by a single request, particularly if it is multistatement request. But even with multiple parallel steps in the plan, most requests will be limited to 4 AWTs at the same time.

This is because of limitations related to other internal structures (called channels) that manage cross-AMP communication within a request. Things like step completion coordination or abort coordination are performed by channels behind the scenes. So you could have a plan with 8 parallel steps in the plan, but at run-time they will not all execute at the same time. If the first 4 steps only use a single AWT each, then all 4 could run at the same time. But if the first two required 2 AWTs at the same time, then only the first two would actually run at the same time.

Even if you hade two parallel steps and each was doing row redistribution and so required 2 AWTs each, and you had 4 AWTs active at a time, there is usually one of the two parallel steps that is longer running while the other is usually much shorter, so the actual time that 4 AWTs are held by the request might be shorter than you would expect. However, If there are other parallel steps to be run in that high-level step, they will run, as long as the combination of sub-steps doesn't require more than 4 AWTs.

Thanks, -Carrie

↧

Tips on using Sampled Stats - comment by Santanu84

July 11, 2014, 10:18 pm

≫ Next: Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by srinivas486

≪ Previous: Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by carrie

Hi Carrie

Your blog is really helpful. I have a question, does sample stats work with aggregate functions as well? If not then full stat coll. will be required? Also if a table has almost 5m data and nupi and I know it is slightly non unique (in other words mostly unique with small non uniqueness) then will increase percentage of sample stat be helpful?
Thanks
Santanu

↧

Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by srinivas486

July 13, 2014, 5:50 pm

≫ Next: How Resources are Shared in the SLES 11 Priority Scheduler - comment by YL185019

≪ Previous: Tips on using Sampled Stats - comment by Santanu84

Thanks for the infomation provided Carrie. It really helps me understand AWT's better.

↧

How Resources are Shared in the SLES 11 Priority Scheduler - comment by YL185019

July 15, 2014, 7:59 pm

≫ Next: Tips on using Sampled Stats - comment by carrie

≪ Previous: Changes to the TASM AMP Worker Task Event in Teradata 13.10 - comment by srinivas486

Hi Carrie,
Thank you for this article. I have a question about workload hard limits:
per KAP314ACE6,
<http://pc02.td.teradata.com/__8525621800464274.nsf/0/98724B709BDEF31685257C07006E9184>
we can know WM COD and Virtual Partition Hard Limits function for SLES11 will be available from 14.10.03.XX.
I need to know if worklad hard limits will be available from 14.10.03.XX also. Would you like to tell me if you know.
Best Regards,
Yanmei

↧

Tips on using Sampled Stats - comment by carrie

July 16, 2014, 7:40 am

≫ Next: How Resources are Shared in the SLES 11 Priority Scheduler - comment by carrie

≪ Previous: How Resources are Shared in the SLES 11 Priority Scheduler - comment by YL185019

Santanu,

If you are asking if using can use sampling when you collect statitics on an expression (whether aggregate or not), the anwer is yes, sampling for stats on expressions is supportable.   Statistics collection on expressions is covered in the 14.10 orange book on Statistics Enhancements.

In terms of NUPIs that are slightly non-unique, I would refer you to the blog posting on Statistics Collection Recommendations for 14.0, where it makes the point that if NUPIs are NOT USED FOR JOINING, and their distribution of values is fairly even, you could rely on random AMP samples. You could also probably try relying on sampling.   I would guess that slightly non-unique NUPIs would be OK with sampling.

That blog posting does differentiate NUPIs that are USED FOR JOINING, and says this:

NUPIs that are used in join steps in the absence of collected statistics are assumed to be 75% unique, and the number of distinct values in the table is derived from that. A NUPI that is far off from being 75% unique (for example, it’s 90% unique, or on the other side, it’s 60% unique or less) will benefit from having statistics collected, including a NUPI composed of multiple columns regardless of the length of the concatenated values. However, if it is close to being 75% unique, then random AMP samples are adequate. To determine what the uniqueness of a NUPI is before collecting statistics, you can issue this SQL statement:

EXPLAIN SELECT DISTINCT nupi-column FROM table;

Best to check that plans OK if you go to very low sampling percents.

Thanks, -Carrie

↧

How Resources are Shared in the SLES 11 Priority Scheduler - comment by carrie

July 16, 2014, 7:44 am

≫ Next: Expediting Express Requests - comment by rasikatyagi

≪ Previous: Tips on using Sampled Stats - comment by carrie

Virtual partition hard limits and workload level hard limits in SLES11 will be available at the same time a WM COD is available. All three levels of hard limits are bundled into a single feature in SLES11.

Thanks, -Carrie

↧

Expediting Express Requests - comment by rasikatyagi

July 16, 2014, 2:40 pm

≫ Next: Expediting Express Requests - comment by carrie

≪ Previous: How Resources are Shared in the SLES 11 Priority Scheduler - comment by carrie

Hi All,
We are using Teradata Studio Express Version: 14.10.01.201310271204.

When we execute a single statement, it pops up message "Result set contains at or over 2,000 rows.Cancel at 2,000 rows in accordance with the settings?" Is there some setting by which it can tell us 2000 should be cancelled out of how many rows, i.e. total rows being selected in the message box itself?
Rasika

↧

Expediting Express Requests - comment by carrie

July 16, 2014, 4:13 pm

≫ Next: Expediting Express Requests - comment by vinodraj

≪ Previous: Expediting Express Requests - comment by rasikatyagi

Rasika,

I have never used Teradata Studio Express. I think you may have posted this question on the wrong blog.

Sorry I cannot help you. You could try posting your question on Teradata Forum.

Thanks, -Carrie

↧

Expediting Express Requests - comment by vinodraj

July 16, 2014, 10:03 pm

≫ Next: How Resources are Shared in the SLES 11 Priority Scheduler - comment by YL185019

≪ Previous: Expediting Express Requests - comment by carrie

Hi Rasika,
'Max Display row Count' is set to 2000 by default in preference.
You can increase your limit accordingly. Check out 'Max Display row Count' variable under following location
Preference -> Teradata Datatools Preference -> Result set Viewer Preference
Cheers,
Vinod

↧

How Resources are Shared in the SLES 11 Priority Scheduler - comment by YL185019