Quantcast
Channel: Teradata Developer Exchange - Blog activity for carrie's blog
Viewing all 1058 articles
Browse latest View live

Intrepreting DBQL DelayTime in Teradata 13.10 - comment by sm4073

$
0
0

Hi Carrie,
As always I  value your postings and articles and views. Based on the conclusion above, I wrote some a simple scripts and found incorrect restults. Did a deep dive into the query and found that  the  inside DBC.DBQLogTbl  the column data type doesn't match and as such my report bit away from the actual expected results. Your thoughts on this please. There is no major impact because of this but would like to know the  reason behind this. 
DelayTime FLOAT FORMAT '----,---,---,---,--9.999',   
WDDelayTime INTEGER FORMAT '--,---,---,--9',
 
Ref:
If your query is only delayed by a WD throttle: 

  • WDDelayTime = Time actually delayed
  • DelayTime = The same time as in WDDelayTime    (/* When I compare the results does not match because of the decimal values*/ )

 If your query is only delayed by a system throttle:

  • WDDelayTime will be NULL
  • DelayTime = Time actually in the delay queue

If your query is delayed by both:

  • WDDelayTime = WD throttle delay time
  • DelayTime = Wallclock time delayed by both, does not identify system throttle contribution
  •  

Intrepreting DBQL DelayTime in Teradata 13.10 - comment by sm4073

$
0
0

My earlier posting (Data type differences in DBC.DBQLogTbl) were findings on TD14.10... Thank you. 
-- Murali

Intrepreting DBQL DelayTime in Teradata 13.10 - comment by carrie

$
0
0

Murali,
 
You are correct that DelayTime and WDDelayTime are formatted differently.    Starting in 14.0, DelayTime is made into FLOAT so a user can see microseconds.  
 
WDDelayTime is still an integer, but in 14.0 it really doesn’t matter anymore.
 
That is because starting in 14.0 there is a single delay queue for all throttles.  The time a request is held in the delay queue due to different objects (system throttles vs. workload throttles, etc.) can no longer be differentiated.  So all you see as a user a single delay time that accounts for all types of throttles.  If TASM workloads are being used, the WDDelayTime will essentially be the same as DelayTime, except they are being reported using different formats.  So you can just look at DelayTime, and not be concerned with comparing the two fields.
 
In 15.0, WDDelayTime will be removed from the DBQLogTbl, so then it will be simpler as there will only be a single field to examine.
 
Thanks, -Carrie

Tips on using Sampled Stats - comment by Santanu84

$
0
0

Hi Carrie
Thanks for you reply. Sorry for my late response. Your explanation is really helpful.
However, I have another question. For example, I have collected sample stat on a table and now going to perform aggregation on a column from that table. Will sample stat play any part for the operation, or sample stat does not work out for such situation ?
Hope, I am able to explain it properly. Please let me know.
Thanks
Santanu

Tips on using Sampled Stats - comment by carrie

$
0
0

Santanu,
 
You cannot actually apply USING SAMPLE to summary stats, which are the table-level statistics new in 14.0.  I'm guessing you meant you were collecting sampled stats on a column or index of the table.
 
When you attach USING SAMPLE to a collect stats statement and then collect stats on that column, the resulting histogram will be treated the same was as if full stats had been collected.   The sampling option only has an impact during the collection of statistics, and not how those stats are used.  The optimizer will use the number of distinct values in the histogram to base its estimates on the row count that will come out of an aggregation process if sample or full stats have been collected on the GROUP BY column(s).
 
Whether or not sampled stats are as beneficial as collecting full stats in your case will depend on the degree of skew in the GROUP BY column.  You could run an explain and look at the query plan row count estimate with sampled stats, then collect full stats and see if the estimate has changed very much.  
 
Thanks, -Carrie

Tips on using Sampled Stats - comment by Santanu84

$
0
0

Hi Carrie
Thanks for your reply. I will follow your suggestions and try to explore more. I may get back to you, in case guidance is required. :-)
Thanks
Santanu

Workload Management with User Defined Functions and Table Operators - blog entry by carrie

$
0
0

How does TASM enforce rules such as classification and workload exceptions against User Defined Functions?  What about table functions and table operators, some of which do their work outside of Teradata?  How far can you rely on TASM classifications using estimated processing time in these cases?  Will there be accurate resource usage numbers reported to support workload exceptions on CPU or I/O?

These are some of the questions that need answering if you are extending the use of simple or more complex user-defined functions on your platform.

Simple users defined functions (UDFs) are database objects that can extend the capability of normal SQL within the database, a row at time. Table functions and table operators are special more complex types of UDFs that manipulate multiple rows, and can extend outside the Teradata Database.   Once compiled, a UDF can be referenced in SQL statements for activities such as enforcing business rules, reading a queue file, aiding in the transformation of data, or accessing foreign servers such as Hadoop.

Let’s consider straight-forward (scalar) UDFs first, then table functions and table operators.

Scalar UDFs and Workload Management

A scalar UDF is the simplest form of a UDF.  It is referenced like a column and operates on the values of a single row. Scalar UDFs are ideal for managing events and standardizing operations inside of the Teradata Database. Here are a few characteristics of scalar UDFs as they relate to workload management.

Use of AMP Worker Task

AMP worker tasks (AWT) are required to accomplish work that is sent to the AMPs.  However, whether or not AWTs will be required by a UDF will depend on which mode it is operating in:

  • Protected mode:  If the UDF is running in protected mode it uses a separate process that is set up for the purpose of executing the UDF.  When the step that contains the UDF is executed, the UDF running in the AMP worker task (AWT) will grab the UDF server process and execute the UDF within that context.  When the UDF completes its processing for that row, it releases the protected mode server process to be used for other transactions that have a UDF.  
  • Non-protected mode:  When not in protected mode, the UDF is running in the context of the AWT being used by the query step.  There is no additional AWT involved. 

Optimizer Estimates for Classification

For scalar UDFs there is a fixed default cost that is established by the optimizer, the cost to run the UDF per each row in the table.  This cost will be adjusted depending on the number of rows per table and other predicates.  So you will see an estimated processing time for steps that include non-table function UDFs, based on these fixed values.

The optimizer will use a different default cost for UDFs running in protected mode vs. unprotected mode.  

Workload Exception Handling Based on Resource Usage

TASM looks at the transaction-in-process (TIP) table when it checks whether or not a workload exception based on resource usage has taken place.

A scalar UDF will update the TIP table on each AMP after every row is updated.  Each time it finishes with its work for a given row, the UDF makes an internal API call in order to keep a running tally of CPU and I/O. This means that if each row in an answer set causes a second of CPU time to be used executing a UDF, that second of time will be attributed to the query and will be seen by workload management when the UDF relinquishes control for that row.  TASM will see an increase in CPU for the transaction over time, as the UDF processes one row, then another, then another.

If for some reason the UDF is spending a lot of CPU on one row, the CPU used against that one row will not be visible until the UDF relinquishes control of the row and updates the internal database structures.   So if a UDF were to spend a minute processing one row, or if it were caught in an internal loop, workload management would not be able to detect its CPU usage until the base SQL query gets control again.

==== ================== What is the TIP TABLE =======================================

The transaction-in-process (TIP) table holds information about all the current transactions (or requests) that are in process on each AMP.  The TIP table keeps track of things like Host Number, spool usage, CPU and I/O usage, transient journal information, user ID, the start and end times of the transaction, and other detail.  Database Query Log, PMON and TASM get some of their information directly from the memory-resident TIP table, which can only be accessed using special internal APIs.   

============================================================================

Using Workload Management with Table Functions and Table Operators

While simple UDFs appear in the SELECT list and return a single value, table functions and/or table operators appear in the FROM clause and process a set of rows.  The term “table function” in this discussion will be used to refer to UDFs that access multiple rows inside or outside the Teradata Database that have to be called one time for each row processed.   The term “table operator” will be used to refer to UDFs that can access data inside Teradata or via a foreign server, but that are called only once for an entire data set.   The table operator comes with a high level of flexibility when it comes to converting input into output and vice versa, which table functions lack.

Use of AMP Worker Tasks

No additional AMP worker tasks are required for the execution of either a table function or table operator.   The AWT already acquired to execute the query step invokes the function and performs any required database work.  Any work performed outside of the database is outside the scope of what an AWT can do.  However AMP worker tasks that invoke the function will be held for the period of time that a table operator executes externally. 

Optimizer Estimates for Classification

No cost or cardinality estimates are currently produced by the optimizer for table functions or for table operators.  Consequently, no estimated processing time is provided that TASM can use for classification purposes.  It is unknown to the optimizer how many rows are being processed by a table function/operator or the effort involved in such processing.

Here is an Explain of a query that includes a table operator that accesses a remote server.  The Explain was taken from a Teradata Database 15.0 system.  Note that each step in the Explain includes an estimated processing time, except for the part of the plan that is executed on the remote server.  That part of the Explain text is delimited by the text “BEGIN/END EXPLAIN FOR REMOTE QUERY”.

EXPLAIN SELECT

CAST(Price AS DECIMAL (8,2))

, mileage

, CAST(make AS VARCHAR(20))

, CAST(model AS VARCHAR(20))

FROM vim.cardata@sdll7940 WHERE brand='Buick';

 

Explanation

‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐

1) First, we do an all‐AMPs RETRIEVE step executing table operator

SYSLIB.load_from_hcatalog with a condition of ("(1=1)"). The size

of Spool 2 is estimated with low confidence to be 4 rows (85,656

bytes). The estimated time for this step is 0.07 seconds.

2) Next, we do an all‐AMPs RETRIEVE step from Spool 2 (Last Use) by

way of an all‐rows scan into Spool 3 (used to materialize view,

derived table, table function or table operator TblOpInputSpool)

(all_amps), which is redistributed by hash code to all AMPs. The

size of Spool 3 is estimated with low confidence to be 4 rows (

85,656 bytes). The estimated time for this step is 0.08 seconds.

3) We do an all‐AMPs RETRIEVE step from Spool 3 (Last Use) by way of

an all‐rows scan executing table operator

SYSLIB.load_from_hcatalog with a condition of ("cardata.BRAND =

'Buick'") into Spool 4 (used to materialize view, derived table,

table function or table operator cardata) (all_amps), which is

built locally on the AMPs.

< BEGIN EXPLAIN FOR REMOTE QUERY ‐‐>

We use tdsqlh_td 15.00.00.00dev to connect to tdsqlh_hdp

01.03.02.01dev Hive Metastore server(sdll7940.labs.teradata.com)

on port 9083, then we retrieve and process 0 hadoop splits for

partitions brand = "Buick" that is about 0 bytes worth of rowdata

from remote table vim.cardata for the qualifying columns

(price,mileage,make,model,brand) and map them to the following

Teradata output columns.

price DOUBLE => REAL_DT, mileage BIGINT

=> BIGINT_DT, make STRING => VARCHAR_DT, model STRING =>

VARCHAR_DT, brand STRING => VARCHAR_DT

<‐‐ END EXPLAIN FOR REMOTE QUERY >

The size of Spool 4 is estimated with low confidence to be 4 rows

(964 bytes). The estimated time for this step is 0.08 seconds.

4) We do an all‐AMPs RETRIEVE step from Spool 4 (Last Use) by way of

an all‐rows scan with a condition of ("cardata.BRAND = 'Buick'")

into Spool 5 (group_amps), which is built locally on the AMPs.

The size of Spool 5 is estimated with low confidence to be 4 rows

(196 bytes). The estimated time for this step is 0.08 seconds.

5) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

‐> The contents of Spool 5 are sent back to the user as the result of

statement 1. The total estimated time is 0.31 seconds.

 

For more information about this Explain and to see other similar examples, see the orange book titled Teradata QueryGrid: Teradata Database-to-Hadoop.

If you base workload or throttle classification on estimated processing time, this query could classify to a workload inappropriately, because the time and effort taken by the table operator is not able to be represented within the query characteristics.

Workload Exception Handling Based on Resource Usage

Table functions do not update the TIP table with every row processed as do scalar UDFs.  Rather a buffer of 64K rows is constructed, and when the entire buffer is complete the combined resource usage is sent back to the TIP table.

TASM workload exception handling looks at the TIP table to identify when a workload exception, such as CPU usage, has been met.  This approach may cause a slight delay in the identification of a resource usage and the exception action may not be performed as quickly as it would with a scalar UDF.  The degree of the delay will depend how much processing the table function performs on each row within the external data source.

In the case of table operators, which are invoked once per data source, the TIP table on the AMPs will not get any resource usage information until a buffer of 64K rows has been accessed and spooled on the AMPs.  Once the table operator has completed, all resource usage can be tracked and the TIP table on each AMP will be updated a final time.  The resource usage reported is only for the Teradata database activity.   Any activity as a result of a table operator which is external to Teradata is not able to be reported in the TIP table and will not be visible to TASM.

Use workload exceptions based on resource usage with care when the workload is supporting queries that contain table functions or table operators.

New TASM Features Related to UDFs

A new TASM feature around UDFs in Teradata Database 14.10 allows target type classification to specify selected functions, similar to how other target classifications on things like tables or views works.  This includes scalar UDFs as well as table functions or table operators.  

Classification by UDF name allows you to control the workload (and thus the priority) that requests that contain specific UDFs will map to.  It also means you can now control concurrency by specific functions or groups of functions, by means of either a system throttle or a workload throttle. You could also create a filter rule with function classification that disallowed certain functions from being executed at specific times of the day.

A second related TASM enhancement that appears in Teradata Database 15.0 adds classification by Server Object.  A new database object of “server” is able to  be defined and allows TASM to use the server name explicitly for classification purposes.  A server object represents an external system that a table operator is accessing, and acts much like a view.  This enhancement allows you to control concurrency of requests that are going to be accessing a specific external server.

Comparing these two TASM enhancements, classification by function (in 14.10) is the more granular, as it allows you to exhibit control at the level of a single UDF if you wish. Classification by server object (in 15.0) will cover all UDFs that reference a specific foreign server.  
For additional information on UDFs, table functions and table operators, see the following orange books:

  • User Defined Functions, Mike Watzke
  • Teradata QueryGrid: Teradata Database-to-Hadoop, Doug Fraser and Vimalraj Panneerselvam

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

How to Calculate Your Max Number of Usable AMP Worker Tasks - comment by Ana

$
0
0

Hi Carrie, could you please help me with the following question regarding AWTs consumption?
- My customer has 50% COD in his TD platform (2+1HSN 6700H)
- Max of AWTs crosses 62 several times which indicates high concurrency and is a cause of concern.
- CPU is about 90% from 5am to 7pm.
My question is, releasing CoD will not help because this will not increase #AWT right?
To lower concurrency we have to work with tasm right?
 
 
 
 


How to Calculate Your Max Number of Usable AMP Worker Tasks - comment by Ana

$
0
0

Hi Carrie, could you please help me with the following question regarding AWTs consumption?
- My customer has 50% COD in his TD platform (2+1HSN 6700H)
- Max of AWTs crosses 62 several times which indicates high concurrency and is a cause of concern.
- CPU is about 90% from 5am to 7pm.
My question is, releasing CoD will not help because this will not increase #AWT right?
To lower concurrency we have to work with tasm right?

FastExport for Really Short Queries in Teradata 13.10 - comment by ambuj2k50

$
0
0

Hi Carrie,
Can we export the output of multiple select statements specified with in .begin export and .end export to multiple outfiles ?
If we create multiple .begin export and .end export blocks with in a single fast export job, are they executed in parallel or sequencially ?
Thanks
ambuj

How to Calculate Your Max Number of Usable AMP Worker Tasks - comment by carrie

$
0
0

Hi Ana,
 
Releasing COD will not increase the number of AMP worker tasks to your platform, but it will increase CPU and I/O resources, so requests that are holding AWTs should be able to complete sooner, release their AWT sooner, and allow AWTs to be available for other work sooner.  I would expect you would see lower AWT usage levels with more processing power available if the active work running on the platform is the same.
 
It is not necessarily problematic to have more than 62 AWTs active at the same time (assuming you have the default of 80 AWTs/AMP defined).  As long as a delay in getting an AWT for some of the requests is not causing problems meeting a response time goals.  However, if you are reaching 62 in-use AWTs from time to time, it is an indication that you are begininng to run out of AWTs, so workload management throttles are usually a good idea to consider, if you haven't already.  If yo use throttle rules, you can decide which workloads you want to experience a wait at busy times (by means of a throttle delay queue), rather than having it be randomly determined which queries will have their work messages wait in the work message queue.
 
Thanks, -Carrie

FastExport for Really Short Queries in Teradata 13.10 - comment by carrie

$
0
0

Ambuj,
 
I am not able to help you with this question as I am not experienced with FastExport coding conventions.  I'd suggest you post your question on Teradata Forum at:  http://forums.teradata.com/forum  or see if there are explanations in the Teradata FastExport Reference Manual that help answer your questions.
 
Best Regards, -Carrie

Expediting Express Requests - comment by SarathyG

$
0
0

Hi Carrie,
In a senario, where we have set EnableExpediteExp = 1 and AWTs have been reserved, and we have many workloads defined under Tactical, then my understanding is that, Tactical queries are likely to be highly impacted due to high request rate for work09, as tasks from all type of workloads will compete for work09 resulting in flowcontrol ?? 
For EnableExpediteExp = 2, am able to understand the clear benefit as its extra perf booster for tactical queries(alone) in terms of parsing.
What will be the ideal senario for using EnableExpediteExp = 1 ? 
 
 

Expediting Express Requests - comment by carrie

$
0
0

Sarathy,
 
Although it is possible,  I would not expect tactical queries to be impacted much, if at all, from competition with express requests.    Express requests for the most part are extremely fast (that is why they are named express requests) unless they have to wait for an AMP worker task.  Most of them go to one AMP, not all-AMPs.     In addition, if your tactical applications are well-tuned, primarily single-AMP requests, then they will mostly use work08 anyway, not work09, and there will be no conflict at all.
 
The only way to know if there is an impact like you describe is to monitor the SAWT table for inuse counts and max inuse counts for work09, before and then after you change the EnableExpediteExp setting.  If max in use counts for work09 are higher than your reserve number in either case, increase your reserve number.  
 
Flow control on work09 is pretty unusual.  Even if a work09 message cannot get an AWT, that work09 message will be placed in the message queue ahead of work08, work00, work01 and other lower work types, and will get off the queue sooner than those others, so flow control is very unlikely. Most commonly it is only workone messages that experience flow control.
 
The benefit for expedite express requests for all requests is that it adds a small boost in parsing times for all queries, which is often desirable.  However, option 2, which only provides that benefit to already-expedited requests, is a fine choice as well.  It's up to you.   
 
Thanks, -Carrie

Teradata 13.10 Statistics Collection Recommendations - comment by cmedved

$
0
0

Hi Carrie. Do you have any recommendations for statistics on a multi-level PPI where the columns used in the PPI are not in the PI?
 
If Col1 is the PI and Col2 and Col3 are the MLPPI columns, would we want to collect statistics on: (PARTITION, Col1, Col2, Col3)? Are there any others that would be important?
 
Thanks in advance!


New opportunities for statistics collection in Teradata 14.0 - comment by pinaldba

$
0
0

Hi Carrie,
 
Thanks for nice explanation. 
I have query related to Summary statistics for the staging tables.
Following steps are performed before loading data into final Model tables.

Step 1:- Loading data into Staging table 

 

CREATE SET TABLE staging.ABC ,NO FALLBACK ,

     NO BEFORE JOURNAL,

     NO AFTER JOURNAL,

     CHECKSUM = DEFAULT,

     DEFAULT MERGEBLOCKRATIO

     (

      RECORD_TYPE CHAR(2) CHARACTER SET LATIN NOT CASESPECIFIC,

      ACCOUNT_NO INTEGER NOT NULL,

      PARENT_ID INTEGER,

      CHILD_COUNT INTEGER,

      HIERARCHY_ID INTEGER,

PRIMARY INDEX ( ACCOUNT_NO );

 

Step 2:- Collecting statistics on staging table - staging.ABC

 

Step 3:- Rename staging.ABC to staging.ABC_XYZ and then renamed table "staging.ABC_XYZ" will be further used will be in batch processing and in transafomration.

 

Step 4:- Re-create the staging table ipstaging.AR_A100 for next run.
Questions:- 
1). Does Teradata recommanded to collect stats after renaming the table staging.ABC_XYZ? if Answer is Yes, then I would like to know the reason for the same?
2). Does renaming the tables can drop Random Amp sample from summary statistics history as Table-id has been changed during staging process. Does optimizer may extrapulate wronly here as stats has been not refreshed after renaming the staging table ""staging.ABC_XYZ".
 
Thanks for your help in Advance and eagerly awaiting answer from you.
 
Regards
Pinal
 
 
 
 
 
 
 
 
 

FastExport for Really Short Queries in Teradata 13.10 - comment by ambuj2k50

$
0
0

Carrie,
I have read the complete fastexport manuall but could not find answer to my query.
Anyways i will post it in a separate thread in forum.
Thanks,
Ambuj

Teradata 13.10 Statistics Collection Recommendations - comment by carrie

$
0
0

Whether the PPI table uses a single level of partitioning, or multiple level of partitioning, the guidelines are still the same.  The fourth bullet under "Other Considerations" above provides a couple of recommendations that would apply to MLPPI tables.  Whether or not you want to implement those recommendations depends on how partitioning is being used by queries and the type of joins being performed against the PPI table.    And of course, whether or not they improve your query plans!
 
•For a partitioned primary index table, consider collecting these statistics if the partitioning column is not part of the table’s primary index (PI):
 
◦(PARTITION, PI).  This statistic is most important when a given PI value may exist in multiple partitions, and can be skipped if a PI value only goes to one partition. It provides the optimizer with the distribution of primary index values across the partitions.  It helps in costing the sliding-window and rowkey-based merge join, as well as dynamic partition elimination.
 
 ◦(PARTITION, PI, partitioning column).   This statistic provides the combined number of distinct values for the combination of PI and partitioning columns after partition elimination.  It is used in rowkey-based merge join costing.
 
Thanks, -Carrie

New opportunities for statistics collection in Teradata 14.0 - comment by carrie

$
0
0

Pinal,
 
Statistics rows in the new DBC.StatsTbl in 14.0 refer to table id not table name. So, there is no problem with renaming a table, the stats from the old name (collected and RAS) will still be usable.
 
See page 893 of the SQL Data Definition Language Detailed Topics manual for validation of this: 
 
Function of RENAME Table Requests
 
When you rename a table, Teradata Database only changes the table name. All statistics and
privileges belonging to the table remain with it under the new name.
 
It's easy to see for yourself that the stats are still there: create table, add some rows, collect some stats, rename the table and you will see the stats are associated with the renamed table.    If they are outdated, it's always a good idea to recollect them whether or not you have renamed the table.  But the act of renaming does not in and of itself require a recollection.
 
Thanks, -Carrie

Workload Management with User Defined Functions and Table Operators - comment by geethareddy

$
0
0

hi Carrie,

While reading this article, few things triggered me to get clarified on 'delay issues on the system'. Sometimes we see too many delayed sessions in delay queue from Viewpoint. From your previous articles, i understood that there are 'send delays' (in milliseconds) and 'receive delays' (in milliseconds) for a request. Are these milliseconds are wall clock ticks OR cpu seconds? Another question is, when we see delay queue hiked to 3 digit or 4 digit number, in that case, does system experience any disadvantage in terms of resouce wastage (like CPU/IO)?
Just trying to understand the hardest hit of delay queue on the system beyond the delay in processing of the cocnern application jobs. 
 
 

Viewing all 1058 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>