Thursday, February 14, 2013

SQL - the problem what should not be.

After spending about 5 hours trying to get a nested SQL query working I sit in frustration at how lame the language is and how even more lame the parsers are for the language.
They say the last 10% of any project is the hardest, and I agree.
We now have lame speech recognition but we don't have excellent SQL parsing.
What's worse, I know too much.  I know that this whole SQL thing was solved by the Sidewalk team at Microsoft back in 1997 by an ingenious developer who devised CQL (Conceptual Query Language) which was implemented using an ODBC driver on top of SQL Server.
We had a schema that was so big, it covered about 100 feet of wall-space around our offices when printed on huge 4x5 foot sheets of paper in 12pt font or so!  It became so difficult to generate efficient (even correct) SQL queries that CQL was devised.  It analyzed the schema and then, given a much more free-form language describing the problem in a table-less flat space would create the optimal SQL query for the CQL specification and cache it and return the results.  Parameters could be set up to substitute into the cached queries and, whalla, the whole SQL efficiency, complexity and parsing problem was gone.
That enterprising developer was hired by the SQL Server team.  Unfortunately, CQL never saw the light of day.  I suppose that it didn't make sense to solve a core problem like that.  There was just too much invested in SQL consulting and systems to bypass it.  I'm sure it came down to $.
Anyway, my specific problem is this:

Schema:
-- phpMyAdmin SQL Dump
-- version 3.4.5
-- http://www.phpmyadmin.net
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
CREATE TABLE IF NOT EXISTS `links` (
  `pageId` int(10) unsigned NOT NULL,
  `linkId` int(10) unsigned NOT NULL,
  `whenUsed` datetime NOT NULL,
  `whenRendered` datetime NOT NULL,
  KEY `pageId` (`pageId`),
  KEY `linkId` (`linkId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `links` (`pageId`, `linkId`, `whenUsed`, `whenRendered`) VALUES
(1, 9, '2013-02-14 00:44:20', '2013-02-14 00:42:39'),
(1, 9, '2013-02-14 00:44:21', '2013-02-14 00:42:39'),
(1, 9, '2013-02-14 00:44:23', '2013-02-14 00:44:21'),
(1, 8, '2013-02-14 00:44:25', '2013-02-14 00:44:23'),
(1, 7, '2013-02-14 00:44:26', '2013-02-14 00:44:25'),
(1, 6, '2013-02-14 00:44:28', '2013-02-14 00:44:26'),
(1, 3, '2013-02-14 00:44:29', '2013-02-14 00:44:26'),
(1, 7, '2013-02-14 00:44:31', '2013-02-14 00:44:29'),
(1, 8, '2013-02-14 00:44:32', '2013-02-14 00:44:31'),
(1, 11, '2013-02-14 00:44:34', '2013-02-14 00:44:32');

Inner Query:
SELECT `linkId`, 
TIMEDIFF(`whenUsed`, `whenRendered`) AS 'URDiff', 
CONVERT(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()), SIGNED) AS 'dateDiff',
CONVERT('dateDiff' = 0, UNSIGNED) AS 'usedToday',
CONVERT('dateDiff' < 8 AND 'dateDiff' > 0, UNSIGNED) AS 'usedThisWeek',
CONVERT('dateDiff' < 31 AND 'dateDiff' > 7, UNSIGNED) AS 'usedThisMonth', 
CONVERT('dateDiff' < 365 AND 'dateDiff' > 30, UNSIGNED) AS 'usedThisYear'
FROM `links` 
WHERE MINUTE('URDiff') < 15 AND HOUR('URDiff') = 0
AND `pageId` = '1'
) AS T

Results of inner query:
+--------+----------+----------+-----------+--------------+---------------+--------------+
| linkId | URDiff   | dateDiff | usedToday | usedThisWeek | usedThisMonth | usedThisYear |
+--------+----------+----------+-----------+--------------+---------------+--------------+
|      9 | 00:01:41 |        0 |         1 |            0 |             0 |            0 |
|      9 | 00:01:42 |        0 |         1 |            0 |             0 |            0 |
|      9 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      8 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      7 | 00:00:01 |        0 |         1 |            0 |             0 |            0 |
|      6 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      3 | 00:00:03 |        0 |         1 |            0 |             0 |            0 |
|      7 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      8 | 00:00:01 |        0 |         1 |            0 |             0 |            0 |
|     11 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |

+--------+----------+----------+-----------+--------------+---------------+--------------+ Outer Query:


SELECT 
`linkId`,
SUM('T.usedToday') AS 'countToday',
SUM('T.usedThisWeek') AS 'countThisWeek',
SUM('T.usedThisMonth') AS 'countThisMonth',
SUM('T.usedThisYear') AS 'countThisYear'
FROM

(
SELECT `linkId`, 
TIMEDIFF(`whenUsed`, `whenRendered`) AS 'URDiff', 
CONVERT(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()), SIGNED) AS 'dateDiff',
CONVERT('dateDiff' = 0, UNSIGNED) AS 'usedToday',
CONVERT('dateDiff' < 8 AND 'dateDiff' > 0, UNSIGNED) AS 'usedThisWeek',
CONVERT('dateDiff' < 31 AND 'dateDiff' > 7, UNSIGNED) AS 'usedThisMonth', 
CONVERT('dateDiff' < 365 AND 'dateDiff' > 30, UNSIGNED) AS 'usedThisYear'
FROM `links` 
WHERE MINUTE('URDiff') < 15 AND HOUR('URDiff') = 0
AND `pageId` = '1'
) AS T

GROUP BY `linkId`        
Outer query result:+--------+------------+---------------+----------------+---------------+
| linkId | countToday | countThisWeek | countThisMonth | countThisYear |
+--------+------------+---------------+----------------+---------------+
|      3 |          0 |             0 |              0 |             0 |
|      6 |          0 |             0 |              0 |             0 |
|      7 |          0 |             0 |              0 |             0 |
|      8 |          0 |             0 |              0 |             0 |
|      9 |          0 |             0 |              0 |             0 |
|     11 |          0 |             0 |              0 |             0 |
+--------+------------+---------------+----------------+---------------+


Go figure?  I can clearly see from the inner results that there are 3 rows in the usedToday column that are 1 (and I have converted them to unsigned so they should SUM ok).  Yet the countToday for linkId 9 is 0 and not 3.  Why?

MySQL gives me some interesting warnings:
Warning (Code 1292): Truncated incorrect time value: 'URDiff'
Warning (Code 1292): Truncated incorrect time value: 'URDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'


From what I have been able to find online about this warning, it appears to happen for some people who didn't us quotes on their names (which I use everywhere).  Now why would a parser give such a warning for quote usage?

LAMEOTECH I say!

Answer:
My son correctly pointed out that I was quoting my column names (a no no I guess, with no help from the parser) and referencing aliases outside the GROUP BY, HAVING, and ORDER BY clauses (also no help from the parser).
The correct query becomes:
SELECT 
`linkId`,
SUM(T.usedToday)ascountToday,
SUM(T.usedThisWeek)as countThisWeek,
SUM(T.usedThisMonth)as countThisMonth,
SUM(T.usedThisYear)as countThisYear
FROM
(
    SELECT `linkId`, 
    TIMEDIFF(`whenUsed`, `whenRendered`) as URDiff, 
    DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) as `dateDiff`,
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) = 0, 1, 0)as usedToday,
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) < 8 AND DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) > 0, 1, 0)as usedThisWeek,
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) < 31 AND DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) > 7, 1, 0) as usedThisMonth, 
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) < 365 AND DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) > 30, 1, 0) as usedThisYear
    FROM `links` 
    WHERE MINUTE(TIMEDIFF(`whenUsed`, `whenRendered`)) < 15 AND HOUR(TIMEDIFF(`whenUsed`, `whenRendered`)) = 0
    AND `pageId` = 1
) as T
GROUP BY `linkId`    
 
With a cheery results of:

+--------+--------------+---------------+----------------+---------------+
| linkId | ascountToday | countThisWeek | countThisMonth | countThisYear |
+--------+--------------+---------------+----------------+---------------+
|      3 |            1 |             0 |              0 |             0 |
|      6 |            1 |             0 |              0 |             0 |
|      7 |            2 |             0 |              0 |             0 |
|      8 |            2 |             0 |              0 |             0 |
|      9 |            3 |             0 |              0 |             0 |
|     11 |            1 |             0 |              0 |             0 |
+--------+--------------+---------------+----------------+---------------+