Thursday, February 14, 2013

SQL - the problem what should not be.

After spending about 5 hours trying to get a nested SQL query working I sit in frustration at how lame the language is and how even more lame the parsers are for the language.
They say the last 10% of any project is the hardest, and I agree.
We now have lame speech recognition but we don't have excellent SQL parsing.
What's worse, I know too much.  I know that this whole SQL thing was solved by the Sidewalk team at Microsoft back in 1997 by an ingenious developer who devised CQL (Conceptual Query Language) which was implemented using an ODBC driver on top of SQL Server.
We had a schema that was so big, it covered about 100 feet of wall-space around our offices when printed on huge 4x5 foot sheets of paper in 12pt font or so!  It became so difficult to generate efficient (even correct) SQL queries that CQL was devised.  It analyzed the schema and then, given a much more free-form language describing the problem in a table-less flat space would create the optimal SQL query for the CQL specification and cache it and return the results.  Parameters could be set up to substitute into the cached queries and, whalla, the whole SQL efficiency, complexity and parsing problem was gone.
That enterprising developer was hired by the SQL Server team.  Unfortunately, CQL never saw the light of day.  I suppose that it didn't make sense to solve a core problem like that.  There was just too much invested in SQL consulting and systems to bypass it.  I'm sure it came down to $.
Anyway, my specific problem is this:

Schema:
-- phpMyAdmin SQL Dump
-- version 3.4.5
-- http://www.phpmyadmin.net
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
CREATE TABLE IF NOT EXISTS `links` (
  `pageId` int(10) unsigned NOT NULL,
  `linkId` int(10) unsigned NOT NULL,
  `whenUsed` datetime NOT NULL,
  `whenRendered` datetime NOT NULL,
  KEY `pageId` (`pageId`),
  KEY `linkId` (`linkId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `links` (`pageId`, `linkId`, `whenUsed`, `whenRendered`) VALUES
(1, 9, '2013-02-14 00:44:20', '2013-02-14 00:42:39'),
(1, 9, '2013-02-14 00:44:21', '2013-02-14 00:42:39'),
(1, 9, '2013-02-14 00:44:23', '2013-02-14 00:44:21'),
(1, 8, '2013-02-14 00:44:25', '2013-02-14 00:44:23'),
(1, 7, '2013-02-14 00:44:26', '2013-02-14 00:44:25'),
(1, 6, '2013-02-14 00:44:28', '2013-02-14 00:44:26'),
(1, 3, '2013-02-14 00:44:29', '2013-02-14 00:44:26'),
(1, 7, '2013-02-14 00:44:31', '2013-02-14 00:44:29'),
(1, 8, '2013-02-14 00:44:32', '2013-02-14 00:44:31'),
(1, 11, '2013-02-14 00:44:34', '2013-02-14 00:44:32');

Inner Query:
SELECT `linkId`, 
TIMEDIFF(`whenUsed`, `whenRendered`) AS 'URDiff', 
CONVERT(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()), SIGNED) AS 'dateDiff',
CONVERT('dateDiff' = 0, UNSIGNED) AS 'usedToday',
CONVERT('dateDiff' < 8 AND 'dateDiff' > 0, UNSIGNED) AS 'usedThisWeek',
CONVERT('dateDiff' < 31 AND 'dateDiff' > 7, UNSIGNED) AS 'usedThisMonth', 
CONVERT('dateDiff' < 365 AND 'dateDiff' > 30, UNSIGNED) AS 'usedThisYear'
FROM `links` 
WHERE MINUTE('URDiff') < 15 AND HOUR('URDiff') = 0
AND `pageId` = '1'
) AS T

Results of inner query:
+--------+----------+----------+-----------+--------------+---------------+--------------+
| linkId | URDiff   | dateDiff | usedToday | usedThisWeek | usedThisMonth | usedThisYear |
+--------+----------+----------+-----------+--------------+---------------+--------------+
|      9 | 00:01:41 |        0 |         1 |            0 |             0 |            0 |
|      9 | 00:01:42 |        0 |         1 |            0 |             0 |            0 |
|      9 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      8 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      7 | 00:00:01 |        0 |         1 |            0 |             0 |            0 |
|      6 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      3 | 00:00:03 |        0 |         1 |            0 |             0 |            0 |
|      7 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |
|      8 | 00:00:01 |        0 |         1 |            0 |             0 |            0 |
|     11 | 00:00:02 |        0 |         1 |            0 |             0 |            0 |

+--------+----------+----------+-----------+--------------+---------------+--------------+ Outer Query:


SELECT 
`linkId`,
SUM('T.usedToday') AS 'countToday',
SUM('T.usedThisWeek') AS 'countThisWeek',
SUM('T.usedThisMonth') AS 'countThisMonth',
SUM('T.usedThisYear') AS 'countThisYear'
FROM

(
SELECT `linkId`, 
TIMEDIFF(`whenUsed`, `whenRendered`) AS 'URDiff', 
CONVERT(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()), SIGNED) AS 'dateDiff',
CONVERT('dateDiff' = 0, UNSIGNED) AS 'usedToday',
CONVERT('dateDiff' < 8 AND 'dateDiff' > 0, UNSIGNED) AS 'usedThisWeek',
CONVERT('dateDiff' < 31 AND 'dateDiff' > 7, UNSIGNED) AS 'usedThisMonth', 
CONVERT('dateDiff' < 365 AND 'dateDiff' > 30, UNSIGNED) AS 'usedThisYear'
FROM `links` 
WHERE MINUTE('URDiff') < 15 AND HOUR('URDiff') = 0
AND `pageId` = '1'
) AS T

GROUP BY `linkId`        
Outer query result:+--------+------------+---------------+----------------+---------------+
| linkId | countToday | countThisWeek | countThisMonth | countThisYear |
+--------+------------+---------------+----------------+---------------+
|      3 |          0 |             0 |              0 |             0 |
|      6 |          0 |             0 |              0 |             0 |
|      7 |          0 |             0 |              0 |             0 |
|      8 |          0 |             0 |              0 |             0 |
|      9 |          0 |             0 |              0 |             0 |
|     11 |          0 |             0 |              0 |             0 |
+--------+------------+---------------+----------------+---------------+


Go figure?  I can clearly see from the inner results that there are 3 rows in the usedToday column that are 1 (and I have converted them to unsigned so they should SUM ok).  Yet the countToday for linkId 9 is 0 and not 3.  Why?

MySQL gives me some interesting warnings:
Warning (Code 1292): Truncated incorrect time value: 'URDiff'
Warning (Code 1292): Truncated incorrect time value: 'URDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'dateDiff'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedToday'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisWeek'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisMonth'
Warning (Code 1292): Truncated incorrect DOUBLE value: 'T.usedThisYear'


From what I have been able to find online about this warning, it appears to happen for some people who didn't us quotes on their names (which I use everywhere).  Now why would a parser give such a warning for quote usage?

LAMEOTECH I say!

Answer:
My son correctly pointed out that I was quoting my column names (a no no I guess, with no help from the parser) and referencing aliases outside the GROUP BY, HAVING, and ORDER BY clauses (also no help from the parser).
The correct query becomes:
SELECT 
`linkId`,
SUM(T.usedToday)ascountToday,
SUM(T.usedThisWeek)as countThisWeek,
SUM(T.usedThisMonth)as countThisMonth,
SUM(T.usedThisYear)as countThisYear
FROM
(
    SELECT `linkId`, 
    TIMEDIFF(`whenUsed`, `whenRendered`) as URDiff, 
    DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) as `dateDiff`,
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) = 0, 1, 0)as usedToday,
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) < 8 AND DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) > 0, 1, 0)as usedThisWeek,
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) < 31 AND DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) > 7, 1, 0) as usedThisMonth, 
    if(DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) < 365 AND DATEDIFF(`whenUsed`, UTC_TIMESTAMP()) > 30, 1, 0) as usedThisYear
    FROM `links` 
    WHERE MINUTE(TIMEDIFF(`whenUsed`, `whenRendered`)) < 15 AND HOUR(TIMEDIFF(`whenUsed`, `whenRendered`)) = 0
    AND `pageId` = 1
) as T
GROUP BY `linkId`    
 
With a cheery results of:

+--------+--------------+---------------+----------------+---------------+
| linkId | ascountToday | countThisWeek | countThisMonth | countThisYear |
+--------+--------------+---------------+----------------+---------------+
|      3 |            1 |             0 |              0 |             0 |
|      6 |            1 |             0 |              0 |             0 |
|      7 |            2 |             0 |              0 |             0 |
|      8 |            2 |             0 |              0 |             0 |
|      9 |            3 |             0 |              0 |             0 |
|     11 |            1 |             0 |              0 |             0 |
+--------+--------------+---------------+----------------+---------------+


4 comments:

  1. Sql is actually pretty cool in this way, cuz sometimes you want to have a string where a column would normally belong. For example, when doing an insert with a sub select, you can inject static values, basically as a sort of imaginary column. Example:

    INSERT INTO stuff
    SELECT Col1, Col2, 'VALID'
    FROM other_stuff
    WHERE other_stuff.value != 'infinity'

    This would give you something like:

    +--------+--------+--------+
    |Col1 |Col2 |Col3 |
    +--------+--------+--------+
    |41 |32 |VALID |
    +--------+--------+--------+
    |9 |7 |VALID |
    +--------+--------+--------+
    |12 |32 |VALID |
    +--------+--------+--------+
    |123 |84 |VALID |
    +--------+--------+--------+

    Sql is cool, because it uses a different sort of thinking than most procedural-type languages. It's super tricky and sorta dumb, but also magic.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. C++ is very cool too because you can overload an innocent operator like + and have it do anything you want with just a simple character added.
      Such "features" create the rope that newbe's hang themselves with.
      With such possibilities in a language like SQL, the parser clearly should note all the existing aliases and if a quited string is the same as one of them but in an inappropriate context, it should raise a warning like:
      "expression 'foo' looks like a constant but may have been intended as an alias reference. Alias references in an inner select cannot be used except in the context of a GROUP BY, ORDER BY or HAVING statement."
      Technically, such a usage is allowed by the systax but the odds of a constant being the same as an alias yet intended as a constant are near nill and the warning can be suppressed by using a different alias name.
      The fact is, to make a parser this smart takes some real work but the number of man-years this saves for all the 'nubes' out there is astronimical. The leverage of such a labor of love I believe is what makes the world a better place.

      Delete