SQL语句
前言
这篇文章中的大部分的SQL语句都是在clickhouse中编写的,所以有些函数,比如toHour、toMinute可能其他的数据库中并没有。
1.去掉时间的毫秒数
有时候查询出来的sql语句中的时间是这样的’2019-01-01 02:03:22.000’,如何去掉其中的’.000’呢?
1 | ---- RecordTime 为字段名 |
1.SqlServer数据库中datetime日期不要毫秒的几种方法
2.MySQL 用sql语句格式化时间和日期
3.sql server 日期时间 存储时怎么能不让它显示毫秒,后面那3个000
2.分时间段统计
数据表t中有一个字段是sum,这个sum记录的是,前一条记录的sum加上累积量得到的新的sum值。我需要获取到每一个整点时间的sum值,但是这个sum值如果在整点的时候恰好为空,那么就要取向前五分钟的值作为整点的值,如果向前五分钟还是空,则再次向前找,最终得到一个不为空的值作为整点的值。在另一篇文章中,我曾经查过如何分时间段统计(MySql使用及SQL语法学习),最后也没有实现。现在我有一个新的思路,那就是现将时间进行分组,然后取这一组中时间记录最大的且sum不为空的记录,这样就好操作多了。
(1) 将时间段转为分组
1 | select id,SiteNo,RecordTime,floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60) as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' ordery by RecordTime |
(2) 按站点编号进行分组,获取分组最新记录
1 | select SiteNo,RecordTime,groupid from (select SiteNo,RecordTime,MediumType,floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.1) as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' order by RecordTime ASC) as t group by SiteNo,groupid,RecordTime order by groupid |
(3) 分组获取每小时最新一条记录并按站点编号和时间进行排序
1 | select SiteNo, max(RecordTime) from (select SiteNo,RecordTime,groupid from (select SiteNo,RecordTime,MediumType,floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001) as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' order by RecordTime ASC) as t group by SiteNo,groupid,RecordTime order by SiteNo,groupid,RecordTime) group by groupid,SiteNo order by SiteNo |
(4) 获取按小时分组获取最新的,SumFlux不为空且不为0的一条记录
1 | select SiteNo, max(RecordTime) from (select SiteNo,RecordTime,groupid from (select SiteNo,RecordTime,MediumType,floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001) as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0 order by RecordTime ASC) as t group by SiteNo,groupid,RecordTime order by SiteNo,groupid,RecordTime) group by |
(5) 上面的内容可以说已经基本上实现了需求,不过加入inner join后的同一RecordTime和同一SiteNo的情况有多条记录怎么办呢?这个时候还需要通过id号,取主键id号最大的一条记录。
1 | Select * from lsmfhx.v_realtime as vr inner join (Select toUInt64(max(id)) as i from (select id,SiteNo,RecordTime from lsmfhx.v_realtime v inner join (select SiteNo,max(RecordTime) as recor,floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60) as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0 group by groupid,SiteNo order by SiteNo,recor) as temp_table on v.SiteNo=temp_table.SiteNo and v.RecordTime=temp_table.recor) group by RecordTime,SiteNo order by i) as id_table on vr.id=id_table.i |
是不是还有其他方法呢?比如参考文章4中的内容。
1.sql查询技巧,按时间分段进行分组,每半小时一组统计组内记录数量
2.sql 查数据库中时间最新的一条记录
3.mysql查询最后一条记录
4.SQL中遇到多条相同内容只取一条的最简单实现
7.GROUP BY having MAX date
6.SQL中遇到多条相同内容只取一条的最简单实现
7.SQL分组取最大值的方法
(更新)
上面的一些sql可能不太正确,只是提供了一个思路。后来我又想了一个思路,因为我的数据是有多个约束字段,SiteNo(站点编号)、RecordTime(记录时间2020-01-02 09:08:22)、MediumType(介质类型),我需要每种介质,每个站点,取其每一个小时中的最大值(整点记录,比如2020-01-02 09:00:00算最大时间),所以我使用了将站点编号和时间以及介质和小时组成一个数字,然后以这个数字进行分组
1 | ((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as groupid |
形成的数字如下:20160920(日期) 22(小时数分组) 001(站点编号) 000(介质类型)
重写整理一下思路
(1) 将日期整理成分组,把日期转为数字,把小时数转为距零点的分钟数:toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001),整点要放到前一个分组中,所以就添加了一个0.001
1 | select RecordTime,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0 |
(2) 以groupid为分组,获取每组最大的记录时间
1 | select max(RecordTime),groupid from (select RecordTime,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as groupid from lsmfhx.v_realtime as v where RecordTime between '2016-09-19 00:00:00' and '2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0) as t group by groupid order by groupid |
(3) 获取其他信息。上面的只能获取到记录的最大时间,如果想要获取这条记录的其他的信息怎么办,比如这条记录的id号,SiteNo站点编号等。
1 | select id,RecordTime,SiteNo,MediumType,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as gid from lsmfhx.v_realtime as v_t inner join (select max(RecordTime) as maxtime,groupid from (select RecordTime,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as groupid from lsmfhx.v_realtime as v where RecordTime >= '2016-09-19 00:00:00' and RecordTime <='2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0) group by groupid order by groupid) as tp on gid=tp.groupid and v_t.RecordTime=tp.maxtime order by SiteNo,RecordTime |
(4) 第三个基本上实现了我的需要,但是还是存在一种情况,就是时间相同的情况,就会出现重复的记录。
这个时候,我觉得还要做一步,就是取id号最大的那一条记录。现在我用的是在进行一次内连接,达到这个目的,但是总感觉有点不好。
1 | Select id,RecordTime,SiteNo from lsmfhx.v_realtime as vr inner join (Select toUInt64(max(id)) as i from (select id,RecordTime,SiteNo,MediumType,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as gid from lsmfhx.v_realtime as v_t inner join (select max(RecordTime) as maxtime,groupid from (select RecordTime,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as groupid from lsmfhx.v_realtime as v where RecordTime >= '2016-09-19 00:00:00' and RecordTime <='2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0) group by groupid order by groupid) as tp on gid=tp.groupid and v_t.RecordTime=tp.maxtime order by SiteNo,RecordTime) group by RecordTime,SiteNo order by i) as id_table on vr.id=id_table.i order by SiteNo,RecordTime |
(5) 在4的基础上分组求和。
1 | Select MediumType,sum(SumFlux),(toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001)) as timeid from lsmfhx.v_realtime as vr inner join (Select toUInt64(max(id)) as i from (select id,RecordTime,SiteNo,MediumType,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as gid from lsmfhx.v_realtime as v_t inner join (select max(RecordTime) as maxtime,groupid from (select RecordTime,((toYYYYMMDD(RecordTime)*100+floor(((toHour(RecordTime)*60+toMinute(RecordTime)))/60-0.001))*1000+SiteNo)*1000+MediumType as groupid from lsmfhx.v_realtime as v where RecordTime >= '2016-09-19 00:00:00' and RecordTime <='2026-09-20 23:59:00' and SumFlux IS NOT NULL and SumFlux != 0) group by groupid order by groupid) as tp on gid=tp.groupid and v_t.RecordTime=tp.maxtime order by SiteNo,RecordTime) group by RecordTime,SiteNo order by i) as id_table on vr.id=id_table.i group by timeid,MediumType order by timeid |
SQL语句总的合起来,非常的复杂,但是从里到外一层层的剥离,就能知道整个的逻辑是什么样的了。
3.join语法
INNER JOIN:如果表中有至少一个匹配,则返回行
LEFT JOIN:即使右表中没有匹配,也从左表返回所有的行
RIGHT JOIN:即使左表中没有匹配,也从右表返回所有的行
FULL JOIN:只要其中一个表中存在匹配,则返回行
4.group by语句
group by 有一个原则,就是 select 后面的所有列中,没有使用聚合函数的列,必须出现在 group by 后面
5.尖峰平谷时间段
1.如何写一条SQL,分时段统计结果?