虎克的博客

Enthusiasm Biogeography-Biodiversity Informatics-Data Sciences

我们经常遇到需要从一张多个字段具有不唯一重复值的表中统计数据的情况,表结构如下图:

Lisong_2010-11-08_144758

表中只有一个字符型字段CanonicalString和ID具有唯一值,其它字段都有不同程度的重复。需要的结果是统计每一个不同分类等级(界、门、纲、目、科、属)的下级分类单元有多少。例如:统计出每一个科(TaxonFamily)有多少个属(TaxonGenus),多少个分类群(CanonicalString);统计每一个门有多纲、目、科、属等等,其它依次类推。

1、统计每一个科有多少分类群

 

SELECT  TaxonFamily AS Family ,

        COUNT(Id) AS TaxonNumber

FROM    dbo.Taxons

GROUP BY TaxonFamily

ORDER BY TaxonNumber DESC
运行结果:
image 

2、统计类群数目最多的10个科

 

SELECT TOP(10)  TaxonFamily AS Family ,



        COUNT(Id) AS TaxonNumber



FROM    dbo.Taxons



GROUP BY TaxonFamily



ORDER BY TaxonNumber DESC

 

3、统计每一个科包括有多少个属

 

SELECT  TaxonFamily AS Family ,

        COUNT(DISTINCT ( TaxonGenus )) AS GenusNumber

FROM    dbo.Taxons

GROUP BY TaxonFamily

ORDER BY GenusNumber DESC

 

运行结果:

image

4、统计每一个科的属和分类群的数目

 
SELECT  TaxonFamily AS Family ,
        COUNT(DISTINCT ( TaxonGenus )) AS GenusNumber ,
        COUNT(Id) AS TaxonNumber
FROM    dbo.Taxons
GROUP BY TaxonFamily
ORDER BY GenusNumber DESC

运行结果:

image

5、统计每一个属包括的分类群数目并显示这个属的科名

SELECT TaxonFamily AS Family,  TaxonGenus AS Genus ,
        COUNT(Id) AS TaxonNumber
FROM    dbo.Taxons
GROUP BY TaxonGenus, TaxonFamily
ORDER BY TaxonNumber DESC

运行结果:

image

6、统计某一个科的属和分类群数目

SELECT  TaxonFamily AS Family ,
        COUNT(DISTINCT ( TaxonGenus )) AS GenusNumber ,
        COUNT(Id) AS TaxonNumber
FROM    dbo.Taxons
WHERE   TaxonFamily = 'Pinaceae'
GROUP BY TaxonFamily

运行结果:

image

7、统计物种名字字段(CanonicalString)中空格出现的次数

SELECT  Id ,
        CanonicalString ,
        LEN(CanonicalString) - LEN(REPLACE(CanonicalString, ' ', '')) AS 空格出现次数
FROM    dbo.Taxons

运行结果如下:

image

上面这个语句的作用是当我们没有有效标记改类群的分类等级,种或者亚种、变种的时候,可以用来作为筛选记录的一个备选方法。

8、统计每一个科有多少属和种(不包括种下等级的情况)

SELECT  TaxonFamily AS Family ,
        COUNT(DISTINCT ( TaxonGenus )) AS GenusNumber ,
        COUNT(Id) AS SpeciesNumber
FROM    dbo.Taxons
WHERE   LEN(CanonicalString) - LEN(REPLACE(CanonicalString, ' ', '')) = 1
GROUP BY TaxonFamily

9、统计每一个大类群(蕨类、苔藓、双子叶等)有的科、属、种的数目

2010-12-12_210829

--统计大类群的科、属、类群数目
SELECT  TaxonGroup ,
        COUNT(DISTINCT ( TaxonFamily )) AS FamilyNumber ,
        COUNT(DISTINCT ( TaxonGenus )) AS GenusNumber ,
        COUNT(Id) AS TaxonNumber
FROM    dbo.Taxons
GROUP BY TaxonGroup
ORDER BY FamilyNumber DESC

--统计大类群的科、属、类群数目
SELECT  TaxonGroup ,
        COUNT(DISTINCT ( TaxonFamily )) AS FamilyNumber ,
        COUNT(DISTINCT ( TaxonGenus )) AS GenusNumber ,
        COUNT(Id) AS SpeciesNumber
FROM    dbo.Taxons
WHERE   [RANK] = 'species'
GROUP BY TaxonGroup
ORDER BY FamilyNumber DESC