1. Define business requirements first
  2. SELECT fields instead of using SELECT *
  3. Avoid SELECT DISTINCT
  4. Create joins with INNER JOIN (not WHERE) Some SQL developers prefer to make joins with WHERE clauses. This type of join creates a Cartesian Join, also called a Cartesian Product or CROSS JOIN. Cartesian Joins are especially problematic in large-scale databases, because a Cartesian Join of two large tables could create billions or trillions of results. To prevent creating a Cartesian Join, use INNER JOIN instead.
  5. Use WHERE instead of HAVING to define filters
  6. Use wildcards at the end of a phrase only When searching plaintext data, such as cities or names, wildcards create the widest search possible. However, the widest search is also the most inefficient search. When a leading wildcard is used, especially in combination with an ending wildcard, the database is tasked with searching all records for a match anywhere within the selected field.
  7. Use LIMIT to sample query results
  8. Run your query during off-peak hours

There are also many other advices on Google. By understanding the mechanism of SQL Query, the efficiency of specific cases can be analyzed and compared.

在做最后一道关于 SQL 的题目的时候,其实剩下的时间蛮多的。可是我却没有写完。 现在想想其实蛮简单的。不需要什么 PostgreSQL 的高级用法。 以前做这种题太少了,经验不足。所以,没能把握好思考的方向。 在之前备考的时候,我有略略浏览过 PostgreSQL 的文档。 由于时间紧迫,对于文档里面的各种高级用法的应用场景,我理解得很不充分。 机试的时候,我就在想文档里面的各种高级用法。想不到好的思路,人就慌了。 还有就是,做到后面的时候,我感觉自己体力有点不支了。 我感觉这次机试要死在 SQL 上。

In [ ]:
%load_ext google.cloud.bigquery
The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery
In [ ]:
%%bigquery
CREATE TABLE
  dataset.buses ( id INT64,
    origin STRING,
    destination STRING,
    time STRING )
Out[ ]:
In [ ]:
%%bigquery
INSERT
  dataset.buses (id,
    origin,
    destination,
    time)
VALUES
  (10, 'Wuhan', 'Beijing', '10:00'),
  (20, 'Hangzhou', 'Shanghai', '12:00'),
  (21, 'Hangzhou', 'Shanghai', '12:30'),
  (22, 'Hangzhou', 'Shanghai', '13:00'),
  (30, 'Beijing', 'Tokyo', '23:59')
Out[ ]:
In [ ]:
%%bigquery
SELECT
  *
FROM
  dataset.buses
ORDER BY
  id
Out[ ]:
id origin destination time
0 10 Wuhan Beijing 10:00
1 20 Hangzhou Shanghai 12:00
2 21 Hangzhou Shanghai 12:30
3 22 Hangzhou Shanghai 13:00
4 30 Beijing Tokyo 23:59
In [ ]:
%%bigquery
CREATE TABLE
  dataset.passengers ( id INT64,
    origin STRING,
    destination STRING,
    time STRING )
Out[ ]:
In [ ]:
%%bigquery
INSERT
  dataset.passengers (id,
    origin,
    destination,
    time)
VALUES
  (1, 'Beijing', 'Tokyo', '23:59'),
  (2, 'Hangzhou', 'Tokyo', '09:00'),
  (10, 'Wuhan', 'Tokyo', '00:30'),
  (11, 'Hangzhou', 'Shanghai', '13:30'),
  (40, 'Hangzhou', 'Shanghai', '12:00'),
  (41, 'Hangzhou', 'Shanghai', '12:15'),
  (42, 'Hangzhou', 'Shanghai', '12:15'),
  (43, 'Hangzhou', 'Shanghai', '12:30'),
  (44, 'Hangzhou', 'Shanghai', '12:35')
Out[ ]:
In [ ]:
%%bigquery
SELECT
  *
FROM
  dataset.passengers
ORDER BY
  id
Out[ ]:
id origin destination time
0 1 Beijing Tokyo 23:59
1 2 Hangzhou Tokyo 09:00
2 10 Wuhan Tokyo 00:30
3 11 Hangzhou Shanghai 13:30
4 40 Hangzhou Shanghai 12:00
5 41 Hangzhou Shanghai 12:15
6 42 Hangzhou Shanghai 12:15
7 43 Hangzhou Shanghai 12:30
8 44 Hangzhou Shanghai 12:35
In [ ]:
%%bigquery
WITH
  buses_ranked AS (
  SELECT
    id,
    origin,
    destination,
    time,
    RANK() OVER (PARTITION BY origin, destination ORDER BY time) AS rank
  FROM
    dataset.buses)
SELECT
  bus_latter.*,
  bus_former.time AS time_former,
  passengers.id AS passengers_id,
  passengers.time AS passengers_time
FROM
  buses_ranked AS bus_latter
LEFT OUTER JOIN
  buses_ranked AS bus_former
ON
  bus_latter.origin = bus_former.origin
  AND bus_latter.destination = bus_former.destination
  AND bus_latter.rank = bus_former.rank + 1
LEFT OUTER JOIN
  dataset.passengers AS passengers
ON
  bus_latter.origin = passengers.origin
  AND bus_latter.destination = passengers.destination
  AND (bus_former.time IS NULL
    OR bus_former.time < passengers.time)
  AND passengers.time <= bus_latter.time
ORDER BY
  id,
  passengers_id
In [ ]:
%%bigquery
WITH
  buses_ranked AS (
  SELECT
    id,
    origin,
    destination,
    time,
    RANK() OVER (PARTITION BY origin, destination ORDER BY time) AS rank
  FROM
    dataset.buses)
SELECT
  bus_latter.id,
  COUNT(passengers.id) AS passengers_count
FROM
  buses_ranked AS bus_latter
LEFT OUTER JOIN
  buses_ranked AS bus_former
ON
  bus_latter.origin = bus_former.origin
  AND bus_latter.destination = bus_former.destination
  AND bus_latter.rank = bus_former.rank + 1
LEFT OUTER JOIN
  dataset.passengers AS passengers
ON
  bus_latter.origin = passengers.origin
  AND bus_latter.destination = passengers.destination
  AND (bus_former.time IS NULL
    OR bus_former.time < passengers.time)
  AND passengers.time <= bus_latter.time
GROUP BY
  bus_latter.id
ORDER BY
  id
Out[ ]:
id passengers_count
0 10 0
1 20 1
2 21 3
3 22 1
4 30 1
In [ ]:
 

Comments

2020-06-22