Define business requirements first
SELECT fields instead of using SELECT *
Avoid SELECT DISTINCT
Create joins with INNER JOIN (not WHERE) Some SQL developers prefer to make joins with WHERE clauses. This type of join creates a Cartesian Join, also called a Cartesian Product or CROSS JOIN. Cartesian Joins are especially problematic in large-scale databases, because a Cartesian Join of two large tables could create billions or trillions of results. To prevent creating a Cartesian Join, use INNER JOIN instead.
Use WHERE instead of HAVING to define filters
Use wildcards at the end of a phrase only When searching plaintext data, such as cities or names, wildcards create the widest search possible. However, the widest search is also the most inefficient search. When a leading wildcard is used, especially in combination with an ending wildcard, the database is tasked with searching all records for a match anywhere within the selected field.
Use LIMIT to sample query results
Run your query during off-peak hours

There are also many other advices on Google. By understanding the mechanism of SQL Query, the efficiency of specific cases can be analyzed and compared.

在做最后一道关于 SQL 的题目的时候，其实剩下的时间蛮多的。可是我却没有写完。现在想想其实蛮简单的。不需要什么 PostgreSQL 的高级用法。以前做这种题太少了，经验不足。所以，没能把握好思考的方向。在之前备考的时候，我有略略浏览过 PostgreSQL 的文档。由于时间紧迫，对于文档里面的各种高级用法的应用场景，我理解得很不充分。机试的时候，我就在想文档里面的各种高级用法。想不到好的思路，人就慌了。还有就是，做到后面的时候，我感觉自己体力有点不支了。我感觉这次机试要死在 SQL 上。

In [ ]:

%load_ext google.cloud.bigquery

The google.cloud.bigquery extension is already loaded. To reload it, use:
  %reload_ext google.cloud.bigquery

In [ ]:

%%bigquery
CREATE TABLE
  dataset.buses ( id INT64,
    origin STRING,
    destination STRING,
    time STRING )

Out[ ]:

In [ ]:

%%bigquery
INSERT
  dataset.buses (id,
    origin,
    destination,
    time)
VALUES
  (10, 'Wuhan', 'Beijing', '10:00'),
  (20, 'Hangzhou', 'Shanghai', '12:00'),
  (21, 'Hangzhou', 'Shanghai', '12:30'),
  (22, 'Hangzhou', 'Shanghai', '13:00'),
  (30, 'Beijing', 'Tokyo', '23:59')

Out[ ]:

In [ ]:

%%bigquery
SELECT
  *
FROM
  dataset.buses
ORDER BY
  id

Out[ ]:

	id	origin	destination	time
0	10	Wuhan	Beijing	10:00
1	20	Hangzhou	Shanghai	12:00
2	21	Hangzhou	Shanghai	12:30
3	22	Hangzhou	Shanghai	13:00
4	30	Beijing	Tokyo	23:59

In [ ]:

%%bigquery
CREATE TABLE
  dataset.passengers ( id INT64,
    origin STRING,
    destination STRING,
    time STRING )

Out[ ]:

In [ ]:

%%bigquery
INSERT
  dataset.passengers (id,
    origin,
    destination,
    time)
VALUES
  (1, 'Beijing', 'Tokyo', '23:59'),
  (2, 'Hangzhou', 'Tokyo', '09:00'),
  (10, 'Wuhan', 'Tokyo', '00:30'),
  (11, 'Hangzhou', 'Shanghai', '13:30'),
  (40, 'Hangzhou', 'Shanghai', '12:00'),
  (41, 'Hangzhou', 'Shanghai', '12:15'),
  (42, 'Hangzhou', 'Shanghai', '12:15'),
  (43, 'Hangzhou', 'Shanghai', '12:30'),
  (44, 'Hangzhou', 'Shanghai', '12:35')

Out[ ]:

In [ ]:

%%bigquery
SELECT
  *
FROM
  dataset.passengers
ORDER BY
  id

Out[ ]:

	id	origin	destination	time
0	1	Beijing	Tokyo	23:59
1	2	Hangzhou	Tokyo	09:00
2	10	Wuhan	Tokyo	00:30
3	11	Hangzhou	Shanghai	13:30
4	40	Hangzhou	Shanghai	12:00
5	41	Hangzhou	Shanghai	12:15
6	42	Hangzhou	Shanghai	12:15
7	43	Hangzhou	Shanghai	12:30
8	44	Hangzhou	Shanghai	12:35

In [ ]:

%%bigquery
WITH
  buses_ranked AS (
  SELECT
    id,
    origin,
    destination,
    time,
    RANK() OVER (PARTITION BY origin, destination ORDER BY time) AS rank
  FROM
    dataset.buses)
SELECT
  bus_latter.*,
  bus_former.time AS time_former,
  passengers.id AS passengers_id,
  passengers.time AS passengers_time
FROM
  buses_ranked AS bus_latter
LEFT OUTER JOIN
  buses_ranked AS bus_former
ON
  bus_latter.origin = bus_former.origin
  AND bus_latter.destination = bus_former.destination
  AND bus_latter.rank = bus_former.rank + 1
LEFT OUTER JOIN
  dataset.passengers AS passengers
ON
  bus_latter.origin = passengers.origin
  AND bus_latter.destination = passengers.destination
  AND (bus_former.time IS NULL
    OR bus_former.time < passengers.time)
  AND passengers.time <= bus_latter.time
ORDER BY
  id,
  passengers_id

In [ ]:

%%bigquery
WITH
  buses_ranked AS (
  SELECT
    id,
    origin,
    destination,
    time,
    RANK() OVER (PARTITION BY origin, destination ORDER BY time) AS rank
  FROM
    dataset.buses)
SELECT
  bus_latter.id,
  COUNT(passengers.id) AS passengers_count
FROM
  buses_ranked AS bus_latter
LEFT OUTER JOIN
  buses_ranked AS bus_former
ON
  bus_latter.origin = bus_former.origin
  AND bus_latter.destination = bus_former.destination
  AND bus_latter.rank = bus_former.rank + 1
LEFT OUTER JOIN
  dataset.passengers AS passengers
ON
  bus_latter.origin = passengers.origin
  AND bus_latter.destination = passengers.destination
  AND (bus_former.time IS NULL
    OR bus_former.time < passengers.time)
  AND passengers.time <= bus_latter.time
GROUP BY
  bus_latter.id
ORDER BY
  id

Out[ ]:

	id	passengers_count
0	10	0
1	20	1
2	21	3
3	22	1
4	30	1

https://medium.com/analytics-vidhya/postgresql-integration-with-jupyter-notebook-deb97579a38d

In [ ]:

Try PostgreSQL

PostgreSQL is Turing Complete.¶

http://www.postgres.cn/v2/document ¶

I found an answer from Google. And I quite agree with that.¶

Comments

Try PostgreSQL

PostgreSQL is Turing Complete.¶

http://www.postgres.cn/v2/document¶

I found an answer from Google. And I quite agree with that.¶

Comments

http://www.postgres.cn/v2/document ¶