2020. 4. 7. 02:28ㆍSQL
Reported Posts II
Table: Actions
Column Name | Type |
user_id | int |
post_id | int |
action_date | date |
action | enum |
extra | varchar |
There is no primary key for this table, it may have duplicate rows. The action column is an ENUM type of ('view', 'like', 'reaction', 'comment', 'report', 'share'). The extra column has optional information about the action such as a reason for report or a type of reaction.
Table: Removals
Column Name | Type |
post_id | int |
remove_date | date |
post_id is the primary key of this table. Each row in this table indicates that some post was removed as a result of being reported or as a result of an admin review.
Write an SQL query to find the average for daily percentage of posts that got removed after being reported as spam, rounded to 2 decimal places.
The query result format is in the following example:
Actions table:
user_id | post_id | action_date | action | extra |
1 | 1 | 2019-07-01 | view | None |
1 | 1 | 2019-07-01 | like | None |
1 | 1 | 2019-07-01 | share | None |
2 | 2 | 2019-07-04 | view | None |
2 | 2 | 2019-07-04 | report | spam |
3 | 4 | 2019-07-04 | view | None |
3 | 4 | 2019-07-04 | report | spam |
4 | 3 | 2019-07-02 | view | None |
4 | 3 | 2019-07-02 | report | spam |
5 | 2 | 2019-07-03 | view | None |
5 | 2 | 2019-07-03 | report | racism |
5 | 5 | 2019-07-03 | view | None |
5 | 5 | 2019-07-03 | report | racism |
Removals table:
post_id | remove_date |
2 | 2019-07-20 |
3 | 2019-07-18 |
Result table:
| average_daily_percent |
| 75.00 |
The percentage for 2019-07-04 is 50% because only one post of two spam reported posts was removed. The percentage for 2019-07-02 is 100% because one post was reported as spam and it was removed. The other days had no spam reports so the average is (50 + 100) / 2 = 75% Note that the output is only one number and that we do not care about the remove dates.
ANALYSIS:
The question asks to get the avg of removal rate. The rate is calculated by the number of removals divided the total number of posts that are reported as spam (WHERE).
That is, I need to get the total number of posts by COUNT(DISTINCT post_id) from actions. And, get the total number of removes by COUNT(DISTINCT post_id) from removals.
One critical thing that I need to calculate them together rather than separately. Otherwise, I cannot join them. In other words, I need to join the tables together and start from there. Plus, I need to perform LEFT JOIN instead of JOIN. Otherwise, I will end up losing data from actions, which I need for the total number and deletes.
The joined table looks like this:
["post_id", "user_id", "action_date", "action", "extra", "remove_date"]
[2, 2, "2019-07-04", "view", null, "2019-07-20"],
[2, 2, "2019-07-04", "report", "spam", "2019-07-20"],
[2, 5, "2019-07-03", "view", null, "2019-07-20"],
[2, 5, "2019-07-03", "report", "racism", "2019-07-20"],
[3, 4, "2019-07-02", "view", null, "2019-07-18"],
[3, 4, "2019-07-02", "report", "spam", "2019-07-18"],
[1, 1, "2019-07-01", "view", null, null],
[1, 1, "2019-07-01", "like", null, null],
[1, 1, "2019-07-01", "share", null, null],
[4, 3, "2019-07-04", "view", null, null],
[4, 3, "2019-07-04", "report", "spam", null],
[5, 5, "2019-07-03", "view", null, null],
[5, 5, "2019-07-03", "report", "racism", null]
Then, the rest is easy! simply count the post_id meet the condition: action = report and extra = spam to get the total potential removals and the removed post_id. This is grouped by the action_date.
Lastly, get the average removal rates and round it to 2 digits.
SELECT
ROUND(AVG(del*100/total),2) as average_daily_percent
FROM
(SELECT
action_date,
COUNT(DISTINCT a.post_id) as total,
COUNT(DISTINCT r.post_id) as del
FROM
Actions a LEFT JOIN Removals r USING (post_id)
WHERE
action = 'report' and extra = 'spam'
GROUP BY
action_date) temp
'SQL' 카테고리의 다른 글
SQL - User Activity for the Past 30 Days II (0) | 2020.04.07 |
---|---|
SQL - Department Highest Salary (0) | 2020.04.07 |
SQL- nth highest salary (0) | 2020.04.07 |
SQL - Second Degree Follower (0) | 2020.04.07 |
SQL - Department Top Three Salaries (0) | 2020.04.07 |