What emojis are most popular?
This week I added some emojis for College Confidential members to use when reacting to other people’s posts. To help me decide which emojis to add, I wrote a little query to see which emojis are most commonly used in our Discourse posts:
-- [params]
-- text :emoji_regex = :[a-z]+:|[^A-z]-\)
select emoji,
count(*)
from (SELECT substring(raw from :emoji_regex) emoji
from posts
TABLESAMPLE system(1)
where length(substring(raw from :emoji_regex)) < 50
and user_id > 0
) emoji_sample
group by emoji
order by count(*) desc;
I’m kinda proud of the query. Let’s break it down line-by-line:
-- [params]
This line is interpreted as a comment by PostgreSQL because it starts
with --
. Everything after the dashes is ignored by the SQL
interpreter. But this particular comment is interpreted by the
Discourse Data
Explorer
as the start of a list of parameters to the query. These are variables
that can be filled in at run time.
-- text :emoji_regex = :[a-z]+:|[^A-z]-\)
I’m only using one variable—:emoji_regex
. It’s just the regex
I’m using to detect emojis in posts. I’m making it a parameter because
I need to use it twice and I didn’t want to have to edit it in two
places. The regex itself can be divided into two parts separated by
|
, which works like an or
operator. The first part (:[a-z]+:
)
finds emojis like “:smiley:”. The second ([^A-z]-\)
) finds a subset
of emoticons such
as “:-)”. It’s far from perfect (“{-)” is probably a false positive),
but it served my purpose.
select emoji,
count(*)
When I write SQL, the select clause is often one of the last things I
write. Often the first cut uses count(*)
to verify that the query is
producing some sort of result. Later I frequently select *
so that I
can get an idea of what the columns look like. In this case, I’m
selecting an emoji and the count of rows that contain that emoji. It’s
possible to infer that there will be a group by emoji
later on.
from ( ...
Normally I’d select from a table, but in this case, I’m selecting from a subquery. It’s a useful technique because you can pretend that the subquery is just like a regular table, but you can set up the columns any way you need.
SELECT substring(raw from :emoji_regex) emoji
The capital SELECT
is pre-populated when creating a new query in
Discourse Data Explorer. I typically use lowercase in my queries, so
this is an indicator of where I started working. I’m selecting the
first emoji from the raw
post column using the regex from
earlier. If a post has several emoji, this query will only select the
first one.
from posts
posts
is the only real table I use in this query. Typically a query
will join
two or more tables. It just happens that this time I only
need the raw
column from posts
.
TABLESAMPLE system(1)
Scanning all posts
will cause the query to time out, so I’m just
sampling 1% of the data. Initially I used limit 1000
in the
subquery. But I wanted a random sample, so I used TABLESAMPLE
in
the from
clause.
where length(substring(raw from :emoji_regex)) < 50
I only want to get posts that have at least one emoji. I added < 50
to eliminate cases where two colons were used as regular
punctuation. I could have built this into the regex instead, I
suppose.
and user_id > 0
Before I added this restriction, I got a lot of “:gift:” emojis coming
from an automated response. Automation frequently screws up data. By
convention automated accounts have negative user_id
s, so it’s easy
to filter them out.
) emoji_sample
This ends the subquery and gives it a name. The name doesn’t matter this time, but it’s required and it can be handy if you are joining another table.
group by emoji
Here’s the foreshadowed group by
clause.
order by count(*) desc;
Finally, it’s always a good idea to order the results. Often when
doing a count(*)
, that will be the order your care about. Often
you’ll be most interested in the most frequent results, so adding
desc
is really common.
Since you’ve read this far, here are the most common emojis I found in my sample:
emoji | count |
---|---|
:-) | 602 |
:rolleyes: | 188 |
;-) | 177 |
:eek: | 137 |
:cool: | 100 |
:smile: | 83 |
:confused: | 62 |
:smiley: | 33 |
:wink: | 31 |
:mad: | 26 |
:frowning: | 18 |
:joy: | 17 |
:grinning: | 17 |
@-) | 15 |
:rofl: | 13 |
:grin: | 12 |
:blush: | 10 |
8-) | 6 |
:sob: | 6 |
/-) | 5 |
:disappointed: | 5 |
:cry: | 5 |
:shrug: | 5 |
–) | 4 |
.-) | 4 |
:grimacing: | 4 |
:sigh: | 4 |
:laughing: | 4 |
:relaxed: | 3 |
:thinking: | 3 |
:skull: | 3 |
=-) | 3 |
:lol: | 3 |
:heart: | 2 |
:https: | 2 |
:v: | 2 |
-) | 2 |
:embarrass: | 2 |
:herb: | 2 |
:triumph: | 2 |
:sunglasses: | 2 |
:pensive: | 2 |
:tada: | 2 |
:pray: | 2 |
:p: | 1 |
:grade: | 1 |
:rollseyes: | 1 |
:popcorn: | 1 |
:gift: | 1 |
:mosquito: | 1 |
:women: | 1 |
:expressionless: | 1 |
:confounded: | 1 |
:golf: | 1 |
:wave: | 1 |
>-) | 1 |
:hugs: | 1 |
:weary: | 1 |
:eyes: | 1 |
:sparkles: | 1 |
’-) | 1 |
1-) | 1 |
:refresh: | 1 |
!-) | 1 |
3-) | 1 |
:blossom: | 1 |
:crown: | 1 |
:shrugs: | 1 |
:o: | 1 |
6-) | 1 |
{-) | 1 |
:shruggs: | 1 |
:heartpulse: | 1 |
:worried: | 1 |
4-) | 1 |
:anguished: | 1 |
:fireworks: | 1 |
:smirk: | 1 |
:nonexistence: | 1 |
:neutral: | 1 |
9-) | 1 |
:grumble: | 1 |
:con: | 1 |
:sweat: | 1 |