LeetCode Solution, Medium, 192. Word Frequency

192. Word Frequency

題目敘述

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

  • words.txt contains only lowercase characters and space ' ' characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.

Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

Note:

  • Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
  • Could you write it in one-line using Unix pipes?

題目翻譯

題目簡單的需求就像是第一行說的,從 words.txt 這檔案中計算出每個單字出現的頻率。其中在 words.txt 的每個單字都只會是小寫字母,並用 ' ' 一到兩個空白字元來區隔。

解法解析

範例

Bash

Runtime

# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'
  1. cat 讀出所有的內容
  2. tr 來替換空白字元為\n 換行字元,另外搭配 -s 將多個連續的空白字元換成一個(因為題目有說到會是用一到兩個空白字元來區隔,如果沒有搭配 -s 會變成 \n\n)
  3. sort 按照開頭字母排序
  4. uniq 來處理重複的單字,搭配 -c 可以計算重複出現的字數。會形成
    1 day
    2 is
    2 sunny
    4 the
    
  5. 再次使用 sort,這次是多使用了 -r 是為了讓他排序反轉,變成出現最多的在最上面
  6. 最後用 awk 將單字和出現次數的位置互換

Memory

# Read from the file words.txt and output the word frequency list to stdout.
grep -o -E '\w+' words.txt | sort | uniq -c | sort -r | sed -r 's/\s+([0-9]+) ([a-z]+)/\2 \1/'
  1. 使用 grep 搭配 -o(只輸出符合條件的行數) 和 -E(使用 Regex 做篩選),可以看到用的 Regex 是 \w+ 判斷是不是字母,所以會略過空白字元
  2. 後面的操作 sort | uniq -c | sort -r 跟 Runtime 是一樣的操作
  3. 最後則是使用了 sed 來做位置的互換。可以拆兩個部分來看

    1. +([0-9]+)([a-z]+)
    2. \2\1

    符合第一個條件 +([0-9]+) 換到第二個位置,反之亦然

Did you find this article valuable?

Support 攻城獅 by becoming a sponsor. Any amount is appreciated!