192. Word Frequency

題目敘述

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.

Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

Note:

Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
Could you write it in one-line using Unix pipes?

題目翻譯

題目簡單的需求就像是第一行說的，從 words.txt 這檔案中計算出每個單字出現的頻率。其中在 words.txt 的每個單字都只會是小寫字母，並用 ' ' 一到兩個空白字元來區隔。

解法解析

範例

Bash

Runtime

# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'

用 cat 讀出所有的內容
用 tr 來替換空白字元為\n 換行字元，另外搭配 -s 將多個連續的空白字元換成一個(因為題目有說到會是用一到兩個空白字元來區隔，如果沒有搭配 -s 會變成 \n\n)
用 sort 按照開頭字母排序
用 uniq 來處理重複的單字，搭配 -c 可以計算重複出現的字數。會形成
```
1 day
2 is
2 sunny
4 the
```
再次使用 sort，這次是多使用了 -r 是為了讓他排序反轉，變成出現最多的在最上面
最後用 awk 將單字和出現次數的位置互換

Memory

# Read from the file words.txt and output the word frequency list to stdout.
grep -o -E '\w+' words.txt | sort | uniq -c | sort -r | sed -r 's/\s+([0-9]+) ([a-z]+)/\2 \1/'

使用 grep 搭配 -o(只輸出符合條件的行數) 和 -E(使用 Regex 做篩選)，可以看到用的 Regex 是 \w+ 判斷是不是字母，所以會略過空白字元
後面的操作 sort | uniq -c | sort -r 跟 Runtime 是一樣的操作
最後則是使用了 sed 來做位置的互換。可以拆兩個部分來看
1. +([0-9]+) 和 ([a-z]+)
2. \2 和 \1
符合第一個條件 +([0-9]+) 換到第二個位置，反之亦然

Not a programmer 不工程的攻城獅

Not a programmer 不工程的攻城獅

LeetCode Solution, Medium, 192. Word Frequency

192. Word Frequency

題目敘述

題目翻譯

解法解析

範例

Bash

LeetCode Solution, Medium, 192. Word Frequency

192. Word Frequency

題目敘述

題目翻譯

解法解析

範例

Bash

Did you find this article valuable?