# LeetCode Solution, Medium, 192. Word Frequency

# [192. Word Frequency](https://leetcode.com/problems/word-frequency/)

### 題目敘述

Write a bash script to calculate the frequency of each word in a text file `words.txt`.

For simplicity sake, you may assume:

- `words.txt` contains only lowercase characters and space `' '` characters.
- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.

**Example:**

Assume that `words.txt` has the following content:

    the day is sunny the the
    the sunny is is

Your script should output the following, sorted by descending frequency:

    the 4
    is 3
    sunny 2
    day 1

**Note:**

- Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.
- Could you write it in one-line using [Unix pipes](http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-4.html)?

#### 題目翻譯

題目簡單的需求就像是第一行說的，從 `words.txt` 這檔案中計算出每個單字出現的頻率。其中在 `words.txt` 的每個單字都只會是小寫字母，並用 `' '` 一到兩個空白字元來區隔。

### 解法解析

#### 範例

##### Bash

**Runtime**

```bash
# Read from the file words.txt and output the word frequency list to stdout.
cat words.txt | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'
```

1. 用 `cat` 讀出所有的內容
2. 用 `tr` 來替換空白字元為`\n` 換行字元，另外搭配 `-s` 將多個連續的空白字元換成一個(因為題目有說到會是用一到兩個空白字元來區隔，如果沒有搭配 `-s` 會變成 `\n\n`)
3. 用 `sort` 按照開頭字母排序
4. 用 `uniq` 來處理重複的單字，搭配 `-c` 可以計算重複出現的字數。會形成
   ```
   1 day
   2 is
   2 sunny
   4 the
   ```
5. 再次使用 `sort`，這次是多使用了 `-r` 是為了讓他排序反轉，變成出現最多的在最上面
6. 最後用 `awk` 將單字和出現次數的位置互換

**Memory**

```bash
# Read from the file words.txt and output the word frequency list to stdout.
grep -o -E '\w+' words.txt | sort | uniq -c | sort -r | sed -r 's/\s+([0-9]+) ([a-z]+)/\2 \1/'
```

1. 使用 `grep` 搭配 `-o`(只輸出符合條件的行數) 和 `-E`(使用 Regex 做篩選)，可以看到用的 Regex 是 `\w+` 判斷是不是字母，所以會略過空白字元
2. 後面的操作 `sort | uniq -c | sort -r` 跟 Runtime 是一樣的操作
3. 最後則是使用了 `sed` 來做位置的互換。可以拆兩個部分來看
  1. `+([0-9]+)` 和 `([a-z]+)`
  2. `\2` 和 `\1`  

  符合第一個條件 `+([0-9]+)` 換到第二個位置，反之亦然