[인프런 워밍업 클럽 4기 - CS] - 3주차 미션 (자료구조와 알고리즘 심화) - 강동훈님의 블로그 - 인프런

문제

여러분은 문서 압축 프로그램을 개발해야 합니다.

'허프만 코딩'이라는 압축 알고리즘을 사용하여,

아래와 같은 결과가 나오도록 구현해보세요.

(필요하다면 사용된 자료구조를 수정해도 좋습니다.)

const huffmanCoding = new HuffmanCoding();
const str = "Lorem ipsum dolor sit amet consectetur adipiscing elit. Consectetur adipiscing elit quisque faucibus ex sapien vitae. Ex sapien vitae pellentesque sem placerat in id. Placerat in id cursus mi pretium tellus duis. Pretium tellus duis convallis tempus leo eu aenean."
let result  = huffmanCoding.compress(str)
console.log(result);

// 결과
[
  [ 'i', '000' ],      [ 'v', '001000' ],
  [ 'q', '001001' ],   [ 'd', '00101' ],
  [ 'l', '0011' ],     [ 'o', '01000' ],
  [ 'b', '0100100' ],  [ 'P', '0100101' ],
  [ 'L', '01001100' ], [ 'E', '01001101' ],
  [ 'C', '01001110' ], [ 'f', '01001111' ],
  [ 'a', '0101' ],     [ 'e', '011' ],
  [ 'm', '10000' ],    [ 'r', '10001' ],
  [ 't', '1001' ],     [ 'u', '1010' ],
  [ 'x', '1011000' ],  [ 'g', '1011001' ],
  [ '.', '101101' ],   [ 'p', '10111' ],
  [ ' ', '110' ],      [ 's', '1110' ],
  [ 'c', '11110' ],    [ 'n', '11111' ]
]

풀이

허프만 알고리즘이란 압축 알고리즘으로 문자의 출현 빈도를 통해 0과 1로 코드로 변환하여 데이터를 압축하는 알고리즘이다.

문자의 출현 빈도와 그에 따른 데이터 저장 자료구조가 필요하기 때문에 최소 힙을 통해 자료를 저장하고 우선순위 큐를 통해 출현 빈도에 따른 코드를 생성해낼 수 있다.

import { Heap } from "./heap.mjs";

class CharacterNode {
    constructor(char, frequency, left = null, right = null) {
        this.char = char;
        this.frequency = frequency;
        this.priority = frequency;
        this.left = left;
        this.right = right;
    }
}

class HuffmanCoding {
    constructor() {
        this.root = null;
        this.frequency = new Map();
    }

    mergingNodes() {
        const heap = new Heap();

        for (let char of this.frequency.keys()) {
            const node = new CharacterNode(char, this.frequency.get(char));
            heap.insert(node);
        }
        while (heap.getSize() > 1) {
            const left = heap.remove().getData();
            const right = heap.remove().getData();

            const merged = new CharacterNode(null, left.frequency + right.frequency, left, right);

            heap.insert(merged);
        }

        this.root = heap.remove().getData();

        return this.root;
    }

    generateCodes(node = this.root, code = "", result = new Map()) {
        if (node === null) return result;

        if (node.left === null && node.right === null) {
            result.set(node.char, code || "0");

            return result;
        }

        this.generateCodes(node.left, code + "0", result);
        this.generateCodes(node.right, code + "1", result);

        return result;
    }

    formatResult(result) {
        const formattedResult = [];
        for (let [char, code] of result.entries()) {
            formattedResult.push([char, code]);
        }

        return formattedResult;
    }

    compress(str) {
        for (let char of str) {
            this.frequency.set(char, (this.frequency.get(char) || 0) + 1);
        }

        this.mergingNodes();
        const result = this.generateCodes();

        return this.formatResult(result);
    }
}

const huffmanCoding = new HuffmanCoding();
const str =
    "Lorem ipsum dolor sit amet consectetur adipiscing elit. Consectetur adipiscing elit quisque faucibus ex sapien vitae. Ex sapien vitae pellentesque sem placerat in id. Placerat in id cursus mi pretium tellus duis. Pretium tellus duis convallis tempus leo eu aenean.";
let result = huffmanCoding.compress(str);
console.log(result);

HuffmanCoding의 가장 중심인 compress 메서드를 통해 매개변수로 받은 문자열을 압축하도록 코드를 작성하였다.

compress는 4가지 단계를 통해 문자열을 압축한다.

frequency 저장
1. 우선순위 큐를 활용하기 위해선 어떤 기준으로 우선순위를 할 것인지 정해야하며 허프만 알고리즘은 문자 빈도에 따라 압축 코드가 생성된다.
2. 매개변수로 받은 문자열 값을 반복문을 통해 각 문자를 순회하며 Map 의 key로 문자, value로 빈도 수를 저장해준다.
힙 노드 병합
1. mergingNodes 메서드가 힙 노드를 병합해주는 기능을 수행한다.
2. 문자 빈도가 저장된 frequency Map을 순회하며 우선순위에 따라 Heap에 저장해준다.
3. 허프만 알고리즘은 가장 빈도가 적은 두 문자를 병합한 노드를 생성하고 해당 노드의 자식으로 병합된 노드들을 이어준다.
  1. A {frequency: 1}, B {frequency: 2} 일 경우에는 Null { frequency: 3, left: A, right: B } 이런 노드를 만들어서 Heap에 삽입해준다.
  2. 해당 작업은 while문을 통해 구현되기에 Heap 클래스에서 전체 힙의 사이즈를 알 수 있는 Heap.getSize() 메서드를 만들어주었다.
  3. return 1 + this.getSize(node.getLeftSubTree()) + this.getSize(node.getRightSubTree());
4. 모든 노드를 병합하여 허프만 알고리즘을 위한 트리 구조를 만들고 해당 트리의 root를 반환시켜준다.
코드 생성
1. 반환된 트리 구조의 root부터 다시 순회를 해가면서 코드를 생성해주는 기능이다.
2. 해당 트리 구조의 노드는 '문자 노드'와 '병합 노드'로 분류되며 병합 노드는 우선순위가 문자 노드보다 높기 때문에 모든 문자 노드는 리프노드가 된다.
3. 코드와 문자를 저장할 result Map을 선언하고, 허프만 알고리즘에 따라 왼쪽 간선으로 이동하면 0을 오른쪽 간선으로 이동하면 1을 코드에 추가하며 재귀적으로 각 노드를 순회한다.
4. 마지막 리프노드(문자노드)에 다다르게 되면 해당 코드를 result에 저장하게 된다.
출력 포매팅
1. 그렇게 출력된 result는 현재 Map 자료구조에 저장되어 있기 때문에 원하는 출력값으로 포매팅을 진행해주었다.
2. 출력값을 빈 배열로 선언해주고 Map을 순회하면서 key와 value를 하나의 배열에 저장하여 출력 배열에 다시 push해주었다.

후기

기존에 배웠던 자료구조를 통해 직접 새로운 알고리즘을 공부하고 실제 구현까지 해볼 수 있는 뜻깊은 경험이었다. 분명 알고리즘을 구현하는 과정까지는 쉽지 않았지만 기존에 배웠던 자료구조를 한 번 더 복습할 수 있었고 세부적인 동작과정까지 한 번 천천히 돌아볼 수 있었던 계기가 되었다.

역시 이론을 배우고 연습하는 것보다 뭔가 만들기 위해서 활용하고 그 과정에서 여러 시행착오들을 겪는 경험들이 해당 이론과 개념을 더 깊이 있게 이해하고 나아가 활용할 수 있는 기회를 다시 만들어낸다는 것을 다시 한 번 깨달을 수 있었다.