선형 데이터 구조와 탐색

서론

선형 데이터 구조

자료의 순서를 유일하게 결정할 수 이쓴 구조를 말 한다.
수학적으로 n 번째 자료를 탐색한 다음 n + 1 번째로 탐색할 자료가 유일한 구조다.

시각적으로는 위와 같다.
왼쪽이 n 다음 n + 1 이 유일하므로 선형구조이고,
오른쪽이 n 다음 n + 1 이 복수이므로 비선형구조이다.

형태에서 느껴지듯 주로 리스트와 배열이 이에 해당하고,

조금 더 확장하면 위와 같이 다차원의 배열과 리스트도 선형구조에 포함된다.

본론

선형 데이터의 탐색

이러한 선형구조의 데이터를 탐색하는 방법에는 크게 두 가지 방식이 있다.

순차탐색 (Linear Search)

첫번째 자료부터 마지막 자료까지 순차적으로 탐색하는 방식으로,
최종적으로 자료의 수 만큼 탐색하게 된다.

따라서 계산량은 O(n) 이 된다.

Python 코드로는 다음과 같다.

for index, value in enumerate(list):
    if value == target:
        return index #or value

이분탐색 (Binary Search)

만약 리스트가 오름차순이건 내림차순이건 정렬이 돼 있다면,
이분탐색을 사용해 조금 더 효율적으로 탐색을 진행할 수 있다.
동작 순서는 다음과 같다.

정렬 시킨다.
데이터의 중간 값을 비교한다.
찾으려는 값과 동일하다면 종료한다.
찾으려는 값보다 작다면 이하의 범위를 버린다.
찾으려는 값보다 크다면 이상의 범위를 버린다.
수정된 범위를 대상으로 반복한다.

이해를 위한 동작 예시는 다음과 같다.

위와 같은 배열에서 5를 찾는다고 가정하자

우선은 오름차순이건 내림차순이건 정렬을 진행한다.

리스트의 중앙을 찾아 해당 값을 먼저 비교한다.
해당 과정은 수학적으로 (head + tail) // 2 로 표현할 수 있다.

중앙의 값이 찾으려는 값이 아니며 작은 값이므로 그 이전의 범위는 탐색 할 필요가 없다.
따라서 탐색의 범위를 중앙의 값 이후로 변경한다.
해당 과정은 수학적으로 list[m] < target 으로 표현할 수 있다.
반대의 경우 list[m] > target 이 된다.

변경한 범위의 중앙을 찾아 해당 값을 비교한다.

해당 자료는 우리가 찾으려는 5와 같으므로 탐색을 종료한다.
해당 과정은 수학적으로 list[m] == target 으로 표현할 수 있다.

이분탐색의 계산량은 O(log₂ n) 이 된다.

이를 위한 Python 코드는 다음과 같다.

while head < tail:
    mid = (head + tail) // 2
    if target == list[mid]:
        return mid #or list[mid]
    elif target < list[mid]:
        tail = mid - 1
    else target > list[mid]:
        head = mid + 1

head는 데이터의 처음, tail은 데이터의 끝 index를 의미한다.
중간의 값이 작다면 tail을 중간의 index를 기준으로 1 줄여 범위를 수정한다.
중간의 값이 크다면 head를 중간의 index를 기준으로 1 키워 범위를 수정한다.

Upper bound & Lower bound (상계, 하계)

순차탐색도 이분탐색도 충분히 좋지만
이 둘은 target이 데이터에 존재하지 않는다면 반환하지 못한다는 한계를 갖고 있다.
하지만 Lower bound와 Upper bound는
각각 target 보다 큰 가장 작은 값, target 보다 작은 가장 큰 값을 반환할 수 있다는 장점이 있다.

Upper bound

기본적으로 이분탐색과 같은 방식을 사용한다.
target이 5라고 가정했을 때 수정된 범위의 중간 값과 비교를 진행한다.
이분탐색이라면 target과 일치하므로 탐색을 종료하지만 Upper bound는 조금 더 나아간다.

일치하는 중간 값 까지의 범위를 제외하고 탐색 범위를 수정한다.
해당 과정은 수학적으로 list[mid] <= target 으로 표현할 수 있다.

수정된 범위의 중간값을 비교한다.

중간 값이 target 보다 크므로 해당 값 까지의 범위를 제외하고 탐색 범위를 수정한다.

수정된 범위의 중간 값을 비교한다.
해당 값이 target과 동일하다.
또한 범위에 해당하는 head와 tail이 동일하다.
따라서 이번에 일치한 값이 target 보다 작거나 같은 값이 등장하는 최대 index 라고 판단할 수 있다.
즉 Upper bound는 5보다 큰 값이 처음 등장할 index인 10이 된다.

이를 위한 Python 코드는 아래와 같다.

while head < tail:
    mid = (head + tail) // 2
    if target < list[mid]:
        tail = mid
    else:
        head = mid + 1
return head

Lower bound

동작 방식은 Upper bound와 동일하며,
조건만 살짝 수정된다.

이분탐색의 방법으로 동일하게 진행한다.
수정된 범위의 중간 값을 비교한다.

해당 자료가 target과 같지만 target 이상의 값이 처음 등장하는 index라는 보장이 존재하지 않는다.
따라서 그 이상의 자료를 제외하고 범위를 수정한다.
해당 과정은 list[mid] <= target 으로 표현할 수 있다.

수정된 범위의 중간 값을 비교한다.

중간 값이 target보다 작으므로 이하의 자료를 제외하도록 범위를 수정한다.
수정된 범위 내에 탐색할 자료가 유일하고, 해당 자료는 target과 같거나 크다.
해당 과정은
따라서 해당 index가 target 이상의 값이 존재하는 최소 index인 Lower bound 이다.

이를 위한 Python 코드는 아래와 같다.

while head < tail:
    mid = (tail + head) // 2
    if target <= list[mid]:
        tail = mid
    else:
        head = mid + 1
return head

Lower bound와 Upper bound는 선형 구조의 데이터 탐색시
대상 값이 존재하지 않더라도 탐색이 실패하지 않는다는 장점과,
데이터 내의 값이 여러개가 존재하더라도 유일한 답을 반환한다는 장점이 존재한다.

이는 이분 탐색을 진행할 시
A에서 5를 탐색한 후 나온 결과가 B에서 탐색한 결과가 각각 9와 8로 다른 것과 다르게
Lower bound나 Upper bound로 탐색을 진행할 시
항상 8과 10을 반환 한다는 것을 쉽게 확인 할 수 있다.

또한 Upper bound에서 Lower bound를 빼는 간단한 연산으로,
해당 선형 데이터에서 찾으려고자 하는 데이터가 몇 개나 존재하는지 간단하게 알아낼 수 있다.
실제 A에서 탐색한 5의 Upper bound와 Lower bound를 빼게 되면
'10 - 8 = 2'가 되므로 존재하는 5의 수와 동일하다.

bisect 함수

이렇게 유용한 Lower bound와 Upper bound 이지만 항상 머리에 집어 넣고 같은 코딩을 하는 것은 수고 스러운 일이다.
따라서 이런 기본적인 알고리즘은 기본 라이브러리로 제공되는 경우가 많고, 이는 Python도 마찬가지이다.

bisect — Array bisection algorithm — Python 3.10.4 documentation

bisect — Array bisection algorithm Source code: Lib/bisect.py This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can

docs.python.org

Lower bound와 Upper bound는 bisect.bisect_left와 bisect.bisect_right 라는 이름으로.
알고리즘의 구현 없이 바로 사용할 수 있다.

결론

이 날 이를 사용한 문제 세 문제를 다뤘다.

Q3

Q3_순차탐색

import datetime as time

#Q3

#generate list
num_list = []
amount = int(input("input amount of values bwtween 2 to 1,000,000: "))

#input values
inputVals = input("input {} values must under 100,000,000 with space: ".format(amount))
inputVals_slice = inputVals.split()

#convert input string to int
for value in inputVals_slice:
  num_list.append(int(value))

#list sorting
num_list.sort()

#input target value
target_num = int(input("input number for search: "))

def Q3():
  result = None
  #process
  for index, target in enumerate(num_list):
    if target == target_num:
      result = index
      print(result + 1)
      break

  if result is None:
    print(-1)

#verification answer
start = time.datetime.now()
Q3()
end = time.datetime.now()

print(end - start)

결과

input amount of values bwtween 2 to 1,000,000: 3
input 3 values must under 100,000,000 with space: 2 5 7
input number for search: 3
-1
0:00:00.000913

Q3_이분탐색

import datetime as time
import bisect

#Q3_re
num_list = []

def binarySearch(x, list):
  head, tail = 0, len(list)
  result = -1

  while head < tail:
    mid = (tail + head)//2
    if x == list[mid]:
      result = mid
      break
    elif x < list[mid]:
      tail = mid
    else:
      head = mid + 1
  
  return result

#generate list
amount = int(input("input value for maximum amount: "))

#input values
inputVals = input("input {} values must under 100,000,000 with space: ".format(amount))
inputVals_slice = inputVals.split(" ")

#convert input string to int
for value in inputVals_slice:
  num_list.append(int(value))

#list sorting
num_list.sort()

#input target value
target_num = int(input("input number for search: "))

start = time.datetime.now()
print(binarySearch(target_num, num_list))
end = time.datetime.now()
print(end - start)

결과

input value for maximum amount: 3
input 3 values must under 100,000,000 with space: 2 5 7
input number for search: 3
-1
0:00:00.000527

기기 상태에 따라 속도가 달라지긴 하지만
최대 탐색을 요구하는 존재하지 않는 값의 탐색을 진행할 경우
이분탐색이 순차탐색보다 소폭 빠른 것을 확인할 수 있었다.

Q4

Q4_선형탐색

import datetime as time

#Q4
#generate list
num_list = []
amount = int(input("input amount of values bwtween 2 to 1,000,000: "))

#input values
inputVals = input("input {} values must under 100,000,000 with space: ".format(amount))
inputVals_slice = inputVals.split(" ")

#convert input string to int
for value in inputVals_slice:
  num_list.append(int(value))

#list sorting
num_list.sort()

#input target value
target_num = int(input("input number for search: "))

def Q4():
  result = None
  #process
  for index, target in enumerate(num_list):
    if target >= target_num:
      result = index
      print(result + 1)
      break

  if result is None:
    print(amount + 1)

#verification answer
start = time.datetime.now()
Q4()
end = time.datetime.now()
print(end - start)

결과

input amount of values bwtween 2 to 1,000,000: 8
input 8 values must under 100,000,000 with space: 1 2 3 5 7 9 11 15
input number for search: 19
9
0:00:00.000648

Q4_이분탐색

import datetime as time
import bisect

#Q4_re

#generate list
num_list = []
amount = int(input("input amount of values bwtween 2 to 1,000,000: "))

#input values
inputVals = input("input {} values must under 100,000,000 with space: ".format(amount))
inputVals_slice = inputVals.split(" ")

#convert input string to int
for value in inputVals_slice:
  num_list.append(int(value))

#list sorting
num_list.sort()

#input target value
target_num = int(input("input number for search: "))

#process
def lowerBound(x, list):
  head, tail = 0, len(list)

  while head < tail:
    mid = (tail + head)//2
    if x <= list[mid]:
      tail = mid
    else:
      head = mid + 1

  if head == 0:
    return len(list) + 1
  else:
    return head + 1

#verification answer

start = time.datetime.now()
print(lowerBound(target_num, num_list))
end = time.datetime.now()
print(end - start)

결과

input amount of values bwtween 2 to 1,000,000: 8
input 8 values must under 100,000,000 with space: 1 2 3 5 7 9 11 15
input number for search: 19
9
0:00:00.000138

Q5

Q5_순차탐색

import datetime as time

#Q5
#generate list
num_list = []
amount = int(input("input amount of values bwtween 2 to 1,000,000: "))

#input values
inputVals = input("input {} values must under 100,000,000 with space: ".format(amount))
inputVals_slice = inputVals.split(" ")

#convert input string to int
for value in inputVals_slice:
  num_list.append(int(value))

#list sorting
num_list.sort()

#input target value
target_num = int(input("input number for search: "))

#process
def ans():
  result = None
  for index, target in enumerate(num_list):
    if target > target_num:
      result = index
      break

  if result is None:
    print(amount + 1)
  else:
    if result > 0:
      print(result)
    else:
      print(result + 1)

start = time.datetime.now()
ans()
end = time.datetime.now()
print(end - start)

결과

input amount of values bwtween 2 to 1,000,000: 8
input 8 values must under 100,000,000 with space: 1 2 7 7 7 7 11 15
input number for search: 7
7
0:00:00.000113

Q5_이분탐색

import datetime as time
#Q5_re

#generate list
num_list = []
amount = int(input("input amount of values bwtween 2 to 1,000,000: "))

#input values
inputVals = input("input {} values must under 100,000,000 with space: ".format(amount))
inputVals_slice = inputVals.split(" ")

#convert input string to int
for value in inputVals_slice:
  num_list.append(int(value))

#list sorting
num_list.sort()

#input target value
target_num = int(input("input number for search: "))

#process

def upperBound(x, list):
  head, tail = 0, len(list)

  while head < tail:
    mid = (tail + head)//2
    if x < list[mid]:
      tail = mid
    else:
      head = mid + 1

  return head + 1

start = time.datetime.now()
print(upperBound(target_num, num_list))
end = time.datetime.now()
print(end - start)

결과

input amount of values bwtween 2 to 1,000,000: 8
input 8 values must under 100,000,000 with space: 1 2 7 7 7 7 11 15
input number for search: 7
7
0:00:00.000151

자료의 수가 적은 상태에선 대체로 이분탐색이 순차탐색 보다 느린 모습을 보여준다.
하지만 자료가 늘어나거나 탐색이 최대로 진행되는 상황에서는
이분탐색이 눈에 띄게 빨라지거나 동등해 지는 것을 확인 할 수 있다.

저작자표시 비영리 변경금지 (새창열림)

'학습 노트 > 알고리즘 (Python)' 카테고리의 다른 글

Stack #2 (0)	2022.06.23
Stack #1 (0)	2022.06.04
Q1. 방 배정 하기 (0)	2022.04.22
Python #3 (0)	2022.04.18
Python #2 (0)	2022.04.15

걔랑계란's Chillog

선형 데이터 구조와 탐색

서론