6단원. 이산형 확률분포¶

추측 통계의 목표: 한정된 표본으로부터 모집단의 평균, 분산을 추정

모수적 기법: 이 때 모집단이 어떠한 성질일 것이므로 이러한 형태를 지닌 확률분포일 것이다라는 가정을 하고, 확률분포의 기댓값과 분산을 결정하는 파라미터를 추측
비모수적 기법: 모집단의 확률분포에 어떠한 가정도 하지 않음

파라미터만 추측하면 되니까 추정이 간단하고 분석이 쉬운 모형을 만들 수 있다!

다양한 확률분포, 특히 이산형 확률분포에 대해 소개
각각의 확률분포를 어떠한 상황에서 사용하는지 설명

6.1. 베르누이 분포(Bernoulli distribution)¶

확률변수가 취할 수 있는 값이 0과 1밖에 없는 분포
1이 나오는 확률을 $p$ , 0이 나오는 확률을 $1-p$
파라미터: $p$

$f(x)= \begin{cases} p^x (1-p)^{(1-x)} & (x \in \{0, 1\}) \\ 0 & (otherwise) \end{cases}$

기댓값과 분산

$E(X)=p$

$V(X)=p(1-p)$

import numpy as np


def Bern(p):
    x_set = np.array([0, 1])
    
    def f(x):
        if x in x_set:
            return p ** x * (1 - p) ** (1 - x)
        else:
            return 0
    
    return x_set, f

p = 0.3
X = Bern(p)

print(X)

(array([0, 1]), <function Bern.<locals>.f at 0x000001CE2420B598>)

def E(X):
    x_set, f = X
    
    return np.sum([x_k * f(x_k) for x_k in x_set])


def V(X):
    x_set, f = X
    
    mean = E(X)
    
    return np.sum([(x_k - mean) ** 2 * f(x_k) for x_k in x_set])


def check_prob(X):
    x_set, f = X
    prob = np.array([f(x_k) for x_k in x_set])
    
    assert np.all(prob >= 0), "minus probability" 
    # assert는 뒤의 조건이 참이 아니면 AssertError를 발생시킴
    # numpy.all() 배열의 모든 데이터가 조건과 맞으면 참, 하나라도 다르면 거짓
    
    prob_sum = np.round(np.sum(prob), 2)
    # Computation error 때문에 반올림해줘야 함!
    assert prob_sum == 1, f"sum of probability {prob_sum}"
    # f-문자열이 가독성 좋음
    
    print(f"expected value {E(X): .4}")
    print(f"variance {V(X): .4}")

check_prob(X)

expected value  0.3
variance  0.21

import matplotlib.pyplot as plt
import matplotlib.font_manager as fm


fontPath = "./NanumGothic.ttf"
fontProp = fm.FontProperties(fname = fontPath)


def plot_prob(X):
    x_set, f = X
    prob = np.array([f(x_k) for x_k in x_set])
    
    fig = plt.figure(figsize = (15, 5))
    ax  = fig.add_subplot(111)
    
    ax.bar(x_set, prob, color = "0.5", label = "확률")
    ax.vlines(E(X), 0, 1, label = "평균")
    
    ax.set_xticks(np.append(x_set, E(X)))
    ax.set_ylim(0, 1)
    ax.set_xlabel("x")
    ax.set_ylabel("p(x)")
    ax.legend(prop = fontProp)
    ax.grid(True)
    
    plt.show()

plot_prob(X)

p = 0.5
X = Bern(p)

check_prob(X)
plot_prob(X)

expected value  0.5
variance  0.25

p = 0.7
X = Bern(p)

check_prob(X)
plot_prob(X)

expected value  0.7
variance  0.21

6.2. 이항 분포(Binomial distribution)¶

성공 확률이 $p$ 인 베르누이 시행을 $n$ 번 했을 때의 성공 횟수가 따르는 분포
파라미터: 성공 확률 $p$ , 시행 횟수 $n$

$f(x)= \begin{cases} {}_nC_x p^x (1-p)^{(n-x)} & (x \in \{0, 1, ..., n\}) \\ 0 & (otherwise) \end{cases}$

${}_nC_x = {n! \over x!(n-x)!}$

기댓값과 분산

$E(X) = np$

$V(X) = np(1-p)$

from scipy.special import comb


def Bin(n, p):
    x_set = np.arange(n + 1)
    
    def f(x):
        if x in x_set:
            return comb(n, x) * p ** x * (1 - p) ** (n - x)
        
        else:
            return 0
    
    return x_set, f

n = 10
p = 0.3
X = Bin(n, p)

print(X)

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]), <function Bin.<locals>.f at 0x000001CE25DCF400>)

check_prob(X)

expected value  3.0
variance  2.1

plot_prob(X)

n = 10
p = 0.5
X = Bin(n, p)

check_prob(X)

plot_prob(X)

expected value  5.0
variance  2.5

n = 10
p = 0.7
X = Bin(n, p)

check_prob(X)

plot_prob(X)

expected value  7.0
variance  2.1

6.3. 기하 분포(Geometric distribution)¶

베르누이 시행에서 처음 성공할 때까지 반복한 시행 횟수가 따르는 분포
파라미터: 성공 확률 $p$

$f(x)= \begin{cases} (1-p)^{(x-1)}p & (x \in \{1, 2, 3, ...\}) \\ 0 & (otherwise) \end{cases}$

기댓값과 분산

$E(X)={1 \over p}$

$V(X)={(1-p) \over p^2}$

def Ge(p):
    x_set = np.arange(1, 25)
    
    def f(x):
        if x in x_set:
            return (1 - p) ** (x - 1) * p
        
        else:
            return 0
    
    return x_set, f

p = 0.5
X = Ge(p)

print(X)

(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24]), <function Ge.<locals>.f at 0x000001CE25F747B8>)

check_prob(X)

expected value  2.0
variance  2.0

plot_prob(X)

p = 0.2
X = Ge(p)

check_prob(X)
plot_prob(X)

expected value  4.863
variance  17.17

6.4. 포아송 분포(Poisson distribution)¶

사건이 단위 시간당 발생하는 건수가 따르는 확률분포
파라미터: 평균 발생 건수 $\lambda$

$f(x)= \begin{cases} {\lambda^x \over x!} \cdot e^{-\lambda} & (x \in \{0, 1, 2, ...\}) \\ 0 & (otherwise) \end{cases}$

기댓값과 분산

$E(X)=\lambda$

$V(X)=\lambda$

from scipy.special import factorial


def Poi(lam):
    x_set = np.arange(20)
    
    def f(x):
        if x in x_set:
            return np.power(lam, x) / factorial(x) * np.exp(-lam)
        
        else:
            return 0
    
    return x_set, f

lam = 3
X = Poi(lam)

check_prob(X)

expected value  3.0
variance  3.0

plot_prob(X)

from scipy import stats


lams       = [3  , 5   , 8  ]
linestyles = ["-", "--", ":"]

fig = plt.figure(figsize = (10, 6))
ax = fig.add_subplot(111)

x_set = np.arange(20)

for lam, ls in zip(lams, linestyles):
    rv = stats.poisson(lam)
    
    ax.plot(x_set, rv.pmf(x_set), label = f"lam: {lam}", ls = ls, color = "0.25")

ax.set_xticks(x_set)
ax.set_xlabel("x")
ax.set_ylabel("p(x)")
ax.legend()

plt.show()

샘플링 전략 (Sampling Strategy) (2)	2023.01.30
9-2. 독립동일분포(표본평균의 분포) (0)	2020.10.06
9-1. 독립동일분포(독립성, 합의 분포) (0)	2020.09.15
1. 데이터 (0)	2020.07.29

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

데싸(Data Science) 노트

6. 이산형 확률분포

6단원. 이산형 확률분포¶

6.1. 베르누이 분포(Bernoulli distribution)¶

6.2. 이항 분포(Binomial distribution)¶

6.3. 기하 분포(Geometric distribution)¶

6.4. 포아송 분포(Poisson distribution)¶

'과학 > 통계' 카테고리의 다른 글

'과학/통계'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

6. 이산형 확률분포

6단원. 이산형 확률분포¶

6.1. 베르누이 분포(Bernoulli distribution)¶

6.2. 이항 분포(Binomial distribution)¶

6.3. 기하 분포(Geometric distribution)¶

6.4. 포아송 분포(Poisson distribution)¶

'과학 > 통계' 카테고리의 다른 글

'과학/통계'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역