📚파이썬 머신러닝 판다스 데이터분석 part5. 데이터 사전 처리 1. 누락 데이터 처리
📄 part5 1.누락 데이터 처리.ipynb
🍒 데이터 불러오기
import seaborn as sns
df=sns.load_dataset('titanic')
df
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 sex 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
🍒 deck 열의 NaN 개수 확인
nan_deck=df['deck'].value_counts(dropna=False)
nan_deck
NaN 688
C 59
B 47
D 33
E 32
A 15
F 13
G 4
Name: deck, dtype: int64