📚파이썬 머신러닝 판다스 데이터분석 part5. 데이터 사전 처리 1. 누락 데이터 처리

📄 part5 1.누락 데이터 처리.ipynb

🍫누락 데이터 확인

🍒 데이터 불러오기

import seaborn as sns
df=sns.load_dataset('titanic')
df

Untitled

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   survived     891 non-null    int64
 1   pclass       891 non-null    int64
 2   sex          891 non-null    object
 3   age          714 non-null    float64
 4   sibsp        891 non-null    int64
 5   parch        891 non-null    int64
 6   fare         891 non-null    float64
 7   embarked     889 non-null    object
 8   class        891 non-null    category
 9   who          891 non-null    object
 10  adult_male   891 non-null    bool
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object
 13  alive        891 non-null    object
 14  alone        891 non-null    bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB

🍒 deck 열의 NaN 개수 확인

nan_deck=df['deck'].value_counts(dropna=False)
nan_deck
NaN    688
C       59
B       47
D       33
E       32
A       15
F       13
G        4
Name: deck, dtype: int64