라이브러리 불러오기

  • pandas 라이브러리 불러오기, supermarket_sales.csv 파일 불러오기
1
2
3
import pandas as pd 
from google.colab import drive
drive.mount("/content/drive")
Mounted at /content/drive
1
2
3
DATA_PATH = "/content/drive/MyDrive/Colab Notebooks/data/supermarket_sales.csv"
sales = pd.read_csv(DATA_PATH)
sales

Invoice ID Branch City Customer type Gender Product line Unit price Quantity Date Time Payment
0 750-67-8428 A Yangon Member Female Health and beauty 74.69 7 1/5/2019 13:08 Ewallet
1 226-31-3081 C Naypyitaw Normal Female Electronic accessories 15.28 5 3/8/2019 10:29 Cash
2 631-41-3108 A Yangon Normal Male Home and lifestyle 46.33 7 3/3/2019 13:23 Credit card
3 123-19-1176 A Yangon Member Male Health and beauty 58.22 8 1/27/2019 20:33 Ewallet
4 373-73-7910 A Yangon Normal Male Sports and travel 86.31 7 2/8/2019 10:37 Ewallet
... ... ... ... ... ... ... ... ... ... ... ...
995 233-67-5758 C Naypyitaw Normal Male Health and beauty 40.35 1 1/29/2019 13:46 Ewallet
996 303-96-2227 B Mandalay Normal Female Home and lifestyle 97.38 10 3/2/2019 17:16 Ewallet
997 727-02-1313 A Yangon Member Male Food and beverages 31.84 1 2/9/2019 13:22 Cash
998 347-56-2442 A Yangon Normal Male Home and lifestyle 65.82 1 2/22/2019 15:33 Cash
999 849-09-3807 A Yangon Member Female Fashion accessories 88.34 7 2/18/2019 13:28 Cash

1000 rows × 11 columns

  <script>
    const buttonEl =
      document.querySelector('#df-5de646d0-967f-4dc5-a51f-d7ef279760fb button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-5de646d0-967f-4dc5-a51f-d7ef279760fb');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
sales.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Invoice ID     1000 non-null   object 
 1   Branch         1000 non-null   object 
 2   City           1000 non-null   object 
 3   Customer type  1000 non-null   object 
 4   Gender         1000 non-null   object 
 5   Product line   1000 non-null   object 
 6   Unit price     1000 non-null   float64
 7   Quantity       1000 non-null   int64  
 8   Date           1000 non-null   object 
 9   Time           1000 non-null   object 
 10  Payment        1000 non-null   object 
dtypes: float64(1), int64(1), object(9)
memory usage: 86.1+ KB

Groupy by

  • (동의어) 집계함수를 배운다.
1
sales['Invoice ID'].value_counts()
750-67-8428    1
642-61-4706    1
816-72-8853    1
491-38-3499    1
322-02-2271    1
              ..
633-09-3463    1
374-17-3652    1
378-07-7001    1
433-75-6987    1
849-09-3807    1
Name: Invoice ID, Length: 1000, dtype: int64
1
sales.groupby('Customer type')['Quantity'].sum()
Customer type
Member    2785
Normal    2725
Name: Quantity, dtype: int64
1
sales.groupby(['Customer type', 'Branch', 'Payment'])['Quantity'].sum()
Customer type  Branch  Payment    
Member         A       Cash           308
                       Credit card    282
                       Ewallet        374
               B       Cash           284
                       Credit card    371
                       Ewallet        269
               C       Cash           293
                       Credit card    349
                       Ewallet        255
Normal         A       Cash           264
                       Credit card    298
                       Ewallet        333
               B       Cash           344
                       Credit card    228
                       Ewallet        324
               C       Cash           403
                       Credit card    194
                       Ewallet        337
Name: Quantity, dtype: int64

-

1
print(type(sales.groupby(['Customer type', 'Branch', 'Payment'])['Quantity'].sum()))
<class 'pandas.core.series.Series'>
1
sales.groupby(['Customer type', 'Branch', 'Payment'], as_index=False)['Quantity'].agg(['sum', 'mean']).reset_index()

Customer type Branch Payment sum mean
0 Member A Cash 308 5.500000
1 Member A Credit card 282 5.755102
2 Member A Ewallet 374 6.032258
3 Member B Cash 284 5.358491
4 Member B Credit card 371 5.888889
5 Member B Ewallet 269 5.489796
6 Member C Cash 293 4.966102
7 Member C Credit card 349 5.816667
8 Member C Ewallet 255 5.100000
9 Normal A Cash 264 4.888889
10 Normal A Credit card 298 5.418182
11 Normal A Ewallet 333 5.203125
12 Normal B Cash 344 6.035088
13 Normal B Credit card 228 4.956522
14 Normal B Ewallet 324 5.062500
15 Normal C Cash 403 6.200000
16 Normal C Credit card 194 5.105263
17 Normal C Ewallet 337 6.017857

  <script>
    const buttonEl =
      document.querySelector('#df-06183b9d-9668-4956-b5b7-bf3ab9256622 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-06183b9d-9668-4956-b5b7-bf3ab9256622');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
print(type(sales.groupby(['Customer type', 'Branch', 'Payment'])['Quantity'].agg(['sum', 'mean'])))
<class 'pandas.core.frame.DataFrame'>

결측치 다루기

결측치 데이터 생성

1
2
3
4
5
6
7
8
9
10
11
import pandas as pd 
import numpy as np

dict_01 = {
'Score_A' : [80, 90, np.nan, 80],
'Score_B' : [30, 45, np.nan, np.nan],
'Score_C' : [np.nan, 50, 80, 90]
}

df = pd.DataFrame(dict_01)
df

Score_A Score_B Score_C
0 80.0 30.0 NaN
1 90.0 45.0 50.0
2 NaN NaN 80.0
3 80.0 NaN 90.0

  <script>
    const buttonEl =
      document.querySelector('#df-19e2d3df-d166-4556-96a3-7e87c689f899 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-19e2d3df-d166-4556-96a3-7e87c689f899');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
df.isnull().sum()
Score_A    1
Score_B    2
Score_C    1
dtype: int64
1
df.fillna("0")

Score_A Score_B Score_C
0 80.0 30.0 0
1 90.0 45.0 50.0
2 0 0 80.0
3 80.0 0 90.0

  <script>
    const buttonEl =
      document.querySelector('#df-8fdcaeb0-1a6d-42b3-a659-2ed1ac98f3c4 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-8fdcaeb0-1a6d-42b3-a659-2ed1ac98f3c4');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
df.fillna(method="pad")

Score_A Score_B Score_C
0 80.0 30.0 NaN
1 90.0 45.0 50.0
2 90.0 45.0 80.0
3 80.0 45.0 90.0

  <script>
    const buttonEl =
      document.querySelector('#df-0beb1959-e8a4-4550-b78e-1da677eaf1f7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-0beb1959-e8a4-4550-b78e-1da677eaf1f7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
4
5
6
7
dict_01 = {
"성별" : ["남자", "여자", np.nan, "남자"],
"Salary" : [30, 45, 90, 70]
}

df = pd.DataFrame(dict_01)
df

성별 Salary
0 남자 30
1 여자 45
2 NaN 90
3 남자 70

  <script>
    const buttonEl =
      document.querySelector('#df-cd4f0fa9-4594-42f6-b011-ec8d9c313a45 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-cd4f0fa9-4594-42f6-b011-ec8d9c313a45');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
df['성별'].fillna("성별 없음")
0       남자
1       여자
2    성별 없음
3       남자
Name: 성별, dtype: object
  • 결측치

–> 문자열 타입이랑 / 숫자 타입이랑 접근 방법이 다름
–> 문자열 (빈도 –> 가장 많이 나타나는 문자열 넣어주기!, 최빈값)
–> 숫자열 (평균, 최대, 최소, 중간, 기타 등등..)

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd 
import numpy as np

dict_01 = {
'Score_A' : [80, 90, np.nan, 80],
'Score_B' : [30, 45, np.nan, 60],
'Score_C' : [np.nan, 50, 80, 90],
'Score_D' : [50, 30, 80, 60]
}

df = pd.DataFrame(dict_01)
df

Score_A Score_B Score_C Score_D
0 80.0 30.0 NaN 50
1 90.0 45.0 50.0 30
2 NaN NaN 80.0 80
3 80.0 60.0 90.0 60

  <script>
    const buttonEl =
      document.querySelector('#df-4285262a-db62-49be-9c8d-273186ab08c4 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-4285262a-db62-49be-9c8d-273186ab08c4');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
df.dropna(axis = 1)

Score_D
0 50
1 30
2 80
3 60

  <script>
    const buttonEl =
      document.querySelector('#df-0e190d15-cdf3-464b-bcfb-9ef6867cfba1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-0e190d15-cdf3-464b-bcfb-9ef6867cfba1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
df.dropna(axis = 0)

Score_A Score_B Score_C Score_D
1 90.0 45.0 50.0 30
3 80.0 60.0 90.0 60

  <script>
    const buttonEl =
      document.querySelector('#df-c4cae1e4-71a0-446b-8e51-6af06230d2ee button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-c4cae1e4-71a0-446b-8e51-6af06230d2ee');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

이상치

1
sales

Invoice ID Branch City Customer type Gender Product line Unit price Quantity Date Time Payment
0 750-67-8428 A Yangon Member Female Health and beauty 74.69 7 1/5/2019 13:08 Ewallet
1 226-31-3081 C Naypyitaw Normal Female Electronic accessories 15.28 5 3/8/2019 10:29 Cash
2 631-41-3108 A Yangon Normal Male Home and lifestyle 46.33 7 3/3/2019 13:23 Credit card
3 123-19-1176 A Yangon Member Male Health and beauty 58.22 8 1/27/2019 20:33 Ewallet
4 373-73-7910 A Yangon Normal Male Sports and travel 86.31 7 2/8/2019 10:37 Ewallet
... ... ... ... ... ... ... ... ... ... ... ...
995 233-67-5758 C Naypyitaw Normal Male Health and beauty 40.35 1 1/29/2019 13:46 Ewallet
996 303-96-2227 B Mandalay Normal Female Home and lifestyle 97.38 10 3/2/2019 17:16 Ewallet
997 727-02-1313 A Yangon Member Male Food and beverages 31.84 1 2/9/2019 13:22 Cash
998 347-56-2442 A Yangon Normal Male Home and lifestyle 65.82 1 2/22/2019 15:33 Cash
999 849-09-3807 A Yangon Member Female Fashion accessories 88.34 7 2/18/2019 13:28 Cash

1000 rows × 11 columns

  <script>
    const buttonEl =
      document.querySelector('#df-6f6e6109-861d-4be9-ba85-95e41759ab5e button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6f6e6109-861d-4be9-ba85-95e41759ab5e');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • 일반적인 통계적인 공식

  • IQR - 박스플롯 - 사분위수

  • Q0(0), Q1(25%), Q2(50%), Q3(75%), Q4(100%)

  • 이상치의 하한 경계값 : Q1 - (1.5 * (Q3-Q1))

  • 이상치의 상한 경계값 : Q3 + (1.5 * (Q3-Q1))

  • 도메인(각 비즈니스 영역, 미래 일자리)에서 바라보는 이상치 기준(관습)

1
sales[['Unit price']].describe()

Unit price
count 1000.000000
mean 55.672130
std 26.494628
min 10.080000
25% 32.875000
50% 55.230000
75% 77.935000
max 99.960000

  <script>
    const buttonEl =
      document.querySelector('#df-9b6fc1ab-b0b3-47ba-b180-b37f4ec43d85 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-9b6fc1ab-b0b3-47ba-b180-b37f4ec43d85');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
4
5
6
7
8
Q1 = sales['Unit price'].quantile(0.25)
Q3 = sales['Unit price'].quantile(0.75)

# Q1보다 낮은 값을 이상치로 간주
outliers_q1 = (sales['Unit price'] < Q1)

# Q3보다 높은 값을 이상치로 간주
outliers_q3 = (sales['Unit price'] > Q3)
1
print(sales['Unit price'][~(outliers_q1 | outliers_q3)])
0      74.69
2      46.33
3      58.22
6      68.84
7      73.56
       ...  
991    76.60
992    58.03
994    60.95
995    40.35
998    65.82
Name: Unit price, Length: 500, dtype: float64

Comment and share

라이브 러리 불러오기

1
2
3
4
import matplotlib
import seaborn as sns
print(matplotlib.__version__)
print(sns.__version__)
3.2.2
0.11.2

시각화 그려보기

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import matplotlib.pyplot as plt

dates = [
'2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05',
'2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10'
]
min_temperature = [20.7, 17.9, 18.8, 14.6, 15.8, 15.8, 15.8, 17.4, 21.8, 20.0]
max_temperature = [34.7, 28.9, 31.8, 25.6, 28.8, 21.8, 22.8, 28.4, 30.8, 32.0]

# 앞으로 여러분들이 아래와 같이 코드를 작성해주시면 됩니다.
flg, ax = plt.subplots(nrows = 1, ncols = 1, figsize = (10, 6)) # 시각화 기초 해심



ax.plot(dates, min_temperature, label = "Min Temp.")
ax.plot(dates, max_temperature, label = "Max Temp.")
ax.legend()
plt.show()

png

1
!pip install yfinance --upgrade --no-cache-dir
Collecting yfinance
  Downloading yfinance-0.1.70-py2.py3-none-any.whl (26 kB)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.7/dist-packages (from yfinance) (1.21.5)
Collecting lxml>=4.5.1
  Downloading lxml-4.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
     |████████████████████████████████| 6.4 MB 16.4 MB/s 
[?25hCollecting requests>=2.26
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
     |████████████████████████████████| 63 kB 5.9 MB/s 
[?25hRequirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from yfinance) (1.3.5)
Requirement already satisfied: multitasking>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from yfinance) (0.0.10)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->yfinance) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->yfinance) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.24.0->yfinance) (1.15.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2.10)
Installing collected packages: requests, lxml, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
Successfully installed lxml-4.8.0 requests-2.27.1 yfinance-0.1.70
1
2
3
4
5
import yfinance as yf
data = yf.download("AAPL", start = "2019-08-01", end = "2022-03-23")
ts = data['Open']
print(ts.head())
print(type(ts))
[*********************100%***********************]  1 of 1 completed
Date
2019-08-01    53.474998
2019-08-02    51.382500
2019-08-05    49.497501
2019-08-06    49.077499
2019-08-07    48.852501
Name: Open, dtype: float64
<class 'pandas.core.series.Series'>

pyplot 형태

1
2
3
4
5
6
import matplotlib.pyplot as plt
plt.plot(ts)
plt.title("Stock Market of AAPL") # 구글 코랩에서 한글 타이틀은 인식을 못하여 나중에 세팅해야함
plt.xlabel("Date")
plt.ylabel("Open Price")
plt.show()

png

1
2
3
4
5
6
7
8
import matplotlib.pyplot as plt

fig, ax = plt.subplots() #fig 는 겉 테두리
ax.plot(ts)
ax.set_title("Stock Market of AAPL")
ax.set_xlabel("Date")
ax.set_ylabel("Open Price")
plt.show()

png

막대 그래프

1
calendar.month_name[1:13]
['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import matplotlib.pyplot as plt
import numpy as np
import calendar # 날짜를 관장하는 라이브러리

month_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
sold_list = [300, 400, 550, 900, 600, 960, 900, 910, 800, 700, 550, 450]

fig, ax = plt.subplots(figsize = (10, 6))
barplots = ax.bar(month_list, sold_list)

print("barplots :", barplots)

for plot in barplots:
print(plot)
# print(plot.get_height())
# print(plot.get_x())
# print(plot.get_y())
# print(plot.get_width())
height = plot.get_height()
ax.text(plot.get_x() + plot.get_width()/2., height, height, ha = 'center', va = 'bottom')

plt.xticks(month_list, calendar.month_name[1:13], rotation = 90)
plt.show()
barplots : <BarContainer object of 12 artists>
Rectangle(xy=(0.6, 0), width=0.8, height=300, angle=0)
Rectangle(xy=(1.6, 0), width=0.8, height=400, angle=0)
Rectangle(xy=(2.6, 0), width=0.8, height=550, angle=0)
Rectangle(xy=(3.6, 0), width=0.8, height=900, angle=0)
Rectangle(xy=(4.6, 0), width=0.8, height=600, angle=0)
Rectangle(xy=(5.6, 0), width=0.8, height=960, angle=0)
Rectangle(xy=(6.6, 0), width=0.8, height=900, angle=0)
Rectangle(xy=(7.6, 0), width=0.8, height=910, angle=0)
Rectangle(xy=(8.6, 0), width=0.8, height=800, angle=0)
Rectangle(xy=(9.6, 0), width=0.8, height=700, angle=0)
Rectangle(xy=(10.6, 0), width=0.8, height=550, angle=0)
Rectangle(xy=(11.6, 0), width=0.8, height=450, angle=0)

png

1
2
3
4
5
6
7
8
9
10
11
12
13
import seaborn as sns

tips = sns.load_dataset("tips")
print(tips.info())
x = tips['total_bill']
y = tips['tip']

# 산점도
fig, ax = plt.subplots(figsize = (10,6))
ax.scatter(x, y)
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')
plt.show()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB
None

png

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
label, data = tips.groupby('sex')
# print(label)
# print(data)

tips['sex_color'] = tips['sex'].map({'Male': '#2521F6', 'Female': '#EB4036'})
# print(tips.head())

fig, ax = plt.subplots(figsize = (10, 6))
for label, data in tips.groupby('sex'):
ax.scatter(data['total_bill'], data['tip'], label = label, color = data['sex_color'], alpha = 0.5)
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')

ax.legend() # 범례
plt.show()

png

Seaborn

1
2
3
4
5
6
7
8
9
10
from IPython.core.pylabtools import figsize
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset("tips")
# print(tips.info())

fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'sex', data = tips)
plt.show()

png

1
2
3
4
5
6
7
8
# 두개의 그래프를 동시에 표현
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
sns.regplot(x = "total_bill", y = "tip", data = tips , ax = ax[1], fit_reg = True)
ax[1].set_title("with linear regression line")

sns.regplot(x = "total_bill", y = "tip", data = tips , ax = ax[0], fit_reg = False)
ax[0].set_title("without linear regression line")
plt.show()

png

  • 막대 그래프 그리기 seaborn 방신
1
2
sns.countplot(x = "day", data = tips)
plt.show()

png

1
2
3
print(tips['day'].value_counts().index)
print(tips['day'].value_counts().values)
print(tips['day'].value_counts(ascending=True)) # 오름차순
CategoricalIndex(['Sat', 'Sun', 'Thur', 'Fri'], categories=['Thur', 'Fri', 'Sat', 'Sun'], ordered=False, dtype='category')
[87 76 62 19]
Fri     19
Thur    62
Sun     76
Sat     87
Name: day, dtype: int64
1
2
3
4
5
6
7
8
9
10
fig, ax = plt.subplots()
ax = sns.countplot(x = "day", data = tips, order = tips['day'].value_counts().index)

for plot in ax.patches:
# print(plot)
height = plot.get_height()
ax.text(plot.get_x() + plot.get_width()/2., height, height, ha = 'center', va = 'bottom')

ax.set_ylim(-5, 100) # y축 값 변경
plt.show()

png

Comment and share

라이브러리 불러오기

1
2
import pandas as pd
print(pd.__version__)
1.3.5

테스트

1
2
3
4
5
temp_dic = {"coll" : [1, 2, 3],
"col2" : [3, 4, 5]}
df = pd.DataFrame(temp_dic)
print(type(df))
print(df)
<class 'pandas.core.frame.DataFrame'>
   coll  col2
0     1     3
1     2     4
2     3     5
1
2
3
4
temp_dic = {'a' : 1 , "b" : 2, "c" : 3}
ser = pd.Series(temp_dic)
print(type(ser))
print(ser)
<class 'pandas.core.series.Series'>
a    1
b    2
c    3
dtype: int64

구글 드라이브 연동

1
2
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
1
2
3
DATA_PATH = '/content/drive/MyDrive/Colab Notebooks/data/Lemonade2016.csv'
juice = pd.read_csv(DATA_PATH)
juice

Date Location Lemon Orange Temperature Leaflets Price
0 7/1/2016 Park 97 67 70 90.0 0.25
1 7/2/2016 Park 98 67 72 90.0 0.25
2 7/3/2016 Park 110 77 71 104.0 0.25
3 7/4/2016 Beach 134 99 76 98.0 0.25
4 7/5/2016 Beach 159 118 78 135.0 0.25
5 7/6/2016 Beach 103 69 82 90.0 0.25
6 7/6/2016 Beach 103 69 82 90.0 0.25
7 7/7/2016 Beach 143 101 81 135.0 0.25
8 NaN Beach 123 86 82 113.0 0.25
9 7/9/2016 Beach 134 95 80 126.0 0.25
10 7/10/2016 Beach 140 98 82 131.0 0.25
11 7/11/2016 Beach 162 120 83 135.0 0.25
12 7/12/2016 Beach 130 95 84 99.0 0.25
13 7/13/2016 Beach 109 75 77 99.0 0.25
14 7/14/2016 Beach 122 85 78 113.0 0.25
15 7/15/2016 Beach 98 62 75 108.0 0.50
16 7/16/2016 Beach 81 50 74 90.0 0.50
17 7/17/2016 Beach 115 76 77 126.0 0.50
18 7/18/2016 Park 131 92 81 122.0 0.50
19 7/19/2016 Park 122 85 78 113.0 0.50
20 7/20/2016 Park 71 42 70 NaN 0.50
21 7/21/2016 Park 83 50 77 90.0 0.50
22 7/22/2016 Park 112 75 80 108.0 0.50
23 7/23/2016 Park 120 82 81 117.0 0.50
24 7/24/2016 Park 121 82 82 117.0 0.50
25 7/25/2016 Park 156 113 84 135.0 0.50
26 7/26/2016 Park 176 129 83 158.0 0.35
27 7/27/2016 Park 104 68 80 99.0 0.35
28 7/28/2016 Park 96 63 82 90.0 0.35
29 7/29/2016 Park 100 66 81 95.0 0.35
30 7/30/2016 Beach 88 57 82 81.0 0.35
31 7/31/2016 Beach 76 47 82 68.0 0.35

  <script>
    const buttonEl =
      document.querySelector('#df-27fb38f5-8b87-4f93-8b5a-9f5a79087b29 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-27fb38f5-8b87-4f93-8b5a-9f5a79087b29');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • 데이터를 불러왔다.
  • 첫번째 파악해야 하는 것 = 데이터 구조 파악
1
2
juice.info() # info = DataFrame 안에 있는 method
# 결측치가 있으면 Non-Null Count 개수가 다름.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         31 non-null     object 
 1   Location     32 non-null     object 
 2   Lemon        32 non-null     int64  
 3   Orange       32 non-null     int64  
 4   Temperature  32 non-null     int64  
 5   Leaflets     31 non-null     float64
 6   Price        32 non-null     float64
dtypes: float64(2), int64(3), object(2)
memory usage: 1.9+ KB
1
juice.head(10) # 위에서부터 5개까지, ()안에 숫자를 넣으면 그 숫자까지 데이터를 불러옴

Date Location Lemon Orange Temperature Leaflets Price
0 7/1/2016 Park 97 67 70 90.0 0.25
1 7/2/2016 Park 98 67 72 90.0 0.25
2 7/3/2016 Park 110 77 71 104.0 0.25
3 7/4/2016 Beach 134 99 76 98.0 0.25
4 7/5/2016 Beach 159 118 78 135.0 0.25
5 7/6/2016 Beach 103 69 82 90.0 0.25
6 7/6/2016 Beach 103 69 82 90.0 0.25
7 7/7/2016 Beach 143 101 81 135.0 0.25
8 NaN Beach 123 86 82 113.0 0.25
9 7/9/2016 Beach 134 95 80 126.0 0.25

  <script>
    const buttonEl =
      document.querySelector('#df-6f968857-ee36-4309-96a8-22630fa0efd6 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6f968857-ee36-4309-96a8-22630fa0efd6');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.tail() # 아래에서 부터 5개

Date Location Lemon Orange Temperature Leaflets Price
27 7/27/2016 Park 104 68 80 99.0 0.35
28 7/28/2016 Park 96 63 82 90.0 0.35
29 7/29/2016 Park 100 66 81 95.0 0.35
30 7/30/2016 Beach 88 57 82 81.0 0.35
31 7/31/2016 Beach 76 47 82 68.0 0.35

  <script>
    const buttonEl =
      document.querySelector('#df-6733b4da-1377-4661-8b0c-d798794900ed button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6733b4da-1377-4661-8b0c-d798794900ed');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • Describe() 함수
  • 기술통계량 확인 해주는 함수
1
juice.describe() # type(juice.describe()) 항상 데이터 타입 확인.

Lemon Orange Temperature Leaflets Price
count 32.000000 32.000000 32.000000 31.000000 32.000000
mean 116.156250 80.000000 78.968750 108.548387 0.354687
std 25.823357 21.863211 4.067847 20.117718 0.113137
min 71.000000 42.000000 70.000000 68.000000 0.250000
25% 98.000000 66.750000 77.000000 90.000000 0.250000
50% 113.500000 76.500000 80.500000 108.000000 0.350000
75% 131.750000 95.000000 82.000000 124.000000 0.500000
max 176.000000 129.000000 84.000000 158.000000 0.500000

  <script>
    const buttonEl =
      document.querySelector('#df-f9f0ce1e-fc97-44be-914b-2a73ca101631 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-f9f0ce1e-fc97-44be-914b-2a73ca101631');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • value_counts()
1
2
print(juice['Location'].value_counts()) # 기초 통계량으로는 빈도만 확인 할수 있으니 value counts()함수를 사용
print(type(juice['Location'].value_counts()))
Beach    17
Park     15
Name: Location, dtype: int64
<class 'pandas.core.series.Series'>

데이터 다뤄보기

  • 행과 열을 핸들링 해보자.
1
2
juice['Sold'] = 0 # 새로운 데이터 추가
print(juice.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25     0
1  7/2/2016     Park     98      67           72      90.0   0.25     0
2  7/3/2016     Park    110      77           71     104.0   0.25     0
1
2
juice['Sold'] = juice['Lemon'] + juice['Orange']
print(juice.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25   164
1  7/2/2016     Park     98      67           72      90.0   0.25   165
2  7/3/2016     Park    110      77           71     104.0   0.25   187
  • 매출액 = 가격 * 판매량
1
2
3
# juice['Revenue'] = 0 생략 가능
juice['Revenue'] = juice['Price'] * juice['Sold']
print(juice.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold  \
0  7/1/2016     Park     97      67           70      90.0   0.25   164   
1  7/2/2016     Park     98      67           72      90.0   0.25   165   
2  7/3/2016     Park    110      77           71     104.0   0.25   187   

   Revenue  
0    41.00  
1    41.25  
2    46.75  
  • drop(axis = 0 | 1)
    • axis를 0으로 설정 시, 행(=index)방향으로 drop() 실행
    • axis를 1로 설정 시, 열방향으로 drop 수행함.
1
2
juice_column_drop = juice.drop('Sold', axis = 1)
print(juice_column_drop.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Revenue
0  7/1/2016     Park     97      67           70      90.0   0.25    41.00
1  7/2/2016     Park     98      67           72      90.0   0.25    41.25
2  7/3/2016     Park    110      77           71     104.0   0.25    46.75
1
2
juice_row_drop = juice.drop(0, axis = 0)
print(juice_row_drop.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold  \
1  7/2/2016     Park     98      67           72      90.0   0.25   165   
2  7/3/2016     Park    110      77           71     104.0   0.25   187   
3  7/4/2016    Beach    134      99           76      98.0   0.25   233   

   Revenue  
1    41.25  
2    46.75  
3    58.25  

데이터 인덱싱

1
juice[4:8]

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25
5 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00

  <script>
    const buttonEl =
      document.querySelector('#df-39a9c573-48e3-4700-b5bb-c24950ba4f3f button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-39a9c573-48e3-4700-b5bb-c24950ba4f3f');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • boolean 값을 활용한 데이터 추출
1
2
3
# location이 Beach인 경우
# juice['Location'].value_counts()
juice[juice['Leaflets'] >= 100]

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
8 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
9 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
10 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
11 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
14 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
15 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
17 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
19 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
22 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
23 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
24 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-39bd9f29-5daa-48f0-bd59-c71dc37c6437 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-39bd9f29-5daa-48f0-bd59-c71dc37c6437');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

iloc vs loc

  • 차이를 확인한다!
1
juice.head(3)

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-10e12aa4-8e41-4c4f-91a6-be55e247dea1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-10e12aa4-8e41-4c4f-91a6-be55e247dea1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
4
%%time

# juice.iloc[:, 0:2]
juice.iloc[0:3, 0:2]
CPU times: user 652 µs, sys: 0 ns, total: 652 µs
Wall time: 653 µs

Date Location
0 7/1/2016 Park
1 7/2/2016 Park
2 7/3/2016 Park

  <script>
    const buttonEl =
      document.querySelector('#df-3556673a-432a-46c9-b3b9-d55c0796a5c1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-3556673a-432a-46c9-b3b9-d55c0796a5c1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • loc
  • 라벨 기반!
1
2
3
%%time

juice.loc[0:2, ['Date','Location']]
CPU times: user 1.56 ms, sys: 0 ns, total: 1.56 ms
Wall time: 1.5 ms

Date Location
0 7/1/2016 Park
1 7/2/2016 Park
2 7/3/2016 Park

  <script>
    const buttonEl =
      document.querySelector('#df-7f178606-9630-45b5-bb8e-006efc9d78d3 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-7f178606-9630-45b5-bb8e-006efc9d78d3');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • 데이터, 컬럼명 동시에 별도 추출 (iloc만 가능)
1
juice.loc[juice['Leaflets'] >= 100, ['Date', 'Location']]

Date Location
2 7/3/2016 Park
4 7/5/2016 Beach
7 7/7/2016 Beach
8 NaN Beach
9 7/9/2016 Beach
10 7/10/2016 Beach
11 7/11/2016 Beach
14 7/14/2016 Beach
15 7/15/2016 Beach
17 7/17/2016 Beach
18 7/18/2016 Park
19 7/19/2016 Park
22 7/22/2016 Park
23 7/23/2016 Park
24 7/24/2016 Park
25 7/25/2016 Park
26 7/26/2016 Park

  <script>
    const buttonEl =
      document.querySelector('#df-f18c55cb-276f-414b-b983-0362c3463c87 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-f18c55cb-276f-414b-b983-0362c3463c87');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.loc[juice['Leaflets'] >= 100, 0:2]
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-70-46f78a7ec2bf> in <module>()
----> 1 juice.loc[juice['Leaflets'] >= 100, 0:2]


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1107             return self._multi_take(tup)
   1108 
-> 1109         return self._getitem_tuple_same_dim(tup)
   1110 
   1111     def _get_label(self, label, axis: int):


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
    804                 continue
    805 
--> 806             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    807             # We should never have retval.ndim < self.ndim, as that should
    808             #  be handled by the _getitem_lowerdim call above.


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1140         if isinstance(key, slice):
   1141             self._validate_key(key, axis)
-> 1142             return self._get_slice_axis(key, axis=axis)
   1143         elif com.is_bool_indexer(key):
   1144             return self._getbool_axis(key, axis=axis)


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1174 
   1175         labels = obj._get_axis(axis)
-> 1176         indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1177 
   1178         if isinstance(indexer, slice):


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
   5683         slice(1, 3, None)
   5684         """
-> 5685         start_slice, end_slice = self.slice_locs(start, end, step=step)
   5686 
   5687         # return a slice


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
   5885         start_slice = None
   5886         if start is not None:
-> 5887             start_slice = self.get_slice_bound(start, "left")
   5888         if start_slice is None:
   5889             start_slice = 0


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   5795         # For datetime indices label may be a string that has to be converted
   5796         # to datetime boundary according to its resolution.
-> 5797         label = self._maybe_cast_slice_bound(label, side)
   5798 
   5799         # we need to look up the label


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in _maybe_cast_slice_bound(self, label, side, kind)
   5747         # reject them, if index does not contain label
   5748         if (is_float(label) or is_integer(label)) and label not in self._values:
-> 5749             raise self._invalid_indexer("slice", label)
   5750 
   5751         return label


TypeError: cannot do slice indexing on Index with these indexers [0] of type int

정렬

  • sort.values()
1
juice.sort_values(by = ['Revenue']).head(3) # 오름차순

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-8cc684a9-4a06-4feb-934c-53ece2672c41 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-8cc684a9-4a06-4feb-934c-53ece2672c41');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.sort_values(by = ['Revenue'], ascending=False).head(3) # 내림차순

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-e6dc6ec1-85e1-4d62-a618-1841af7f4df7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-e6dc6ec1-85e1-4d62-a618-1841af7f4df7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.sort_values(by = ['Price', 'Temperature'], ascending=False) # 그룹화(0.5일때 나열, 0.35일때 나열)

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
24 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
23 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
22 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
19 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
17 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
21 7/21/2016 Park 83 50 77 90.0 0.50 133 66.50 Beach
15 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
16 7/16/2016 Beach 81 50 74 90.0 0.50 131 65.50 Beach
20 7/20/2016 Park 71 42 70 NaN 0.50 113 56.50 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach
28 7/28/2016 Park 96 63 82 90.0 0.35 159 55.65 Beach
30 7/30/2016 Beach 88 57 82 81.0 0.35 145 50.75 Beach
31 7/31/2016 Beach 76 47 82 68.0 0.35 123 43.05 Beach
29 7/29/2016 Park 100 66 81 95.0 0.35 166 58.10 Beach
27 7/27/2016 Park 104 68 80 99.0 0.35 172 60.20 Beach
12 7/12/2016 Beach 130 95 84 99.0 0.25 225 56.25 Beach
11 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
5 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
8 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
10 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
9 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
14 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
13 7/13/2016 Beach 109 75 77 99.0 0.25 184 46.00 Beach
3 7/4/2016 Beach 134 99 76 98.0 0.25 233 58.25 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-1b835c37-2512-40ad-b58d-8bb74607a0ee button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-1b835c37-2512-40ad-b58d-8bb74607a0ee');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
# Price는 내림차순 , Temperature은 오름차순 
juice.sort_values(by = ['Price', 'Temperature'], ascending=[False, True])

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
20 7/20/2016 Park 71 42 70 NaN 0.50 113 56.50 Beach
16 7/16/2016 Beach 81 50 74 90.0 0.50 131 65.50 Beach
15 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
17 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
21 7/21/2016 Park 83 50 77 90.0 0.50 133 66.50 Beach
19 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
22 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
23 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
24 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
27 7/27/2016 Park 104 68 80 99.0 0.35 172 60.20 Beach
29 7/29/2016 Park 100 66 81 95.0 0.35 166 58.10 Beach
28 7/28/2016 Park 96 63 82 90.0 0.35 159 55.65 Beach
30 7/30/2016 Beach 88 57 82 81.0 0.35 145 50.75 Beach
31 7/31/2016 Beach 76 47 82 68.0 0.35 123 43.05 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
3 7/4/2016 Beach 134 99 76 98.0 0.25 233 58.25 Beach
13 7/13/2016 Beach 109 75 77 99.0 0.25 184 46.00 Beach
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
14 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
9 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
5 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
8 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
10 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
11 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
12 7/12/2016 Beach 130 95 84 99.0 0.25 225 56.25 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-50367d11-88b7-4011-923b-a3d85935e6a7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-50367d11-88b7-4011-923b-a3d85935e6a7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
# 정보를 업데이트 및 정렬을 할떄 reset_index 사용
juice2 = juice.sort_values(by = ['Price', 'Temperature'], ascending=[False, True]).reset_index(drop=True)
juice2

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
0 7/20/2016 Park 71 42 70 NaN 0.50 113 56.50 Beach
1 7/16/2016 Beach 81 50 74 90.0 0.50 131 65.50 Beach
2 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
3 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
4 7/21/2016 Park 83 50 77 90.0 0.50 133 66.50 Beach
5 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
6 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
7 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
8 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
9 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
10 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
11 7/27/2016 Park 104 68 80 99.0 0.35 172 60.20 Beach
12 7/29/2016 Park 100 66 81 95.0 0.35 166 58.10 Beach
13 7/28/2016 Park 96 63 82 90.0 0.35 159 55.65 Beach
14 7/30/2016 Beach 88 57 82 81.0 0.35 145 50.75 Beach
15 7/31/2016 Beach 76 47 82 68.0 0.35 123 43.05 Beach
16 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach
17 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
18 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
19 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
20 7/4/2016 Beach 134 99 76 98.0 0.25 233 58.25 Beach
21 7/13/2016 Beach 109 75 77 99.0 0.25 184 46.00 Beach
22 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
23 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
24 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
25 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
26 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
27 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
28 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
29 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
30 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
31 7/12/2016 Beach 130 95 84 99.0 0.25 225 56.25 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-5bba3bc3-d034-4d0b-894c-29ef5ffadfa0 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-5bba3bc3-d034-4d0b-894c-29ef5ffadfa0');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Groupby ()

  • 데이터 요약(피벗테이블)
  • R dplyr groupby() %>% summarize()
1
juice.groupby(by = 'Location').count()

Date Lemon Orange Temperature Leaflets Price Sold Revenue location
Location
Beach 16 17 17 17 17 17 17 17 17
Park 15 15 15 15 14 15 15 15 15

  <script>
    const buttonEl =
      document.querySelector('#df-ff4cd12f-13fc-47fd-8d42-3adcf0dba627 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-ff4cd12f-13fc-47fd-8d42-3adcf0dba627');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
import numpy as np

juice.groupby(['Location'])['Revenue'].agg([max, min, sum, np.mean])

max min sum mean
Location
Beach 95.5 43.0 1002.8 58.988235
Park 134.5 41.0 1178.2 78.546667

  <script>
    const buttonEl =
      document.querySelector('#df-615cb7b2-8595-47fc-afd8-67653b7f8d14 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-615cb7b2-8595-47fc-afd8-67653b7f8d14');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Comment and share

  • 파이썬 라이브러리 설치 방법 (vs R)
1
2
3
4
5
6
7
8
9
10
# R install.pakages("패키지명")
# 파이썬 라이브러리 설치 코드에서 실행 (x)
# 터미널에서 설치
# 방법 1. conda 설치
# --> 아나콘다 설치 후, conda 설치 (데이터 과학)
# 방법 2. pip 설치 (개발 + 데이터과학 + 그외)
# --> 아나콘다 설치 안함 / 파이썬만 설치

# git bash 열고, pip install numpy
# pip install numpy

NumPy 라이브 불러오기

1
2
import numpy
print(numpy.__version__)
1.21.5
1
2
import numpy as np
print(np.__version__)
1.21.5

배열로 변환

  • 1부터 10까지의 리스트를 만든다.
  • NumPy 배열로 변환해서 저장한다.
1
2
3
4
temp = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
arr = np.array(temp)
print(arr)
print(temp)
[ 1  2  3  4  5  6  7  8  9 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
1
2
print(type(temp))
print(type(arr))
<class 'list'>
<class 'numpy.ndarray'>
  • arr 배열 숫자 5 출력
1
arr[4:8]
array([5, 6, 7, 8])
  • NumPy를 사용하여 기초 통계 함수를 사용한다.
1
2
3
4
np.mean(arr)
np.sum(arr)
np.median(arr)
np.std(arr)
2.8722813232690143

사칙연산

1
2
3
4
5
math_scores = [90, 80, 88]
english_scores = [80, 70, 90]

total_scores = math_scores + english_scores
total_scores
[90, 80, 88, 80, 70, 90]
1
2
3
4
5
6
7
8
math_scores = [90, 80, 88]
english_scores = [80, 70, 90]

math_arr = np.array(math_scores)
english_arr = np.array(english_scores)

total_scores = math_arr + english_arr
total_scores
array([170, 150, 178])
1
np.min(total_scores)
150
1
np.max(total_scores)
178
1
2
3
4
5
6
7
8
9
10
11
12
math_scores = [2, 3, 4]
english_scores = [1, 2, 3]

math_arr = np.array(math_scores)
english_arr = np.array(english_scores)

# 사칙연산
print("덧셈:", np.add(math_arr, english_arr))
print("뺄셈:", np.subtract(math_arr, english_arr))
print("곱셈:", np.multiply(math_arr, english_arr))
print("나눗셈:", np.divide(math_arr, english_arr))
print("거듭제곱:", np.power(math_arr, english_arr))
덧셈: [3 5 7]
뺄셈: [1 1 1]
곱셈: [ 2  6 12]
나눗셈: [2.         1.5        1.33333333]
거듭제곱: [ 2  9 64]

배열의 생성

  • 0차원부터 3차원까지 생성하는 방법
1
2
3
4
temp_arr = np.array(20)
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
20
<class 'numpy.ndarray'>
()
1
2
3
4
5
6
# 1차원 배열
temp_arr = np.array([1, 2, 3])
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim) # 몇 차원인지 알아보는 법
[1 2 3]
<class 'numpy.ndarray'>
(3,)
1
1
2
3
4
5
6
# 2차원 배열
temp_arr = np.array([[1, 2, 3], [4, 5, 6]])
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape) # 2 * 3 배열이다
print(temp_arr.ndim)
[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>
(2, 3)
2
1
2
3
4
5
6
# 3차원 배열
temp_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim)
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]
<class 'numpy.ndarray'>
(2, 2, 3)
3
1
2
3
4
5
temp_arr = np.array([1, 2, 3, 4], ndmin = 2) # ndmin으로 인해 2차원 배열로 바뀜
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim)
[[1 2 3 4]]
<class 'numpy.ndarray'>
(1, 4)
2

소수점 정렬

1
2
temp_arr = np.trunc([-1.23, 1.23])
temp_arr # 소수점 아래 자리가 절삭됨.
array([-1.,  1.])
1
2
temp_arr = np.fix([-1.23, 1.23])
temp_arr
array([-1.,  1.])
1
2
temp_arr = np.around([-1.23789, 1.23789], 4)
temp_arr
array([-1.2379,  1.2379])
1
2
temp_arr = np.round([-1.23789, 1.23789], 4)
temp_arr
array([-1.2379,  1.2379])
1
2
temp_arr = np.floor([-1.23789, 1.23789]) # 내림
temp_arr
array([-2.,  1.])
1
2
temp_arr = np.ceil([-1.23789, 1.23789]) # 올림
temp_arr
array([-1.,  2.])
  • shape는 axis 축을 설정함

배열을 생성하는 다양한 방법들

1
2
temp_arr = np.arange(5)
temp_arr
array([0, 1, 2, 3, 4])
1
2
temp_arr = np.arange(1, 11, 3)
temp_arr
array([ 1,  4,  7, 10])
1
2
3
4
5
6
zero_arr = np.zeros((2, 3))
print(zero_arr)
print(type(zero_arr))
print(zero_arr.shape)
print(zero_arr.ndim)
print(zero_arr.dtype) # float64 -> 64는 bit
[[0. 0. 0.]
 [0. 0. 0.]]
<class 'numpy.ndarray'>
(2, 3)
2
float64
1
2
3
4
5
6
temp_arr = np.ones((4, 5), dtype = "int32") # 데이터 타입도 인위적으로 수정 가능
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim)
print(temp_arr.dtype)
[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]
<class 'numpy.ndarray'>
(4, 5)
2
int32
1
2
3
4
5
6
7
temp_arr = np.ones((2, 6), dtype = "int32")
temp_res_arr = temp_arr.reshape(2, 2, 3) # (5, 3)을 했을때 cannot reshape array of size 12 into shape (5,3)
print(temp_res_arr) # 사이즈를 12로 바꾸면 되서 4, 3 또는 3, 4 등 바꿔주면 됨.
print(type(temp_res_arr))
print(temp_res_arr.shape)
print(temp_res_arr.ndim)
print(temp_res_arr.dtype)
[[[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]]
<class 'numpy.ndarray'>
(2, 2, 3)
3
int32
1
2
3
4
5
6
7
temp_arr = np.ones((12, 12), dtype = "int32")
temp_res_arr = temp_arr.reshape(5, -1) # np.ones(12, 12) -> 12*12 = 144 약수가 아니면 error
print(temp_res_arr)
print(type(temp_res_arr))
print(temp_res_arr.shape)
print(temp_res_arr.ndim)
print(temp_res_arr.dtype)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-60-dfc75cfbf69a> in <module>()
      1 temp_arr = np.ones((12, 12), dtype = "int32")
----> 2 temp_res_arr = temp_arr.reshape(5, -1)
      3 print(temp_res_arr)
      4 print(type(temp_res_arr))
      5 print(temp_res_arr.shape)


ValueError: cannot reshape array of size 144 into shape (5,newaxis)

numpy 조건식

  • np.where()
1
2
temp_arr = np.arange(10)
temp_arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
1
2
3
# 5보다 작은 값은 원래값으로 반환
# 5보다 큰 값은 원래 값 * 10
np.where(temp_arr < 5, temp_arr, temp_arr * 10)
array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])
1
2
3
4
5
# 0 - 100 까지의 배열을 만들고, 50보다 작은 값은 곱하기 10, 나머지는 그냥 원래 값으로 반환
# np.where 은 조건식이 하나만 필요할 떄 사용
temp_arr = np.arange(101)
# temp_arr
np.where(temp_arr < 50, temp_arr * 10, temp_arr)
array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120,
       130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250,
       260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380,
       390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100])
  • np.select()
1
2
3
4
temp_arr = np.arange(10)
temp_arr

# 5보다 큰 값은 곱하기 2, 2보다 작은 값은 더하기 100
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
1
2
3
condlist = [temp_arr > 5, temp_arr < 2]
choicelist = [temp_arr * 2, temp_arr + 100]
np.select(condlist, choicelist, default = temp_arr)
array([100, 101,   2,   3,   4,   5,  12,  14,  16,  18])

Comment and share

클래스를 만드는 목적!

  • 코드의 간결화!
    • 코드를 재사용!
  • 여러 라이브러리 –> 클래스로 구현이 됨
    • list 클래스, str 클래스,
    • 객체로 씀
    • 변수명으로 정의!
  • 여러 클래스들이 모여서 하나의 라이브러리가 됨.
    • 장고(django) / 웹개발 / 머신러닝 / 시각화 / 데이터 전처리

instance 메서드 생성

  • list.append(), list.extend()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Person:      # 대문자소문자 입력.

# class attribute
country = "korean"

# instance attribute
def __init__(self, name, age): # def __init__(self) 고정적 default
self.name = name
self.age = age

# instance method 정의
def singing(self,songtitle, sales):
return "{} 판매량 {} 된 {}을 노래합니다.".format(self.name, sales, songtitle)

if __name__ == "__main__":
kim = Person("Kim", 100)
lee = Person("Lee", 100)

# access class attribute
print("kim은 {}".format(kim.__class__.country))
print("lee는 {}".format(lee.__class__.country))

# call instance
print(kim.singing("A", 10))
print(lee.singing("B", 200))
kim은 korean
lee는 korean
Kim 판매량 10 된 A을 노래합니다.
Lee 판매량 200 된 B을 노래합니다.

클래스 상속

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class Parent:
# instance attribute
def __init__(self, name, age):
self.name = name
self.age = age

# instance method 정의
def whoAmI(self):
print(" I am Parent!!")

def singing(self,songtitle):
return "{} {}을 노래합니다.".format(self.name, songtitle)

def dancing(self):
return "{} 현재 춤을 춥니다.".format(self.name)

class Child(Parent):
def __init__(self, name, age):
# super() function
super().__init__(name, age)
print("Child Class is ON")

def whoAmI(self):
print("I am Child")

def studying(self):
print("I am Fast Runner")


if __name__ == "__main__":
child_kim = Child("kim", 15)
parent_kim = Parent("kim", 45)
print(child_kim.dancing())
print(child_kim.singing("연애"))
#print(parent_kim.studying()) # AttributeError: 'Parent' object has no attribute 'studying' **parent 클래스에 정의 되어있지 않아서 에러가 남"
child_kim.whoAmI()
parent_kim.whoAmI()
Child Class is ON
kim 현재 춤을 춥니다.
kim 연애을 노래합니다.
I am Child
 I am Parent!!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class TV:

# init constructor
def __init__(self):
self.__maxprice = 500

def sell(self):
print("Selling Price: {}".format(self.__maxprice))

def setMaxPrice(self, price):
self.__maxprice = price

if __name__ == "__main__":
tv = TV()
tv.sell()

# change price
# 안 바뀌는 코드의 예시
tv.__maxprice = 1000
tv.sell()

# setMaxPrice
# 값을 바꿀 수있다!? 외부의 입력값을 업데이트 할 수 있다!
tv.setMaxPrice(1000)
tv.sell()
Selling Price: 500
Selling Price: 500
Selling Price: 1000

클래스 내부에 조건문

  • init constructor에 조건문을 써보자!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class Employee:

# init constructor
# nmae , salary
def __init__(self, name, salary = 0):
self.name = name

# 조건문 추가
if salary > 0:
self.salary = salary
else:
self.salary = 0
print("급여는 0원이 될수 없다!. 다시 입력하십시오!!")

def update_salary(self, amount):
self.salary += amount

def weekly_salary(self):
return self.salary / 7

if __name__ == "__main__":
emp01 = Employee("Winters", -50000)
print(emp01.name)
print(emp01.salary)
emp01.salary += 1500
print(emp01.salary)
emp01.update_salary(3000)
print(emp01.salary)
week_salary = emp01.weekly_salary()
print(week_salary)
급여는 0원이 될수 없다!. 다시 입력하십시오!!
Winters
0
1500
4500
642.8571428571429

클래스 Docstring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
class Person:
"""
사람을 표현하는 클래스



Attributes
------------
name : str
name of the person

age : int
age of the person

Methods
-------------

info(additional=""):
prints the person's name and age
"""
def __init__(self, name, age):
"""
Constructs all the neccessary attributes for the person object

Parameters(매개변수)
-------------------------
name : str
name of the person

age : int
age of the person
"""

self.name = name
self.age = age

def info(self, additional = None):
"""
귀찮음...

Parameters
--------------
additional : str, optional
more info to be displayed (Default is None) / A, B, C


Returens
-----------
None

"""

print(f'My name is {self.name}. I am {self.age} years old. ' + additional)

if __name__ == "__main__":
person = Person("Evan", age = 20)
person.info("나의 직장은 00이야")
help(Person)
My name is Evan. I am 20 years old. 나의 직장은 00이야
Help on class Person in module __main__:

class Person(builtins.object)
 |  Person(name, age)
 |  
 |  사람을 표현하는 클래스
 |  
 |  
 |  
 |  Attributes
 |  ------------
 |  name : str
 |    name of the person
 |  
 |  age : int
 |    age of the person
 |  
 |  Methods
 |  -------------
 |  
 |  info(additional=""):
 |    prints the person's name and age
 |  
 |  Methods defined here:
 |  
 |  __init__(self, name, age)
 |      Constructs all the neccessary attributes for the person object
 |      
 |      Parameters(매개변수)
 |      -------------------------
 |        name : str
 |          name of the person
 |      
 |        age : int
 |          age of the person
 |  
 |  info(self, additional=None)
 |      귀찮음...
 |      
 |      Parameters
 |      --------------
 |        additional : str, optional
 |          more info to be displayed (Default is None) / A, B, C
 |      
 |      
 |      Returens
 |      -----------
 |        None
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

Comment and share

기초 문법 리뷰

1
2
3
4
5
6
7
8
9
10
11
# 리스트
book_list = ["A","B","C"]
# append, extend, insert, remove, pop, etc

# 튜플
book_tuple = ("A", "B", "c")
# 수정 삭제가 불가능하다

# 딕셔너리
book_dictionary = {"책 제목" : ["A", "B"], "출판년도" :[2011, 2002]}
# keys(), values(), items(), get()

조건문 & 반복문

1
2
3
4
5
6
if True:
print("코드 실행")
elif True:
print("코드 실행")
else:
print("코드 실행")
1
2
for i in range(3):
print(i+1, "안녕하세요")
1 안녕하세요
2 안녕하세요
3 안녕하세요
1
2
3
book_list = ["프로그래밍 R", "혼자 공부하는 머신러닝"]
for book in book_list:
print(book)
프로그래밍 R
혼자 공부하는 머신러닝
1
2
3
strings01 = "Hello World"
for char in strings01:
print(char)
H
e
l
l
o
 
W
o
r
l
d
1
2
3
num_tuple = (1, 2, 3, 4)
for num in num_tuple:
print(num)
1
2
3
4
1
2
3
4
num_dict = {"A" : 1, "B" : 2}
for num in num_dict:
# print(num) # keys 값이 나옴, value값이 아니라.
print(num_dict[num])
1
2

반복문의 필요성

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
product_name = ["요구르트", "우유", "과자"]
prices = [1000, 1500, 2000]
quantities = [5, 3, 4]
a = [1, 2, 3]
# name = product_name[0]
# sales = prices[0] * quantities[0]
# print(name + "의 매출액은" + str(sales) + "원이다.")

# name = product_name[1]
# sales = prices[1] * quantities[1]
# print(name + "의 매출액은" + str(sales) + "원이다.")

# 위 코드의 반복문 코드 작성 필요 절감
for i in range(len(product_name)):
name = product_name[i]
sales = prices[i] * quantities[i]
print(name + "의 매출액은" + str(sales) + "원이다.")
요구르트의 매출액은5000원이다.
우유의 매출액은4500원이다.
과자의 매출액은8000원이다.

while

  • 조건식이 들어간 반복문
1
2
3
4
5
6
7
count = 1
while count <= 5:
print("안녕하세요..")
count += 1
print(count)

print("5 초과 했군요..")
안녕하세요..
2
안녕하세요..
3
안녕하세요..
4
안녕하세요..
5
안녕하세요..
6
5 초과 했군요..
1
2
3
4
5
count = 3
while count > 0:
print("안녕하세요..")
count -= 1
print(count)
안녕하세요..
2
안녕하세요..
1
안녕하세요..
0

리스트 컴프리헨션

  • for-loop 반복문을 한줄로 처리
1
2
3
4
5
6
7
8
9
10
11
12
my_list = [[10], [20, 30]]
# print(my_list)

flattened_list = []
for value_list in my_list:
# print(value_list)
for value in value_list:
# print(value)
flattened_list.append(value)

print(flattened_list)
# 결괏값 : [10, 20 ,30]
[10]
[20, 30]
[10, 20, 30]
1
2
3
my_list = [[10], [20, 30]]
flattened_list = [value for value_list in my_list for value in value_list]
print(flattened_list)
[10, 20, 30]
1
2
3
4
letters = []
for char in "helloworld":
letters.append(char)
print(letters)
['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd']
1
2
letters2 = [char for char in "helloworld"]
print(letters2)
['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd']

사용자 정의 함수 (User-Defined Function)

1
2
3
4
5
def 함수명():
# 코드 실행
return

함수명()
  • basic.py로 저장할떄, 예시
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# /user/local/bin/python
# -*- coding: utf-8 -*-
def temp(content, letter):
"""content안에 있는 문자를 세는 함수입니다.

Args:
content(str) : 탐색 문자열
letter(str) : 찾을 문자열

Returns:
int
"""

print("함수 테스트")

cnt = len([char for char in content if char == letter])
return cnt

if __name__ == "__main__":
help(temp)
docstring = temp.__doc__ # docstring 문서화
print(docstring)
Help on function temp in module __main__:

temp(content, letter)
    content안에 있는 문자를 세는 함수입니다.
    
    Args:
      content(str) : 탐색 문자열
      letter(str) : 찾을 문자열
    
    Returns:
      int

content안에 있는 문자를 세는 함수입니다.
  
  Args:
    content(str) : 탐색 문자열
    letter(str) : 찾을 문자열

  Returns:
    int
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
value_list = [1, 2, 3, 4, 5, 6]
print("avg:", sum(value_list) / len(value_list))

# 중간값
midpoint = int(len(value_list) / 2)
# len(value_list) % 2 == 0:
print((value_list[midpoint - 1] + value_list[midpoint]) / 2)
print(value_list[midpoint])

def mean_and_median(value_list):
""" 숫자 리스트 요소들의 평균과 중간값을 구하는 코드를 작성해라
Args:
value_list (iterable of int / float): A list of int numbers

Return:
tuple(float, float)
"""
# 평균
mean = sum(value_list) / len(value_list)
# 중간값
midpoint = int(len(value_list) / 2)
if len(value_list) % 2 == 0:
median = (value_list[midpoint - 1] + value_list[midpoint]) / 2
else:
median = value_list[midpoint]

return mean, median

if __name__ == "__main__":
value_lists = [1, 1, 2, 2, 3, 4, 5]
avg, median = mean_and_median(value_lists)
print("avg:", avg)
print("median:", median)
avg: 3.5
3.5
4
avg: 2.5714285714285716
median: 2
  • 데코레이터, 변수명 immutable or mutable
    context manager

함수 클로저 사용하기

  • global 함수 (전역 변수 변경)
1
2
3
4
5
6
7
x = 10
def foo():
x = 20
print(x)

print(x)
foo()
10
20
1
2
3
4
5
6
7
8
x = 10
def foo():
global x # 전역 변수를 설정하겠다
x = 20
print(x)

print(x)
foo()
10
20

Comment and share

1
print('hello world!')
hello world!

주석처리

  • 코드 작업 시, 특정 코드에 대해 설명
  • 사용자 정의 함수 작성 시, 클래스 작성 시.. (도움말 작성..)

변수 (Scalar)

  • 객체 (Object)로 구현이 됨
    • 하나의 자료형 (Type)을 가진다.
    • 클래스로 정의가 됨.
      • 다양한 함수들이 존재 함.

int

  • int 정수를 표한하는데 사용함.
1
2
3
num_int = 1
print(num_int)
print(type(num_int))
1
<class 'int'>

float

  • 실수를 표현하는데 사용한다.
1
2
3
num_float = 0.2
print(num_float)
print(type(num_float))
0.2
<class 'float'>

bool

  • True와 False로 나타내는 Boolean 값을 표현하는데 사용한다.
1
2
3
bool_true = True
print(bool_true)
print(type(bool_true))
True
<class 'bool'>

None

  • Null을 나타내는 자료형으로 Nonen이라는 한 가지 값만 가집니다.
1
2
3
none_x = None
print(none_x)
print(type(none_x))
None
<class 'NoneType'>

사칙연산

  • 정수형 사칙 연산
1
2
3
4
5
a = 2
b = 4
print('a + b = ', a + b)
print('a % b = ', a % b)
print('a / b = ', a / b) # 나누기를 했을때 type이 실수형(float)으로 바뀜.
a + b =  6
a % b =  2
a / b =  0.5

논리형 연산자

  • Bool 형은 True와 False 값으로 정의
  • AND / OR
1
2
3
4
5
6
7
8
9
10
11
x = 5 > 4
y = 3 > 4
print(x and x)
print(x and y)
print(y and x)
print(y and y)
print("-----")
print(x or x)
print(x or y)
print(y or x)
print(y or y)
True
False
False
False
-----
True
True
True
False

비교 연산자

  • 부등호를 의미합니다.
  • 비교 연산자를 True와 False값을 도출

논리 & 비교 연산자 응용

1
2
var = input("입력하여 주세요....")
print(type(var))
입력하여 주세요....5
<class 'str'>
  • 형변환을 해준다.
  • 문자열, 정수, 실수 등등
1
2
var = int("1")
print(type(var))
<class 'int'>
1
2
var = int(input("숫자를 입력하여 주세요"))
print(type(var))
숫자를 입력하여 주세요3
<class 'int'>
1
2
3
4
num1 = int(input("숫자를 입력하여 주세요"))
num2 = int(input("숫자를 입력하여 주세요"))

print(num1 > num2)
숫자를 입력하여 주세요10
숫자를 입력하여 주세요5
True
1
2
3
4
5
6
7
8
9
num1 = int(input("숫자를 입력하여 주세요"))
num2 = int(input("숫자를 입력하여 주세요"))
num3 = int(input("숫자를 입력하여 주세요"))
num4 = int(input("숫자를 입력하여 주세요"))

var1 = num11 >= num2 #True
var2 = num3 < num4 #True
print(var1 and var2)
print(var1 or var2)
숫자를 입력하여 주세요20
숫자를 입력하여 주세요15
숫자를 입력하여 주세요3
숫자를 입력하여 주세요5
True
True

변수 (Non Scalar)

  • 문자열을 입력
1
2
print("'Hello, World'")
print('"Hello, World"')
'Hello, World'
"Hello, World"

String 연산자

  • 덧셈 연산자를 써보자.
1
2
3
str1 = "Hello "
str2 = "World!"
print(str1 + str2)
Hello World!

Indexing

  • 문자열 인덱싱은 각각의 문자열 안에서 범위를 지정하여 특정 문자를 추린다
1
2
greeting = "Hello Kaggle!"
print(greeting[6])
K

리스트

  • 시퀀스 데이터 타입
  • 데이터에 순서가 존재하는지, 슬라이싱이 가능하는지
  • 대괄호 (‘[값]’)
1
2
3
4
5
6
7
8
9
10
11
12
a = [] # 빈 리스트
a_func = list() # 빈 리스트 생성
b = [1] # 숫자가 요소가 될 수 있다.
c = ['apple'] # 문자열도 요소가 될 수 있다.
d = [1, 2, ['apple']] # 리스트 안에 또 다른 리스트를 요소로 넣을 수 있다.

print(a)
print(a_func)
print(b)
print(c)
print(d)
print(type(d))
[]
[]
[1]
['apple']
[1, 2, ['apple']]
<class 'list'>

리스트 슬라이싱

1
2
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(a[0])
1
1
2
a =[ ["apple", "banana", "cherry"], 1]
print(a[0][2][2])
e
1
2
3
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(a[::-1]) # 역순
print(a[::2])
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[1, 3, 5, 7, 9]

리스트 연산자

1
2
3
4
5
a = ["john", "evan"]
b = ["alice", "eva"]

c = a + b
print(c)
['john', 'evan', 'alice', 'eva']
1
2
3
4
c = a * 3
d = b * 0
print("a * 3 = ", c)
print("b * 0 = ", d)
a * 3 =  ['john', 'evan', 'john', 'evan', 'john', 'evan']
b * 0 =  []

리스트 수정 및 삭제

1
2
3
a = [0, 1, 2]
a[1] = "b"
print(a)
[0, 'b', 2]

리스트 값 추가하기

1
2
3
4
5
6
7
8
9
a = [100, 200, 300]
a.append(400)
print(a)

# a.append([500, 600])
# print(a)

a.extend([500, 600])
print(a)
[100, 200, 300, 400]
[100, 200, 300, 400, 500, 600]
1
2
3
4
a = [0, 1, 2]
# a.insert(인덱스번호, 넣고자하는 값)
a.insert(1, 100)
print(a)
[0, 100, 1, 2]

리스트 값 삭제하기

1
2
3
4
5
a = [1, 2, 3, 4, "A"]
a.remove(1)
print(a)
a.remove("A")
print(a)
[2, 3, 4, 'A']
[2, 3, 4]
1
2
3
4
5
6
7
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

del a[1] # 인덱스 번호
print(a)

del a[1:5]
print(a)
[1, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 7, 8, 9, 10]
1
2
3
4
b = ["a", "b", "c", "d"]
x = b.pop()
print(x)
print(b)
d
['a', 'b', 'c']

그 외 메서드

1
2
3
4
5
a = [0, 1, 2, 3]
print(a)

a.clear()
print(a)
[0, 1, 2, 3]
[]
1
2
a = ["a", "a", "b", "b"]
print(a.index("b")) # 반복 되는 문구가 있을때 index를 쓰면 첫번째 문자의 위치가 출력.
2
1
2
3
4
5
6
7
8
9
a = [1, 4, 5, 2, 3]
b = [1, 4, 5, 2, 3]

a.sort()
print("a.sort():", a)

# 내림차순, sort()
b.sort(reverse = True)
print("sort(reverse = True): ", b)
a.sort(): [1, 2, 3, 4, 5]
None
sort(reverse = True):  [5, 4, 3, 2, 1]
1
2
c = [4, 3, 2, 'a']
# c.srot() 숫자와 문자는 정렬 불가.

튜플

  • List와 비슷하다.
  • 슬라이싱, 인덱싱 등등
  • (vs 리스트) : 튜플은 수정 삭제가 안된다.
1
2
3
4
5
6
tuple1 = (0) # 끝에 콤마(,)를 붙이지 않을 때
tuple2 = (0,) # 끝에 콤마(,)를 붙일때
tuple3 = 0, 1, 2
print(tuple1) # int
print(tuple2) # tuple
print(tuple3) # tuple
0
(0,)
(0, 1, 2)
1
2
3
4
5
a = (0, 1, 2, 3, 'a')
print(type(a))

# del a[4] TypeError: 'tuple' object doesn't support item deletion
# a[1] = "b"
<class 'tuple'>

튜플 인덱싱 및 슬라이싱 하기

1
2
3
4
a = (0, 1, 2, 3, "a")
print(a[1])
print(a[3])
print(a[4])
1
3
a

더하기 곱셈 연산자 사용

1
2
3
4
t1 = (0, 1, 2)
t2 = ('a','b')
print(t1 + t2)
print(t1 * 3)
(0, 1, 2, 'a', 'b')
(0, 1, 2, 0, 1, 2, 0, 1, 2)

딕셔너리

  • key-value 값으로 나뉨.
1
2
3
4
5
6
7
8
dict_01 = {'teacher' : 'evan',
'class' : 601,
'student' : 24,
'학생이름' : ['A','Z']}
# print(dict_01)
print(dict_01['teacher'])
print(dict_01['class'])
print(dict_01['학생이름'])
evan
601
['A', 'Z']
1
2
3
print(type(dict_01.keys()))
print(dict_01.keys())
print(list(dict_01.keys()))
<class 'dict_keys'>
dict_keys(['teacher', 'class', 'student', '학생이름'])
['teacher', 'class', 'student', '학생이름']
1
2
3
print(type(dict_01.values()))
print(dict_01.values())
print(list(dict_01.values()))
<class 'dict_values'>
dict_values(['evan', 601, 24, ['A', 'Z']])
['evan', 601, 24, ['A', 'Z']]
1
dict_01.items()
dict_items([('teacher', 'evan'), ('class', 601), ('student', 24), ('학생이름', ['A', 'Z'])])
1
2
3
print(dict_01.get("teacher", "값 없음"))
print(dict_01.get("선생님", "값 없음"))
print(dict_01.get("class"))
evan
값 없음
601

조건문 & 반복문

1
2
3
4
5
weather = "맑음"
if weather == "비":
print("우산을 가져간다.")
else:
print("우산을 가져가지 않는다.")
우산을 가져가지 않는다.
  • 등급표 만들기
  • 60점 이상 합격/불합격
  • 숫자는 아무거나 써도 상관없음
1
2
3
4
5
6
score =int(input("점수를 입력해주세요."))

if score >= 60:
print("합격")
else:
print("불합격")
점수를 입력해주세요.60
합격
1
2
3
4
5
6
7
8
9
10
11
12
# 90점 이상은 A등급
# 80점 이상은 B등급
# 나머지는 F등급

score = int(input("점수를 입력해주세요"))

if score >= 90:
print("A등급")
elif score >= 80:
print("B등급")
else:
print("F등급")
점수를 입력해주세요56
F등급

반복문

  • for 문
1
2
for i in range(3):
print(i + 1, "안녕하세요!")
1 안녕하세요!
2 안녕하세요!
3 안녕하세요!
1
2
3
4
5
6
7
8
9
count = range(50)
print(count)

for n in count:
print(str(n + 1) + "번째")
if (n + 1) == 5:
print("그만합시다!!")
break
print("축구 슈팅")
range(0, 50)
1번째
축구 슈팅
2번째
축구 슈팅
3번째
축구 슈팅
4번째
축구 슈팅
5번째
그만합시다!!
1
2
3
4
5
6
a = "hello"

for x in a:
if x == "l":
break
print(x)
h
e
1
2
3
alphabets = ['A', 'B', 'C']
for index, value in enumerate(alphabets):
print(index, value)
0 A
1 B
2 C
  • while문
1
2
3
4
n = 0
while n <10:
n += 1
print("%d번째 인사입니다." % n)
1번째 인사입니다.
2번째 인사입니다.
3번째 인사입니다.
4번째 인사입니다.
5번째 인사입니다.
6번째 인사입니다.
7번째 인사입니다.
8번째 인사입니다.
9번째 인사입니다.
10번째 인사입니다.

Comment and share

Author's picture

Winters

개발자를 꿈꾸는 어른이


개발자(예비)


대한민국/서울