판다스 튜토리얼 2

Mar 25, 2022 in Education

라이브러리 불러오기

pandas 라이브러리 불러오기, supermarket_sales.csv 파일 불러오기

1
2
3

import pandas as pd 
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive

1
2
3

DATA_PATH = "/content/drive/MyDrive/Colab Notebooks/data/supermarket_sales.csv"
sales = pd.read_csv(DATA_PATH)
sales

	Invoice ID	Branch	City	Customer type	Gender	Product line	Unit price	Quantity	Date	Time	Payment
0	750-67-8428	A	Yangon	Member	Female	Health and beauty	74.69	7	1/5/2019	13:08	Ewallet
1	226-31-3081	C	Naypyitaw	Normal	Female	Electronic accessories	15.28	5	3/8/2019	10:29	Cash
2	631-41-3108	A	Yangon	Normal	Male	Home and lifestyle	46.33	7	3/3/2019	13:23	Credit card
3	123-19-1176	A	Yangon	Member	Male	Health and beauty	58.22	8	1/27/2019	20:33	Ewallet
4	373-73-7910	A	Yangon	Normal	Male	Sports and travel	86.31	7	2/8/2019	10:37	Ewallet
...	...	...	...	...	...	...	...	...	...	...	...
995	233-67-5758	C	Naypyitaw	Normal	Male	Health and beauty	40.35	1	1/29/2019	13:46	Ewallet
996	303-96-2227	B	Mandalay	Normal	Female	Home and lifestyle	97.38	10	3/2/2019	17:16	Ewallet
997	727-02-1313	A	Yangon	Member	Male	Food and beverages	31.84	1	2/9/2019	13:22	Cash
998	347-56-2442	A	Yangon	Normal	Male	Home and lifestyle	65.82	1	2/22/2019	15:33	Cash
999	849-09-3807	A	Yangon	Member	Female	Fashion accessories	88.34	7	2/18/2019	13:28	Cash

1000 rows × 11 columns

  <script>
    const buttonEl =
      document.querySelector('#df-5de646d0-967f-4dc5-a51f-d7ef279760fb button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-5de646d0-967f-4dc5-a51f-d7ef279760fb');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Invoice ID     1000 non-null   object 
 1   Branch         1000 non-null   object 
 2   City           1000 non-null   object 
 3   Customer type  1000 non-null   object 
 4   Gender         1000 non-null   object 
 5   Product line   1000 non-null   object 
 6   Unit price     1000 non-null   float64
 7   Quantity       1000 non-null   int64  
 8   Date           1000 non-null   object 
 9   Time           1000 non-null   object 
 10  Payment        1000 non-null   object 
dtypes: float64(1), int64(1), object(9)
memory usage: 86.1+ KB

Groupy by

(동의어) 집계함수를 배운다.

1	sales['Invoice ID'].value_counts()

750-67-8428    1
642-61-4706    1
816-72-8853    1
491-38-3499    1
322-02-2271    1
              ..
633-09-3463    1
374-17-3652    1
378-07-7001    1
433-75-6987    1
849-09-3807    1
Name: Invoice ID, Length: 1000, dtype: int64

1	sales.groupby('Customer type')['Quantity'].sum()

Customer type
Member    2785
Normal    2725
Name: Quantity, dtype: int64

1	sales.groupby(['Customer type', 'Branch', 'Payment'])['Quantity'].sum()

Customer type  Branch  Payment    
Member         A       Cash           308
                       Credit card    282
                       Ewallet        374
               B       Cash           284
                       Credit card    371
                       Ewallet        269
               C       Cash           293
                       Credit card    349
                       Ewallet        255
Normal         A       Cash           264
                       Credit card    298
                       Ewallet        333
               B       Cash           344
                       Credit card    228
                       Ewallet        324
               C       Cash           403
                       Credit card    194
                       Ewallet        337
Name: Quantity, dtype: int64

1	print(type(sales.groupby(['Customer type', 'Branch', 'Payment'])['Quantity'].sum()))

<class 'pandas.core.series.Series'>

1	sales.groupby(['Customer type', 'Branch', 'Payment'], as_index=False)['Quantity'].agg(['sum', 'mean']).reset_index()

	Customer type	Branch	Payment	sum	mean
0	Member	A	Cash	308	5.500000
1	Member	A	Credit card	282	5.755102
2	Member	A	Ewallet	374	6.032258
3	Member	B	Cash	284	5.358491
4	Member	B	Credit card	371	5.888889
5	Member	B	Ewallet	269	5.489796
6	Member	C	Cash	293	4.966102
7	Member	C	Credit card	349	5.816667
8	Member	C	Ewallet	255	5.100000
9	Normal	A	Cash	264	4.888889
10	Normal	A	Credit card	298	5.418182
11	Normal	A	Ewallet	333	5.203125
12	Normal	B	Cash	344	6.035088
13	Normal	B	Credit card	228	4.956522
14	Normal	B	Ewallet	324	5.062500
15	Normal	C	Cash	403	6.200000
16	Normal	C	Credit card	194	5.105263
17	Normal	C	Ewallet	337	6.017857

  <script>
    const buttonEl =
      document.querySelector('#df-06183b9d-9668-4956-b5b7-bf3ab9256622 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-06183b9d-9668-4956-b5b7-bf3ab9256622');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	print(type(sales.groupby(['Customer type', 'Branch', 'Payment'])['Quantity'].agg(['sum', 'mean'])))

<class 'pandas.core.frame.DataFrame'>

결측치 다루기

결측치 데이터 생성

import pandas as pd 
import numpy as np 

dict_01 = {
    'Score_A' : [80, 90, np.nan, 80], 
    'Score_B' : [30, 45, np.nan, np.nan], 
    'Score_C' : [np.nan, 50, 80, 90]
}

df = pd.DataFrame(dict_01)
df

	Score_A	Score_B	Score_C
0	80.0	30.0	NaN
1	90.0	45.0	50.0
2	NaN	NaN	80.0
3	80.0	NaN	90.0

  <script>
    const buttonEl =
      document.querySelector('#df-19e2d3df-d166-4556-96a3-7e87c689f899 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-19e2d3df-d166-4556-96a3-7e87c689f899');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	df.isnull().sum()

Score_A    1
Score_B    2
Score_C    1
dtype: int64

1	df.fillna("0")

	Score_A	Score_B	Score_C
0	80.0	30.0	0
1	90.0	45.0	50.0
2	0	0	80.0
3	80.0	0	90.0

  <script>
    const buttonEl =
      document.querySelector('#df-8fdcaeb0-1a6d-42b3-a659-2ed1ac98f3c4 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-8fdcaeb0-1a6d-42b3-a659-2ed1ac98f3c4');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	df.fillna(method="pad")

	Score_A	Score_B	Score_C
0	80.0	30.0	NaN
1	90.0	45.0	50.0
2	90.0	45.0	80.0
3	80.0	45.0	90.0

  <script>
    const buttonEl =
      document.querySelector('#df-0beb1959-e8a4-4550-b78e-1da677eaf1f7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-0beb1959-e8a4-4550-b78e-1da677eaf1f7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

dict_01 = {
    "성별" : ["남자", "여자", np.nan, "남자"], 
    "Salary" : [30, 45, 90, 70]
}

df = pd.DataFrame(dict_01)
df

	성별	Salary
0	남자	30
1	여자	45
2	NaN	90
3	남자	70

  <script>
    const buttonEl =
      document.querySelector('#df-cd4f0fa9-4594-42f6-b011-ec8d9c313a45 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-cd4f0fa9-4594-42f6-b011-ec8d9c313a45');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	df['성별'].fillna("성별 없음")

0       남자
1       여자
2    성별 없음
3       남자
Name: 성별, dtype: object

결측치

–> 문자열 타입이랑 / 숫자 타입이랑 접근 방법이 다름
–> 문자열 (빈도 –> 가장 많이 나타나는 문자열 넣어주기!, 최빈값)
–> 숫자열 (평균, 최대, 최소, 중간, 기타 등등..)

import pandas as pd 
import numpy as np 

dict_01 = {
    'Score_A' : [80, 90, np.nan, 80], 
    'Score_B' : [30, 45, np.nan, 60], 
    'Score_C' : [np.nan, 50, 80, 90], 
    'Score_D' : [50, 30, 80, 60]
}

df = pd.DataFrame(dict_01)
df

	Score_A	Score_B	Score_C	Score_D
0	80.0	30.0	NaN	50
1	90.0	45.0	50.0	30
2	NaN	NaN	80.0	80
3	80.0	60.0	90.0	60

  <script>
    const buttonEl =
      document.querySelector('#df-4285262a-db62-49be-9c8d-273186ab08c4 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-4285262a-db62-49be-9c8d-273186ab08c4');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	df.dropna(axis = 1)

	Score_D
0	50
1	30
2	80
3	60

  <script>
    const buttonEl =
      document.querySelector('#df-0e190d15-cdf3-464b-bcfb-9ef6867cfba1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-0e190d15-cdf3-464b-bcfb-9ef6867cfba1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	df.dropna(axis = 0)

	Score_A	Score_B	Score_C	Score_D
1	90.0	45.0	50.0	30
3	80.0	60.0	90.0	60

  <script>
    const buttonEl =
      document.querySelector('#df-c4cae1e4-71a0-446b-8e51-6af06230d2ee button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-c4cae1e4-71a0-446b-8e51-6af06230d2ee');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

이상치

sales

	Invoice ID	Branch	City	Customer type	Gender	Product line	Unit price	Quantity	Date	Time	Payment
0	750-67-8428	A	Yangon	Member	Female	Health and beauty	74.69	7	1/5/2019	13:08	Ewallet
1	226-31-3081	C	Naypyitaw	Normal	Female	Electronic accessories	15.28	5	3/8/2019	10:29	Cash
2	631-41-3108	A	Yangon	Normal	Male	Home and lifestyle	46.33	7	3/3/2019	13:23	Credit card
3	123-19-1176	A	Yangon	Member	Male	Health and beauty	58.22	8	1/27/2019	20:33	Ewallet
4	373-73-7910	A	Yangon	Normal	Male	Sports and travel	86.31	7	2/8/2019	10:37	Ewallet
...	...	...	...	...	...	...	...	...	...	...	...
995	233-67-5758	C	Naypyitaw	Normal	Male	Health and beauty	40.35	1	1/29/2019	13:46	Ewallet
996	303-96-2227	B	Mandalay	Normal	Female	Home and lifestyle	97.38	10	3/2/2019	17:16	Ewallet
997	727-02-1313	A	Yangon	Member	Male	Food and beverages	31.84	1	2/9/2019	13:22	Cash
998	347-56-2442	A	Yangon	Normal	Male	Home and lifestyle	65.82	1	2/22/2019	15:33	Cash
999	849-09-3807	A	Yangon	Member	Female	Fashion accessories	88.34	7	2/18/2019	13:28	Cash

1000 rows × 11 columns

  <script>
    const buttonEl =
      document.querySelector('#df-6f6e6109-861d-4be9-ba85-95e41759ab5e button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6f6e6109-861d-4be9-ba85-95e41759ab5e');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

일반적인 통계적인 공식
IQR - 박스플롯 - 사분위수
Q0(0), Q1(25%), Q2(50%), Q3(75%), Q4(100%)
이상치의 하한 경계값 : Q1 - (1.5 * (Q3-Q1))
이상치의 상한 경계값 : Q3 + (1.5 * (Q3-Q1))
도메인(각 비즈니스 영역, 미래 일자리)에서 바라보는 이상치 기준(관습)

1	sales[['Unit price']].describe()

	Unit price
count	1000.000000
mean	55.672130
std	26.494628
min	10.080000
25%	32.875000
50%	55.230000
75%	77.935000
max	99.960000

  <script>
    const buttonEl =
      document.querySelector('#df-9b6fc1ab-b0b3-47ba-b180-b37f4ec43d85 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-9b6fc1ab-b0b3-47ba-b180-b37f4ec43d85');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Q1 = sales['Unit price'].quantile(0.25)
Q3 = sales['Unit price'].quantile(0.75)

# Q1보다 낮은 값을 이상치로 간주 
outliers_q1 = (sales['Unit price'] < Q1)

# Q3보다 높은 값을 이상치로 간주
outliers_q3 = (sales['Unit price'] > Q3)

1	print(sales['Unit price'][~(outliers_q1 \| outliers_q3)])

0      74.69
2      46.33
3      58.22
6      68.84
7      73.56
       ...  
991    76.60
992    58.03
994    60.95
995    40.35
998    65.82
Name: Unit price, Length: 500, dtype: float64

Comment and share

visualization 튜토리얼 1

Mar 25, 2022 in Education

라이브 러리 불러오기

import matplotlib
import seaborn as sns
print(matplotlib.__version__)
print(sns.__version__)

3.2.2
0.11.2

시각화 그려보기

import matplotlib.pyplot as plt

dates = [
    '2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05',
    '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10'
]
min_temperature = [20.7, 17.9, 18.8, 14.6, 15.8, 15.8, 15.8, 17.4, 21.8, 20.0]
max_temperature = [34.7, 28.9, 31.8, 25.6, 28.8, 21.8, 22.8, 28.4, 30.8, 32.0]

# 앞으로 여러분들이 아래와 같이 코드를 작성해주시면 됩니다.
flg, ax = plt.subplots(nrows = 1, ncols = 1, figsize = (10, 6)) # 시각화 기초 해심



ax.plot(dates, min_temperature, label = "Min Temp.")
ax.plot(dates, max_temperature, label = "Max Temp.")
ax.legend()
plt.show()

png

1	!pip install yfinance --upgrade --no-cache-dir

Collecting yfinance
  Downloading yfinance-0.1.70-py2.py3-none-any.whl (26 kB)
Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.7/dist-packages (from yfinance) (1.21.5)
Collecting lxml>=4.5.1
  Downloading lxml-4.8.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
[K     |████████████████████████████████| 6.4 MB 16.4 MB/s 
[?25hCollecting requests>=2.26
  Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 5.9 MB/s 
[?25hRequirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from yfinance) (1.3.5)
Requirement already satisfied: multitasking>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from yfinance) (0.0.10)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->yfinance) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->yfinance) (2.8.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.24.0->yfinance) (1.15.0)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.26->yfinance) (2.10)
Installing collected packages: requests, lxml, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.27.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
Successfully installed lxml-4.8.0 requests-2.27.1 yfinance-0.1.70

import yfinance as yf
data = yf.download("AAPL", start = "2019-08-01", end = "2022-03-23")
ts = data['Open']
print(ts.head())
print(type(ts))

[*********************100%***********************]  1 of 1 completed
Date
2019-08-01    53.474998
2019-08-02    51.382500
2019-08-05    49.497501
2019-08-06    49.077499
2019-08-07    48.852501
Name: Open, dtype: float64
<class 'pandas.core.series.Series'>

pyplot 형태

import matplotlib.pyplot as plt
plt.plot(ts)
plt.title("Stock Market of AAPL") # 구글 코랩에서 한글 타이틀은 인식을 못하여 나중에 세팅해야함
plt.xlabel("Date")
plt.ylabel("Open Price")
plt.show()

png

import matplotlib.pyplot as plt

fig, ax = plt.subplots() #fig 는 겉 테두리
ax.plot(ts)
ax.set_title("Stock Market of AAPL")
ax.set_xlabel("Date")
ax.set_ylabel("Open Price")
plt.show()

png

막대 그래프

1	calendar.month_name[1:13]

['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

import matplotlib.pyplot as plt
import numpy as np
import calendar # 날짜를 관장하는 라이브러리

month_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
sold_list = [300, 400, 550, 900, 600, 960, 900, 910, 800, 700, 550, 450]

fig, ax = plt.subplots(figsize = (10, 6))
barplots = ax.bar(month_list, sold_list)

print("barplots :", barplots)

for plot in barplots:
  print(plot)
  # print(plot.get_height())
  # print(plot.get_x())
  # print(plot.get_y())
  # print(plot.get_width())
  height = plot.get_height()
  ax.text(plot.get_x() + plot.get_width()/2., height, height, ha = 'center', va = 'bottom')

plt.xticks(month_list, calendar.month_name[1:13], rotation = 90)
plt.show()

barplots : <BarContainer object of 12 artists>
Rectangle(xy=(0.6, 0), width=0.8, height=300, angle=0)
Rectangle(xy=(1.6, 0), width=0.8, height=400, angle=0)
Rectangle(xy=(2.6, 0), width=0.8, height=550, angle=0)
Rectangle(xy=(3.6, 0), width=0.8, height=900, angle=0)
Rectangle(xy=(4.6, 0), width=0.8, height=600, angle=0)
Rectangle(xy=(5.6, 0), width=0.8, height=960, angle=0)
Rectangle(xy=(6.6, 0), width=0.8, height=900, angle=0)
Rectangle(xy=(7.6, 0), width=0.8, height=910, angle=0)
Rectangle(xy=(8.6, 0), width=0.8, height=800, angle=0)
Rectangle(xy=(9.6, 0), width=0.8, height=700, angle=0)
Rectangle(xy=(10.6, 0), width=0.8, height=550, angle=0)
Rectangle(xy=(11.6, 0), width=0.8, height=450, angle=0)

png

import seaborn as sns

tips = sns.load_dataset("tips")
print(tips.info())
x = tips['total_bill']
y = tips['tip']

# 산점도
fig, ax = plt.subplots(figsize = (10,6))
ax.scatter(x, y)
ax.set_xlabel('Total Bill')
ax.set_ylabel('Tip')
plt.show()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB
None

png

label, data = tips.groupby('sex')
# print(label)
# print(data)

tips['sex_color'] = tips['sex'].map({'Male': '#2521F6', 'Female': '#EB4036'})
# print(tips.head())

fig, ax = plt.subplots(figsize = (10, 6))
for label, data in tips.groupby('sex'):
  ax.scatter(data['total_bill'], data['tip'], label = label, color = data['sex_color'], alpha = 0.5)
  ax.set_xlabel('Total Bill')
  ax.set_ylabel('Tip')

ax.legend() # 범례
plt.show()

png

Seaborn

from IPython.core.pylabtools import figsize
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset("tips")
# print(tips.info())

fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'sex', data = tips)
plt.show()

png

# 두개의 그래프를 동시에 표현
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 5))
sns.regplot(x = "total_bill", y = "tip", data = tips , ax = ax[1], fit_reg = True)
ax[1].set_title("with linear regression line")

sns.regplot(x = "total_bill", y = "tip", data = tips , ax = ax[0], fit_reg = False)
ax[0].set_title("without linear regression line")
plt.show()

png

막대 그래프 그리기 seaborn 방신

1 2	sns.countplot(x = "day", data = tips) plt.show()

png

1
2
3

print(tips['day'].value_counts().index)
print(tips['day'].value_counts().values)
print(tips['day'].value_counts(ascending=True)) # 오름차순

CategoricalIndex(['Sat', 'Sun', 'Thur', 'Fri'], categories=['Thur', 'Fri', 'Sat', 'Sun'], ordered=False, dtype='category')
[87 76 62 19]
Fri     19
Thur    62
Sun     76
Sat     87
Name: day, dtype: int64

fig, ax = plt.subplots()
ax = sns.countplot(x = "day", data = tips, order = tips['day'].value_counts().index)

for plot in ax.patches:
  # print(plot)
  height = plot.get_height()
  ax.text(plot.get_x() + plot.get_width()/2., height, height, ha = 'center', va = 'bottom')

ax.set_ylim(-5, 100) # y축 값 변경
plt.show()

png

Comment and share

판다스 튜토리얼 1

Mar 23, 2022 in Education

라이브러리 불러오기

1 2	import pandas as pd print(pd.__version__)

1.3.5

테스트

temp_dic = {"coll" : [1, 2, 3],
            "col2" : [3, 4, 5]}
df = pd.DataFrame(temp_dic)
print(type(df))
print(df)

<class 'pandas.core.frame.DataFrame'>
   coll  col2
0     1     3
1     2     4
2     3     5

temp_dic = {'a' : 1 , "b" : 2, "c" : 3}
ser = pd.Series(temp_dic)
print(type(ser))
print(ser)

<class 'pandas.core.series.Series'>
a    1
b    2
c    3
dtype: int64

구글 드라이브 연동

1 2	from google.colab import drive drive.mount('/content/drive')

Mounted at /content/drive

1
2
3

DATA_PATH = '/content/drive/MyDrive/Colab Notebooks/data/Lemonade2016.csv'
juice = pd.read_csv(DATA_PATH)
juice

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price
0	7/1/2016	Park	97	67	70	90.0	0.25
1	7/2/2016	Park	98	67	72	90.0	0.25
2	7/3/2016	Park	110	77	71	104.0	0.25
3	7/4/2016	Beach	134	99	76	98.0	0.25
4	7/5/2016	Beach	159	118	78	135.0	0.25
5	7/6/2016	Beach	103	69	82	90.0	0.25
6	7/6/2016	Beach	103	69	82	90.0	0.25
7	7/7/2016	Beach	143	101	81	135.0	0.25
8	NaN	Beach	123	86	82	113.0	0.25
9	7/9/2016	Beach	134	95	80	126.0	0.25
10	7/10/2016	Beach	140	98	82	131.0	0.25
11	7/11/2016	Beach	162	120	83	135.0	0.25
12	7/12/2016	Beach	130	95	84	99.0	0.25
13	7/13/2016	Beach	109	75	77	99.0	0.25
14	7/14/2016	Beach	122	85	78	113.0	0.25
15	7/15/2016	Beach	98	62	75	108.0	0.50
16	7/16/2016	Beach	81	50	74	90.0	0.50
17	7/17/2016	Beach	115	76	77	126.0	0.50
18	7/18/2016	Park	131	92	81	122.0	0.50
19	7/19/2016	Park	122	85	78	113.0	0.50
20	7/20/2016	Park	71	42	70	NaN	0.50
21	7/21/2016	Park	83	50	77	90.0	0.50
22	7/22/2016	Park	112	75	80	108.0	0.50
23	7/23/2016	Park	120	82	81	117.0	0.50
24	7/24/2016	Park	121	82	82	117.0	0.50
25	7/25/2016	Park	156	113	84	135.0	0.50
26	7/26/2016	Park	176	129	83	158.0	0.35
27	7/27/2016	Park	104	68	80	99.0	0.35
28	7/28/2016	Park	96	63	82	90.0	0.35
29	7/29/2016	Park	100	66	81	95.0	0.35
30	7/30/2016	Beach	88	57	82	81.0	0.35
31	7/31/2016	Beach	76	47	82	68.0	0.35

  <script>
    const buttonEl =
      document.querySelector('#df-27fb38f5-8b87-4f93-8b5a-9f5a79087b29 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-27fb38f5-8b87-4f93-8b5a-9f5a79087b29');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

데이터를 불러왔다.
첫번째 파악해야 하는 것 = 데이터 구조 파악

1 2	juice.info() # info = DataFrame 안에 있는 method # 결측치가 있으면 Non-Null Count 개수가 다름.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         31 non-null     object 
 1   Location     32 non-null     object 
 2   Lemon        32 non-null     int64  
 3   Orange       32 non-null     int64  
 4   Temperature  32 non-null     int64  
 5   Leaflets     31 non-null     float64
 6   Price        32 non-null     float64
dtypes: float64(2), int64(3), object(2)
memory usage: 1.9+ KB

1	juice.head(10) # 위에서부터 5개까지, ()안에 숫자를 넣으면 그 숫자까지 데이터를 불러옴

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price
0	7/1/2016	Park	97	67	70	90.0	0.25
1	7/2/2016	Park	98	67	72	90.0	0.25
2	7/3/2016	Park	110	77	71	104.0	0.25
3	7/4/2016	Beach	134	99	76	98.0	0.25
4	7/5/2016	Beach	159	118	78	135.0	0.25
5	7/6/2016	Beach	103	69	82	90.0	0.25
6	7/6/2016	Beach	103	69	82	90.0	0.25
7	7/7/2016	Beach	143	101	81	135.0	0.25
8	NaN	Beach	123	86	82	113.0	0.25
9	7/9/2016	Beach	134	95	80	126.0	0.25

  <script>
    const buttonEl =
      document.querySelector('#df-6f968857-ee36-4309-96a8-22630fa0efd6 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6f968857-ee36-4309-96a8-22630fa0efd6');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	juice.tail() # 아래에서 부터 5개

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price
27	7/27/2016	Park	104	68	80	99.0	0.35
28	7/28/2016	Park	96	63	82	90.0	0.35
29	7/29/2016	Park	100	66	81	95.0	0.35
30	7/30/2016	Beach	88	57	82	81.0	0.35
31	7/31/2016	Beach	76	47	82	68.0	0.35

  <script>
    const buttonEl =
      document.querySelector('#df-6733b4da-1377-4661-8b0c-d798794900ed button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6733b4da-1377-4661-8b0c-d798794900ed');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Describe() 함수
기술통계량 확인 해주는 함수

1	juice.describe() # type(juice.describe()) 항상 데이터 타입 확인.

	Lemon	Orange	Temperature	Leaflets	Price
count	32.000000	32.000000	32.000000	31.000000	32.000000
mean	116.156250	80.000000	78.968750	108.548387	0.354687
std	25.823357	21.863211	4.067847	20.117718	0.113137
min	71.000000	42.000000	70.000000	68.000000	0.250000
25%	98.000000	66.750000	77.000000	90.000000	0.250000
50%	113.500000	76.500000	80.500000	108.000000	0.350000
75%	131.750000	95.000000	82.000000	124.000000	0.500000
max	176.000000	129.000000	84.000000	158.000000	0.500000

  <script>
    const buttonEl =
      document.querySelector('#df-f9f0ce1e-fc97-44be-914b-2a73ca101631 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-f9f0ce1e-fc97-44be-914b-2a73ca101631');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

value_counts()

1 2	print(juice['Location'].value_counts()) # 기초 통계량으로는 빈도만 확인 할수 있으니 value counts()함수를 사용 print(type(juice['Location'].value_counts()))

Beach    17
Park     15
Name: Location, dtype: int64
<class 'pandas.core.series.Series'>

데이터 다뤄보기

행과 열을 핸들링 해보자.

1 2	juice['Sold'] = 0 # 새로운 데이터 추가 print(juice.head(3))

       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25     0
1  7/2/2016     Park     98      67           72      90.0   0.25     0
2  7/3/2016     Park    110      77           71     104.0   0.25     0

1 2	juice['Sold'] = juice['Lemon'] + juice['Orange'] print(juice.head(3))

       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25   164
1  7/2/2016     Park     98      67           72      90.0   0.25   165
2  7/3/2016     Park    110      77           71     104.0   0.25   187

매출액 = 가격 * 판매량

1
2
3

# juice['Revenue'] = 0 생략 가능
juice['Revenue'] = juice['Price'] * juice['Sold']
print(juice.head(3))

       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold  \
0  7/1/2016     Park     97      67           70      90.0   0.25   164   
1  7/2/2016     Park     98      67           72      90.0   0.25   165   
2  7/3/2016     Park    110      77           71     104.0   0.25   187   

   Revenue  
0    41.00  
1    41.25  
2    46.75

drop(axis = 0 | 1)
- axis를 0으로 설정 시, 행(=index)방향으로 drop() 실행
- axis를 1로 설정 시, 열방향으로 drop 수행함.

1 2	juice_column_drop = juice.drop('Sold', axis = 1) print(juice_column_drop.head(3))

       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Revenue
0  7/1/2016     Park     97      67           70      90.0   0.25    41.00
1  7/2/2016     Park     98      67           72      90.0   0.25    41.25
2  7/3/2016     Park    110      77           71     104.0   0.25    46.75

1 2	juice_row_drop = juice.drop(0, axis = 0) print(juice_row_drop.head(3))

       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold  \
1  7/2/2016     Park     98      67           72      90.0   0.25   165   
2  7/3/2016     Park    110      77           71     104.0   0.25   187   
3  7/4/2016    Beach    134      99           76      98.0   0.25   233   

   Revenue  
1    41.25  
2    46.75  
3    58.25

데이터 인덱싱

1	juice[4:8]

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue
4	7/5/2016	Beach	159	118	78	135.0	0.25	277	69.25
5	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00
6	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00
7	7/7/2016	Beach	143	101	81	135.0	0.25	244	61.00

  <script>
    const buttonEl =
      document.querySelector('#df-39a9c573-48e3-4700-b5bb-c24950ba4f3f button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-39a9c573-48e3-4700-b5bb-c24950ba4f3f');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

boolean 값을 활용한 데이터 추출

1
2
3

# location이 Beach인 경우
# juice['Location'].value_counts()
juice[juice['Leaflets'] >= 100]

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
2	7/3/2016	Park	110	77	71	104.0	0.25	187	46.75	Beach
4	7/5/2016	Beach	159	118	78	135.0	0.25	277	69.25	Beach
7	7/7/2016	Beach	143	101	81	135.0	0.25	244	61.00	Beach
8	NaN	Beach	123	86	82	113.0	0.25	209	52.25	Beach
9	7/9/2016	Beach	134	95	80	126.0	0.25	229	57.25	Beach
10	7/10/2016	Beach	140	98	82	131.0	0.25	238	59.50	Beach
11	7/11/2016	Beach	162	120	83	135.0	0.25	282	70.50	Beach
14	7/14/2016	Beach	122	85	78	113.0	0.25	207	51.75	Beach
15	7/15/2016	Beach	98	62	75	108.0	0.50	160	80.00	Beach
17	7/17/2016	Beach	115	76	77	126.0	0.50	191	95.50	Beach
18	7/18/2016	Park	131	92	81	122.0	0.50	223	111.50	Beach
19	7/19/2016	Park	122	85	78	113.0	0.50	207	103.50	Beach
22	7/22/2016	Park	112	75	80	108.0	0.50	187	93.50	Beach
23	7/23/2016	Park	120	82	81	117.0	0.50	202	101.00	Beach
24	7/24/2016	Park	121	82	82	117.0	0.50	203	101.50	Beach
25	7/25/2016	Park	156	113	84	135.0	0.50	269	134.50	Beach
26	7/26/2016	Park	176	129	83	158.0	0.35	305	106.75	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-39bd9f29-5daa-48f0-bd59-c71dc37c6437 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-39bd9f29-5daa-48f0-bd59-c71dc37c6437');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

iloc vs loc

차이를 확인한다!

1	juice.head(3)

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
0	7/1/2016	Park	97	67	70	90.0	0.25	164	41.00	Beach
1	7/2/2016	Park	98	67	72	90.0	0.25	165	41.25	Beach
2	7/3/2016	Park	110	77	71	104.0	0.25	187	46.75	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-10e12aa4-8e41-4c4f-91a6-be55e247dea1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-10e12aa4-8e41-4c4f-91a6-be55e247dea1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

%%time

# juice.iloc[:, 0:2]
juice.iloc[0:3, 0:2]

CPU times: user 652 µs, sys: 0 ns, total: 652 µs
Wall time: 653 µs

	Date	Location
0	7/1/2016	Park
1	7/2/2016	Park
2	7/3/2016	Park

  <script>
    const buttonEl =
      document.querySelector('#df-3556673a-432a-46c9-b3b9-d55c0796a5c1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-3556673a-432a-46c9-b3b9-d55c0796a5c1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

loc
라벨 기반!

1
2
3

%%time

juice.loc[0:2, ['Date','Location']]

CPU times: user 1.56 ms, sys: 0 ns, total: 1.56 ms
Wall time: 1.5 ms

	Date	Location
0	7/1/2016	Park
1	7/2/2016	Park
2	7/3/2016	Park

  <script>
    const buttonEl =
      document.querySelector('#df-7f178606-9630-45b5-bb8e-006efc9d78d3 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-7f178606-9630-45b5-bb8e-006efc9d78d3');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

데이터, 컬럼명 동시에 별도 추출 (iloc만 가능)

1	juice.loc[juice['Leaflets'] >= 100, ['Date', 'Location']]

	Date	Location
2	7/3/2016	Park
4	7/5/2016	Beach
7	7/7/2016	Beach
8	NaN	Beach
9	7/9/2016	Beach
10	7/10/2016	Beach
11	7/11/2016	Beach
14	7/14/2016	Beach
15	7/15/2016	Beach
17	7/17/2016	Beach
18	7/18/2016	Park
19	7/19/2016	Park
22	7/22/2016	Park
23	7/23/2016	Park
24	7/24/2016	Park
25	7/25/2016	Park
26	7/26/2016	Park

  <script>
    const buttonEl =
      document.querySelector('#df-f18c55cb-276f-414b-b983-0362c3463c87 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-f18c55cb-276f-414b-b983-0362c3463c87');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	juice.loc[juice['Leaflets'] >= 100, 0:2]

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-70-46f78a7ec2bf> in <module>()
----> 1 juice.loc[juice['Leaflets'] >= 100, 0:2]


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1107             return self._multi_take(tup)
   1108 
-> 1109         return self._getitem_tuple_same_dim(tup)
   1110 
   1111     def _get_label(self, label, axis: int):


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
    804                 continue
    805 
--> 806             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    807             # We should never have retval.ndim < self.ndim, as that should
    808             #  be handled by the _getitem_lowerdim call above.


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1140         if isinstance(key, slice):
   1141             self._validate_key(key, axis)
-> 1142             return self._get_slice_axis(key, axis=axis)
   1143         elif com.is_bool_indexer(key):
   1144             return self._getbool_axis(key, axis=axis)


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1174 
   1175         labels = obj._get_axis(axis)
-> 1176         indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1177 
   1178         if isinstance(indexer, slice):


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
   5683         slice(1, 3, None)
   5684         """
-> 5685         start_slice, end_slice = self.slice_locs(start, end, step=step)
   5686 
   5687         # return a slice


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
   5885         start_slice = None
   5886         if start is not None:
-> 5887             start_slice = self.get_slice_bound(start, "left")
   5888         if start_slice is None:
   5889             start_slice = 0


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   5795         # For datetime indices label may be a string that has to be converted
   5796         # to datetime boundary according to its resolution.
-> 5797         label = self._maybe_cast_slice_bound(label, side)
   5798 
   5799         # we need to look up the label


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in _maybe_cast_slice_bound(self, label, side, kind)
   5747         # reject them, if index does not contain label
   5748         if (is_float(label) or is_integer(label)) and label not in self._values:
-> 5749             raise self._invalid_indexer("slice", label)
   5750 
   5751         return label


TypeError: cannot do slice indexing on Index with these indexers [0] of type int

정렬

sort.values()

1	juice.sort_values(by = ['Revenue']).head(3) # 오름차순

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
0	7/1/2016	Park	97	67	70	90.0	0.25	164	41.00	Beach
1	7/2/2016	Park	98	67	72	90.0	0.25	165	41.25	Beach
6	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-8cc684a9-4a06-4feb-934c-53ece2672c41 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-8cc684a9-4a06-4feb-934c-53ece2672c41');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	juice.sort_values(by = ['Revenue'], ascending=False).head(3) # 내림차순

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
25	7/25/2016	Park	156	113	84	135.0	0.50	269	134.50	Beach
18	7/18/2016	Park	131	92	81	122.0	0.50	223	111.50	Beach
26	7/26/2016	Park	176	129	83	158.0	0.35	305	106.75	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-e6dc6ec1-85e1-4d62-a618-1841af7f4df7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-e6dc6ec1-85e1-4d62-a618-1841af7f4df7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1	juice.sort_values(by = ['Price', 'Temperature'], ascending=False) # 그룹화(0.5일때 나열, 0.35일때 나열)

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
25	7/25/2016	Park	156	113	84	135.0	0.50	269	134.50	Beach
24	7/24/2016	Park	121	82	82	117.0	0.50	203	101.50	Beach
18	7/18/2016	Park	131	92	81	122.0	0.50	223	111.50	Beach
23	7/23/2016	Park	120	82	81	117.0	0.50	202	101.00	Beach
22	7/22/2016	Park	112	75	80	108.0	0.50	187	93.50	Beach
19	7/19/2016	Park	122	85	78	113.0	0.50	207	103.50	Beach
17	7/17/2016	Beach	115	76	77	126.0	0.50	191	95.50	Beach
21	7/21/2016	Park	83	50	77	90.0	0.50	133	66.50	Beach
15	7/15/2016	Beach	98	62	75	108.0	0.50	160	80.00	Beach
16	7/16/2016	Beach	81	50	74	90.0	0.50	131	65.50	Beach
20	7/20/2016	Park	71	42	70	NaN	0.50	113	56.50	Beach
26	7/26/2016	Park	176	129	83	158.0	0.35	305	106.75	Beach
28	7/28/2016	Park	96	63	82	90.0	0.35	159	55.65	Beach
30	7/30/2016	Beach	88	57	82	81.0	0.35	145	50.75	Beach
31	7/31/2016	Beach	76	47	82	68.0	0.35	123	43.05	Beach
29	7/29/2016	Park	100	66	81	95.0	0.35	166	58.10	Beach
27	7/27/2016	Park	104	68	80	99.0	0.35	172	60.20	Beach
12	7/12/2016	Beach	130	95	84	99.0	0.25	225	56.25	Beach
11	7/11/2016	Beach	162	120	83	135.0	0.25	282	70.50	Beach
5	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach
6	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach
8	NaN	Beach	123	86	82	113.0	0.25	209	52.25	Beach
10	7/10/2016	Beach	140	98	82	131.0	0.25	238	59.50	Beach
7	7/7/2016	Beach	143	101	81	135.0	0.25	244	61.00	Beach
9	7/9/2016	Beach	134	95	80	126.0	0.25	229	57.25	Beach
4	7/5/2016	Beach	159	118	78	135.0	0.25	277	69.25	Beach
14	7/14/2016	Beach	122	85	78	113.0	0.25	207	51.75	Beach
13	7/13/2016	Beach	109	75	77	99.0	0.25	184	46.00	Beach
3	7/4/2016	Beach	134	99	76	98.0	0.25	233	58.25	Beach
1	7/2/2016	Park	98	67	72	90.0	0.25	165	41.25	Beach
2	7/3/2016	Park	110	77	71	104.0	0.25	187	46.75	Beach
0	7/1/2016	Park	97	67	70	90.0	0.25	164	41.00	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-1b835c37-2512-40ad-b58d-8bb74607a0ee button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-1b835c37-2512-40ad-b58d-8bb74607a0ee');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1 2	# Price는 내림차순 , Temperature은 오름차순 juice.sort_values(by = ['Price', 'Temperature'], ascending=[False, True])

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
20	7/20/2016	Park	71	42	70	NaN	0.50	113	56.50	Beach
16	7/16/2016	Beach	81	50	74	90.0	0.50	131	65.50	Beach
15	7/15/2016	Beach	98	62	75	108.0	0.50	160	80.00	Beach
17	7/17/2016	Beach	115	76	77	126.0	0.50	191	95.50	Beach
21	7/21/2016	Park	83	50	77	90.0	0.50	133	66.50	Beach
19	7/19/2016	Park	122	85	78	113.0	0.50	207	103.50	Beach
22	7/22/2016	Park	112	75	80	108.0	0.50	187	93.50	Beach
18	7/18/2016	Park	131	92	81	122.0	0.50	223	111.50	Beach
23	7/23/2016	Park	120	82	81	117.0	0.50	202	101.00	Beach
24	7/24/2016	Park	121	82	82	117.0	0.50	203	101.50	Beach
25	7/25/2016	Park	156	113	84	135.0	0.50	269	134.50	Beach
27	7/27/2016	Park	104	68	80	99.0	0.35	172	60.20	Beach
29	7/29/2016	Park	100	66	81	95.0	0.35	166	58.10	Beach
28	7/28/2016	Park	96	63	82	90.0	0.35	159	55.65	Beach
30	7/30/2016	Beach	88	57	82	81.0	0.35	145	50.75	Beach
31	7/31/2016	Beach	76	47	82	68.0	0.35	123	43.05	Beach
26	7/26/2016	Park	176	129	83	158.0	0.35	305	106.75	Beach
0	7/1/2016	Park	97	67	70	90.0	0.25	164	41.00	Beach
2	7/3/2016	Park	110	77	71	104.0	0.25	187	46.75	Beach
1	7/2/2016	Park	98	67	72	90.0	0.25	165	41.25	Beach
3	7/4/2016	Beach	134	99	76	98.0	0.25	233	58.25	Beach
13	7/13/2016	Beach	109	75	77	99.0	0.25	184	46.00	Beach
4	7/5/2016	Beach	159	118	78	135.0	0.25	277	69.25	Beach
14	7/14/2016	Beach	122	85	78	113.0	0.25	207	51.75	Beach
9	7/9/2016	Beach	134	95	80	126.0	0.25	229	57.25	Beach
7	7/7/2016	Beach	143	101	81	135.0	0.25	244	61.00	Beach
5	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach
6	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach
8	NaN	Beach	123	86	82	113.0	0.25	209	52.25	Beach
10	7/10/2016	Beach	140	98	82	131.0	0.25	238	59.50	Beach
11	7/11/2016	Beach	162	120	83	135.0	0.25	282	70.50	Beach
12	7/12/2016	Beach	130	95	84	99.0	0.25	225	56.25	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-50367d11-88b7-4011-923b-a3d85935e6a7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-50367d11-88b7-4011-923b-a3d85935e6a7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1
2
3

# 정보를 업데이트 및 정렬을 할떄 reset_index 사용
juice2 = juice.sort_values(by = ['Price', 'Temperature'], ascending=[False, True]).reset_index(drop=True)
juice2

	Date	Location	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
0	7/20/2016	Park	71	42	70	NaN	0.50	113	56.50	Beach
1	7/16/2016	Beach	81	50	74	90.0	0.50	131	65.50	Beach
2	7/15/2016	Beach	98	62	75	108.0	0.50	160	80.00	Beach
3	7/17/2016	Beach	115	76	77	126.0	0.50	191	95.50	Beach
4	7/21/2016	Park	83	50	77	90.0	0.50	133	66.50	Beach
5	7/19/2016	Park	122	85	78	113.0	0.50	207	103.50	Beach
6	7/22/2016	Park	112	75	80	108.0	0.50	187	93.50	Beach
7	7/18/2016	Park	131	92	81	122.0	0.50	223	111.50	Beach
8	7/23/2016	Park	120	82	81	117.0	0.50	202	101.00	Beach
9	7/24/2016	Park	121	82	82	117.0	0.50	203	101.50	Beach
10	7/25/2016	Park	156	113	84	135.0	0.50	269	134.50	Beach
11	7/27/2016	Park	104	68	80	99.0	0.35	172	60.20	Beach
12	7/29/2016	Park	100	66	81	95.0	0.35	166	58.10	Beach
13	7/28/2016	Park	96	63	82	90.0	0.35	159	55.65	Beach
14	7/30/2016	Beach	88	57	82	81.0	0.35	145	50.75	Beach
15	7/31/2016	Beach	76	47	82	68.0	0.35	123	43.05	Beach
16	7/26/2016	Park	176	129	83	158.0	0.35	305	106.75	Beach
17	7/1/2016	Park	97	67	70	90.0	0.25	164	41.00	Beach
18	7/3/2016	Park	110	77	71	104.0	0.25	187	46.75	Beach
19	7/2/2016	Park	98	67	72	90.0	0.25	165	41.25	Beach
20	7/4/2016	Beach	134	99	76	98.0	0.25	233	58.25	Beach
21	7/13/2016	Beach	109	75	77	99.0	0.25	184	46.00	Beach
22	7/5/2016	Beach	159	118	78	135.0	0.25	277	69.25	Beach
23	7/14/2016	Beach	122	85	78	113.0	0.25	207	51.75	Beach
24	7/9/2016	Beach	134	95	80	126.0	0.25	229	57.25	Beach
25	7/7/2016	Beach	143	101	81	135.0	0.25	244	61.00	Beach
26	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach
27	7/6/2016	Beach	103	69	82	90.0	0.25	172	43.00	Beach
28	NaN	Beach	123	86	82	113.0	0.25	209	52.25	Beach
29	7/10/2016	Beach	140	98	82	131.0	0.25	238	59.50	Beach
30	7/11/2016	Beach	162	120	83	135.0	0.25	282	70.50	Beach
31	7/12/2016	Beach	130	95	84	99.0	0.25	225	56.25	Beach

  <script>
    const buttonEl =
      document.querySelector('#df-5bba3bc3-d034-4d0b-894c-29ef5ffadfa0 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-5bba3bc3-d034-4d0b-894c-29ef5ffadfa0');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Groupby ()

데이터 요약(피벗테이블)
R dplyr groupby() %>% summarize()

1	juice.groupby(by = 'Location').count()

	Date	Lemon	Orange	Temperature	Leaflets	Price	Sold	Revenue	location
Location
Beach	16	17	17	17	17	17	17	17	17
Park	15	15	15	15	14	15	15	15	15

  <script>
    const buttonEl =
      document.querySelector('#df-ff4cd12f-13fc-47fd-8d42-3adcf0dba627 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-ff4cd12f-13fc-47fd-8d42-3adcf0dba627');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

1
2
3

import numpy as np

juice.groupby(['Location'])['Revenue'].agg([max, min, sum, np.mean])

	max	min	sum	mean
Location
Beach	95.5	43.0	1002.8	58.988235
Park	134.5	41.0	1178.2	78.546667

  <script>
    const buttonEl =
      document.querySelector('#df-615cb7b2-8595-47fc-afd8-67653b7f8d14 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-615cb7b2-8595-47fc-afd8-67653b7f8d14');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Comment and share

파이썬 넘파이 1

Mar 23, 2022 in Education

파이썬 라이브러리 설치 방법 (vs R)

# R install.pakages("패키지명")
# 파이썬 라이브러리 설치 코드에서 실행 (x)
# 터미널에서 설치
# 방법 1. conda 설치
# --> 아나콘다 설치 후, conda 설치 (데이터 과학)
# 방법 2. pip 설치 (개발 + 데이터과학 + 그외)
# --> 아나콘다 설치 안함 / 파이썬만 설치

# git bash 열고, pip install numpy
# pip install numpy

NumPy 라이브 불러오기

1 2	import numpy print(numpy.__version__)

1.21.5

1 2	import numpy as np print(np.__version__)

1.21.5

배열로 변환

1부터 10까지의 리스트를 만든다.
NumPy 배열로 변환해서 저장한다.

temp = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
arr = np.array(temp)
print(arr)
print(temp)

[ 1  2  3  4  5  6  7  8  9 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

1 2	print(type(temp)) print(type(arr))

<class 'list'>
<class 'numpy.ndarray'>

arr 배열 숫자 5 출력

arr[4:8]

array([5, 6, 7, 8])

NumPy를 사용하여 기초 통계 함수를 사용한다.

np.mean(arr)
np.sum(arr)
np.median(arr)
np.std(arr)

2.8722813232690143

사칙연산

math_scores = [90, 80, 88]
english_scores = [80, 70, 90]

total_scores = math_scores + english_scores
total_scores

[90, 80, 88, 80, 70, 90]

math_scores = [90, 80, 88]
english_scores = [80, 70, 90]

math_arr = np.array(math_scores)
english_arr = np.array(english_scores)

total_scores = math_arr + english_arr
total_scores

array([170, 150, 178])

1	np.min(total_scores)

1	np.max(total_scores)

math_scores = [2, 3, 4]
english_scores = [1, 2, 3]

math_arr = np.array(math_scores)
english_arr = np.array(english_scores)

# 사칙연산
print("덧셈:", np.add(math_arr, english_arr))
print("뺄셈:", np.subtract(math_arr, english_arr))
print("곱셈:", np.multiply(math_arr, english_arr))
print("나눗셈:", np.divide(math_arr, english_arr))
print("거듭제곱:", np.power(math_arr, english_arr))

덧셈: [3 5 7]
뺄셈: [1 1 1]
곱셈: [ 2  6 12]
나눗셈: [2.         1.5        1.33333333]
거듭제곱: [ 2  9 64]

배열의 생성

0차원부터 3차원까지 생성하는 방법

temp_arr = np.array(20)
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)

20
<class 'numpy.ndarray'>
()

# 1차원 배열
temp_arr = np.array([1, 2, 3])
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim) # 몇 차원인지 알아보는 법

[1 2 3]
<class 'numpy.ndarray'>
(3,)
1

# 2차원 배열
temp_arr = np.array([[1, 2, 3], [4, 5, 6]])
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape) # 2 * 3 배열이다
print(temp_arr.ndim)

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>
(2, 3)
2

# 3차원 배열
temp_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]
<class 'numpy.ndarray'>
(2, 2, 3)
3

temp_arr = np.array([1, 2, 3, 4], ndmin = 2) # ndmin으로 인해 2차원 배열로 바뀜
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim)

[[1 2 3 4]]
<class 'numpy.ndarray'>
(1, 4)
2

소수점 정렬

1 2	temp_arr = np.trunc([-1.23, 1.23]) temp_arr # 소수점 아래 자리가 절삭됨.

array([-1.,  1.])

1 2	temp_arr = np.fix([-1.23, 1.23]) temp_arr

array([-1.,  1.])

1 2	temp_arr = np.around([-1.23789, 1.23789], 4) temp_arr

array([-1.2379,  1.2379])

1 2	temp_arr = np.round([-1.23789, 1.23789], 4) temp_arr

array([-1.2379,  1.2379])

1 2	temp_arr = np.floor([-1.23789, 1.23789]) # 내림 temp_arr

array([-2.,  1.])

1 2	temp_arr = np.ceil([-1.23789, 1.23789]) # 올림 temp_arr

array([-1.,  2.])

shape는 axis 축을 설정함

배열을 생성하는 다양한 방법들

1 2	temp_arr = np.arange(5) temp_arr

array([0, 1, 2, 3, 4])

1 2	temp_arr = np.arange(1, 11, 3) temp_arr

array([ 1,  4,  7, 10])

zero_arr = np.zeros((2, 3))
print(zero_arr)
print(type(zero_arr))
print(zero_arr.shape)
print(zero_arr.ndim)
print(zero_arr.dtype) # float64 -> 64는 bit

[[0. 0. 0.]
 [0. 0. 0.]]
<class 'numpy.ndarray'>
(2, 3)
2
float64

temp_arr = np.ones((4, 5), dtype = "int32") # 데이터 타입도 인위적으로 수정 가능
print(temp_arr)
print(type(temp_arr))
print(temp_arr.shape)
print(temp_arr.ndim)
print(temp_arr.dtype)

[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]
<class 'numpy.ndarray'>
(4, 5)
2
int32

temp_arr = np.ones((2, 6), dtype = "int32")
temp_res_arr = temp_arr.reshape(2, 2, 3) # (5, 3)을 했을때 cannot reshape array of size 12 into shape (5,3)
print(temp_res_arr)                   # 사이즈를 12로 바꾸면 되서 4, 3 또는 3, 4 등 바꿔주면 됨.
print(type(temp_res_arr))
print(temp_res_arr.shape)
print(temp_res_arr.ndim)
print(temp_res_arr.dtype)

[[[1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]]]
<class 'numpy.ndarray'>
(2, 2, 3)
3
int32

temp_arr = np.ones((12, 12), dtype = "int32")
temp_res_arr = temp_arr.reshape(5, -1) # np.ones(12, 12) -> 12*12 = 144 약수가 아니면 error
print(temp_res_arr) 
print(type(temp_res_arr))
print(temp_res_arr.shape)
print(temp_res_arr.ndim)
print(temp_res_arr.dtype)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-60-dfc75cfbf69a> in <module>()
      1 temp_arr = np.ones((12, 12), dtype = "int32")
----> 2 temp_res_arr = temp_arr.reshape(5, -1)
      3 print(temp_res_arr)
      4 print(type(temp_res_arr))
      5 print(temp_res_arr.shape)


ValueError: cannot reshape array of size 144 into shape (5,newaxis)

numpy 조건식

np.where()

1 2	temp_arr = np.arange(10) temp_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1
2
3

# 5보다 작은 값은 원래값으로 반환
# 5보다 큰 값은 원래 값 * 10
np.where(temp_arr < 5, temp_arr, temp_arr * 10)

array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])

# 0 - 100 까지의 배열을 만들고, 50보다 작은 값은 곱하기 10, 나머지는 그냥 원래 값으로 반환
# np.where 은 조건식이 하나만 필요할 떄 사용
temp_arr = np.arange(101)
# temp_arr
np.where(temp_arr < 50, temp_arr * 10, temp_arr)

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120,
       130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250,
       260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380,
       390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100])

np.select()

temp_arr = np.arange(10)
temp_arr

# 5보다 큰 값은 곱하기 2, 2보다 작은 값은 더하기 100

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1
2
3

condlist = [temp_arr > 5, temp_arr < 2]
choicelist = [temp_arr * 2, temp_arr + 100]
np.select(condlist, choicelist, default = temp_arr)

array([100, 101,   2,   3,   4,   5,  12,  14,  16,  18])

Comment and share

파이썬 기초 문법 3

Mar 22, 2022 in Education

클래스를 만드는 목적!

코드의 간결화!
- 코드를 재사용!
여러 라이브러리 –> 클래스로 구현이 됨
- list 클래스, str 클래스,
- 객체로 씀
- 변수명으로 정의!
여러 클래스들이 모여서 하나의 라이브러리가 됨.
- 장고(django) / 웹개발 / 머신러닝 / 시각화 / 데이터 전처리

instance 메서드 생성

list.append(), list.extend()

class Person:      # 대문자소문자 입력.
  
  # class attribute
  country = "korean"

  # instance attribute
  def __init__(self, name, age):   # def __init__(self) 고정적 default
    self.name = name
    self.age = age

  # instance method 정의
  def singing(self,songtitle, sales):
    return "{} 판매량 {} 된 {}을 노래합니다.".format(self.name, sales, songtitle)

if __name__ == "__main__":
  kim = Person("Kim", 100)
  lee = Person("Lee", 100)

  # access class attribute
  print("kim은 {}".format(kim.__class__.country))
  print("lee는 {}".format(lee.__class__.country))

  # call instance
  print(kim.singing("A", 10))
  print(lee.singing("B", 200))

kim은 korean
lee는 korean
Kim 판매량 10 된 A을 노래합니다.
Lee 판매량 200 된 B을 노래합니다.

클래스 상속

class Parent:
  # instance attribute
  def __init__(self, name, age):
    self.name = name
    self.age = age

  # instance method 정의
  def whoAmI(self):
    print(" I am Parent!!")

  def singing(self,songtitle):
    return "{} {}을 노래합니다.".format(self.name, songtitle)

  def dancing(self):
    return "{} 현재 춤을 춥니다.".format(self.name)

class Child(Parent):
  def __init__(self, name, age):
    # super() function
    super().__init__(name, age)
    print("Child Class is ON")

  def whoAmI(self):
    print("I am Child")

  def studying(self):
    print("I am Fast Runner")


if __name__ == "__main__":
  child_kim = Child("kim", 15)
  parent_kim = Parent("kim", 45)
  print(child_kim.dancing())
  print(child_kim.singing("연애"))
  #print(parent_kim.studying()) # AttributeError: 'Parent' object has no attribute 'studying' **parent 클래스에 정의 되어있지 않아서 에러가 남"
  child_kim.whoAmI()
  parent_kim.whoAmI()

Child Class is ON
kim 현재 춤을 춥니다.
kim 연애을 노래합니다.
I am Child
 I am Parent!!

class TV:

  # init constructor
  def __init__(self):
    self.__maxprice = 500

  def sell(self):
    print("Selling Price: {}".format(self.__maxprice))

  def setMaxPrice(self, price):
    self.__maxprice = price

if __name__ == "__main__":
  tv = TV()
  tv.sell()

  # change price
  # 안 바뀌는 코드의 예시
  tv.__maxprice = 1000
  tv.sell()

  # setMaxPrice
  # 값을 바꿀 수있다!? 외부의 입력값을 업데이트 할 수 있다!
  tv.setMaxPrice(1000)
  tv.sell()

Selling Price: 500
Selling Price: 500
Selling Price: 1000

클래스 내부에 조건문

init constructor에 조건문을 써보자!

class Employee:

  # init constructor
  # nmae , salary
  def __init__(self, name, salary = 0):
    self.name = name

    # 조건문 추가
    if salary > 0:
      self.salary = salary
    else:
      self.salary = 0
      print("급여는 0원이 될수 없다!. 다시 입력하십시오!!")

  def update_salary(self, amount):
    self.salary += amount

  def weekly_salary(self):
    return self.salary / 7

if __name__ == "__main__":
  emp01 = Employee("Winters", -50000)
  print(emp01.name)
  print(emp01.salary)
  emp01.salary += 1500
  print(emp01.salary)
  emp01.update_salary(3000)
  print(emp01.salary)
  week_salary = emp01.weekly_salary()
  print(week_salary)

급여는 0원이 될수 없다!. 다시 입력하십시오!!
Winters
0
1500
4500
642.8571428571429

클래스 Docstring

class Person:
  """
  사람을 표현하는 클래스
  

 
  Attributes
  ------------
  name : str
    name of the person

  age : int
    age of the person

  Methods
  -------------
 
  info(additional=""):
    prints the person's name and age
  """
  def __init__(self, name, age):
    """
    Constructs all the neccessary attributes for the person object
  
    Parameters(매개변수)
    -------------------------
      name : str
        name of the person

      age : int
        age of the person
    """

    self.name = name
    self.age = age

  def info(self, additional = None):
    """
    귀찮음...
    
    Parameters
    --------------
      additional : str, optional
        more info to be displayed (Default is None) / A, B, C


    Returens
    -----------
      None

    """

    print(f'My name is {self.name}. I am {self.age} years old. ' + additional)

if __name__ == "__main__":
    person = Person("Evan", age = 20)
    person.info("나의 직장은 00이야")
    help(Person)

My name is Evan. I am 20 years old. 나의 직장은 00이야
Help on class Person in module __main__:

class Person(builtins.object)
 |  Person(name, age)
 |  
 |  사람을 표현하는 클래스
 |  
 |  
 |  
 |  Attributes
 |  ------------
 |  name : str
 |    name of the person
 |  
 |  age : int
 |    age of the person
 |  
 |  Methods
 |  -------------
 |  
 |  info(additional=""):
 |    prints the person's name and age
 |  
 |  Methods defined here:
 |  
 |  __init__(self, name, age)
 |      Constructs all the neccessary attributes for the person object
 |      
 |      Parameters(매개변수)
 |      -------------------------
 |        name : str
 |          name of the person
 |      
 |        age : int
 |          age of the person
 |  
 |  info(self, additional=None)
 |      귀찮음...
 |      
 |      Parameters
 |      --------------
 |        additional : str, optional
 |          more info to be displayed (Default is None) / A, B, C
 |      
 |      
 |      Returens
 |      -----------
 |        None
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

Comment and share

파이썬 기초 문법 2

Mar 22, 2022 in Education

기초 문법 리뷰

# 리스트
book_list = ["A","B","C"]
# append, extend, insert, remove, pop, etc

# 튜플
book_tuple = ("A", "B", "c")
# 수정 삭제가 불가능하다

# 딕셔너리
book_dictionary = {"책 제목" : ["A", "B"], "출판년도" :[2011, 2002]}
# keys(), values(), items(), get()

조건문 & 반복문

if True:
  print("코드 실행")
elif True:
  print("코드 실행")
else:
  print("코드 실행")

1 2	for i in range(3): print(i+1, "안녕하세요")

1 안녕하세요
2 안녕하세요
3 안녕하세요

1
2
3

book_list = ["프로그래밍 R", "혼자 공부하는 머신러닝"]
for book in book_list:
  print(book)

프로그래밍 R
혼자 공부하는 머신러닝

1
2
3

strings01 = "Hello World"
for char in strings01:
  print(char)

H
e
l
l
o
 
W
o
r
l
d

1
2
3

num_tuple = (1, 2, 3, 4)
for num in num_tuple:
  print(num)

num_dict = {"A" : 1, "B" : 2}
for num in num_dict:
  # print(num) # keys 값이 나옴, value값이 아니라.
  print(num_dict[num])

1
2

반복문의 필요성

product_name = ["요구르트", "우유", "과자"]
prices = [1000, 1500, 2000]
quantities = [5, 3, 4]
a = [1, 2, 3]
# name = product_name[0]
# sales = prices[0] * quantities[0]
# print(name + "의 매출액은" + str(sales) + "원이다.")

# name = product_name[1]
# sales = prices[1] * quantities[1]
# print(name + "의 매출액은" + str(sales) + "원이다.")

# 위 코드의 반복문 코드 작성 필요 절감
for i in range(len(product_name)):
  name = product_name[i]
  sales = prices[i] * quantities[i]
  print(name + "의 매출액은" + str(sales) + "원이다.")

요구르트의 매출액은5000원이다.
우유의 매출액은4500원이다.
과자의 매출액은8000원이다.

while

조건식이 들어간 반복문

count = 1
while count <= 5:
  print("안녕하세요..")
  count += 1
  print(count)

print("5 초과 했군요..")

안녕하세요..
2
안녕하세요..
3
안녕하세요..
4
안녕하세요..
5
안녕하세요..
6
5 초과 했군요..

count = 3
while count > 0:
  print("안녕하세요..")
  count -= 1
  print(count)

안녕하세요..
2
안녕하세요..
1
안녕하세요..
0

리스트 컴프리헨션

for-loop 반복문을 한줄로 처리

my_list = [[10], [20, 30]]
# print(my_list)

flattened_list = []
for value_list in my_list:
  # print(value_list)
  for value in value_list:
    # print(value)
    flattened_list.append(value)

print(flattened_list)
# 결괏값 : [10, 20 ,30]

[10]
[20, 30]
[10, 20, 30]

1
2
3

my_list = [[10], [20, 30]]
flattened_list = [value for value_list in my_list for value in value_list]
print(flattened_list)

[10, 20, 30]

letters = []
for char in "helloworld":
  letters.append(char)
print(letters)

['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd']

1 2	letters2 = [char for char in "helloworld"] print(letters2)

['h', 'e', 'l', 'l', 'o', 'w', 'o', 'r', 'l', 'd']

사용자 정의 함수 (User-Defined Function)

def 함수명():
  # 코드 실행
  return 값

함수명()

basic.py로 저장할떄, 예시

# /user/local/bin/python
# -*- coding: utf-8 -*-
def temp(content, letter):
  """content안에 있는 문자를 세는 함수입니다.
  
  Args:
    content(str) : 탐색 문자열
    letter(str) : 찾을 문자열

  Returns:
    int
  """

  print("함수 테스트")

  cnt = len([char for char in content if char == letter])
  return cnt

if __name__ == "__main__":
  help(temp)
  docstring = temp.__doc__ # docstring 문서화
  print(docstring)

Help on function temp in module __main__:

temp(content, letter)
    content안에 있는 문자를 세는 함수입니다.
    
    Args:
      content(str) : 탐색 문자열
      letter(str) : 찾을 문자열
    
    Returns:
      int

content안에 있는 문자를 세는 함수입니다.
  
  Args:
    content(str) : 탐색 문자열
    letter(str) : 찾을 문자열

  Returns:
    int

value_list = [1, 2, 3, 4, 5, 6]
print("avg:", sum(value_list) / len(value_list))

# 중간값
midpoint = int(len(value_list) / 2)
# len(value_list) % 2 == 0:
print((value_list[midpoint - 1] + value_list[midpoint]) / 2)
print(value_list[midpoint])

def mean_and_median(value_list):
  """ 숫자 리스트 요소들의 평균과 중간값을 구하는 코드를 작성해라
  Args:
    value_list (iterable of int / float): A list of int numbers
  
  Return:
    tuple(float, float)
  """
  # 평균
  mean = sum(value_list) / len(value_list)
  # 중간값
  midpoint = int(len(value_list) / 2)
  if len(value_list) % 2 == 0:
    median = (value_list[midpoint - 1] + value_list[midpoint]) / 2
  else:
    median = value_list[midpoint]

  return mean, median

if __name__ == "__main__":
  value_lists = [1, 1, 2, 2, 3, 4, 5]
  avg, median = mean_and_median(value_lists)
  print("avg:", avg)
  print("median:", median)

avg: 3.5
3.5
4
avg: 2.5714285714285716
median: 2

데코레이터, 변수명 immutable or mutable
context manager

함수 클로저 사용하기

global 함수 (전역 변수 변경)

x = 10
def foo():
  x = 20
  print(x)

print(x)
foo()

10
20

x = 10
def foo():
  global x # 전역 변수를 설정하겠다
  x = 20
  print(x)

print(x)
foo()

10
20

Comment and share

파이썬 기초 문법 1

Mar 21, 2022 in Education

1	print('hello world!')

hello world!

주석처리

코드 작업 시, 특정 코드에 대해 설명
사용자 정의 함수 작성 시, 클래스 작성 시.. (도움말 작성..)

변수 (Scalar)

객체 (Object)로 구현이 됨
- 하나의 자료형 (Type)을 가진다.
- 클래스로 정의가 됨.
  - 다양한 함수들이 존재 함.

int

int 정수를 표한하는데 사용함.

1
2
3

num_int = 1
print(num_int)
print(type(num_int))

1
<class 'int'>

float

실수를 표현하는데 사용한다.

1
2
3

num_float = 0.2
print(num_float)
print(type(num_float))

0.2
<class 'float'>

bool

True와 False로 나타내는 Boolean 값을 표현하는데 사용한다.

1
2
3

bool_true = True
print(bool_true)
print(type(bool_true))

True
<class 'bool'>

None

Null을 나타내는 자료형으로 Nonen이라는 한 가지 값만 가집니다.

1
2
3

none_x = None
print(none_x)
print(type(none_x))

None
<class 'NoneType'>

사칙연산

정수형 사칙 연산

a = 2
b = 4
print('a + b = ', a + b)
print('a % b = ', a % b)
print('a / b = ', a / b) # 나누기를 했을때 type이 실수형(float)으로 바뀜.

a + b =  6
a % b =  2
a / b =  0.5

논리형 연산자

Bool 형은 True와 False 값으로 정의
AND / OR

x = 5 > 4
y = 3 > 4
print(x and x)
print(x and y)
print(y and x)
print(y and y)
print("-----")
print(x or x)
print(x or y)
print(y or x)
print(y or y)

True
False
False
False
-----
True
True
True
False

비교 연산자

부등호를 의미합니다.
비교 연산자를 True와 False값을 도출

논리 & 비교 연산자 응용

1 2	var = input("입력하여 주세요....") print(type(var))

입력하여 주세요....5
<class 'str'>

형변환을 해준다.
문자열, 정수, 실수 등등

1 2	var = int("1") print(type(var))

<class 'int'>

1 2	var = int(input("숫자를 입력하여 주세요")) print(type(var))

숫자를 입력하여 주세요3
<class 'int'>

num1 = int(input("숫자를 입력하여 주세요"))
num2 = int(input("숫자를 입력하여 주세요"))

print(num1 > num2)

숫자를 입력하여 주세요10
숫자를 입력하여 주세요5
True

num1 = int(input("숫자를 입력하여 주세요"))
num2 = int(input("숫자를 입력하여 주세요"))
num3 = int(input("숫자를 입력하여 주세요"))
num4 = int(input("숫자를 입력하여 주세요"))

var1 = num11 >= num2 #True
var2 = num3 < num4 #True
print(var1 and var2)
print(var1 or var2)

숫자를 입력하여 주세요20
숫자를 입력하여 주세요15
숫자를 입력하여 주세요3
숫자를 입력하여 주세요5
True
True

변수 (Non Scalar)

문자열을 입력

1 2	print("'Hello, World'") print('"Hello, World"')

'Hello, World'
"Hello, World"

String 연산자

덧셈 연산자를 써보자.

1
2
3

str1 = "Hello "
str2 = "World!"
print(str1 + str2)

Hello World!

Indexing

문자열 인덱싱은 각각의 문자열 안에서 범위를 지정하여 특정 문자를 추린다

1 2	greeting = "Hello Kaggle!" print(greeting[6])

리스트

시퀀스 데이터 타입
데이터에 순서가 존재하는지, 슬라이싱이 가능하는지
대괄호 (‘[값]’)

a = [] # 빈 리스트
a_func = list() # 빈 리스트 생성
b = [1] # 숫자가 요소가 될 수 있다.
c = ['apple'] # 문자열도 요소가 될 수 있다.
d = [1, 2, ['apple']] # 리스트 안에 또 다른 리스트를 요소로 넣을 수 있다.

print(a)
print(a_func)
print(b)
print(c)
print(d)
print(type(d))

[]
[]
[1]
['apple']
[1, 2, ['apple']]
<class 'list'>

리스트 슬라이싱

1 2	a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] print(a[0])

1 2	a =[ ["apple", "banana", "cherry"], 1] print(a[0][2][2])

1
2
3

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(a[::-1]) # 역순
print(a[::2])

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[1, 3, 5, 7, 9]

리스트 연산자

a = ["john", "evan"]
b = ["alice", "eva"]

c = a + b
print(c)

['john', 'evan', 'alice', 'eva']

c = a * 3
d = b * 0
print("a * 3 = ", c)
print("b * 0 = ", d)

a * 3 =  ['john', 'evan', 'john', 'evan', 'john', 'evan']
b * 0 =  []

리스트 수정 및 삭제

1
2
3

a = [0, 1, 2]
a[1] = "b"
print(a)

[0, 'b', 2]

리스트 값 추가하기

a = [100, 200, 300]
a.append(400)
print(a)

# a.append([500, 600])
# print(a)

a.extend([500, 600])
print(a)

[100, 200, 300, 400]
[100, 200, 300, 400, 500, 600]

a = [0, 1, 2]
# a.insert(인덱스번호, 넣고자하는 값)
a.insert(1, 100)
print(a)

[0, 100, 1, 2]

리스트 값 삭제하기

a = [1, 2, 3, 4, "A"]
a.remove(1)
print(a)
a.remove("A")
print(a)

[2, 3, 4, 'A']
[2, 3, 4]

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

del a[1] # 인덱스 번호
print(a)

del a[1:5]
print(a)

[1, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 7, 8, 9, 10]

b = ["a", "b", "c", "d"]
x = b.pop()
print(x)
print(b)

d
['a', 'b', 'c']

그 외 메서드

a = [0, 1, 2, 3]
print(a)

a.clear()
print(a)

[0, 1, 2, 3]
[]

1 2	a = ["a", "a", "b", "b"] print(a.index("b")) # 반복 되는 문구가 있을때 index를 쓰면 첫번째 문자의 위치가 출력.

a = [1, 4, 5, 2, 3]
b = [1, 4, 5, 2, 3]

a.sort()
print("a.sort():", a)

# 내림차순, sort()
b.sort(reverse = True)
print("sort(reverse = True): ", b)

a.sort(): [1, 2, 3, 4, 5]
None
sort(reverse = True):  [5, 4, 3, 2, 1]

1 2	c = [4, 3, 2, 'a'] # c.srot() 숫자와 문자는 정렬 불가.

튜플

List와 비슷하다.
슬라이싱, 인덱싱 등등
(vs 리스트) : 튜플은 수정 삭제가 안된다.

tuple1 = (0) # 끝에 콤마(,)를 붙이지 않을 때
tuple2 = (0,) # 끝에 콤마(,)를 붙일때
tuple3 = 0, 1, 2
print(tuple1) # int
print(tuple2) # tuple
print(tuple3) # tuple

0
(0,)
(0, 1, 2)

a = (0, 1, 2, 3, 'a')
print(type(a))

# del a[4] TypeError: 'tuple' object doesn't support item deletion
# a[1] = "b"

<class 'tuple'>

튜플 인덱싱 및 슬라이싱 하기

a = (0, 1, 2, 3, "a")
print(a[1])
print(a[3])
print(a[4])

1
3
a

더하기 곱셈 연산자 사용

t1 = (0, 1, 2)
t2 = ('a','b')
print(t1 + t2)
print(t1 * 3)

(0, 1, 2, 'a', 'b')
(0, 1, 2, 0, 1, 2, 0, 1, 2)

딕셔너리

key-value 값으로 나뉨.

dict_01 = {'teacher' : 'evan',
           'class' : 601,
           'student' : 24,
           '학생이름' : ['A','Z']}
# print(dict_01)
print(dict_01['teacher'])
print(dict_01['class'])
print(dict_01['학생이름'])

evan
601
['A', 'Z']

1
2
3

print(type(dict_01.keys()))
print(dict_01.keys())
print(list(dict_01.keys()))

<class 'dict_keys'>
dict_keys(['teacher', 'class', 'student', '학생이름'])
['teacher', 'class', 'student', '학생이름']

1
2
3

print(type(dict_01.values()))
print(dict_01.values())
print(list(dict_01.values()))

<class 'dict_values'>
dict_values(['evan', 601, 24, ['A', 'Z']])
['evan', 601, 24, ['A', 'Z']]

1	dict_01.items()

dict_items([('teacher', 'evan'), ('class', 601), ('student', 24), ('학생이름', ['A', 'Z'])])

1
2
3

print(dict_01.get("teacher", "값 없음"))
print(dict_01.get("선생님", "값 없음"))
print(dict_01.get("class"))

evan
값 없음
601

조건문 & 반복문

weather = "맑음"
if weather == "비":
  print("우산을 가져간다.")
else:
  print("우산을 가져가지 않는다.")

우산을 가져가지 않는다.

등급표 만들기
60점 이상 합격/불합격
숫자는 아무거나 써도 상관없음

score =int(input("점수를 입력해주세요."))

if score >= 60:
  print("합격")
else:
  print("불합격")

점수를 입력해주세요.60
합격

# 90점 이상은 A등급
# 80점 이상은 B등급
# 나머지는 F등급

score = int(input("점수를 입력해주세요"))

if score >= 90:
  print("A등급")
elif score >= 80:
  print("B등급")
else:
  print("F등급")

점수를 입력해주세요56
F등급

반복문

for 문

1 2	for i in range(3): print(i + 1, "안녕하세요!")

1 안녕하세요!
2 안녕하세요!
3 안녕하세요!

count = range(50)
print(count)

for n in count:
  print(str(n + 1) + "번째")
  if (n + 1) == 5:
    print("그만합시다!!")
    break
  print("축구 슈팅")

range(0, 50)
1번째
축구 슈팅
2번째
축구 슈팅
3번째
축구 슈팅
4번째
축구 슈팅
5번째
그만합시다!!

a = "hello"

for x in a:
  if x == "l":
    break
  print(x)

h
e

1
2
3

alphabets = ['A', 'B', 'C']
for index, value in enumerate(alphabets):
  print(index, value)

0 A
1 B
2 C

while문

n = 0
while n <10:
  n += 1
  print("%d번째 인사입니다." % n)

1번째 인사입니다.
2번째 인사입니다.
3번째 인사입니다.
4번째 인사입니다.
5번째 인사입니다.
6번째 인사입니다.
7번째 인사입니다.
8번째 인사입니다.
9번째 인사입니다.
10번째 인사입니다.

Comment and share

NEWER POSTS
page 2 of 2