라이브러리 불러오기

1
2
import pandas as pd
print(pd.__version__)
1.3.5

테스트

1
2
3
4
5
temp_dic = {"coll" : [1, 2, 3],
"col2" : [3, 4, 5]}
df = pd.DataFrame(temp_dic)
print(type(df))
print(df)
<class 'pandas.core.frame.DataFrame'>
   coll  col2
0     1     3
1     2     4
2     3     5
1
2
3
4
temp_dic = {'a' : 1 , "b" : 2, "c" : 3}
ser = pd.Series(temp_dic)
print(type(ser))
print(ser)
<class 'pandas.core.series.Series'>
a    1
b    2
c    3
dtype: int64

구글 드라이브 연동

1
2
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
1
2
3
DATA_PATH = '/content/drive/MyDrive/Colab Notebooks/data/Lemonade2016.csv'
juice = pd.read_csv(DATA_PATH)
juice

Date Location Lemon Orange Temperature Leaflets Price
0 7/1/2016 Park 97 67 70 90.0 0.25
1 7/2/2016 Park 98 67 72 90.0 0.25
2 7/3/2016 Park 110 77 71 104.0 0.25
3 7/4/2016 Beach 134 99 76 98.0 0.25
4 7/5/2016 Beach 159 118 78 135.0 0.25
5 7/6/2016 Beach 103 69 82 90.0 0.25
6 7/6/2016 Beach 103 69 82 90.0 0.25
7 7/7/2016 Beach 143 101 81 135.0 0.25
8 NaN Beach 123 86 82 113.0 0.25
9 7/9/2016 Beach 134 95 80 126.0 0.25
10 7/10/2016 Beach 140 98 82 131.0 0.25
11 7/11/2016 Beach 162 120 83 135.0 0.25
12 7/12/2016 Beach 130 95 84 99.0 0.25
13 7/13/2016 Beach 109 75 77 99.0 0.25
14 7/14/2016 Beach 122 85 78 113.0 0.25
15 7/15/2016 Beach 98 62 75 108.0 0.50
16 7/16/2016 Beach 81 50 74 90.0 0.50
17 7/17/2016 Beach 115 76 77 126.0 0.50
18 7/18/2016 Park 131 92 81 122.0 0.50
19 7/19/2016 Park 122 85 78 113.0 0.50
20 7/20/2016 Park 71 42 70 NaN 0.50
21 7/21/2016 Park 83 50 77 90.0 0.50
22 7/22/2016 Park 112 75 80 108.0 0.50
23 7/23/2016 Park 120 82 81 117.0 0.50
24 7/24/2016 Park 121 82 82 117.0 0.50
25 7/25/2016 Park 156 113 84 135.0 0.50
26 7/26/2016 Park 176 129 83 158.0 0.35
27 7/27/2016 Park 104 68 80 99.0 0.35
28 7/28/2016 Park 96 63 82 90.0 0.35
29 7/29/2016 Park 100 66 81 95.0 0.35
30 7/30/2016 Beach 88 57 82 81.0 0.35
31 7/31/2016 Beach 76 47 82 68.0 0.35

  <script>
    const buttonEl =
      document.querySelector('#df-27fb38f5-8b87-4f93-8b5a-9f5a79087b29 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-27fb38f5-8b87-4f93-8b5a-9f5a79087b29');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • 데이터를 불러왔다.
  • 첫번째 파악해야 하는 것 = 데이터 구조 파악
1
2
juice.info() # info = DataFrame 안에 있는 method
# 결측치가 있으면 Non-Null Count 개수가 다름.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         31 non-null     object 
 1   Location     32 non-null     object 
 2   Lemon        32 non-null     int64  
 3   Orange       32 non-null     int64  
 4   Temperature  32 non-null     int64  
 5   Leaflets     31 non-null     float64
 6   Price        32 non-null     float64
dtypes: float64(2), int64(3), object(2)
memory usage: 1.9+ KB
1
juice.head(10) # 위에서부터 5개까지, ()안에 숫자를 넣으면 그 숫자까지 데이터를 불러옴

Date Location Lemon Orange Temperature Leaflets Price
0 7/1/2016 Park 97 67 70 90.0 0.25
1 7/2/2016 Park 98 67 72 90.0 0.25
2 7/3/2016 Park 110 77 71 104.0 0.25
3 7/4/2016 Beach 134 99 76 98.0 0.25
4 7/5/2016 Beach 159 118 78 135.0 0.25
5 7/6/2016 Beach 103 69 82 90.0 0.25
6 7/6/2016 Beach 103 69 82 90.0 0.25
7 7/7/2016 Beach 143 101 81 135.0 0.25
8 NaN Beach 123 86 82 113.0 0.25
9 7/9/2016 Beach 134 95 80 126.0 0.25

  <script>
    const buttonEl =
      document.querySelector('#df-6f968857-ee36-4309-96a8-22630fa0efd6 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6f968857-ee36-4309-96a8-22630fa0efd6');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.tail() # 아래에서 부터 5개

Date Location Lemon Orange Temperature Leaflets Price
27 7/27/2016 Park 104 68 80 99.0 0.35
28 7/28/2016 Park 96 63 82 90.0 0.35
29 7/29/2016 Park 100 66 81 95.0 0.35
30 7/30/2016 Beach 88 57 82 81.0 0.35
31 7/31/2016 Beach 76 47 82 68.0 0.35

  <script>
    const buttonEl =
      document.querySelector('#df-6733b4da-1377-4661-8b0c-d798794900ed button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-6733b4da-1377-4661-8b0c-d798794900ed');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • Describe() 함수
  • 기술통계량 확인 해주는 함수
1
juice.describe() # type(juice.describe()) 항상 데이터 타입 확인.

Lemon Orange Temperature Leaflets Price
count 32.000000 32.000000 32.000000 31.000000 32.000000
mean 116.156250 80.000000 78.968750 108.548387 0.354687
std 25.823357 21.863211 4.067847 20.117718 0.113137
min 71.000000 42.000000 70.000000 68.000000 0.250000
25% 98.000000 66.750000 77.000000 90.000000 0.250000
50% 113.500000 76.500000 80.500000 108.000000 0.350000
75% 131.750000 95.000000 82.000000 124.000000 0.500000
max 176.000000 129.000000 84.000000 158.000000 0.500000

  <script>
    const buttonEl =
      document.querySelector('#df-f9f0ce1e-fc97-44be-914b-2a73ca101631 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-f9f0ce1e-fc97-44be-914b-2a73ca101631');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • value_counts()
1
2
print(juice['Location'].value_counts()) # 기초 통계량으로는 빈도만 확인 할수 있으니 value counts()함수를 사용
print(type(juice['Location'].value_counts()))
Beach    17
Park     15
Name: Location, dtype: int64
<class 'pandas.core.series.Series'>

데이터 다뤄보기

  • 행과 열을 핸들링 해보자.
1
2
juice['Sold'] = 0 # 새로운 데이터 추가
print(juice.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25     0
1  7/2/2016     Park     98      67           72      90.0   0.25     0
2  7/3/2016     Park    110      77           71     104.0   0.25     0
1
2
juice['Sold'] = juice['Lemon'] + juice['Orange']
print(juice.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold
0  7/1/2016     Park     97      67           70      90.0   0.25   164
1  7/2/2016     Park     98      67           72      90.0   0.25   165
2  7/3/2016     Park    110      77           71     104.0   0.25   187
  • 매출액 = 가격 * 판매량
1
2
3
# juice['Revenue'] = 0 생략 가능
juice['Revenue'] = juice['Price'] * juice['Sold']
print(juice.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold  \
0  7/1/2016     Park     97      67           70      90.0   0.25   164   
1  7/2/2016     Park     98      67           72      90.0   0.25   165   
2  7/3/2016     Park    110      77           71     104.0   0.25   187   

   Revenue  
0    41.00  
1    41.25  
2    46.75  
  • drop(axis = 0 | 1)
    • axis를 0으로 설정 시, 행(=index)방향으로 drop() 실행
    • axis를 1로 설정 시, 열방향으로 drop 수행함.
1
2
juice_column_drop = juice.drop('Sold', axis = 1)
print(juice_column_drop.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Revenue
0  7/1/2016     Park     97      67           70      90.0   0.25    41.00
1  7/2/2016     Park     98      67           72      90.0   0.25    41.25
2  7/3/2016     Park    110      77           71     104.0   0.25    46.75
1
2
juice_row_drop = juice.drop(0, axis = 0)
print(juice_row_drop.head(3))
       Date Location  Lemon  Orange  Temperature  Leaflets  Price  Sold  \
1  7/2/2016     Park     98      67           72      90.0   0.25   165   
2  7/3/2016     Park    110      77           71     104.0   0.25   187   
3  7/4/2016    Beach    134      99           76      98.0   0.25   233   

   Revenue  
1    41.25  
2    46.75  
3    58.25  

데이터 인덱싱

1
juice[4:8]

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25
5 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00

  <script>
    const buttonEl =
      document.querySelector('#df-39a9c573-48e3-4700-b5bb-c24950ba4f3f button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-39a9c573-48e3-4700-b5bb-c24950ba4f3f');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • boolean 값을 활용한 데이터 추출
1
2
3
# location이 Beach인 경우
# juice['Location'].value_counts()
juice[juice['Leaflets'] >= 100]

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
8 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
9 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
10 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
11 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
14 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
15 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
17 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
19 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
22 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
23 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
24 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-39bd9f29-5daa-48f0-bd59-c71dc37c6437 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-39bd9f29-5daa-48f0-bd59-c71dc37c6437');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

iloc vs loc

  • 차이를 확인한다!
1
juice.head(3)

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-10e12aa4-8e41-4c4f-91a6-be55e247dea1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-10e12aa4-8e41-4c4f-91a6-be55e247dea1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
4
%%time

# juice.iloc[:, 0:2]
juice.iloc[0:3, 0:2]
CPU times: user 652 µs, sys: 0 ns, total: 652 µs
Wall time: 653 µs

Date Location
0 7/1/2016 Park
1 7/2/2016 Park
2 7/3/2016 Park

  <script>
    const buttonEl =
      document.querySelector('#df-3556673a-432a-46c9-b3b9-d55c0796a5c1 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-3556673a-432a-46c9-b3b9-d55c0796a5c1');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • loc
  • 라벨 기반!
1
2
3
%%time

juice.loc[0:2, ['Date','Location']]
CPU times: user 1.56 ms, sys: 0 ns, total: 1.56 ms
Wall time: 1.5 ms

Date Location
0 7/1/2016 Park
1 7/2/2016 Park
2 7/3/2016 Park

  <script>
    const buttonEl =
      document.querySelector('#df-7f178606-9630-45b5-bb8e-006efc9d78d3 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-7f178606-9630-45b5-bb8e-006efc9d78d3');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
  • 데이터, 컬럼명 동시에 별도 추출 (iloc만 가능)
1
juice.loc[juice['Leaflets'] >= 100, ['Date', 'Location']]

Date Location
2 7/3/2016 Park
4 7/5/2016 Beach
7 7/7/2016 Beach
8 NaN Beach
9 7/9/2016 Beach
10 7/10/2016 Beach
11 7/11/2016 Beach
14 7/14/2016 Beach
15 7/15/2016 Beach
17 7/17/2016 Beach
18 7/18/2016 Park
19 7/19/2016 Park
22 7/22/2016 Park
23 7/23/2016 Park
24 7/24/2016 Park
25 7/25/2016 Park
26 7/26/2016 Park

  <script>
    const buttonEl =
      document.querySelector('#df-f18c55cb-276f-414b-b983-0362c3463c87 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-f18c55cb-276f-414b-b983-0362c3463c87');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.loc[juice['Leaflets'] >= 100, 0:2]
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-70-46f78a7ec2bf> in <module>()
----> 1 juice.loc[juice['Leaflets'] >= 100, 0:2]


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1107             return self._multi_take(tup)
   1108 
-> 1109         return self._getitem_tuple_same_dim(tup)
   1110 
   1111     def _get_label(self, label, axis: int):


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_tuple_same_dim(self, tup)
    804                 continue
    805 
--> 806             retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    807             # We should never have retval.ndim < self.ndim, as that should
    808             #  be handled by the _getitem_lowerdim call above.


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1140         if isinstance(key, slice):
   1141             self._validate_key(key, axis)
-> 1142             return self._get_slice_axis(key, axis=axis)
   1143         elif com.is_bool_indexer(key):
   1144             return self._getbool_axis(key, axis=axis)


/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1174 
   1175         labels = obj._get_axis(axis)
-> 1176         indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop, slice_obj.step)
   1177 
   1178         if isinstance(indexer, slice):


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
   5683         slice(1, 3, None)
   5684         """
-> 5685         start_slice, end_slice = self.slice_locs(start, end, step=step)
   5686 
   5687         # return a slice


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
   5885         start_slice = None
   5886         if start is not None:
-> 5887             start_slice = self.get_slice_bound(start, "left")
   5888         if start_slice is None:
   5889             start_slice = 0


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   5795         # For datetime indices label may be a string that has to be converted
   5796         # to datetime boundary according to its resolution.
-> 5797         label = self._maybe_cast_slice_bound(label, side)
   5798 
   5799         # we need to look up the label


/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in _maybe_cast_slice_bound(self, label, side, kind)
   5747         # reject them, if index does not contain label
   5748         if (is_float(label) or is_integer(label)) and label not in self._values:
-> 5749             raise self._invalid_indexer("slice", label)
   5750 
   5751         return label


TypeError: cannot do slice indexing on Index with these indexers [0] of type int

정렬

  • sort.values()
1
juice.sort_values(by = ['Revenue']).head(3) # 오름차순

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-8cc684a9-4a06-4feb-934c-53ece2672c41 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-8cc684a9-4a06-4feb-934c-53ece2672c41');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.sort_values(by = ['Revenue'], ascending=False).head(3) # 내림차순

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-e6dc6ec1-85e1-4d62-a618-1841af7f4df7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-e6dc6ec1-85e1-4d62-a618-1841af7f4df7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
juice.sort_values(by = ['Price', 'Temperature'], ascending=False) # 그룹화(0.5일때 나열, 0.35일때 나열)

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
24 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
23 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
22 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
19 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
17 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
21 7/21/2016 Park 83 50 77 90.0 0.50 133 66.50 Beach
15 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
16 7/16/2016 Beach 81 50 74 90.0 0.50 131 65.50 Beach
20 7/20/2016 Park 71 42 70 NaN 0.50 113 56.50 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach
28 7/28/2016 Park 96 63 82 90.0 0.35 159 55.65 Beach
30 7/30/2016 Beach 88 57 82 81.0 0.35 145 50.75 Beach
31 7/31/2016 Beach 76 47 82 68.0 0.35 123 43.05 Beach
29 7/29/2016 Park 100 66 81 95.0 0.35 166 58.10 Beach
27 7/27/2016 Park 104 68 80 99.0 0.35 172 60.20 Beach
12 7/12/2016 Beach 130 95 84 99.0 0.25 225 56.25 Beach
11 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
5 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
8 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
10 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
9 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
14 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
13 7/13/2016 Beach 109 75 77 99.0 0.25 184 46.00 Beach
3 7/4/2016 Beach 134 99 76 98.0 0.25 233 58.25 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-1b835c37-2512-40ad-b58d-8bb74607a0ee button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-1b835c37-2512-40ad-b58d-8bb74607a0ee');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
# Price는 내림차순 , Temperature은 오름차순 
juice.sort_values(by = ['Price', 'Temperature'], ascending=[False, True])

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
20 7/20/2016 Park 71 42 70 NaN 0.50 113 56.50 Beach
16 7/16/2016 Beach 81 50 74 90.0 0.50 131 65.50 Beach
15 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
17 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
21 7/21/2016 Park 83 50 77 90.0 0.50 133 66.50 Beach
19 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
22 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
18 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
23 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
24 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
25 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
27 7/27/2016 Park 104 68 80 99.0 0.35 172 60.20 Beach
29 7/29/2016 Park 100 66 81 95.0 0.35 166 58.10 Beach
28 7/28/2016 Park 96 63 82 90.0 0.35 159 55.65 Beach
30 7/30/2016 Beach 88 57 82 81.0 0.35 145 50.75 Beach
31 7/31/2016 Beach 76 47 82 68.0 0.35 123 43.05 Beach
26 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach
0 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
2 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
1 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
3 7/4/2016 Beach 134 99 76 98.0 0.25 233 58.25 Beach
13 7/13/2016 Beach 109 75 77 99.0 0.25 184 46.00 Beach
4 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
14 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
9 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
7 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
5 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
6 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
8 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
10 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
11 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
12 7/12/2016 Beach 130 95 84 99.0 0.25 225 56.25 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-50367d11-88b7-4011-923b-a3d85935e6a7 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-50367d11-88b7-4011-923b-a3d85935e6a7');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
# 정보를 업데이트 및 정렬을 할떄 reset_index 사용
juice2 = juice.sort_values(by = ['Price', 'Temperature'], ascending=[False, True]).reset_index(drop=True)
juice2

Date Location Lemon Orange Temperature Leaflets Price Sold Revenue location
0 7/20/2016 Park 71 42 70 NaN 0.50 113 56.50 Beach
1 7/16/2016 Beach 81 50 74 90.0 0.50 131 65.50 Beach
2 7/15/2016 Beach 98 62 75 108.0 0.50 160 80.00 Beach
3 7/17/2016 Beach 115 76 77 126.0 0.50 191 95.50 Beach
4 7/21/2016 Park 83 50 77 90.0 0.50 133 66.50 Beach
5 7/19/2016 Park 122 85 78 113.0 0.50 207 103.50 Beach
6 7/22/2016 Park 112 75 80 108.0 0.50 187 93.50 Beach
7 7/18/2016 Park 131 92 81 122.0 0.50 223 111.50 Beach
8 7/23/2016 Park 120 82 81 117.0 0.50 202 101.00 Beach
9 7/24/2016 Park 121 82 82 117.0 0.50 203 101.50 Beach
10 7/25/2016 Park 156 113 84 135.0 0.50 269 134.50 Beach
11 7/27/2016 Park 104 68 80 99.0 0.35 172 60.20 Beach
12 7/29/2016 Park 100 66 81 95.0 0.35 166 58.10 Beach
13 7/28/2016 Park 96 63 82 90.0 0.35 159 55.65 Beach
14 7/30/2016 Beach 88 57 82 81.0 0.35 145 50.75 Beach
15 7/31/2016 Beach 76 47 82 68.0 0.35 123 43.05 Beach
16 7/26/2016 Park 176 129 83 158.0 0.35 305 106.75 Beach
17 7/1/2016 Park 97 67 70 90.0 0.25 164 41.00 Beach
18 7/3/2016 Park 110 77 71 104.0 0.25 187 46.75 Beach
19 7/2/2016 Park 98 67 72 90.0 0.25 165 41.25 Beach
20 7/4/2016 Beach 134 99 76 98.0 0.25 233 58.25 Beach
21 7/13/2016 Beach 109 75 77 99.0 0.25 184 46.00 Beach
22 7/5/2016 Beach 159 118 78 135.0 0.25 277 69.25 Beach
23 7/14/2016 Beach 122 85 78 113.0 0.25 207 51.75 Beach
24 7/9/2016 Beach 134 95 80 126.0 0.25 229 57.25 Beach
25 7/7/2016 Beach 143 101 81 135.0 0.25 244 61.00 Beach
26 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
27 7/6/2016 Beach 103 69 82 90.0 0.25 172 43.00 Beach
28 NaN Beach 123 86 82 113.0 0.25 209 52.25 Beach
29 7/10/2016 Beach 140 98 82 131.0 0.25 238 59.50 Beach
30 7/11/2016 Beach 162 120 83 135.0 0.25 282 70.50 Beach
31 7/12/2016 Beach 130 95 84 99.0 0.25 225 56.25 Beach

  <script>
    const buttonEl =
      document.querySelector('#df-5bba3bc3-d034-4d0b-894c-29ef5ffadfa0 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-5bba3bc3-d034-4d0b-894c-29ef5ffadfa0');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>

Groupby ()

  • 데이터 요약(피벗테이블)
  • R dplyr groupby() %>% summarize()
1
juice.groupby(by = 'Location').count()

Date Lemon Orange Temperature Leaflets Price Sold Revenue location
Location
Beach 16 17 17 17 17 17 17 17 17
Park 15 15 15 15 14 15 15 15 15

  <script>
    const buttonEl =
      document.querySelector('#df-ff4cd12f-13fc-47fd-8d42-3adcf0dba627 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-ff4cd12f-13fc-47fd-8d42-3adcf0dba627');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
1
2
3
import numpy as np

juice.groupby(['Location'])['Revenue'].agg([max, min, sum, np.mean])

max min sum mean
Location
Beach 95.5 43.0 1002.8 58.988235
Park 134.5 41.0 1178.2 78.546667

  <script>
    const buttonEl =
      document.querySelector('#df-615cb7b2-8595-47fc-afd8-67653b7f8d14 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-615cb7b2-8595-47fc-afd8-67653b7f8d14');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>