import pandas as pd
import numpy as np


# DataFrame 생성시 index, column 설정안하면 정수형 숫자로 기본값이 입력된다.
df = pd.DataFrame(np.random.randn(10,3)) # 랜덤값 10x3개의 np array객체입력
df.columns = ["one", "two", "three"]
df


# columns 리스트가 매우 많을때 쉽게 나열해 볼수 있는 방법
df_col_check = pd.DataFrame(df.columns)
df_col_check


# shift + tab + tab 클릭시 api정보 확인 할 수 있으나
# 가끔 안뜨는 경우가 있어서 필요시 print해서 확인 한다.

print(df.loc.__doc__)

Access a group of rows and columns by label(s) or a boolean array.

``.loc[]`` is primarily label based, but may also be used with a
boolean array.

Allowed inputs are:

- A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
  interpreted as a *label* of the index, and **never** as an
  integer position along the index).
- A list or array of labels, e.g. ``['a', 'b', 'c']``.
- A slice object with labels, e.g. ``'a':'f'``.

  .. warning:: Note that contrary to usual python slices, **both** the
      start and the stop are included

- A boolean array of the same length as the axis being sliced,
  e.g. ``[True, False, True]``.
- An alignable boolean Series. The index of the key will be aligned before
  masking.
- An alignable Index. The Index of the returned selection will be the input.
- A ``callable`` function with one argument (the calling Series or
  DataFrame) and that returns valid output for indexing (one of the above)

See more at :ref:`Selection by Label <indexing.label>`.

Raises
------
KeyError
    If any items are not found.
IndexingError
    If an indexed key is passed and its index is unalignable to the frame index.

See Also
--------
DataFrame.at : Access a single value for a row/column label pair.
DataFrame.iloc : Access group of rows and columns by integer position(s).
DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the
    Series/DataFrame.
Series.loc : Access group of values using labels.

Examples
--------
**Getting values**

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using ``[[]]`` returns a DataFrame.

>>> df.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

>>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included.

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

Alignable boolean Series:

>>> df.loc[pd.Series([False, True, False],
...        index=['viper', 'sidewinder', 'cobra'])]
            max_speed  shield
sidewinder          7       8

Index (same behavior as ``df.reindex``)

>>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
       max_speed  shield
foo
cobra          1       2
viper          4       5

Conditional that returns a boolean Series

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

Callable that returns a boolean Series

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

**Setting values**

Set value for all items matching the list of labels

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

**Getting values on a DataFrame with an index that has integer labels**

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.

>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

**Getting values with a MultiIndex**

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this
returns a Series.

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using ``[[]]`` returns a DataFrame.

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

>>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Slice from index tuple to index tuple

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1

Please see the :ref:`user guide<advanced.advanced_hierarchical>`
for more details and explanations of advanced indexing.


df.loc[1]

one     -0.896467
two      0.323917
three   -0.015165
Name: 1, dtype: float64


#df.loc[ df.index == 9 , :] # row값 9인 모든 column값, : => min:max 으로 All을 의미함
df.loc[ 0:1, :]


df.loc[0:3, ['one', 'two']]


print(df.iloc.__doc__)

Purely integer-location based indexing for selection by position.

``.iloc[]`` is primarily integer position based (from ``0`` to
``length-1`` of the axis), but may also be used with a boolean
array.

Allowed inputs are:

- An integer, e.g. ``5``.
- A list or array of integers, e.g. ``[4, 3, 0]``.
- A slice object with ints, e.g. ``1:7``.
- A boolean array.
- A ``callable`` function with one argument (the calling Series or
  DataFrame) and that returns valid output for indexing (one of the above).
  This is useful in method chains, when you don't have a reference to the
  calling object, but would like to base your selection on some value.
- A tuple of row and column indexes. The tuple elements consist of one of the
  above inputs, e.g. ``(0, 1)``.

``.iloc`` will raise ``IndexError`` if a requested indexer is
out-of-bounds, except *slice* indexers which allow out-of-bounds
indexing (this conforms with python/numpy *slice* semantics).

See more at :ref:`Selection by Position <indexing.integer>`.

See Also
--------
DataFrame.iat : Fast integer location scalar accessor.
DataFrame.loc : Purely label-location based indexer for selection by label.
Series.iloc : Purely integer-location based indexing for
               selection by position.

Examples
--------
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

**Indexing just the rows**

With a scalar integer.

>>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

With a list of integers.

>>> df.iloc[[0]]
   a  b  c  d
0  1  2  3  4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>

>>> df.iloc[[0, 1]]
     a    b    c    d
0    1    2    3    4
1  100  200  300  400

With a `slice` object.

>>> df.iloc[:3]
      a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

With a boolean mask the same length as the index.

>>> df.iloc[[True, False, True]]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

With a callable, useful in method chains. The `x` passed
to the ``lambda`` is the DataFrame being sliced. This selects
the rows whose index label even.

>>> df.iloc[lambda x: x.index % 2 == 0]
      a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

**Indexing both axes**

You can mix the indexer types for the index and columns. Use ``:`` to
select the entire axis.

With scalar integers.

>>> df.iloc[0, 1]
2

With lists of integers.

>>> df.iloc[[0, 2], [1, 3]]
      b     d
0     2     4
2  2000  4000

With `slice` objects.

>>> df.iloc[1:3, 0:3]
      a     b     c
1   100   200   300
2  1000  2000  3000

With a boolean array whose length matches the columns.

>>> df.iloc[:, [True, False, True, False]]
      a     c
0     1     3
1   100   300
2  1000  3000

With a callable function that expects the Series or DataFrame.

>>> df.iloc[:, lambda df: [0, 2]]
      a     c
0     1     3
1   100   300
2  1000  3000


df.iloc[2]

one      1.093343
two     -2.020293
three    0.099026
Name: 2, dtype: float64


df.iloc[1, 2] # row 1, cal 2 에 해당하는 값

-0.015164940108393593


#df.iloc[:, 0:3]
df.iloc[:3, :2] # 0~3, 0~2, df.iloc[0:3, 0:2] 와 같음


# Column 조건문으로 행 추출하기
df["two"].values > 0.3
#df[df["two"].values > 0.3]
#df["two"].isin([-0.643932, 0.417865])

array([False,  True, False, False, False,  True,  True, False, False,
       False])


df["four"] = [np.nan, np.nan, np.nan, 4, 5, 6, 7, 8, 9, 10]
df


df.dropna() # 데이터 내에 있는 모든 결측치 제거


df.fillna(value=-1) # nan값에 값 넣기


df.isnull()


df.isnull()["four"]

0     True
1     True
2     True
3    False
4    False
5    False
6    False
7    False
8    False
9    False
Name: four, dtype: bool


df.loc[df.isnull()["four"],:]


df["four"].unique() # 특정 column에 중복값을 제외한 unique한 값 array

array([nan,  4.,  5.,  6.,  7.,  8.,  9., 10.])


np.unique(df["four"])

array([ 4.,  5.,  6.,  7.,  8.,  9., 10., nan])


len(np.unique(df["four"]))

8


mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
          {'a': 100, 'b': 200, 'c': 300, 'd': 400},
          {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
df_dict = pd.DataFrame(mydict)
df_dict


df_dict = df_dict.drop(0, axis=0)
df_dict


df_dict = df_dict.drop("b", axis=1)
df_dict

	one	two	three
0	-1.141389	-1.597709	-0.593118
1	-0.896467	0.323917	-0.015165
2	1.093343	-2.020293	0.099026
3	1.971081	-0.379402	0.751828
4	-0.569713	-0.061500	-0.489205
5	-0.502000	0.545283	0.037430
6	1.407763	2.647069	0.217019
7	-1.627089	-0.195426	0.157814
8	-1.544826	-1.033046	-0.704031
9	-0.919720	-0.968175	-1.126654

	one	two	three	four
0	-1.141389	-1.597709	-0.593118	NaN
1	-0.896467	0.323917	-0.015165	NaN
2	1.093343	-2.020293	0.099026	NaN
3	1.971081	-0.379402	0.751828	4.0
4	-0.569713	-0.061500	-0.489205	5.0
5	-0.502000	0.545283	0.037430	6.0
6	1.407763	2.647069	0.217019	7.0
7	-1.627089	-0.195426	0.157814	8.0
8	-1.544826	-1.033046	-0.704031	9.0
9	-0.919720	-0.968175	-1.126654	10.0

	one	two	three	four
3	1.971081	-0.379402	0.751828	4.0
4	-0.569713	-0.061500	-0.489205	5.0
5	-0.502000	0.545283	0.037430	6.0
6	1.407763	2.647069	0.217019	7.0
7	-1.627089	-0.195426	0.157814	8.0
8	-1.544826	-1.033046	-0.704031	9.0
9	-0.919720	-0.968175	-1.126654	10.0

	one	two	three	four
0	-1.141389	-1.597709	-0.593118	-1.0
1	-0.896467	0.323917	-0.015165	-1.0
2	1.093343	-2.020293	0.099026	-1.0
3	1.971081	-0.379402	0.751828	4.0
4	-0.569713	-0.061500	-0.489205	5.0
5	-0.502000	0.545283	0.037430	6.0
6	1.407763	2.647069	0.217019	7.0
7	-1.627089	-0.195426	0.157814	8.0
8	-1.544826	-1.033046	-0.704031	9.0
9	-0.919720	-0.968175	-1.126654	10.0

Python Pandas 기본 (Series, DataFrame) (0)	2023.08.31
Python Jupyter 설치 (0)	2023.08.28

IT GOGO

티스토리 뷰

Python Pandas / loc, iloc, Nan, unique, drop

Python Pandas DataFrame 데이터 활용2

loc vs iloc 비교, Nan (결측값), unique, drop

BASIC_03.Pandas_DataFrame.ipynb¶

loc vs iloc¶

─────────────────────────────────────¶

NaN 결측값 다루기¶

unique¶

drop¶

'Computer > Python' 카테고리의 다른 글

티스토리툴바

	one	two	three	four
0	False	False	False	True
1	False	False	False	True
2	False	False	False	True
3	False	False	False	False
4	False	False	False	False
5	False	False	False	False
6	False	False	False	False
7	False	False	False	False
8	False	False	False	False
9	False	False	False	False