[numpy] 넘파이 데이터 가져오기

티스토리 뷰

인공지능(Artificial Intelligence)/python

[numpy] 넘파이 데이터 가져오기

HAN_PY 2021. 1. 8. 15:46

데이터를 불러오는 방법은 많다. 오늘은 numpy를 이용하여 데이터를 불러오겠다.

데이터 다운받기

MovieLens | GroupLens

실습이 필요하신 분은 위의 사이트에 들어가서 영화평점 데이터를 다운받아서 사용해보자.(초보자는 용량이 작은 것을 추천한다.)

numpy.loadtxt

사실 loadtxt의 Parameters는 굉장히 많다. 그러나 많이 쓰지 않는다. 빠른 이해를 돕기위해 예시부터 가져왔다.

data = np.loadtxt("C:/Users/hanpy/OneDrive/datasets/movielens/ml-1m/ratings.dat", delimiter="::", dtype=np.int64)

다운받은 파일중에 ratings.dat을 불러왔다. 데이터를 뽑으면 다음과 같이 변화한다.

# 초기 데이터
# UserID::Gender::Age::Occupation::Zip-code
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
...


# data에 담긴 데이터 (앞에서 5개만 뽑았다)
> data[:5,:]
array([[        1,      1193,         5, 978300760],
       [        1,       661,         3, 978302109],
       [        1,       914,         3, 978301968],
       [        1,      3408,         4, 978300275],
       [        1,      2355,         5, 978824291]], dtype=int64)

delimiter는 위의 예에서 말 수 있듯, :: 별로 구분해서 리스트를 넣어준다.
dtype은 결과값의 배열의 type을 적어준다.

추가적인 내용 데이터 내용을 정리해 봤다. 참고하면 좋을 것이다.

USERS FILE DESCRIPTION("users.dat")

# example (6040개)
1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455

UserID::Gender::Age::Occupation::Zip-code

Gender is denoted by a "M" for male and "F" for female
Age is chosen from the following ranges:
- 1: "Under 18"
- 18: "18-24"
- 25: "25-34"
- 35: "35-44"
- 45: "45-49"
- 50: "50-55"
- 56: "56+"
Occupation is chosen from the following choices:
- 0: "other" or not specified
- 1: "academic/educator"
- 2: "artist"
- 3: "clerical/admin"
- 4: "college/grad student"
- 5: "customer service"
- 6: "doctor/health care"
- 7: "executive/managerial"
- 8: "farmer"
- 9: "homemaker"
- 10: "K-12 student"
- 11: "lawyer"
- 12: "programmer"
- 13: "retired"
- 14: "sales/marketing"
- 15: "scientist"
- 16: "self-employed"
- 17: "technician/engineer"
- 18: "tradesman/craftsman"
- 19: "unemployed"
- 20: "writer"

MOVIES FILE DESCRIPTION("movies.dat")

# example (3884개)
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy

MovieID::Title::Genres

제목에 출시연도 포함
Genres |로 여러개 표시가능
- Action
- Adventure
- Animation
- Children's
- Comedy
- Crime
- Documentary
- Drama
- Fantasy
- Film-Noir
- Horror
- Musical
- Mystery
- Romance
- Sci-Fi
- Thriller
- War
- Western
일부 누락되어 ID와 제목이 다르다. 정확도 낮다.

RATINGS FILE DESCRIPTION("ratings.dat")

#example(1000209개)
1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
...
6040::1096::4::956715648
6040::1097::4::956715569

UserID::MovieID::Rating::Timestamp

UserIDs range between 1 and 6040
MovieIDs range between 1 and 3952
Ratings are made on a 5-star scale (whole-star ratings only)
Timestamp is represented in seconds since the epoch as returned by time
한 사람당 20개 이상의 평점이 담겨있다.

'인공지능(Artificial Intelligence) > python' 카테고리의 다른 글

[pandas] 시리즈(Series) 기초정리 (0)	2021.01.09
공공데이터(XML, JSON)을 python으로 불러오기_기초 (1)	2021.01.08
python if문 기초 정리 (0)	2020.12.23
python 올림, 내림, 반올림 기초정리 (0)	2020.12.22
argparse 사용법 (0)	2020.09.21

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

글 보관함

AI Platform / Web

티스토리 뷰

[numpy] 넘파이 데이터 가져오기

데이터 다운받기

numpy.loadtxt

USERS FILE DESCRIPTION("users.dat")

MOVIES FILE DESCRIPTION("movies.dat")

RATINGS FILE DESCRIPTION("ratings.dat")

'인공지능(Artificial Intelligence) > python' 카테고리의 다른 글

티스토리툴바