티스토리 뷰
반응형
https://www.edwith.org/aipython/lecture/23091/
위의 강의 내용을 참고하여 작성하였습니다.
Collections 모듈
python에서는 자료구조에 대한 모듈을 Collections 라는 모듈에 포함한다.
from collections import deque
from collections import Counter
from collections import OrderedDict
from collections import defaultdict
from collections import namedtuple
deque
- Stack와 Queue을 지원
- List에 비해 효율적인 자료 저장 방식을 지원함
>>> from collections import deque
>>>
>>> deque_list = deque()
>>> for i in range(5):
... deque_list.append(i)
...
>>> print(deque_list)
deque([0, 1, 2, 3, 4])
>>>
>>> deque_list.appendleft(10)
>>> print(deque_list)
deque([10, 0, 1, 2, 3, 4])
>>>
>>> deque_list.rotate(2)
>>> print(deque_list)
deque([3, 4, 10, 0, 1, 2])
>>>
>>> deque_list.rotate(2)
>>> print(deque_list)
deque([1, 2, 3, 4, 10, 0])
>>>
>>> print(deque_list)
deque([1, 2, 3, 4, 10, 0])
>>> print(deque(reversed(deque_list)))
deque([0, 10, 4, 3, 2, 1])
>>>
>>> deque_list.extend([5, 6, 7])
>>> print(deque_list)
deque([1, 2, 3, 4, 10, 0, 5, 6, 7])
>>>
>>> deque_list.extendleft([5, 6, 7])
>>> print(deque_list)
deque([7, 6, 5, 1, 2, 3, 4, 10, 0, 5, 6, 7])
>>>
OrderedDict
- Dict는 저장한 순서로 저장되지 않는다.
- OrderedDict는 저장한 순서로 저장한다.
- 즉, value나 key 값을 기준으로 저장을 할 때 순서에 맞게 저장이 가능하다.
>>> from collections import OrderedDict
>>>
>>> d = {}
>>> d['x'] = 100
>>> d['y'] = 200
>>> d['z'] = 300
>>> d['l'] = 500
>>>
>>> for k, v in d.items():
... print(k, v)
...
x 100
y 200
z 300
l 500
>>>
>>> d = OrderedDict()
>>> d['x'] = 100
>>> d['y'] = 200
>>> d['z'] = 300
>>> d['l'] = 500
>>>
>>> for k, v in d.items():
... print(k, v)
...
x 100
y 200
z 300
l 500
>>> for k, v in OrderedDict(sorted(d.items(), key=lambda t: t[0])).items():
... print(k, v)
...
l 500
x 100
y 200
z 300
>>> for k, v in OrderedDict(sorted(d.items(), key=lambda t: t[1])).items():
... print(k, v)
...
x 100
y 200
z 300
l 500
>>> for k, v in OrderedDict(sorted(d.items(),
... reverse=True, key=lambda t: t[1])).items():
... print(k, v)
...
l 500
z 300
y 200
x 100
defaultdict
- dict 보다 많이 쓰인다.
- dict 값에 기본 값을 지정해서 신규 값이 생성 시 사용하는 방법이다.
>>> d = dict()
>>> print(d["first"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'first'
dict를 만든 후에 first라는 키값을 넣어주면, 당연히 비어있기 때문에 KeyError가 뜬다.
>>> from collections import defaultdict
>>> d = defaultdict(object) # Default dictionary를 생성
>>> d = defaultdict(lambda: 1) # Default 값을 1으로 설정합
>>> print(d["first"])
1
위와 같이 초기 값이 없을 때 defaultdict를 만들어 놓고 하면 편하다.
글자를 카운터 할 때 defaultdict를 쓰고 안 쓰고가 차이가 있다. 글자 수 세기를 구현해보자.
from collections import defaultdict
from collections import OrderedDict
text = """A press release is the quickest and easiest way to get free publicity. If well written, a press release can result in multiple published articles about your firm and its products. And that can mean new prospects contacting you asking you to sell to them. Talk about low-hanging fruit!
What's more, press releases are cost effective. If the release results in an article that (for instance) appears to recommend your firm or your product, that article is more likely to drive prospects to contact you than a comparable paid advertisement.
However, most press releases never accomplish that. Most press releases are just spray and pray. Nobody reads them, least of all the reporters and editors for whom they're intended. Worst case, a badly-written press release simply makes your firm look clueless and stupid.
For example, a while back I received a press release containing the following sentence: "Release 6.0 doubles the level of functionality available, providing organizations of all sizes with a fast-to-deploy, highly robust, and easy-to-use solution to better acquire, retain, and serve customers."
Translation: "The new release does more stuff." Why the extra verbiage? As I explained in the post "Why Marketers Speak Biz Blab", the BS words are simply a way to try to make something unimportant seem important. And, let's face it, a 6.0 release of a product probably isn't all that important.
As a reporter, my immediate response to that press release was that it's not important because it expended an entire sentence saying absolutely nothing. And I assumed (probably rightly) that the company's marketing team was a bunch of idiots.""".lower().split()
print(text)
# 일반 dic를 사용
word_count = {}
for word in text:
if word in word_count.keys():
word_count[word] += 1
else:
word_count[word] = 1
print(word_count)
# defaultdict를 사용
word_count = defaultdict(object) # Default dictionary를 생성
word_count = defaultdict(lambda: 0) # Default 값을 0으로 설정합
for word in text:
word_count[word] += 1
for i, v in OrderedDict(sorted(
word_count.items(), key=lambda t: t[1], reverse=True)).items():
print(i, v)
Counter
- Sequence type의 data element들의 갯수를 dict 형태로 반환한다.
>>> from collections import Counter
>>>
>>> c = Counter() # a new, empty counter
>>> c = Counter('gallahad') # a new counter from an iterable
>>> print(c)
Counter({'a': 3, 'l': 2, 'g': 1, 'h': 1, 'd': 1})
>>>
>>> c = Counter({'red': 4, 'blue': 2}) # a new counter from a mapping
>>> print(c)
Counter({'red': 4, 'blue': 2})
>>> print(list(c.elements()))
['red', 'red', 'red', 'red', 'blue', 'blue']
>>>
>>> c = Counter(cats=4, dogs=8) # a new counter from keyword args
>>> print(c)
Counter({'dogs': 8, 'cats': 4})
>>> print(list(c.elements()))
['cats', 'cats', 'cats', 'cats', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs']
>>>
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> d = Counter(a=1, b=2, c=3, d=4)
>>> c.subtract(d) # c- d
>>> print(c)
Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})
>>>
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> d = Counter(a=1, b=2, c=3, d=4)
>>> print(c + d)
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2})
>>> print(c & d)
Counter({'b': 2, 'a': 1})
>>> print(c | d)
Counter({'a': 4, 'd': 4, 'c': 3, 'b': 2})
>>>
>>> text = """A press release is the quickest and easiest way to get free publicity. If well written, a press release can result in multiple published articles about your firm and its products. And that can mean new prospects contacting you asking you to sell to them. Talk about low-hanging fruit!
... What's more, press releases are cost effective. If the release results in an article that (for instance) appears to recommend your firm or your product, that article is more likely to drive prospects to contact you than a comparable paid advertisement.
... However, most press releases never accomplish that. Most press releases are just spray and pray. Nobody reads them, least of all the reporters and editors for whom they're intended. Worst case, a badly-written press release simply makes your firm look clueless and stupid.
... For example, a while back I received a press release containing the following sentence: "Release 6.0 doubles the level of functionality available, providing organizations of all sizes with a fast-to-deploy, highly robust, and easy-to-use solution to better acquire, retain, and serve customers."
... Translation: "The new release does more stuff." Why the extra verbiage? As I explained in the post "Why Marketers Speak Biz Blab", the BS words are simply a way to try to make something unimportant seem important. And, let's face it, a 6.0 release of a product probably isn't all that important.
... As a reporter, my immediate response to that press release was that it's not important because it expended an entire sentence saying absolutely nothing. And I assumed (probably rightly) that the company's marketing team was a bunch of idiots.""".lower().split()
>>> print(Counter(text))
Counter({'a': 12, 'to': 10, 'the': 9, 'and': 9, 'press': 8, 'release': 8, 'that': 7, 'of': 5, 'your': 4, 'in': 3, 'firm': 3, 'you': 3, 'releases': 3, 'are': 3, 'all': 3, 'i': 3, 'is': 2, 'way': 2, 'if': 2, 'can': 2, 'about': 2, 'new': 2, 'prospects': 2, 'an': 2, 'article': 2, 'more': 2, 'most': 2, 'for': 2, 'simply': 2, '6.0': 2, 'as': 2, 'important.': 2, 'was': 2, 'quickest': 1, 'easiest': 1, 'get': 1, 'free': 1, 'publicity.': 1, 'well': 1, 'written,': 1, 'result': 1, 'multiple': 1, 'published': 1, 'articles': 1, 'its': 1, 'products.': 1, 'mean': 1, 'contacting': 1, 'asking': 1, 'sell': 1, 'them.': 1, 'talk': 1, 'low-hanging': 1, 'fruit!': 1, "what's": 1, 'more,': 1, 'cost': 1, 'effective.': 1, 'results': 1, '(for': 1, 'instance)': 1, 'appears': 1, 'recommend': 1, 'or': 1, 'product,': 1, 'likely': 1, 'drive': 1, 'contact': 1, 'than': 1, 'comparable': 1, 'paid': 1, 'advertisement.': 1, 'however,': 1, 'never': 1, 'accomplish': 1, 'that.': 1, 'just': 1, 'spray': 1, 'pray.': 1, 'nobody': 1, 'reads': 1, 'them,': 1, 'least': 1, 'reporters': 1, 'editors': 1, 'whom': 1, "they're": 1, 'intended.': 1, 'worst': 1, 'case,': 1, 'badly-written': 1, 'makes': 1, 'look': 1, 'clueless': 1, 'stupid.': 1, 'example,': 1, 'while': 1, 'back': 1, 'received': 1, 'containing': 1, 'following': 1, 'sentence:': 1, '"release': 1, 'doubles': 1, 'level': 1, 'functionality': 1, 'available,': 1, 'providing': 1, 'organizations': 1, 'sizes': 1, 'with': 1, 'fast-to-deploy,': 1, 'highly': 1, 'robust,': 1, 'easy-to-use': 1, 'solution': 1, 'better': 1, 'acquire,': 1, 'retain,': 1, 'serve': 1, 'customers."': 1, 'translation:': 1, '"the': 1, 'does': 1, 'stuff."': 1, 'why': 1, 'extra': 1, 'verbiage?': 1, 'explained': 1, 'post': 1, '"why': 1, 'marketers': 1, 'speak': 1, 'biz': 1, 'blab",': 1, 'bs': 1, 'words': 1, 'try': 1, 'make': 1, 'something': 1, 'unimportant': 1, 'seem': 1, 'and,': 1, "let's": 1, 'face': 1, 'it,': 1, 'product': 1, 'probably': 1, "isn't": 1, 'reporter,': 1, 'my': 1, 'immediate': 1, 'response': 1, "it's": 1, 'not': 1, 'important': 1, 'because': 1, 'it': 1, 'expended': 1, 'entire': 1, 'sentence': 1, 'saying': 1, 'absolutely': 1, 'nothing.': 1, 'assumed': 1, '(probably': 1, 'rightly)': 1, "company's": 1, 'marketing': 1, 'team': 1, 'bunch': 1, 'idiots.': 1})
>>> print(Counter(text)["a"])
12
반응형
'인공지능(Artificial Intelligence) > python' 카테고리의 다른 글
argparse 사용법 (0) | 2020.09.21 |
---|---|
NumPy 1.기초 (0) | 2020.09.14 |
pythonic Code_Asterisk(*) (0) | 2020.07.10 |
Pythonic Code_Lambda&Map,Reduce (0) | 2020.07.10 |
pythonic Code_Enumerate, Zip (0) | 2020.07.09 |
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- nextjs autoFocus
- react
- read_csv
- logout
- pandas
- useHistory 안됨
- typescript
- login
- UserCreationForm
- error:0308010C:digital envelope routines::unsupported
- next.config.js
- JavaScript
- react autoFocus
- NextJS
- 자료구조
- Express
- Vue
- useState
- nodejs
- vuejs
- 자연어처리
- DFS
- Queue
- 클라우데라
- Python
- mongoDB
- TensorFlow
- django
- BFS
- Deque
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
글 보관함