#! /usr/bin/env python
from mrjob.job import MRJob
class LetterCount(MRJob):
def mapper(self, key, value):
for word in value.split():
for letter in list(word):
yield letter.lower(), 1
def reducer(self, key, value):
yield key, sum(value)
if __name__ == '__main__': LetterCount.run()The input and output files can be passed in like so:
We pass dict.txt--just a text file with about 118,000 words, one per line--as the input and we specify lettercounts.txt as the output. The '<' and '>' symbols are called 'redirect' operators for stdin and stdout, respectively. Below is some sample output. Observe that 'e' is the most frequent letter, a fact that makes code breaking slightly easier.


No comments:
Post a Comment