-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Homework #5
base: master
Are you sure you want to change the base?
Homework #5
Conversation
|
||
@Override | ||
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { | ||
final String[] line = value.toString().trim().split("\t", 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
По моему мнению, regex - последнее что нужно использовать в системах, который пытаются быть производительными.
Расскажите: почему вы использовали регекспы
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Наташа скинула регулярку и сказала сделать по ней, чтоб результаты сошлись с ее результатами
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
окей, это аргумент 😄
а почему нет использования uppercase или lowercase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В этом вордкаунте мы считаем, что "слово" и "Слово" - разные слова
на базе этого в дальнейшем строится определение, является ли слово именем)
да и просто, это более гибкая штука, результаты можно потом обработать так, как нам нужно
|
||
int pos = line.indexOf(0x09); | ||
|
||
String inputString = line.substring(0, pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
проверка на валидность pos была бы кстати
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ну вообще, наверное, да
String inputString = line.substring(0, pos); | ||
int inputCount = Integer.valueOf(line.substring(pos+1)); | ||
|
||
context.write(new Text(inputString.toLowerCase()), new IntWritable(inputCount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
job.setOutputValueClass(IntWritable.class); | ||
|
||
// читаем из выходной папки wordcount | ||
FileInputFormat.addInputPath(job, new Path(args[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
что будет, если args не валидное?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
эксепшн)
мы не заморачивались с этим
это же учебный курс, мы делали акцент на других вещах
No description provided.