-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19311: [ABFS] Implement Backoff and Read Footer metrics using IOStatistics Class #7122
base: trunk
Are you sure you want to change the base?
HADOOP-19311: [ABFS] Implement Backoff and Read Footer metrics using IOStatistics Class #7122
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
============================================================
|
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the recent changes, design looks simpler now.
Added a few more minor comments. Rest looks good to me.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsCountersImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/enums/RetryValue.java
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/enums/FileType.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Show resolved
Hide resolved
private void getRetryMetrics(StringBuilder metricBuilder) { | ||
for (RetryValue retryCount : RETRY_LIST) { | ||
long totalRequests = getMetricValue(TOTAL_REQUESTS, retryCount); | ||
metricBuilder.append("$RCTSI$_").append(retryCount.getValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
All these constants are used specifically for Metric work. They qualify to be added into MetricConstants
class. It make sesne to retain that class and add all these constants there itself.
...adoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/enums/AbfsReadFooterMetricsEnum.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Outdated
Show resolved
Hide resolved
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/statistics/package-info.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsCountersImpl.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBackoffMetrics.java
Show resolved
Hide resolved
...hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsReadFooterMetrics.java
Show resolved
Hide resolved
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a minor comment...
Rest LGTM
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/enums/FileType.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
@steveloughran @mukund-thakur @saikatroy038 Could you please review the PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankyou for the PR. I've suggested few changes. Please take a look. Thanks
*/ | ||
public class AbfsBackoffMetrics extends AbstractAbfsStatisticsSource { | ||
private static final List<RetryValue> RETRY_LIST = Arrays.asList( | ||
ONE, TWO, THREE, FOUR, FIVE_FIFTEEN, FIFTEEN_TWENTY_FIVE, TWENTY_FIVE_AND_ABOVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays.asList(RetryValue.values())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
return Arrays.stream(AbfsBackoffMetricsEnum.values()) | ||
.filter(backoffMetricsEnum -> backoffMetricsEnum.getStatisticType().equals(type)) | ||
.flatMap(backoffMetricsEnum -> | ||
RETRY.equals(backoffMetricsEnum.getType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call getMetricName
in flatMap instead of repeating the whole logic:
flatMap(backoffMetricsEnum -> RETRY_LIST.stream().map(retryCount -> getMetricName(backoffMetricsEnum, retryCount)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
* @return the constructed metric name | ||
*/ | ||
private String getMetricName(AbfsBackoffMetricsEnum metric, RetryValue retryValue) { | ||
if (RETRY.equals(metric.getType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null check retryValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
* @param retryValue the retry value | ||
* @return the constructed metric name | ||
*/ | ||
private String getMetricName(AbfsBackoffMetricsEnum metric, RetryValue retryValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null check metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
*/ | ||
public class AbfsReadFooterMetrics extends AbstractAbfsStatisticsSource { | ||
private static final String FOOTER_LENGTH = "20"; | ||
private static final List<FileType> FILE_TYPE_LIST = Arrays.asList(PARQUET, NON_PARQUET); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays.asList(FileType.values())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
.filter(readFooterMetricsEnum -> readFooterMetricsEnum.getStatisticType().equals(type)) | ||
.flatMap(readFooterMetricsEnum -> | ||
FILE.equals(readFooterMetricsEnum.getType()) | ||
? FILE_TYPE_LIST.stream().map(fileType -> fileType + COLON + readFooterMetricsEnum.getName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating a helper method to encapsulate the logic fileType + COLON + readFooterMetricsEnum.getName()
and call that method in getMetricNames
, getCounterMetricValue
, getMeanMetricValue
, incrementMetricValue
, etc., for better maintainability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
int len, | ||
long contentLength, | ||
long nextReadPos) { | ||
synchronized (this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need synchronized block here? readCount is of type AtomicLong and hence threadsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, removed synchronized block.
readFooterMetrics = new AbfsReadFooterMetrics(); | ||
metricsMap.put(filePathIdentifier, readFooterMetrics); | ||
|
||
private final Map<String, FileTypeMetrics> fileTypeMetricsMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateMap
, updateReadMetrics
call fileTypeMetricsMap.computeIfAbsent
which is not threadsafe. Change it to ConcurrentHashMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken!
* | ||
* @param fileTypeMetrics The {@link FileTypeMetrics} object containing metrics and read details. | ||
* @param len The length of the current read operation. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need synchronized here?
The call to addMeanStatisticSample
does not require synchronization since IOStatisticsStoreImpl.meanStatisticMap
is a ConcurrentHashMap and MeanStatistic.addSample
is already synchronized.
If synchronization is needed for any other reason, it should be synchronized on fileTypeMetrics
instead of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need synchronization only to update Mean Statistic, and as rightly mentioned it is already Concurrent so we don't need this. Reverted this.
* @param fileTypeMetrics The {@link FileTypeMetrics} object containing metrics and read details. | ||
* @param len The length of the current read operation. | ||
* @param contentLength The total length of the file content. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronized on fileTypeMetrics
instead on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, no need of this, reverted this.
…19311_AbfsMetricsChanges
🎊 +1 overall
This message was automatically generated. |
Description of PR
JIRA: https://issues.apache.org/jira/browse/HADOOP-19311
Current Flow: We have implemented metrics collection in ABFS flow. We have created a custom AbfsBackoffMetrics and AbfsReadFooterMetrics class which stores all the metrics on the file system level. Our objective is to move away from the custom class implementation and use IOStatisticsStore to store the metrics which is present in hadoop-common.
Changes Made: This PR contains the changes related to storing metrics related to above mentioned classes in IOStatisticStore which is present in hadoop-common. AbstractAbfsStatisticsSource abstract class is created which is implementing IOStatisticsSource interface. This will store IOStatistics of the child metrics class.
Both AbfsBackoffMetrics and AbfsReadFooterMetrics is inheriting AbstractAbfsStatisticsSource and store the respective metrics in IOStatisticsStore.