Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fix HashBucketAssigner load index too large error with refactor exception #3796

Merged
merged 8 commits into from
Aug 7, 2024

Conversation

xuzifu666
Copy link
Member

@xuzifu666 xuzifu666 commented Jul 22, 2024

Purpose

HashCommon in Int2ShortHashMap would cause IllegalArgumentException when expected > 1073741824L,need make regression to default constructor to continue in this condition.
relate to #3776

1721651833340.png

1721651837453.png

Tests

API and Format

Documentation

try {
map = new Int2ShortHashMap(keyList.size());
} catch (IllegalArgumentException e) {
map = new Int2ShortHashMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to solve it without downgrading?

@xuzifu666
Copy link
Member Author

@izhangzhihao Can you try the pr first to comfirm it can be work? Expect for your feedback ~

@izhangzhihao
Copy link

I will try this tomorrow

@xuzifu666
Copy link
Member Author

I will try this tomorrow

thanks~

@izhangzhihao
Copy link

Just created a simple unit test:

public class HashBucketAssignerTest {

    @Test
    public void testFunction() {
        IntStream stream = IntStream.rangeClosed(1, 1_100_000_000);

        final Int2ShortHashMap.Builder builder = Int2ShortHashMap.builder();


        stream.forEach(
                i -> builder.put(i, (short)0)
        );

        builder.build();
    }
}

Error:

java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

	at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:208)
	at it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.insert(Int2ShortOpenHashMap.java:257)
	at it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.put(Int2ShortOpenHashMap.java:265)
	at xx.Int2ShortHashMap.put(Int2ShortHashMap.java:20)
	at xx.Int2ShortHashMap$Builder.build(Int2ShortHashMap.java:62)
	at xx.HashBucketAssignerTest.testFunction(HashBucketAssignerTest.java:25)

@xuzifu666
Copy link
Member Author

xuzifu666 commented Jul 24, 2024

Just created a simple unit test:

public class HashBucketAssignerTest {

    @Test
    public void testFunction() {
        IntStream stream = IntStream.rangeClosed(1, 1_100_000_000);

        final Int2ShortHashMap.Builder builder = Int2ShortHashMap.builder();


        stream.forEach(
                i -> builder.put(i, (short)0)
        );

        builder.build();
    }
}

Error:

java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

	at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:208)
	at it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.insert(Int2ShortOpenHashMap.java:257)
	at it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.put(Int2ShortOpenHashMap.java:265)
	at xx.Int2ShortHashMap.put(Int2ShortHashMap.java:20)
	at xx.Int2ShortHashMap$Builder.build(Int2ShortHashMap.java:62)
	at xx.HashBucketAssignerTest.testFunction(HashBucketAssignerTest.java:25)

Hi, do you had apply the patch,I had test the ut it in my local,not report the exception only oom error. @izhangzhihao
1721787347795.png

@izhangzhihao
Copy link

izhangzhihao commented Jul 24, 2024

For your oom error, you need to add -Xmx32g, BTW, this is truly a huge memory consumer

@xuzifu666
Copy link
Member Author

xuzifu666 commented Jul 24, 2024

For your oom error, you need add -Xmx32g

Yes,my compute memory not enough to so large,so you had tested in your flink job in the issue:#3776? @izhangzhihao

@izhangzhihao
Copy link

Can you find a 32G machine and run java -jar just for this test? Since it does not pass the unit test, there is no reason to believe it can pass a real flink job...

@xuzifu666
Copy link
Member Author

Can you find a 32G machine and run java -jar just for this test? Since it does not pass the unit test, there is no reason to believe it can pass a real flink job...

OK,but from your error strack,I can not find the source position in HashCommon which is litter strange.

@izhangzhihao
Copy link

izhangzhihao commented Jul 24, 2024

Error happens here:

	public static int arraySize(final int expected, final float f) {
		final long s = Math.max(2, nextPowerOfTwo((long)Math.ceil(expected / f)));
		if (s > (1 << 30)) throw new IllegalArgumentException("Too large (" + expected + " expected elements with load factor " + f + ")");
		return (int)s;
	}

BTW, I'm using:

<dependency>
            <groupId>it.unimi.dsi</groupId>
            <artifactId>fastutil</artifactId>
            <version>8.5.13</version>
        </dependency>
        <dependency>
            <groupId>it.unimi.dsi</groupId>
            <artifactId>fastutil-core</artifactId>
            <version>8.5.13</version>
        </dependency>

@xuzifu666
Copy link
Member Author

arraySize

It seems not consistent with master paimon dependecies,would you change it the same with master?From the patch it should not run to the logic as your error stack during Int2ShortOpenHashMap build.

@izhangzhihao
Copy link

the master branch is using 8.5.12, but I can give it a try(master will update to a newer version in the future anyway)

@xuzifu666
Copy link
Member Author

the master branch is using 8.5.12, but I can give it a try(master will update to a newer version in the future anyway)

OK,apply the patch to the code,Thanks~ @izhangzhihao

@izhangzhihao
Copy link

exactly the same error:

java.lang.IllegalArgumentException: Too large (805306401 expected elements with load factor 0.75)

	at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:208)
	at it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.insert(Int2ShortOpenHashMap.java:257)
	at it.unimi.dsi.fastutil.ints.Int2ShortOpenHashMap.put(Int2ShortOpenHashMap.java:265)

And it took 10 mins:

image

@xuzifu666
Copy link
Member Author

OK,Int2ShortOpenHashMap insert operation need rehash array which would occur the error. Thanks for your support. @izhangzhihao

@xuzifu666 xuzifu666 changed the title [core] Fix HashBucketAssigner load index lead to Too large error [core] Fix HashBucketAssigner load index too large error with refactor exception Jul 24, 2024
Copy link

@izhangzhihao izhangzhihao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this not a fix but an exception msg improvement?
If true, pls remove close #3776 from your PR desc

@xuzifu666
Copy link
Member Author

So is this not a fix but an exception msg improvement?

Yes, currently we can give more advise to user about it. @izhangzhihao

map.put(keyList.getInt(i), valueList.getShort(i));
}
} catch (IllegalArgumentException e) {
throw new PaimonUtilsException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just throw runtime exception is OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,had change exception to RumtimeException. @JingsongLi

@xuzifu666 xuzifu666 closed this Aug 6, 2024
@xuzifu666 xuzifu666 reopened this Aug 6, 2024
@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit 12c4684 into apache:master Aug 7, 2024
18 of 20 checks passed
Copy link

@izhangzhihao izhangzhihao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message looks good to me; adding parallelism is a workaround. Ideally, the resource allocation for the task should match the data flow, rather than matching the total amount of data at rest. The current situation does not meet this ideal. Is there any plan for optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants