Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does Duplicated(x, dx) assume x and dx have the same type? #1329

Closed
gdalle opened this issue Mar 6, 2024 · 8 comments · Fixed by #1343
Closed

Why does Duplicated(x, dx) assume x and dx have the same type? #1329

gdalle opened this issue Mar 6, 2024 · 8 comments · Fixed by #1343

Comments

@gdalle
Copy link
Contributor

gdalle commented Mar 6, 2024

A typical example where this is painful is writing a JVP/VJP into a row/column of a Jacobian. Then dx or x can be reshaped or views.

cc @adrhill

@MasonProtter
Copy link

Basically, Enzyme is treating dx as a cartesian vector where the ith field of dx will be given by $\mathrm{d}x_{i} = {\partial f / \partial x_{i}}$, so it's useful to have dx be the exact same type, but the problem is that the interpretation of this object is not necessarily the same as interpreting a derivative.

Very long winded discussion here: #1334

@vchuravy
Copy link
Member

vchuravy commented Mar 7, 2024

We also assume congruency #636 i.e. that the shadow is structurally identically to the primal value. This is not a requirement on the Julia types, but on the datalayout of the objects, but in reality requiring types to be equal is an easier guardrail then determining full congurency #637

My favorite example of this is a sparse array. What should the shadow of it be? It is not enough to require that the shadow is of the same type, but it must additionally be congruent/"structurally identically" so it needs to have the same non-zero values (even though those values may be zero)

@wsmoses
Copy link
Member

wsmoses commented Mar 8, 2024

So as a technical detail the reason why the equal type is done isn't because it is necessarily required internally, but because it is a conservative approximation that prevents a lot of user error.

Like @MasonProtter said the way differentiation work is that for every variable at either register i (For any i) or byte offset i (for any byte offset, or pointer indirection), Enzyme will use the shadow variable at the same register/byte offset as storage for the corresponding derivative.

The precise memory locations accessed will depend on the function being differentiated. For example, if a primal only reads from an array at index 47, the derivative will only read/write to index 47 (and no other indices).

Since we can assume that all memory accesses on the primal are valid for the primal input, using an equivalent data structure for the shadow will always be valid (since at most we will access the same memory locations as the primal). Since a shadow object of the same type of the primal is guaranteed to have the same data layout as the primal data layout, requiring the primal data structure to be the same julia data type is a sufficient (but not necessary) constraint to this.

However, one can construct inputs with differing data layouts, but these come with safety issues that must be taken more seriously, as well as different semantic meanings.

function f(ptr)
	x = unsafe_load(ptr, 47)
	x * x
end

ptr = Base.reinterpret(Ptr{Float64}, Libc.malloc(100*sizeof(Float64)))
unsafe_store!(ptr, 3.14, 47)

@show f(ptr)

using Enzyme


dptr = Base.reinterpret(Ptr{Float64}, Libc.calloc(100*sizeof(Float64), 1))

autodiff(Reverse, f, Duplicated(ptr, dptr))

@show unsafe_load(dptr, 47)
# 6.28


dptr = Base.reinterpret(Ptr{Float64}, Libc.calloc(sizeof(Float64), 1))

# offset the pointer to have unsafe_load(dptr, 47) access the 0th byte of dptr
# since julia one indexes we subtract 46 * sizeof(Float64) here
autodiff(Reverse, f, Duplicated(ptr, dptr - 46 * sizeof(Float64)))

# represents the derivative of the 47'th elem of ptr, 
@show unsafe_load(dptr, 1)

# 6.28

@wsmoses
Copy link
Member

wsmoses commented Mar 8, 2024

Relatedly, a long discussion of this would make excellent docs. @gdalle do you want to test your understanding (and our explanation) and open a docs PR on the subject (obviously we'll help you make sure it is complete/accurate, but also having a voice who isn't already knowledgeable about the weeds is useful to making sure it is accessible)

@wsmoses
Copy link
Member

wsmoses commented Mar 8, 2024

This is similarly why we presently enforce that views have the same offsets/indices in shadow and primal:

@inline function Duplicated(x::T1, dx::T1, check::Bool=true) where {T1 <: SubArray}

We'll otherwise store derivative data into the same/corresponding byte offset of the shadow (as computed by the primal's offset/indices). If the offsets are different, a user may get derivatives at an unexpected offset. Of course someone who knows what they're doing and really wants that behavior may dislike that check, but in that case they should probably just set the check flag to false.

@wsmoses
Copy link
Member

wsmoses commented Mar 13, 2024

bump @gdalle would you be interested in summarizing this into docs?

@gdalle
Copy link
Contributor Author

gdalle commented Mar 13, 2024

yeah I'll give it a shot

@gdalle
Copy link
Contributor Author

gdalle commented Mar 15, 2024

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants