Using a different edge type as a negative edge sample #330
Unanswered
bryceForrest
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Here's what I'm jamming on right now. It feels like it's on the verge of functioning. # get two edge types
pos_edge_index = batch['user', 'bought', 'item'].edge_index
neg_edge_index = batch['user', 'viewed', 'item'].edge_index
# mask out edges in neg_edge_index that are also in pos_edge_index
mask = ~(neg_edge_index.T[:, None] == pos_edge_index.T).all(dim=2).any(dim=1)
neg_edge_index = neg_edge_index[:,mask]
# line them up by src nodes, replicating "triplet"
neg_src, pos_src = (pos_edge_index[0] == neg_edge_index[0].unsqueeze(1)).nonzero().T
pos_edge_index, neg_edge_index = pos_edge_index[:,pos_src], neg_edge_index[:,neg_src]
# concatenate pos and neg edges, and replace edges in batch
edge_index = torch.cat([pos_edge_index, neg_edge_index], dim=-1)
batch['user', 'bought', 'item'].edge_index = edge_index It's weird though, I'm getting this error: IndexError: Found indices in 'edge_index' that are larger than 5877 (got 11066). Please ensure that all indices in 'edge_index' point to valid indices in the interval [0, 5878) in your node feature matrix and try again. But the max index of the original |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Shit, I posted this and didn't realize there's a Q&A section... my bad.
I have a heterogeneous graph, with user nodes, item nodes, and various edge types reflecting different types of interactions between them.
I've been trying to formulate a more clever means of negative sampling the edges during training rather than just randomly grabbing two nodes and plopping an edge between them.
Today I had the idea, well what if I took, say, a real edge that exists in the graph of a user VIEWING an item, and used that as a negative edge for a user BUYING an item (assuming the user doesn't also have an edge representing them buying it of course). This might be a slightly "harder" negative sampling approach.
It's been kind of a pain in the ass to implement though. I've been training the model using the LinkNeighborLoader in "triplet" mode, with BPR loss function. I'm trying to generate these negative samples on the fly for each batch, but I lose out on the handy "src_index," "dst_pos_index" attributes when I don't use the built-in negative sampling.
Any thoughts? I'll share what I've got so far once I get home and strip it of any evidence of the work-related data I'm using it for. I just wanted to get this out into the internet ether while I was thinking about it.
Beta Was this translation helpful? Give feedback.
All reactions