-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(assetlibrary): change connection life cycle between Lambda and Neptune #168
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
"changes": [ | ||
{ | ||
"packageName": "@awssolutions/cdf-assetlibrary", | ||
"comment": "Change connection life cycle between Lambda and Neptune", | ||
"type": "none" | ||
} | ||
], | ||
"packageName": "@awssolutions/cdf-assetlibrary" | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,23 +18,40 @@ import { TYPES } from '../di/types'; | |
@injectable() | ||
export class BaseDaoFull { | ||
private _graph: structure.Graph; | ||
private _conn: driver.DriverRemoteConnection | null; | ||
|
||
public constructor( | ||
@inject('neptuneUrl') private neptuneUrl: string, | ||
@inject(TYPES.GraphSourceFactory) graphSourceFactory: () => structure.Graph | ||
) { | ||
this._graph = graphSourceFactory(); | ||
this._conn = null; | ||
} | ||
|
||
protected getConnection(): NeptuneConnection { | ||
protected async getConnection(): Promise<NeptuneConnection> { | ||
logger.debug(`base.full.dao getConnection: in:`); | ||
const conn = new driver.DriverRemoteConnection(this.neptuneUrl, { | ||
mimeType: 'application/vnd.gremlin-v2.0+json', | ||
pingEnabled: false, | ||
}); | ||
|
||
if (this._conn == null) { | ||
logger.debug(`base.full.dao getConnection: create new connection:`); | ||
this._conn = new driver.DriverRemoteConnection(this.neptuneUrl, { | ||
mimeType: 'application/vnd.gremlin-v2.0+json', | ||
pingEnabled: false, | ||
connectOnStartup: false, | ||
}); | ||
this._conn.addListener('close', (code: number, message: string) => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I notice there's only a "close" event here. As far as I can tell from the docs is that it's using WebSockets (though they don't link to it, so I'm not 100%). That lib also emits an "error" event that we should account for. Unless, of course, GremlinJS is handling that event somehow and doesn't emit it. Are you able to confirm/deny whether or not this is the case? |
||
logger.info(`base.full.dao connection close: code: ${code}, message: ${message}`); | ||
this._conn = null; | ||
if (code === 1006) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious why we call out this specific error and none of the others? https://www.rfc-editor.org/rfc/rfc6455#section-7.4.1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (good reference btw) my reading of the error codes is that 1006 is the only error that seems unrecoverable (i.e. the Gremlin server shut down for abnormal reasons). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other close codes indicate results like the user intentionally closing the connection, but 1006 is a code that occurs due to an unintended disconnection. Even when this happens, it's not handled by gremlin, and it ends up waiting until the Lambda times out. Therefore, an error has to be thrown in this case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aaronatbissell sorry that this is happening. Can you share more info about what you're seeing? Error 1006 is an underlying websocket issue that cannot be recovered from. As @yuma124 mentions, an error is thrown to fail this faster, but ideally, this should not be a common issue.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it appears to be the same error as described in #178. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aaronatbissell have you used the fix in #178 and does it solve this connection issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm just getting back to this. We haven't tried to fix this problem but would really like to see this fixed so we can upgrade asset library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aaronatbissell have you tried the update in #178 and validated whether or not it fixes the issues you are seeing? If not, are you able to test it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
throw new Error('Connection closed prematurely'); | ||
} | ||
}); | ||
await this._conn.open(); | ||
} | ||
|
||
logger.debug(`base.full.dao getConnection: withRemote:`); | ||
const res = new NeptuneConnection(this._graph.traversal().withRemote(conn), conn); | ||
const res = new NeptuneConnection( | ||
this._graph.traversal().withRemote(this._conn), | ||
); | ||
|
||
logger.debug(`base.full.dao getConnection: exit:`); | ||
return res; | ||
|
@@ -44,14 +61,9 @@ export class BaseDaoFull { | |
export class NeptuneConnection { | ||
constructor( | ||
private _traversal: process.GraphTraversalSource, | ||
private _connection: driver.DriverRemoteConnection | ||
) {} | ||
|
||
public get traversal(): process.GraphTraversalSource { | ||
return this._traversal; | ||
} | ||
|
||
public async close(): Promise<void> { | ||
await this._connection.close(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to find further documentation on this. Though, we should enable ping/pong messages if this is possible. This is a common practice for validating that a connection is still active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the AssetLibrary service is targeting Lambda and not a long-running instance, I'd argue that we want to maintain the existing
pingEnabled: false
behavior. Note that the theping
interval appears to be an hour (https://github.com/apache/tinkerpop/blob/master/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js#L46C27-L46C36)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I would usually err on the side of caution, since the timeouts are configurable and could conceivably be set to 15 minutes, but that's probably not necessary here.