Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix host scanning bug #153

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

chris124567
Copy link
Member

Fix scanning bug that has affected the mainnet explored node. We were not incrementing the offset passed to HostsForScanning in fetchHosts. On testnets and in tests this wasn't a problem because the number of hosts was fewer than the batch size (scanBatchSize), so incrementing the offset wasn't necessary cause it'd only take one loop iteration in fetchHosts. On mainnet this was a problem though so we'd keep scanning the same hosts over and over and never closing the hosts channel so the scans that were completed were never added to the DB.

Copy link
Member

@n8maninger n8maninger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the scanning logic and reduce the surface area for bugs: combine the fetchHosts and addHostScans goroutines and remove the offset parameter from HostsForScanning. Focus on the core logic and keep the complexity as low as possible to reduce the chance for bugs like this to sneak in and make it easier to test. Add a test to the store impl to ensure HostsForScanning is returning the expected results after scans are added.

It's incomplete, but this should be about as complex as necessary:

func (e *Explorer) scanHosts(ctx context.Context) error {
	locator, err := geoip.NewIP2LocationLocator("")
	if err != nil {
		return fmt.Errorf("failed to create geoip database: %w", err)
	}
	defer locator.Close()

	var wg sync.WaitGroup

	for {
		lastScanCutoff := time.Now().Add(-e.scanCfg.MaxLastScan)
		lastAnnouncementCutoff := time.Now().Add(-e.scanCfg.MinLastAnnouncement)

		// no need for offset parameter since results are added to the database in the same goroutine
		batch, err := e.s.HostsForScanning(lastScanCutoff, lastAnnouncementCutoff, scanBatchSize)
		if err != nil {
			return fmt.Errorf("failed to get hosts for scanning: %w", err)
		} else if len(batch) == 0 {
			select {
			case <-ctx.Done():
				return ctx.Err() // shutdown
			case <-time.After(15 * time.Second):
				continue // check again
			}
		}

		results := make([]HostScan, len(batch))
		for i, host := range batch {
			wg.Add(1)
			go func(i int, host Host) {
				defer wg.Done()

				scan, err := e.scanV1Host(locator, host)
				if err != nil {
					e.log.Debug("host scan failed", zap.Stringer("pk", host.PublicKey), zap.Error(err))
					results[i] = HostScan{
						PublicKey: host.PublicKey,
						Success:   false,
						Timestamp: types.CurrentTimestamp(),
					}
					return
				}
				results[i] = scan
			}(i, host)
		}
		wg.Wait()

		if err := e.s.AddHostScans(results); err != nil {
			return fmt.Errorf("failed to add host scans to DB: %w", err)
		}
	}
}

Out of scope for this PR, but it's not idiomatic to store a context on a struct: https://go.dev/blog/context-and-structs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants