Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the test scripts for resumption #2117

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

shubham-yb
Copy link
Contributor

@shubham-yb shubham-yb commented Dec 25, 2024

Describe the changes in this pull request

  • Added the test framework for resumption tests for import data file and offline import data
  • Added the test cases of large sized table and large number of tables for import data file
  • Added the test case for PG offline import data resumption with datatypes, indexes, partitions, case sensitivity / reserved words, multiple schemas

Describe if there are any user-facing changes

N/A

How was this pull request tested?

Made the changes to the Jenkins pipeline as well.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@shubham-yb shubham-yb marked this pull request as ready for review December 27, 2024 11:45
@shubham-yb
Copy link
Contributor Author

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB No
Name registry json No
Data File Descriptor Json No
Export Snapshot Status Json No
Import Data State No
Export Status Json No
Data .sql files of tables No
Export and import data queue No
Schema Dump No
AssessmentDB No
Sizing DB No
Migration Assessment Report Json No
Callhome Json No
YugabyteD Tables No
TargetDB Metadata Tables No

@shubham-yb
Copy link
Contributor Author

"""
Runs the yb-voyager command with support for resumption testing.
"""
for attempt in range(1, resumption['max_restarts'] + 1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get/define all the configs in the beginning. It will make it easier to understand what all configuration options are involved.

max_restarts = resumption['max_restarts']
min_interrupt_seconds = resumption['min_interrupt_seconds']
... 

if not output: # Exit if output is empty (end of process output)
break
full_output += output
if time.time() - start_time > 5:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why break ? what is 5? seconds? minutes?

# Final import retry logic
print("\n--- Final attempt to complete the import ---")

for _ in range(2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 attempts finally?

try:
print("\nVoyager command output:")

process = subprocess.Popen(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: separate function for starting command (can be called in above for-loop as well)

)

# Capture and print output
for line in iter(process.stdout.readline, ''):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the above for-loop, we're reading both stderr and stdout, here we're only reading stdout. Any particular reason? Would be good to be consistent here (call a common function that captures stdout/stderr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also till when will you keep reading? How long will the loop run?

for line in iter(process.stderr.readline, ''):
print(line.strip())
sys.stdout.flush()
time.sleep(30)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sleep?

print("Final import failed after 2 attempts.")
sys.exit(1)

def validate_row_counts(row_count, export_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future: you can create a common python file that has such helper
functions.

@@ -0,0 +1,133 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the ONLY change here is that you're specifying ROW_COUNT and essentially making generate_series dynamic.

schema2.Case_Sensitive_Table: 5000000
schema2.case: 5000000
schema2.Table: 5000000
public.boston: 2500000
Copy link
Collaborator

@makalaaneesh makalaaneesh Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the code that generates data for all these other tables boston/cust/emp/etc? I only see code for table/case/Case_Sensitive_Table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants