Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate the loser and tunneldata repositories to Git from Mercurial #16

Open
goatchurchprime opened this issue Jun 17, 2018 · 11 comments
Open
Labels
PostExpo2018 question Further information is requested

Comments

@goatchurchprime
Copy link
Contributor

I think Mercurial is so unpopular now that we probably should convert over. I keep typing the wrong commands, because everything else I do is in Git.

I think it can be done preserving all the changes and using all the same ssh protocols.
https://stackoverflow.com/questions/16037787/convert-mercurial-project-to-git

For students, knowing Git is going to be a more useful life skill at this point than ever hearing of Mercurial, so there will be some payback for this pain.

@goatchurchprime goatchurchprime added the question Further information is requested label Jun 17, 2018
@PhilipSargent
Copy link
Collaborator

Possibly, just possibly, could we leave this discussion until after the imminent expo? Please?

@goatchurchprime
Copy link
Contributor Author

If this argument gets carried, it affects how much effort is put into education on how to run Mercurial, vs doing the minimum to get us through using current systems for the remainder of the summer.

@PhilipSargent
Copy link
Collaborator

PhilipSargent commented Jun 18, 2018 via email

@ojwb
Copy link

ojwb commented Jun 28, 2018

I'd be very happy to not have to deal with mercurial, but please can we try to restore the missing pre-hg history of the loser repository when we build the new git repo's history. It's very frustrating to try to track down the history of something only to see that the interesting stuff is prior to the start of the history that's easily available. This has happened to me twice so far this week for the loser hg repo, so it's not just a theoretical problem.

Because each git sha commit hash is calculated over data which includes the parent commit hash we can't insert the older history later without changing all the commit hashes in the repo, which is disruptive to say the least (this is a deliberate feature - it prevents an attacker tampering with the history). So doing this at the time of the conversion to git would be the best option.

We'd need to track down a backup of the SVN repo. I probably have one somewhere, but it might not be the very latest. Restoring the history with a gap would still be an improvement, but if we can find one which covers up to at least svn r5127 (in 2003) that'd be good (it looks like hg and svn ran in parallel from then until svn r8493 in 2009, though not every svn revision number in between has a commit in hg so perhaps some of the hg commits from that time include changes from multiple svn revisions?)

I'm happy to do the extra work to get us a decent conversion.

@BeckaLawson
Copy link
Collaborator

I think I may have some SVN backups at home but I'm away until Monday (though if anyone else more IT-savvy has one that would be easier). Thanks for the offer to do it, Olly :-)

@mshinwell
Copy link
Collaborator

I'm happy to try to do this as well. I have some recent experience with another hg to git conversion which may be of use. I will also check to see if I have a backup of the old svn.

@ojwb
Copy link

ojwb commented Jun 28, 2018

So far I've found some backups of the CVS repo (so before SVN even) from the early 2000s.

I think I may have some SVN backups at home

Great, though note that a backup of an SVN checkout is much less useful, as it only gives us a single version (unlike newer systems like hg and git, checking out the code from SVN doesn't give you the full history locally, only a single version at a time).

So ideally we're after a backup of the SVN repo (which you might well have - before we could push changes to the internet from expo easily we tended to burn a few copies onto CD-R at the end of expo and send them home in different cars - that's where the CVS repo backups I found so far are from).

But backups of SVN checkouts may still be useful if they fall into a gap we have in the history. The CVS repo backups I have mean we should at least have a complete history from when we started to use version control until the early 2000s, and then a complete history after hg became the master VCS (August 2009). So basically anything from 200x is potentially useful.

I have some recent experience with another hg to git conversion which may be of use

That may well be useful - I've converted CVS to SVN and SVN to git, but not hg to git before.

If people find useful backups, please send me a copy and I can try to assemble as complete a coverage of history as I can, then we can look at actually doing the conversion between this expo and next.

@PhilipSargent
Copy link
Collaborator

PhilipSargent commented Jun 29, 2018 via email

@goatchurchprime
Copy link
Contributor Author

This could be the answer for handling the scans:
https://git-lfs.github.com/

@wobrotson
Copy link
Collaborator

wobrotson commented Sep 8, 2018 via email

@wookey
Copy link

wookey commented Jul 12, 2019

troggle has been migrated to git, and the old erebus and cvs branches (pre 2010) removed. Some decrufting was done to get rid of log files, old copies of embedded javascript (codemirror, jquery etc) and some fat images no longer used.

tunneldata has also been migrated to git, and renamed 'drawings' as it includes therion data too these days.

The loser repo and expoweb repo need more care in migration. Loser should have the old 1999-2004 CVS history restored, and maybe toms annual snapshots from before that, so ancient histoary can usefully be researched (sometimes useful). It's also a good idea to add the 2015, 2016 and 2017 ARGE data we got (in 2017) added in the correct years so that it's possible to go back to an 'end of this year' checkout and get an accurate view of what was found (for making plots and length stats). All of that requires some history rewriting, which is best done at the time of conversion.

Similarly expoweb is full of bloat from fat images and surveys and one 82MB thesis that got checked in and then removed. Clearing that out is a good idea. I have a set of 'unused fat blob' lists which can be stripped out with git-gilter. It's not hard to make a 'do the conversion' script, ready for sometime after expo 2019 has calmed down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PostExpo2018 question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants