Skip to content

ChanChiChoi/guessTxtFileSeparator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

guessTxtFileSeparator

=1== to guess the simple separator of each txt file. the setps as follow:

1 - read the first 10 lines

2 - filter the empty lines

3 - replace the '[0-9a-zA-Z_]' into '@'

4 - Count the number of occurrences of characters

 5 - preserve the ascii char which value between [1,126], and discard the '@' char

6 - get the minimum subset of char in step 5 result

7 - use re.findall('[{}]+'.format(subsetChars)) to get the candidate separator

 8 - verify whether the separator is valid

for example:

aaa,bbb => [,]

aaa\tbbb => [\t]

aaa----bbb => [----]

aaa # bbb => [ # ]

About

to guess the simple separator of each txt file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages