Monday, July 22, 2013

Tag Cloud Trick for Sorting DNA Matches

The Tag Cloud Trick is a little known useful trick that those of us in the DNA communities sometimes use to help sort out our matches and look for common surnames, haplogroups or locations. I am going to list the step by step instructions here as to how to make this trick work for you.

Step 1: Download your CSV file from either Countries of Ancestry or Relative Finder (if you are a 23andMe customer). You may want to rename the file to something you will easily remember so that you can readily locate it on your computer.

Step 2: Create a list of words and abbreviations that you wish to exclude. For example if you wish to sort by the most common surnames then you would want to exclude states, locations, state abbreviations, haplogroups, and other common terms such as sort, mother, father...etc. I have put together a list of the ones I used when I created the photo above. This is not an all inclusive list as you can see from the photo I missed a couple haplogroups but you get the idea.

sharing, cousin, genomes,  Alabama,Alaska,American Samoa,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,District of Columbia,Florida,Georgia,Guam,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New Hampshire,New Jersey,New Mexico,New York,North Carolina,North Dakota,Northern Marianas Islands,Ohio,Oklahoma,Oregon,Pennsylvania,Puerto Rico,Rhode Island,South Carolina,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Virgin Islands,Washington,West Virginia,Wisconsin,introduction,haplogroup,3rd,4t,5th,6th,accepted,alabama,arkansas, ca, california,canada,carolina,co, county, distant,england,europe female, male,match,ireland,northern, maternal, paternal, public, united, tx, texas, va, wva, west virginia, washington, tx,family, father, 4th,e1b1b1a2,eastern, florida,g2a,germany,h1,h1a,h1c1,h1ch3,h4a1,h5a1,h7,il,i2b1,africa, city, de, france,ga,h10,I16,h1a1,h1a3,h1c,h3,i1,j1c1,j1c1,j1c3,j1c,k1,k1a4a1,kingdom,ky,multiple,nc,ny,oh,pa,r1a1,r1b1b2a1a1,regions,scotland,sent,states,switzerland,unknown,usa,wales,wv,x2b,u5a5a1a,u5b2b,i2b2,j1c2,j2a1a1,mo,mother,name,ok,ontario,r1a1a,r1b1b2a1a1a,r1b1b2a1a1d1,r1b1b2a1a1d,r1b1b2a1a2,r1b1b2a1a2d3,r1b1b2a1a2d,r1b1b2a1a2f2,r1b1b2a1a2f,r1b1b2a1a,sc,surnames,t1,t1a1,t2b,tn,ar,australia,brooklyn,denmark,h16,h1g,h5,hv, italy,j2a1a1b,j2a1a,r1b1b2a,southern,u2e1,u2e,u5a1,u5a1b1,u5b1b1,van,von,western,

Step 3. Go to this website: tagcrowd.com



Step 4: Once you arrive at the website you will see a box like the one above. The first thing to do is to click on the tab that says:upload file Then locate the CSV file you have saved on your computer, click to upload. Then set the maximum number of words to show to 150 (some use 100, I like 150). Set the minimum frequency to 3 and tick yes to show frequencies. Then paste the list of words to exclude that I posted above in the box that says: Dont Show these Words. You can't see that box in the screenshot I have posted but it is directly underneath the Show Frequencies option. Then click Visualize. Your list will populate on the next page. Then you can save your image or download it to a PDF if you choose.

Note: If you wish to see how common a haplogroup appears you may opt to only include haplogroups in your search and limit other words and phrases. You have quite a few possibilities with this tool. You can choose to use it to see how often a particular location appears within your DNA relatives list. I have chosen to use Surnames in my list as you can see in the photo above. I have gone back into the list and added the ones that I originally missed to them.Hope this helps!

Below is my own image from my CSV relative file. As you can see I have missed a few words that should have been excluded even with the updated list I posted above. You will just have to add to and take from the above list to suit your own needs. I will do my best to get these added to the above list as well soon.


This is not a complete photo of my list as it had too many names so I could not capture it all in one screenshot 


No comments:

Post a Comment