Sources for data files:

Unknown (files taken from various sources, we need to replace these before shipping)

	british-english.txt (the standard British dictionary which Ubuntu systems come with)

Mendeley:

	hint-email.txt
	hint-keywords.txt
	hint-institution.txt
	months.txt
	person-titles-after.txt
	person-titles.txt
	van-names.txt

CiteSeerX (0.12):
(license for data files not stated explicitly in the files, 
 but the project itself is Apache License v2.0.
 The papers which describe the header parsing service 
 designed for CiteSeer provide some background on the data sources.

 See Hui, Giles et al. 2003 'Automatic Document Metadata Extraction using Support Vector Machines')

	first-names.txt
	surnames.txt
	country-names.txt
	chinese-surnames.txt
