SigParser Email Parsing Improvements (Chinese, Italian, Korean, Google Meetings...)
We've made a lot of improvements to how we parse emails this week in order to reduce bad parses.
These changes improve both the SigParser core product and the SigParser Email Parsing API.
Chinese, Italian and Korean Email Splitting
When there is an email chain and the dividers between emails in the email chain are Chinese, Italian or Korean then SigParser can now properly split those emails. This reduces the chance of mis-attributing contact data to a different person.
Google Meeting Notifications
Google Meeting notifications will no longer be parsed for contact data in signatures. We may bring it back later as a special parser but currently the notifications are too unstructured and can cause contact data to be assigned to the wrong contacts. You'll still get the email addresses of the people on the root email.
- Exclude more email address patterns for spammy contacts
- Improved Gmail signature block detection when -- is present but no email address is in the signature.
- Edge case with line breaks not adding a whitespace which caused a bad name to be generated in rare cases.
- New German Gmail style reply header handled.
- Titles get the "at" stripped from the end. For example, "Founder at BigCo" but the title was "Founder at" so we remove the "at" now.
- Added more training emails to the plain text spammy emails to improve plain text detection of spam or non-human sent type emails.
- Improved false detection of signature data algorithm.