12-04-2017, 03:47 PM
Several anti-spam techniques are used to prevent email spam (unsolicited bulk mail). No technique is a complete solution to the spam problem, and each has a compromise between incorrectly rejecting legitimate email (false positives) and not rejecting all spam (false negatives) and associated costs in time and effort. Antispam techniques can be divided into four broad categories: those requiring action by individuals, those that can be automated by e-mail administrators, those that can be automated by e-mail senders, and those used by researchers and application officials. law.
People tend to be much less annoyed by spam slipping through the filters in their mailbox (false negatives), than having wanted e-mail ("ham") blocked (false positives). Trying to balance false negatives (spam lost) vs. false positives (rejecting good emails) is critical to a successful anti-spam system. Some systems allow individual users to have some control over this balance by setting "spam score" limits, etc. Most techniques have both types of serious errors, to varying degrees. For example, anti-spam systems can use techniques that have a high false-negative rate (lose a lot of spam), to reduce the number of false positives (rejecting a good email).
Detecting spam based on email content, whether by detecting keywords such as "viagra" or by statistical means (content or without content based), is very popular. Content-based or keyword-based statistical media can be very accurate when they correctly match the types of legitimate email an individual receives, but they can also make mistakes such as detecting the keyword "cialis" in the Word "specialist" (see also Internet Censorship: over and under blocking). Spam authors frequently try to defeat these measures by employing typographic techniques such as replacing letters with accented variants or alternate characters that look identical to intended characters but are internally distinct (eg, replacing a Roman A with a Cyrillic A) or inserting others Characters such as blanks, nonprinting characters, or bullets in a term to block pattern matching. This introduces an arms race that demands increasingly complex methods of detection of keywords.
The content also does not determine whether the email has not been requested or added, the two main characteristics of spam. So, if a friend sends you a joke that mentions "viagra", content filters can easily mark it as spam, even though it is neither requested nor sent in bulk. Basic statistical means without content can help reduce false positives because it considers statistical means versus blocking based on content / keywords. Therefore, you will be able to receive a joke that mentions "viagra" from a friend.
People tend to be much less annoyed by spam slipping through the filters in their mailbox (false negatives), than having wanted e-mail ("ham") blocked (false positives). Trying to balance false negatives (spam lost) vs. false positives (rejecting good emails) is critical to a successful anti-spam system. Some systems allow individual users to have some control over this balance by setting "spam score" limits, etc. Most techniques have both types of serious errors, to varying degrees. For example, anti-spam systems can use techniques that have a high false-negative rate (lose a lot of spam), to reduce the number of false positives (rejecting a good email).
Detecting spam based on email content, whether by detecting keywords such as "viagra" or by statistical means (content or without content based), is very popular. Content-based or keyword-based statistical media can be very accurate when they correctly match the types of legitimate email an individual receives, but they can also make mistakes such as detecting the keyword "cialis" in the Word "specialist" (see also Internet Censorship: over and under blocking). Spam authors frequently try to defeat these measures by employing typographic techniques such as replacing letters with accented variants or alternate characters that look identical to intended characters but are internally distinct (eg, replacing a Roman A with a Cyrillic A) or inserting others Characters such as blanks, nonprinting characters, or bullets in a term to block pattern matching. This introduces an arms race that demands increasingly complex methods of detection of keywords.
The content also does not determine whether the email has not been requested or added, the two main characteristics of spam. So, if a friend sends you a joke that mentions "viagra", content filters can easily mark it as spam, even though it is neither requested nor sent in bulk. Basic statistical means without content can help reduce false positives because it considers statistical means versus blocking based on content / keywords. Therefore, you will be able to receive a joke that mentions "viagra" from a friend.