Bayes: Keine Angabe zu Autolearn im X-Spam-Status [Archiv]

Archiv verlassen und diese Seite im Standarddesign anzeigen : Bayes: Keine Angabe zu Autolearn im X-Spam-Status

XXLRay

01.09.09, 12:58

Ich betreue einen Mailserver mit sendmail, Amavis und Spamassassin. Die Erkennungsrate des bayesischen Filters liegt schon seit Monaten unter 10% obwohl er täglich mit über 1.000 (Tausend) Mails manuell trainiert wird und autolearn aktiviert ist.
Jetzt ist mir aufgefallen, dass im X-Spam-Status-Header der emails keine Information zum Autolearn steht. Normalerweise sollte dort aber so etwas stehen wie autolearn=spam, autolearn=ham oder autolearn=no.

Meine /etc/mail/spamassassin/local.cf:

# Limit from that on mail is treated as Spam
required_hits 5
# When using _REQD_ and _SCORE_ tags set report_safe to 0 -> only X-Spam-header is modified
report_safe 0
# Add Information to header (not done on report_safe 0)
rewrite_header Subject [SPAM]
# Activate bayesian filter
use_bayes 1
# Learn from own detected mails if 1 !!! May also learn false negatives !!!
bayes_auto_learn 1
# Set limit for auto learned Ham
bayes_auto_learn_threshhold_nospam 0.0
# Set limit for auto learned Spam
bayes_auto_learn_threshold_spam 9.0
# Set which networks or hosts are considered 'trusted' by your mail
# server (i.e. not spammers)
# trusted_networks 212.17.35.
# Turn of bayesian filtering separate from learning
# use_bayes_rules 0

Der X-Spam-Status-Header einer repräsentativen Spam Email sieht so aus:

Yes, score=15.684 required=5 tests=[BAYES_50=0.001, DATE_IN_FUTURE_12_24=2.189, DNS_FROM_RFC_BOGUSMX=1.482, RATWARE_RCVD_PF=3.847, RCVD_ILLEGAL_IP=1.908, RCVD_IN_PBL=0.905, RCVD_IN_XBL=3.033, RDNS_NONE=0.1, TVD_SPACE_RATIO=2.219]

Ham Email:

No, score=-3.3 required=5 tests=[ALL_TRUSTED=-1.8, BAYES_00=-1.5]

Unsichere Email:

Yes, score=6.785 required=5 tests=[BAYES_00=-1.5, DCC_CHECK=2.17, HTML_MESSAGE=0.001, MIME_HTML_ONLY=1.457, RCVD_IN_PBL=0.905, RCVD_IN_SORBS_WEB=0.619, RCVD_IN_XBL=3.033, RDNS_NONE=0.1]

1) Ich würde bei der Spam-Mail jetzt die Information autolearn=spam erwarten, da der score mit 15.684 über dem bayes_auto_learn_threshold_spam von 9.0 liegt. Bei der Ham Mail würde ich autolearn=ham erwarten, weil der score mit -3,3 unter dem bayes_auto_learn_threshhold_nospam von 0.0 liegt. Bei der unsicheren Mail würde ich autolearn=no erwarten, weil der score mit 6,785 genau zwischen den beiden Thresholdwerten liegt. Liege ich da falsch?

2) Gibt es eine Option, die verhindert, dass die Autolearninformationen in den Header geschrieben werden? Taucht dort vielleicht deswegen ncihts auf?

3) Ich habe den Server von meinem Vorgänger übernommen, der (wie so oft) nicht mehr zu erreichen ist. Könnte es sein, dass evtl eine ganz andere Datei zur Konfiguration verwendet wird? Wie finde ich raus, welche Spamassassinkonfig aktiv ist?

muell200

01.09.09, 13:13

1) Ich würde an dieser Stelle jetzt die Information autolearn=spam erwarten, da der score mit 15.684 über dem bayes_auto_learn_threshold_spam von 9.0 liegt. Liege ich da falsch?

auto_learn 1

XXLRay

01.09.09, 13:18

auto_learn 1

Bedeutet das jetzt, dass im header "auto_learn 1" statt "autolearn=ham" stehen müsste (was auch nicht der Fall ist)? Oder muss in der local.cf "auto_learn 1" statt "bayes_auto_learn 1" stehen oder beides?

XXLRay

01.09.09, 14:17

Könnte es evtl auch an Dateirechten liegen? Amavis läuft unter vscan. Die Spamassassinordner gehören aber alle root (auch wenn others Leserechte haben).

XXLRay

04.09.09, 14:32

Das _AUTOLEARN_ Tag scheint erst ab Version 3 von Spamassassin unterstützt zu werden. Ich habe mehrere Versionen von Spamassassin auf dem Server entdeckt. Wie kann ich herausfinden, welche Version von Amavis gestartet wird?

XXLRay

04.09.09, 15:20

Zumindest scheint die installierte Version aktuell genug zu sein:

spamassassin -V
SpamAssassin version 3.2.4
running on Perl version 5.8.5

Ist auch sicher, dass amavis den auch aufruft?

XXLRay

08.09.09, 08:24

Ok, ich frage mal anders: "Müsste ich (z.B. über top oder ps) sehen, dass spamd oder spamassassin aufgerufen wird? Das ist nämlich nicht der Fall"

XXLRay

08.09.09, 11:31

"auto_learn 1" scheint übrigens der falsche Tipp gewesen zu sein:

spamassassin --lint
[18024] warn: config: failed to parse line, skipping, in "/etc/mail/spamassassin/local.cf": auto_learn 1
[18024] warn: config: failed to parse line, skipping, in "/etc/mail/spamassassin/local.cf": bayes_auto_learn_threshhold_nospam 0.0
[18024] warn: config: failed to parse line, skipping, in "/etc/mail/spamassassin/local.cf": bayes_auto_learn_threshold_spam 9.0

XXLRay

08.09.09, 12:51

Liegt wohl daran, dass Amavis die komplette Headereditierung an sich reißt:

Does SpamAssassin observe settings in its configuration file local.cf?
SA does observe all settings in its configuration file, but not all of them have effect on the mail being checked, as amavisd-new does its own decisions based on spam score (hits) (so for example required_hits has no effect - use tag/tag2/kill amavisd-new settings instead), and does its own header editing, and body is not modified. Read on for related information.
SpamAssassin has configuration options to modify mail body and header, but they seem to be ignored.
amavisd-new does not modify mail body or lets SA do it (with the exception of defanging, introduced with amavisd-new-2.0). All mail (header) editing is done by amavisd-new and not by SA. Even though SA does observe options in its configuration file to rewrite mail body and modify mail header, the result is purposely not used by amavisd-new. There are two reasons for that: SA is only called once per message regardless of the number of recipients, and secondly, to be able to offer a guarantee the mail body will not be altered, This means the per-recipient handling of mail relaying and header editing needs to be done entirely in amavisd-new, as there are no provisions in SA to analyze mail once and then prepare different modifications for different recipients based on the same spam analysis. It is a tradeoff: speed for multi-recipient mail versus the full per-recipient flexibility. It would make no sense to fully duplicate the spamc/spamd functionality in amavisd-new. If you need such features, just disable calling SA from amavisd-new, and use the spamc/spamd or other back-end interface to SA.

Angeblich kann man die SA-Header aber hinzufügen lassen:

When calling SA via amavisd-new, header editing is handled strictly by
amavisd-new, not SA (see http://www.ijs.si/software/amavisd/#faq-spam). You
can get amavisd-new to add the SA report to the headers, which should
include the autolearn header, by adding (or uncommenting) the following in
your amavisd.conf:

$sa_spam_report_header = 1; # insert X-Spam-Report header field? default
false

Beware, this will add lots of additional header lines to your messages

Bei mir hat das bisher aber nicht funktioniert.

XXLRay

08.09.09, 13:38

In der /var/log/amavisd-info.log finde ich jede Menge von solchen Einträgen, in denen die Mail als nicht autogelernt markiert wird:

Sep 8 14:32:18 censored amavis[30625]: (30625-08) SPAM, <censored@censored.org> -> <censored@censored.de>, Yes, score=25.625 tag=x tag
2=5 kill=6.9 tests=[BAYES_50=0.001, DCC_CHECK=2.17, HELO_DYNAMIC_IPADDR2=4.395, HELO_DYNAMIC_SPLIT_IP=3.493, HTML_MESSAGE=0.001, M
IME_HTML_ONLY=1.457, RCVD_IN_BL_SPAMCOP_NET=1.96, RCVD_IN_XBL=3.033, TVD_RCVD_IP=1.931, URIBL_AB_SURBL=1.86, URIBL_BLACK=1.955, UR
IBL_JP_SURBL=1.501, URIBL_WS_SURBL=1.5, URI_HEX=0.368], autolearn=no, quarantine censored (spam-quarantine)
Die Muster "=ham" oder "=spam" kommen überhaupt nicht vor. Das bedeutet ja wohl, dass autolearn nicht funktioniert. Warum ist das so? Welche Informationen benötigt ihr noch, um zu helfen?

XXLRay

08.09.09, 14:31

Folgende Zeile muss in die /etc/mail/spamassassin/v320.pre (bzw. v310, ...) eingefügt werden:

loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold

Jetzt taucht zumindest in /var/log/spamassassin/amavisd-info.log auch ein "autolearn=spam" auf. Ich hoffe mal, dass das meine Trefferquote signifikant verbessert.

XXLRay

08.09.09, 16:21

Die Erkennungsquote des bayesischen Filters hat sich innerhalb von 3std von unter 5% auf über 50% verbessert. Ich denke, das deutet auf einen Erfolg der Maßnahme hin.