今天详细研究了一下squid的命中率,发现如果带参数的话,squid基本都设置成no_cache的,就算你cache了,也是带参数的url作为一个cache,这样的话就有可能会引起频繁返源的问题。。。查了一下,squid有个redirector的配置,可以改变url的,有类似的几种软件,或者自己写脚本,不过脚本的效率可能不是太高,还是用c写的软件比较好。
下面有介绍除了squidguard之外还有哪些比较多人用的redirector。
What’s New
- A patched version of squidguard
- a new software : Hostess
- a new software : soulcatcher
- A mailing list is created
- A Web interface to add or remove urls from our databases
Introduction
This is not the official SquiGuard homepage, but only an happy user’s page :
official homepage was http://www.squidguard.org. It seems out of service. Mirko Lorenz created a mirror here. SquidGuard is a redirector which uses sleepycat‘s version of Berkeley Database
Its authors are
- Pål Marius Baltzersen
- Lars-Erik Håland
A new patched version of squidGuard 1.2.10, with this REAME is available. This is a compilation of patch from many contributors. I didn’t even change a dot. Thanks to Franck Bourdonnec to suggest this packaged version.
The last stable version is 1.2.0.
Here it is the ChangeLog
It needs a recent version of Berkeley Database (> 3.2 but < 4.x)
An ftp directory is available here, in France at : ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidGuard/ I began a contrib directory here : ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/
It has new interesting features
- It can filter the surfing duration on a user basis.
You can find some explanations in other languages here :
- In german. You will find some scripts to update database and a german database
- In english.
- An unofficial Faq
Comparison
Competitors
SquidGuard and its competitors
- ufdbguard is a free and fast (8-9 time faster than squidguard) redirector which can use an commercial URL database : http://www.urlfilterdb.com
- http://tatewake.com/projects.htm Tatewake works on windows and is written in JAVA. It use use squidguard’s database to create a hosts file for windows.
- http://soulcatcher.sourceforge.net Soulcatcher is very similar to squidguard
- http://software.othello.ch/mod_dnsbl a blocking DNS-based module designed apache and Squid. It can use our database.
- An access module for filtering ISA server
- redirector from Ian Lea is written in Perl. It was the first redirector available. Slow, and hungry of memory.
- Squirm 1.0b First redirector written in C. Faster, and more careful of memory usage than redirector. No more available.
- JunkBuster which is not really a redirector (in Squid sense) but more a second proxy to connect to squid. It strips banners in web page.
- Jesred “Son” of squirm. Faster, a little more memory is needed.
- Custodian. It “hashed” its blocklist to hide urls. No more available file. I haven’t test it.
- RedServer. Another one. No more available.
- DansGuardian Son of active guardian, it looks for content and PICS level. It can filter faster than squidGuard (I haven’t test it yet, I would soon) and it’s now a competitor, no more a complement of SquidGuard. It’s a filtering proxy and not a redirector.
- Many commercial sites sells filtering databases and software.
Advantages
- It’s a lot faster : for a 2.000 Urls list to filter and a 11.000 Urls database, on a pentium 233 :
- Squirm : 2 minutes 25 seconds
- Jesred : 1 minute 45 seconds
- SquidGuard : 9 seconds !!!
We can say, it doesn’t care of database size :
- a 100 Urls database tooks 6 seconds
- a 11000 Urls database tooks 9 seconds (now the adult database contains 100 000 urls)
- Faster, in our case, is equal to “less redirector needed” : 20 for squidguard, 25 for squirm. Following a chart of redirector usage
- with squirm you need 3 redirectors 60% of time, with squidguard, only 5%
- with squirm you need 7 redirectors 10% of time, with squidguard, less than 0.5 %
- IP adress of client
- User identity (RFC 1413) or login/password
- URL (of course)
- “Class” of redirection (e.g. we can define a class banner, adult, and so on…)
contrib
- For Squirm, some patterns
- To show web usage : a script which describe a VERY APPROXIMATIVE proportion of URL classes (erotic, hacking, mp3, warez) in your cache taille_categorie_squid.pl
- Come very useful scripts to detect pornographic url in squid log (and more) made by Cedric Foll : http://savannah.nongnu.org/projects/pornfind/
- Usability of database depends of your users : MIT students are not golden boys who are not children.
- Some virus-filtering addon exist. They are connected by redirector (squirm or squidguard), and send their files to a virus scanner. Some of them :
- http://viralator.loddington.com a specifically modified version (0.9b2) which work with SquidGuard is available here viralator-squidguard.pl.txt. It’s patched by Ankit Jain. To use it, prepare this redirection :
redirect http://127.0.0.1/viralator.cgi?url=%u
Be careful with Internet Explorer : scanning file with ftp transfert doesn’t work properly. - http://www.hycomat.co.uk/viromat/
- http://viralator.loddington.com a specifically modified version (0.9b2) which work with SquidGuard is available here viralator-squidguard.pl.txt. It’s patched by Ankit Jain. To use it, prepare this redirection :
Some databases
For all information on database (contributors, size, download method look at this page : http://cri.univ-tlse1.fr/blacklists
Related Projects
- http://www.surbl.org is a site to prevent spam. As we know, porno-business like spam. Surbl will, likely, create a DNS zone for adult web site…
FAQ
- Squid 2.6 isn’t working : it replace
redirect_program
byurl_rewrite_program
redirect_children
byurl_rewrite_children
- A new command appears
url_rewrite_concurrency
- Nothing is blocked Many reasons :
- Unix access right are incorrect. The user who launch squid, is the same who launch squidguard. So, this user must be able to read text database and to write db file, and log file. So, directory looks like this :
drwxr-xr-x 2 root root 1024 avr 2 2001 logs -rw-r----- 1 squid squid 100000 oct 23 08:13 logs/squidGuard.log -rw-r----- 1 squid squid 1000 oct 23 08:13 logs/squidGuard.error drwxr-xr-x 2 root root 1024 avr 2 2001 db drwxr-xr-x 2 root root 1024 avr 2 2001 db/dest drwxr-xr-x 2 squid squid 1024 avr 2 2001 db/dest/adult -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/domains -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/domains.db -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/urls -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/urls.db ... drwxr-xr-x 2 squid squid 1024 avr 2 2001 db/dest/warez ... drwxr-xr-x 2 root root 1024 avr 2 2001 db/src ...
- You forgot
none
at the end of rule. Default is “accept” - You made a syntax error. temporal definition are quite tricky.
- Unix access right are incorrect. The user who launch squid, is the same who launch squidguard. So, this user must be able to read text database and to write db file, and log file. So, directory looks like this :
- I can’t download blacklist. Many reasons :
- You, or your entreprise, are protected by a “low level” firewall which is unable to understand “active ftp”. Check your ftp client, and change mode to “passive ftp”.
- You, or your entreprise, are protected by a “too sensitive” “high level” firewall which say that length of file path are too longue. Firewall1 (Checkpoint) is one of these. Look at ftp://ftp.univ-tlse1.fr/blacklist which is a link to ftp://ftp.univ-tlse1.fr/pub/cache/squidguard_contrib. May Help
Definition
- Redirector :
Official Squid FAQ definition : http://squid.nlanr.net/Squid/FAQ/FAQ-15.html
A redirector is a program which connects to Squid and allows to “translate” URLs before sending them to Squid Process :- for restricting access (erotic or financial Urls)
- for stripping banners to accelerate web usage
- for redirecting Urls on a local mirror (Netscape Navigator downloading e.g.)
To put them in place, uncomment redirector line in squid.conf and enter children processes you need : redirect_program /usr/local/squidGuard/bin/squidGuard
redirect_children 20
Common redirectors use from 800 Ko to 1600 Ko.
You can also, if you don’t care about very rare “workaround” (less than 0.01%) accept to bypass redirector if they are all busy, by adding this line. redirector_bypass on