Jan 21 2009

Squidguard

Category: 技术ssmax @ 13:34:40

今天详细研究了一下squid的命中率,发现如果带参数的话,squid基本都设置成no_cache的,就算你cache了,也是带参数的url作为一个cache,这样的话就有可能会引起频繁返源的问题。。。查了一下,squid有个redirector的配置,可以改变url的,有类似的几种软件,或者自己写脚本,不过脚本的效率可能不是太高,还是用c写的软件比较好。

下面有介绍除了squidguard之外还有哪些比较多人用的redirector。

What’s New

Introduction

This is not the official SquiGuard homepage, but only an happy user’s page :
official homepage was http://www.squidguard.org. It seems out of service. Mirko Lorenz created a mirror here. SquidGuard is a redirector which uses sleepycat‘s version of Berkeley Database
Its authors are

  • Pål Marius Baltzersen
  • Lars-Erik Håland

A new patched version of squidGuard 1.2.10, with this REAME is available. This is a compilation of patch from many contributors. I didn’t even change a dot. Thanks to Franck Bourdonnec to suggest this packaged version.
The last stable version is 1.2.0.
Here it is the ChangeLog
It needs a recent version of Berkeley Database (> 3.2 but < 4.x)
An ftp directory is available here, in France at : ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidGuard/ I began a contrib directory here : ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/
It has new interesting features

  • It can filter the surfing duration on a user basis.

You can find some explanations in other languages here :

Comparison

Competitors

SquidGuard and its competitors

Advantages

  • It’s a lot faster : for a 2.000 Urls list to filter and a 11.000 Urls database, on a pentium 233 :
    • Squirm : 2 minutes 25 seconds
    • Jesred : 1 minute 45 seconds
    • SquidGuard : 9 seconds !!!

    We can say, it doesn’t care of database size :

    • a 100 Urls database tooks 6 seconds
    • a 11000 Urls database tooks 9 seconds (now the adult database contains 100 000 urls)
  • Faster, in our case, is equal to “less redirector needed” : 20 for squidguard, 25 for squirm. Following a chart of redirector usage
    • with squirm you need 3 redirectors 60% of time, with squidguard, only 5%
    • with squirm you need 7 redirectors 10% of time, with squidguard, less than 0.5 %
  • It can redirect url depending on :
    • IP adress of client
    • User identity (RFC 1413) or login/password
    • URL (of course)
    • “Class” of redirection (e.g. we can define a class banner, adult, and so on…)
  • If database size doesn’t matter, it means that we can put MANY urls. So we use less generic regular expression, which make many errors (the computer xxx in nasa.gov domains e.g.)
  • contrib

    This part show some personal contributions : scripts, databases and some advice.
    • For Squirm, some patterns
    • To show web usage : a script which describe a VERY APPROXIMATIVE proportion of URL classes (erotic, hacking, mp3, warez) in your cache taille_categorie_squid.pl
    • Come very useful scripts to detect pornographic url in squid log (and more) made by Cedric Foll : http://savannah.nongnu.org/projects/pornfind/
    • Usability of database depends of your users : MIT students are not golden boys who are not children.
    • Some virus-filtering addon exist. They are connected by redirector (squirm or squidguard), and send their files to a virus scanner. Some of them :

    Some databases

    For all information on database (contributors, size, download method look at this page : http://cri.univ-tlse1.fr/blacklists

    Related Projects

    • http://www.surbl.org is a site to prevent spam. As we know, porno-business like spam. Surbl will, likely, create a DNS zone for adult web site…

    FAQ

    The original FAQ can be found here http://www.squidguard.org/faq/. An additional FAQ is http://www.maynidea.com/squidguard/faq-plus.html
    • Squid 2.6 isn’t working : it replace
      • redirect_program by url_rewrite_program
      • redirect_children by url_rewrite_children
      • A new command appears url_rewrite_concurrency
    • Nothing is blocked Many reasons :
      • Unix access right are incorrect. The user who launch squid, is the same who launch squidguard. So, this user must be able to read text database and to write db file, and log file. So, directory looks like this :
        		drwxr-xr-x  2 root     root	     1024 avr  2  2001 logs
        		-rw-r-----  1 squid    squid	   100000 oct 23 08:13 logs/squidGuard.log
        		-rw-r-----  1 squid    squid	     1000 oct 23 08:13 logs/squidGuard.error
        		drwxr-xr-x  2 root     root	     1024 avr  2  2001 db
        		drwxr-xr-x  2 root     root	     1024 avr  2  2001 db/dest
        		drwxr-xr-x  2 squid    squid	     1024 avr  2  2001 db/dest/adult
        		-rw-r--r--  1 squid    squid	     1024 avr  2  2001 db/dest/adult/domains
        		-rw-r--r--  1 squid    squid	     1024 avr  2  2001 db/dest/adult/domains.db
        		-rw-r--r--  1 squid    squid	     1024 avr  2  2001 db/dest/adult/urls
        		-rw-r--r--  1 squid    squid	     1024 avr  2  2001 db/dest/adult/urls.db
        		...
        		drwxr-xr-x  2 squid    squid	     1024 avr  2  2001 db/dest/warez
        		...
        		drwxr-xr-x  2 root     root          1024 avr  2  2001 db/src
        		...
      • You forgot none at the end of rule. Default is “accept”
      • You made a syntax error. temporal definition are quite tricky.
    • I can’t download blacklist. Many reasons :
      • You, or your entreprise, are protected by a “low level” firewall which is unable to understand “active ftp”. Check your ftp client, and change mode to “passive ftp”.
      • You, or your entreprise, are protected by a “too sensitive” “high level” firewall which say that length of file path are too longue. Firewall1 (Checkpoint) is one of these. Look at ftp://ftp.univ-tlse1.fr/blacklist which is a link to ftp://ftp.univ-tlse1.fr/pub/cache/squidguard_contrib. May Help

    Definition

    • Redirector :
      Official Squid FAQ definition : http://squid.nlanr.net/Squid/FAQ/FAQ-15.html
      A redirector is a program which connects to Squid and allows to “translate” URLs before sending them to Squid Process :

      • for restricting access (erotic or financial Urls)
      • for stripping banners to accelerate web usage
      • for redirecting Urls on a local mirror (Netscape Navigator downloading e.g.)

      To put them in place, uncomment redirector line in squid.conf and enter children processes you need : redirect_program /usr/local/squidGuard/bin/squidGuard
      redirect_children 20
      Common redirectors use from 800 Ko to 1600 Ko.
      You can also, if you don’t care about very rare “workaround” (less than 0.01%) accept to bypass redirector if they are all busy, by adding this line. redirector_bypass on