Jan 22 2009
Flash 的 URLRequestHeader
测试了一天,发现flash新版已经不支持在GET里面用自定义的URLRequestHeader了,必须要在POST里面才能用,日了,无语了。。。Get的话还是要用random string才能忽略浏览器的缓存啊。郁闷。
这样子又要回到squid里面设置rewrite来忽略query string了,ooxx。。。
Due to browser limitations, custom HTTP request headers are only supported for POST requests, not for GET requests.
—
This restriction will be included in the next version of the Language Reference.
Jan 21 2009
Squidguard
今天详细研究了一下squid的命中率,发现如果带参数的话,squid基本都设置成no_cache的,就算你cache了,也是带参数的url作为一个cache,这样的话就有可能会引起频繁返源的问题。。。查了一下,squid有个redirector的配置,可以改变url的,有类似的几种软件,或者自己写脚本,不过脚本的效率可能不是太高,还是用c写的软件比较好。
下面有介绍除了squidguard之外还有哪些比较多人用的redirector。
What’s New
- A patched version of squidguard
- a new software : Hostess
- a new software : soulcatcher
- A mailing list is created
- A Web interface to add or remove urls from our databases
Introduction
This is not the official SquiGuard homepage, but only an happy user’s page :
official homepage was http://www.squidguard.org. It seems out of service. Mirko Lorenz created a mirror here. SquidGuard is a redirector which uses sleepycat‘s version of Berkeley Database
Its authors are
- Pål Marius Baltzersen
- Lars-Erik Håland
A new patched version of squidGuard 1.2.10, with this REAME is available. This is a compilation of patch from many contributors. I didn’t even change a dot. Thanks to Franck Bourdonnec to suggest this packaged version.
The last stable version is 1.2.0.
Here it is the ChangeLog
It needs a recent version of Berkeley Database (> 3.2 but < 4.x)
An ftp directory is available here, in France at : ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidGuard/ I began a contrib directory here : ftp://ftp.univ-tlse1.fr/pub/reseau/cache/squidguard_contrib/
It has new interesting features
- It can filter the surfing duration on a user basis.
You can find some explanations in other languages here :
- In german. You will find some scripts to update database and a german database
- In english.
- An unofficial Faq
Comparison
Competitors
SquidGuard and its competitors
- ufdbguard is a free and fast (8-9 time faster than squidguard) redirector which can use an commercial URL database : http://www.urlfilterdb.com
- http://tatewake.com/projects.htm Tatewake works on windows and is written in JAVA. It use use squidguard’s database to create a hosts file for windows.
- http://soulcatcher.sourceforge.net Soulcatcher is very similar to squidguard
- http://software.othello.ch/mod_dnsbl a blocking DNS-based module designed apache and Squid. It can use our database.
- An access module for filtering ISA server
- redirector from Ian Lea is written in Perl. It was the first redirector available. Slow, and hungry of memory.
- Squirm 1.0b First redirector written in C. Faster, and more careful of memory usage than redirector. No more available.
- JunkBuster which is not really a redirector (in Squid sense) but more a second proxy to connect to squid. It strips banners in web page.
- Jesred “Son” of squirm. Faster, a little more memory is needed.
- Custodian. It “hashed” its blocklist to hide urls. No more available file. I haven’t test it.
- RedServer. Another one. No more available.
- DansGuardian Son of active guardian, it looks for content and PICS level. It can filter faster than squidGuard (I haven’t test it yet, I would soon) and it’s now a competitor, no more a complement of SquidGuard. It’s a filtering proxy and not a redirector.
- Many commercial sites sells filtering databases and software.
Advantages
- It’s a lot faster : for a 2.000 Urls list to filter and a 11.000 Urls database, on a pentium 233 :
- Squirm : 2 minutes 25 seconds
- Jesred : 1 minute 45 seconds
- SquidGuard : 9 seconds !!!
We can say, it doesn’t care of database size :
- a 100 Urls database tooks 6 seconds
- a 11000 Urls database tooks 9 seconds (now the adult database contains 100 000 urls)
- Faster, in our case, is equal to “less redirector needed” : 20 for squidguard, 25 for squirm. Following a chart of redirector usage
- with squirm you need 3 redirectors 60% of time, with squidguard, only 5%
- with squirm you need 7 redirectors 10% of time, with squidguard, less than 0.5 %
- IP adress of client
- User identity (RFC 1413) or login/password
- URL (of course)
- “Class” of redirection (e.g. we can define a class banner, adult, and so on…)
contrib
- For Squirm, some patterns
- To show web usage : a script which describe a VERY APPROXIMATIVE proportion of URL classes (erotic, hacking, mp3, warez) in your cache taille_categorie_squid.pl
- Come very useful scripts to detect pornographic url in squid log (and more) made by Cedric Foll : http://savannah.nongnu.org/projects/pornfind/
- Usability of database depends of your users : MIT students are not golden boys who are not children.
- Some virus-filtering addon exist. They are connected by redirector (squirm or squidguard), and send their files to a virus scanner. Some of them :
- http://viralator.loddington.com a specifically modified version (0.9b2) which work with SquidGuard is available here viralator-squidguard.pl.txt. It’s patched by Ankit Jain. To use it, prepare this redirection :
redirect http://127.0.0.1/viralator.cgi?url=%u
Be careful with Internet Explorer : scanning file with ftp transfert doesn’t work properly. - http://www.hycomat.co.uk/viromat/
- http://viralator.loddington.com a specifically modified version (0.9b2) which work with SquidGuard is available here viralator-squidguard.pl.txt. It’s patched by Ankit Jain. To use it, prepare this redirection :
Some databases
For all information on database (contributors, size, download method look at this page : http://cri.univ-tlse1.fr/blacklists
Related Projects
- http://www.surbl.org is a site to prevent spam. As we know, porno-business like spam. Surbl will, likely, create a DNS zone for adult web site…
FAQ
- Squid 2.6 isn’t working : it replace
redirect_program
byurl_rewrite_program
redirect_children
byurl_rewrite_children
- A new command appears
url_rewrite_concurrency
- Nothing is blocked Many reasons :
- Unix access right are incorrect. The user who launch squid, is the same who launch squidguard. So, this user must be able to read text database and to write db file, and log file. So, directory looks like this :
drwxr-xr-x 2 root root 1024 avr 2 2001 logs -rw-r----- 1 squid squid 100000 oct 23 08:13 logs/squidGuard.log -rw-r----- 1 squid squid 1000 oct 23 08:13 logs/squidGuard.error drwxr-xr-x 2 root root 1024 avr 2 2001 db drwxr-xr-x 2 root root 1024 avr 2 2001 db/dest drwxr-xr-x 2 squid squid 1024 avr 2 2001 db/dest/adult -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/domains -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/domains.db -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/urls -rw-r--r-- 1 squid squid 1024 avr 2 2001 db/dest/adult/urls.db ... drwxr-xr-x 2 squid squid 1024 avr 2 2001 db/dest/warez ... drwxr-xr-x 2 root root 1024 avr 2 2001 db/src ...
- You forgot
none
at the end of rule. Default is “accept” - You made a syntax error. temporal definition are quite tricky.
- Unix access right are incorrect. The user who launch squid, is the same who launch squidguard. So, this user must be able to read text database and to write db file, and log file. So, directory looks like this :
- I can’t download blacklist. Many reasons :
- You, or your entreprise, are protected by a “low level” firewall which is unable to understand “active ftp”. Check your ftp client, and change mode to “passive ftp”.
- You, or your entreprise, are protected by a “too sensitive” “high level” firewall which say that length of file path are too longue. Firewall1 (Checkpoint) is one of these. Look at ftp://ftp.univ-tlse1.fr/blacklist which is a link to ftp://ftp.univ-tlse1.fr/pub/cache/squidguard_contrib. May Help
Definition
- Redirector :
Official Squid FAQ definition : http://squid.nlanr.net/Squid/FAQ/FAQ-15.html
A redirector is a program which connects to Squid and allows to “translate” URLs before sending them to Squid Process :- for restricting access (erotic or financial Urls)
- for stripping banners to accelerate web usage
- for redirecting Urls on a local mirror (Netscape Navigator downloading e.g.)
To put them in place, uncomment redirector line in squid.conf and enter children processes you need : redirect_program /usr/local/squidGuard/bin/squidGuard
redirect_children 20
Common redirectors use from 800 Ko to 1600 Ko.
You can also, if you don’t care about very rare “workaround” (less than 0.01%) accept to bypass redirector if they are all busy, by adding this line. redirector_bypass on
Jan 18 2009
新的服务器
每年都要搞一下。。。去年是在hostmonster,今年改到swvps,12美金一个月
这次付款的时候突然发现paypal可以自动购汇,美金可以自动转换成rmb,好像非常强悍。。。不知道支不支持银行卡了,正在验证中。。。
Jan 13 2009
linux的网关功能
今天发现北京到杭州的一条ip隧道不通了,不知道为啥,发出来的包好像被本机吞掉了,没有继续发下去的样子,检查了半天,最后发现是sysctl里面被人改了。forwarding本来默认是1的,现在默认是0了,真tmd郁闷,重新开了,把ip隧道重启一下就好了。。。
net.ipv4.conf.default.forwarding=1
net.ipv4.conf.all.forwarding=1
Jan 12 2009
查询和修改bios时间
linux修改了时间以后貌似不会立刻同步到bios,好像是要重启啊之类的时候才会执行这个同步,很多服务器都是n年没有重启过的,所以有时候服务器突然断电,启动之后的时间会和原来的相差很大。
有一个办法在linux下面更新硬件时间,就是
hwclock -w
如果出现错误:
select() to /dev/rtc to wait for clock tick timed out
一般就是设备的类型问题,指定用isa设备就可以了
hwclock -w –directisa
一般我们就是用一条命令
hwclock -w; [ $? -ne 0 ] && hwclock -w –directisa;
Jan 06 2009
Job Scheduling Algorithms in Linux Virtual Server
Job Scheduling Algorithms in Linux Virtual Server
This page describes the job scheduling algorithms implemented in Linux Virtual Server.
keepalive configure file
lb_algo rr|wrr|lc|wlc|sh|dh|lblc
Round-Robin Scheduling
Weighted Round-Robin Scheduling
Least-Connection Scheduling
Weighted Least-Connection Scheduling
Locality-Based Least-Connection Scheduling
Locality-Based Least-Connection with Replication Scheduling
Destination Hashing Scheduling
Source Hashing Scheduling
Shortest Expected Delay Scheduling
Never Queue Scheduling
Round-Robin Scheduling
The round-robin scheduling algorithm sends each incoming request to the next server in it’s list. Thus in a three server cluster (servers A, B and C) request 1 would go to server A, request 2 would go to server B, request 3 would go to server C, and request 4 would go to server A, thus completing the cycling or ’round-robin’ of servers. It treats all real servers as equals regardless of the number of incoming connections or response time each server is experiencing. Virtual Server provides a few advantages over traditional round-robin DNS. Round-robin DNS resolves a single domain to the different IP addresses, the scheduling granularity is host-based, and the caching of DNS queries hinders the basic algorithm, these factors lead to significant dynamic load imbalances among the real servers. The scheduling granularity of Virtual Server is network connection-based, and it is much superior to round-robin DNS due to the fine scheduling granularity.
Weighted Round-Robin Scheduling
The weighted round-robin scheduling is designed to better handle servers with different processing capacities. Each server can be assigned a weight, an integer value that indicates the processing capacity. Servers with higher weights receive new connections first than those with less weights, and servers with higher weights get more connections than those with less weights and servers with equal weights get equal connections. For example, the real servers, A, B and C, have the weights, 4, 3, 2 respectively, a good scheduling sequence will be AABABCABC in a scheduling period (mod sum(Wi)). In the implementation of the weighted round-robin scheduling, a scheduling sequence will be generated according to the server weights after the rules of Virtual Server are modified. The network connections are directed to the different real servers based on the scheduling sequence in a round-robin manner.
The weighted round-robin scheduling is better than the round-robin scheduling, when the processing capacity of real servers are different. However, it may lead to dynamic load imbalance among the real servers if the load of the requests vary highly. In short, there is the possibility that a majority of requests requiring large responses may be directed to the same real server.
Actually, the round-robin scheduling is a special instance of the weighted round-robin scheduling, in which all the weights are equal.
Least-Connection Scheduling
The least-connection scheduling algorithm directs network connections to the server with the least number of established connections. This is one of the dynamic scheduling algorithms; because it needs to count live connections for each server dynamically. For a Virtual Server that is managing a collection of servers with similar performance, least-connection scheduling is good to smooth distribution when the load of requests vary a lot. Virtual Server will direct requests to the real server with the fewest active connections.
At a first glance it might seem that least-connection scheduling can also perform well even when there are servers of various processing capacities, because the faster server will get more network connections. In fact, it cannot perform very well because of the TCP’s TIME_WAIT state. The TCP’s TIME_WAIT is usually 2 minutes, during this 2 minutes a busy web site often receives thousands of connections, for example, the server A is twice as powerful as the server B, the server A is processing thousands of requests and keeping them in the TCP’s TIME_WAIT state, but server B is crawling to get its thousands of connections finished. So, the least-connection scheduling cannot get load well balanced among servers with various processing capacities.
Weighted Least-Connection Scheduling
The weighted least-connection scheduling is a superset of the least-connection scheduling, in which you can assign a performance weight to each real server. The servers with a higher weight value will receive a larger percentage of live connections at any one time. The Virtual Server Administrator can assign a weight to each real server, and network connections are scheduled to each server in which the percentage of the current number of live connections for each server is a ratio to its weight. The default weight is one.
The weighted least-connections scheduling works as follows:
Supposing there is n real servers, each server i has weight Wi (i=1,..,n), and alive connections Ci (i=1,..,n), ALL_CONNECTIONS is the sum of Ci (i=1,..,n), the next network connection will be directed to the server j, in which
(Cj/ALL_CONNECTIONS)/Wj = min { (Ci/ALL_CONNECTIONS)/Wi } (i=1,..,n)
Since the ALL_CONNECTIONS is a constant in this lookup, there is no need to divide Ci by ALL_CONNECTIONS, it can be optimized as
Cj/Wj = min { Ci/Wi } (i=1,..,n)
The weighted least-connection scheduling algorithm requires additional division than the least-connection. In a hope to minimize the overhead of scheduling when servers have the same processing capacity, both the least-connection scheduling and the weighted least-connection scheduling algorithms are implemented.
Locality-Based Least-Connection Scheduling
The locality-based least-connection scheduling algorithm is for destination IP load balancing. It is usually used in cache cluster. This algorithm usually directs packet destined for an IP address to its server if the server is alive and under load. If the server is overloaded (its active connection numbers is larger than its weight) and there is a server in its half load, then allocate the weighted least-connection server to this IP address.
Locality-Based Least-Connection with Replication Scheduling
The locality-based least-connection with replication scheduling algorithm is also for destination IP load balancing. It is usually used in cache cluster. It differs from the LBLC scheduling as follows: the load balancer maintains mappings from a target to a set of server nodes that can serve the target. Requests for a target are assigned to the least-connection node in the target’s server set. If all the node in the server set are over loaded, it picks up a least-connection node in the cluster and adds it in the sever set for the target. If the server set has not been modified for the specified time, the most loaded node is removed from the server set, in order to avoid high degree of replication.
Destination Hashing Scheduling
The destination hashing scheduling algorithm assigns network connections to the servers through looking up a statically assigned hash table by their destination IP addresses.
Source Hashing Scheduling
The source hashing scheduling algorithm assigns network connections to the servers through looking up a statically assigned hash table by their source IP addresses.
Shortest Expected Delay Scheduling
The shortest expected delay scheduling algorithm assigns network connections to the server with the shortest expected delay. The expected delay that the job will experience is (Ci + 1) / Ui if sent to the ith server, in which Ci is the number of connections on the the ith server and Ui is the fixed service rate (weight) of the ith server.
Never Queue Scheduling
The never queue scheduling algorithm adopts a two-speed model. When there is an idle server available, the job will be sent to the idle server, instead of waiting for a fast one. When there is no idle server available, the job will be sent to the server that minimize its expected delay (The Shortest Expected Delay scheduling algorithm).
Jan 05 2009
crontab 的一点小问题 Temporary crontab no longer owned by you
这几天某台服务器上面的crontab突然不能用了,表现为某个用户的crontab 临时文件不能读取,
Temporary crontab no longer owned by you
发现crontab -e的时候,在/tmp下面生成的临时文件有问题:
drwx—— 2 root crontab 4.0K 2009-01-05 14:27 crontab.TRVZy0
变成root用户的了,难怪普通用户读不到。
查了n久没有啥发现,然后无意中ls -alh /usr/bin/crontab
发现:
-rwsr-sr-x 1 root crontab 26K Dec 20 2006 /usr/bin/crontab
在owner一栏多了一个SUID。。。然后
chown u-s /usr/bin/crontab
ls -alh /usr/bin/crontab
-rwxr-sr-x 1 root crontab 26K Dec 20 2006 /usr/bin/crontab
去掉SUID以后,crontab就一切正常了。。。
drwx—— 2 ssmax crontab 4.0K 2009-01-05 14:29 crontab.fdzKZk
SUID、GUID、粘滞位一直都没有留意,好像是基础课的时候学到的,但是之后都一直都没有怎么用到,所以很容易就忘记了。
各个位的定义:man chmod
The letters ‘rwxXstugo’ select the new permissions for the affected users: read (r), write (w), execute (or access for directories) (x), execute only if the file is a directory or already has execute permission for some user (X), set user or group ID on execution (s), sticky (t), the permissions granted to the user who owns the file (u), the permissions granted to other users who are members of the file’s group (g), and the permissions granted to users that are in neither of the two preceding categories (o).
Jan 04 2009
ssh remote port forwarding
-R [bind_address:]port:host:hostport
Specifies that the given port on the remote (server) host is to be forwarded to the given host and port on the local side. This works by allocat-
ing a socket to listen to port on the remote side, and whenever a connection is made to this port, the connection is forwarded over the secure
channel, and a connection is made to host port hostport from the local machine.
Port forwardings can also be specified in the configuration file. Privileged ports can be forwarded only when logging in as root on the remote
machine. IPv6 addresses can be specified by enclosing the address in square braces or using an alternative syntax:
[bind_address/]host/port/hostport.
By default, the listening socket on the server will be bound to the loopback interface only. This may be overriden by specifying a bind_address.
An empty bind_address, or the address ‘*’, indicates that the remote socket should listen on all interfaces. Specifying a remote bind_address will
only succeed if the server’s GatewayPorts option is enabled (see sshd_config(5)).
上面是man ssh的说明,其实就是一个通道
client> ssh user@proxy.org 54321:localhost:54321
这样子发到 proxy.org 的54321 端口的请求 就会通过通道发送到 client机的54321端口,完成端口转发,但是要注意几个方面
proxy.org 上面的sshd设置,必须打开 GatewayPorts yes,否则proxy.org 只会监听 127.0.0.1 的 54321端口,也就是设备lo的端口。
另外一个就是在windows下面用客户端连接proxy.org ,也能实现remote port forwarding,但是我在securecrt新旧版本上面死活没有试验成功,全部都没有转发过来,但是在putty上面一下子就ok了,不知道是不是securtcrt的问题,懒得再去研究了,嘿嘿。
做这个东西为了就是突破公司的限制,看看能不能加快点bt或者ed的速度,明天继续试验。
« Previous Page — Next Page »