Apr 20 2010

mod_wsgi 的两种模式

Category: 技术ssmax @ 18:41:47

mod_wsgi 有两种运行模式,

第一种是嵌入模式,类似于mod_python,直接在apache进程中运行,这样的好处是不需要另外增加进程,但是坏处也很明显,所有内存都和apache共享,如果和mod_python一样造成内存漏洞的话,就会危害整个apache。而且如果apache是用worker mpm,mod_wsgi也就强制进入了线程模式,这样子对于非线程安全的程序来说就没法用了。

这种模式下只需要在apache下面设置
WSGIScriptAlias /path /path-to-wsgi

即可生效,对于小型脚本的话,直接用这种模式即可。

第二种是后台模式,类似于FastCGI的后台,mod_wsgi会借apache的外壳,另外启动一个或多个进程,然后通过socket通信和apache的进程联系。

这种方式只要使用以下配置即可开启:
#启动WSGI后台,site1是后台名字
WSGIDaemonProcess site1 processes=2 threads=15 display-name=%{GROUP}

#分配当前上下文应该使用哪个WSGI后台,可以放在Location里面指定
WSGIProcessGroup site1

#根据当前上下文的ProcessGroup分配到对应的后台
WSGIScriptAlias /path /path-to-wsgi

后台模式由于是与apache进程分离了,内存独立,而且可以独立重启,不会影响apache的进程,如果你有多个项目(django),可以选择建立多个后台或者共同使用一个后台。

比如在同一个VirtualHost里面,不同的path对应不同的django项目,可以同时使用一个Daemon:

WSGIDaemonProcess default processes=1 threads=1 display-name=%{GROUP}

WSGIProcessGroup default

WSGIScriptAlias /project1 “/home/website/project1.wsgi”

WSGIScriptAlias /project2 “/home/website/project2.wsgi”

这样子两个django都使用同一个WSGI后台。

也可以把不同的项目分开,分开使用不同的后台,这样开销比较大,但就不会耦合在一起了。

display-name是后台进程的名字,这样方便重启对应的进程,而不需要全部杀掉。

WSGIDaemonProcess site1 processes=1 threads=1 display-name=%{GROUP}

WSGIDaemonProcess site2 processes=1 threads=1 display-name=%{GROUP}

<Location “/project1″>
WSGIProcessGroup site1
</Location>
WSGIScriptAlias /project1 “/home/website/project1.wsgi”

<Location “/project1″>
WSGIProcessGroup site2
</Location>
WSGIScriptAlias /project2 “/home/website/project2.wsgi”

对于django 1.0以下的版本,由于官方认定不是线程安全的,所以建议使用多进程单线程模式

processes=n threads=1

但是我自己在用django 0.9.6,使用多线程模式在很多项目里面基本都没有问题,包括在worker模式下面使用mod_python,其实是一样的道理,呵呵。

升级到django 1.0以后,就可以放心的使用多进程多线程模式了:

processes=2 threads=64

这样子性能会更好。

下面是两种模式的英文原文:

When hosting WSGI applications using mod_wsgi, one of two primary modes of operation can be used. In ‘embedded’ mode, mod_wsgi works in a similar way to mod_python in that the Python application code will be executed within the context of the normal Apache child processes. WSGI applications when run in this mode will therefore share the same processes as other Apache hosted applications using Apache modules for PHP and Perl.

An alternate mode of operation available with Apache 2.X on UNIX is ‘daemon’ mode. This mode operates in similar ways to FASTCGI/SCGI solutions, whereby distinct processes can be dedicated to run a WSGI application. Unlike FASTCGI/SCGI solutions however, a separate infrastructure is not needed when implementing the WSGI application and everything is handled automatically by mod_wsgi.

Because the WSGI applications in daemon mode are being run in their own processes, the impact on the normal Apache child processes used to serve up static files and host applications using Apache modules for PHP, Perl or some other language is much reduced. Daemon processes may if required also be run as a distinct user ensuring that WSGI applications cannot interfere with each other or access information they shouldn’t be able to.


Apr 09 2010

18位身份证号码的规则和简易验证算法JAVA版

Category: 技术ssmax @ 21:11:28

规则内容:

GB2011643199920B_1127449616

关于身份证第18是怎么计算的,原理如下:根据〖中华人民共和国国家标准 GB 11643-1999〗中有关公民身份号码的规定,公民身份号码是特征组合码,由十七位数字本体码和一位数字校验码组成。排列顺序从左至右依次为:六位数字地址码,八位数字出生日期码,三位数字顺序码和一位数字校验码。
地址码(身份证前六位)表示编码对象常住户口所在县(市、旗、区)的行政区划代码。(所有区域的编码可以到统计局网站http://www.stats.gov.cn/tjbz/index.htm
查询到最新的县及县以上的行政编码资料。)
生日期码(身份证第七位到第十四位)表示编码对象出生的年、月、日,其中年份用四位数字表示,年、月、日之间不用分隔符。例如:1981年05月11日就用19810511表示。
顺序码(身份证第十五位到十七位)为同一地址码所标识的区域范围内,对同年、月、日出生的人员编定的顺序号。其中第十七位奇数分给男性,偶数分给女性。
校验码(身份证最后一位)是根据前面十七位数字码,按照ISO 7064:1983.MOD 11-2校验码计算出来的检验码。

第十八位数字的计算方法为:
1.将前面的身份证号码17位数分别乘以不同的系数。从第一位到第十七位的系数分别为:7 9 10 5 8 4 2 1 6 3 7 9 10 5 8 4 2
2.将这17位数字和系数相乘的结果相加。
3.用加出来和除以11,看余数是多少?
4余数只可能有0 1 2 3 4 5 6 7 8 9 10这11个数字。其分别对应的最后一位身份证的号码为1 0 X 9 8 7 6 5 4 3 2。
5.通过上面得知如果余数是2,就会在身份证的第18位数字上出现罗马数字的Ⅹ。如果余数是10,身份证的最后一位号码就是2。

例如:某男性的身份证号码是34052419800101001X。我们要看看这个身份证是不是合法的身份证。
首先:我们得出,前17位的乘积和是189
然后:用189除以11得出的结果是17 + 2/11,也就是说余数是2。
最后:通过对应规则就可以知道余数2对应的数字是x。所以,这是一个合格的身份证号码。

刚好某个项目要用到,写了一个简单的jar包,可以查地区和生日、性别,更新上来 

cnnid.jar内附源代码

用法:

  1. import net.ssmax.commons.cnnid.*
  2. public class TestNid {
  3. public static void main(String[] args) {
  4.   NidCard nid = new NidCard(“xxxxxxxxxxxxxxx”);
  5.   System.out.println(nid.isValid());
  6.   System.out.println(nid.getArea());
  7.   System.out.println(nid.getBirthday());
  8.   System.out.println(nid.getSex());
  9. }
  10. }


Mar 23 2010

ubuntu 的多语言环境和 locale

Category: 技术ssmax @ 10:48:50

前两天准备材料的时候,在虚拟机上面装了个ubuntu玩了下,装的时候选择的是英文环境,刚好要准备字符编码的材料,试了一下
shell> export LANG=zh_CN.GBK

没反应,默认是C了,
shell> locale -a
C
en_US.utf8
POSIX

看了下,果然啥都没

shell> apt-cache search language

找到一大堆,装个中文的看看先

shell> apt-get install language-pack-zh language-pack-zh-base

装好了locale -a 看看
C
en_US.utf8
POSIX
zh_CN.utf8
zh_HK.utf8
zh_SG.utf8
zh_TW.utf8

竟然全部是utf8编码的,郁闷,哈哈

shell> ls /var/lib/locales/supported.d/
local zh

原来所有系统支持的编码都放在这里

编辑 /var/lib/locales/supported.d/zh
增加
zh_CN.GBK GBK
zh_CN.GB2312 GB2312

然后执行
shell> locale-gen
或者 shell> dpkg-reconfigure locales
重新生成locale

然后看看是不是增加成功了?
shell> locale -a
C
en_US.utf8
POSIX
zh_CN.gb2312
zh_CN.gbk
zh_CN.utf8
zh_HK.utf8
zh_SG.utf8
zh_TW.utf8


Mar 18 2010

Deprecate tcp_tw_{reuse,recycle}

Category: 技术ssmax @ 15:48:22

慎用 tcp_tw_{reuse,recycle} 内核参数
以前在我的blog里面写了解决TIME WAIT 连接过多的方法之一是设置tcp快速回收

但是最近经常爆出的一些bug表明,tcp_tw_recycle 开启的情况下,会对内网NAT出来的访问有一定影响,由于开启这个功能后,内核会认为同一个ip只会有一个timestamp生效,如果网关出来的timestamp不一样的话,服务器端就会drop掉这些tcp帧。

建议大家慎用tcp_tw_recycle 和 tcp_tw_reuse 这两个参数。具体原文如下:

We’ve recently had a long discussion about the CVE-2005-0356 time stamp denial-of-service
attack. It turned out that Linux is only vunerable to this problem when tcp_tw_recycle
is enabled (which it is not by default).
In general these two options are not really usable in today’s internet because they
make the (often false) assumption that a single IP address has a single TCP time stamp /
PAWS clock. This assumption breaks both NAT/masquerading and also opens Linux to denial
of service attacks (see the CVE description)
Due to these numerous problems I propose to remove this code for 2.6.26
Signed-off-by: Andi Kleen
Index: linux/Documentation/feature-removal-schedule.txt
===================================================================
— linux.orig/Documentation/feature-removal-schedule.txt
+++ linux/Documentation/feature-removal-schedule.txt
@@ -354,3 +354,15 @@ Why: The support code for the old firmwa
and slightly hurts runtime performance. Bugfixes for the old firmware
are not provided by Broadcom anymore.
Who: Michael Buesch
+
+—————————
+
+What: Support for /proc/sys/net/ipv4/tcp_tw_{reuse,recycle} = 1
+When: 2.6.26
+Why: Enabling either of those makes Linux TCP incompatible with masquerading and
+ also opens Linux to the CVE-2005-0356 denial of service attack. And these
+ optimizations are explicitely disallowed by some benchmarks. They also have
+ been disabled by default for more than ten years so they’re unlikely to be used
+ much. Due to these fatal flaws it doesn’t make sense to keep the code.
+Who: Andi Kleen
+


Mar 09 2010

Mysql 全文索引的中文问题 (Mediawiki搜索中文问题)

Category: 技术ssmax @ 15:24:59

今天翻了一下meidawiki的源代码,由于它的中文搜索不太准确,想查查原因,就看了一下它的搜索是如何实现的。

数据库是mysql,使用了全文索引表进行搜索

CREATE TABLE `searchindex` (
`si_page` int(10) unsigned NOT NULL,
`si_title` varchar(255) NOT NULL DEFAULT ”,
`si_text` mediumtext NOT NULL,
UNIQUE KEY `si_page` (`si_page`),
FULLTEXT KEY `si_title` (`si_title`),
FULLTEXT KEY `si_text` (`si_text`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

mysql的FULLTEXT 对中文的支持一直不太好,如果直接用utf8字符串的话,没有分词分隔符,所以索引就没有效果,wiki通过取巧的方法,把utf8字符转换成U8xxxx进行保存,用英文空格分隔,所以就可以搜索了。

wiki的字符转换代码,比较有用,呵呵:

cat wiki/languages/classes/LanguageZh_cn.php

/**
* @addtogroup Language
*/
class LanguageZh_cn extends Language {
function stripForSearch( $string ) {
# MySQL fulltext index doesn't grok utf-8, so we
# need to fold cases and convert to hex
# we also separate characters as "words"
if( function_exists( 'mb_strtolower' ) ) {
return preg_replace(
"/([\\xc0-\\xff][\\x80-\\xbf]*)/e",
"' U8' . bin2hex( \"$1\" )",
mb_strtolower( $string ) );
} else {
list( , $wikiLowerChars ) = Language::getCaseMaps();
return preg_replace(
"/([\\xc0-\\xff][\\x80-\\xbf]*)/e",
"' U8' . bin2hex( strtr( \"\$1\", \$wikiLowerChars ) )",
$string );
}
}
}

上面的代码就会把汉字转换为U8xxxx空格,然后就可以使用mysql的full text索引了,其实5.0之后的mysql可以使用utf8字符做全文索引了,但是由于分词的问题,还是需要把每个汉字用空格分开,而且要设置最小索引字符长度才行,所以还是wiki的这种方式方便。

因为它是一个汉字作为一个词,没有按顺序搜索,所以最后结果和中国人的语言习惯不太一样,其实只需要改一下源代码,使用冒号封装短语,就可以得出比较精确的结果了。

vim wiki/includes/SearchMySQL4.php

找到以下代码

if( $this->strictMatching && ($terms[1] == '') ) {
$terms[1] = '+';
}
$searchon .= $terms[1] . $wgContLang->stripForSearch( $terms[2] );

修改为


if( $this->strictMatching && ($terms[1] == '') ) {
// $terms[1] = '+';
$terms[1] = '+"';
}
$searchon .= $terms[1] . $wgContLang->stripForSearch( $terms[2] ) . '"';

即可精确搜索。


Feb 03 2010

Openvpn中tun和tap的区别

Category: 技术ssmax @ 22:25:03

tun devices encapsulate IPv4 or IPv6 (OSI Layer 3) while tap devices encapsulate Ethernet 802.3 (OSI Layer 2).

今天搞了一个下午,把几个地区的网络用openvpn连起来了,如果用tun的话,就是模拟了一个p2p的环境,虽然能够连接到同网段别的ip,但是无法广播,这样就无法实现到某些网段的跳转网关了。

后来才看到有tap方式,以前一直没留意这个有什么用,查了手册才发现这个是模拟一个局域网的环境,非常赞,广播有了,怎么指定网关都可以了,哈哈。

ps:今天终于买到火车票。。。真难买啊。


Dec 24 2009

Linux Kernel Configuration

Category: 技术ssmax @ 15:38:59
You can determine the amount of System V IPC resources available by looking at the contents of the following files:
  /proc/sys/kernel/shmmax - The maximum size of a shared memory segment.
  /proc/sys/kernel/shmmni - The maximum number of shared memory segments.
  /proc/sys/kernel/shmall - The maximum amount of shared memory
                              that can be allocated.
  /proc/sys/kernel/sem    - The maximum number and size of semaphore sets
                              that can be allocated.
For example, to view the maximum size of a shared memory segment that can be created enter:
  cat /proc/sys/kernel/shmmax

To change the maximum size of a shared memory segment to 256 MB enter:

  echo 268435456 > /proc/sys/kernel/shmmax

To view the maximum number of semaphores and semaphore sets which can be created enter:

cat /proc/sys/kernel/sem

This returns 4 numbers indicating:

 SEMMSL - The maximum number of semaphores in a sempahore set
 SEMMNS - The maximum number of sempahores in the system
 SEMOPM - The maximum number of operations in a single semop call
 SEMMNI - The maximum number of sempahore sets

 For WebSphere MQ:

  • the SEMMSL value must be 128 or greater
  • the SEMOPM value must be 5 or greater
  • the SEMMNS value must be 16384 or greater
  • the SEMMNI value must be 1024 or greater

 To increase the maximum number of semaphores available to WebSphere MQ, you should update the SEMMNS and SEMMNI values.

 

Maximum open files

If the system is heavily loaded, you might need to increase the maximum possible number of open files. If your distribution supports the proc filesystem you can do this by issuing the following command:  echo 32768 > /proc/sys/fs/file-max

If you are using a pluggable security module such as PAM (Pluggable Authentication Module), ensure that this does not unduly restrict the number of open files for the ‘mqm’ user.

 

TCP Tuning Background

The following is a summary of techniques to maximize TCP WAN throughput.

TCP uses what is called the “congestion window”, or CWND, to determine how many packets can be sent at one time. The larger the congestion window size, the higher the throughput. The TCP “slow start” and “congestion avoidance” algorithms determine the size of the congestion window. The maximum congestion window is related to the amount of buffer space that the kernel allocates for each socket. For each socket, there is a default value for the buffer size, which can be changed by the program using a system library call just before opening the socket. There is also a kernel enforced maximum buffer size. The buffer size can be adjusted for both the send and receive ends of the socket.

To get maximal throughput it is critical to use optimal TCP send and receive socket buffer sizes for the link you are using. If the buffers are too small, the TCP congestion window will never fully open up. If the receiver buffers are too large, TCP flow control breaks and the sender can overrun the receiver, which will cause the TCP window to shut down. This is likely to happen if the sending host is faster than the receiving host. Overly large windows on the sending side is not a big problem as long as you have excess memory.

The optimal buffer size is twice the bandwidth*delay product of the link:

buffer size = 2 * bandwidth * delay

The ping program can be used to get the delay, and tools such as pathrate to get the end-to-end capacity (the bandwidth of the slowest hop in your path). Since ping gives the round trip time (RTT), this formula can be used instead of the previous one:

buffer size = bandwidth * RTT.

For example, if your ping time is 50 ms, and the end-to-end network consists of all 100 BT Ethernet and OC3 (155 Mbps), the TCP buffers should be .05 sec * (100 Mbits / 8 bits) = 625 KBytes. (When in doubt, 10 MB/s is a good first approximation for network bandwidth on high-speed R and E networks like ESnet).

There are 2 TCP settings you need to know about. The default TCP send and receive buffer size, and the maximum TCP send and receive buffer size. Note that most of UNIX OS’s by default have a maximum TCP buffer size that is way too small 1 Gbps pipes, and all have a maximum that is too small for 10 Gbps flows. For instructions on how to increase the maximum TCP buffer, see the OS specific instructions for setting system defaults.

Linux, FreeBSD, Windows, and OSX all now support TCP autotuning, so you no longer need to worry about setting the default buffer sizes. But for Solaris or other older OSes you’ll need to use the UNIX setsockopt call in your sender and receiver to set the optimal buffer size for the link you are using.

/proc/sys/net/core/rmem_max - Maximum TCP Receive Window
/proc/sys/net/core/wmem_max – Maximum TCP Send Window
/proc/sys/net/ipv4/tcp_timestamps – timestamps (RFC 1323) add 12 bytes to the TCP header…
/proc/sys/net/ipv4/tcp_sack – tcp selective acknowledgements.
/proc/sys/net/ipv4/tcp_window_scaling – support for large TCP Windows (RFC 1323). Needs to be set to 1 if the Max TCP Window


Dec 21 2009

64位的Linux中运行32位的应用程序

Category: 技术ssmax @ 15:22:44

    大部分Linux发行套件都有针对x86_64处理器的版本。比较典型的x86_64的处理器有ADM Athlon II和英特尔Xeon。因为这些Linux发行套件都有自己专用的软件源,这些软件源会为提供所有它所支持的应用软件的二进制包。如果你满足于Linux的安装方式,你可能不会需要运行32位的程序。

    一些Linux商业软件,尤其是游戏,只提供32的版本。因为某些特殊的理由,你可能需要配置你的电脑来运行32位的软件。

    而在64位linux下运行这些32位系统的时候,经常会出现:

    No such file or directory 错误。

    只需要安装32位的支持库,就可以解决以上问题。

    因为x86_64处理器是为x86技术涉及,所以它也是支持32位程序的。在Linux里,你所需要做的就是为这些软件安装必要的软件库。幸运的是,大部分Linux发行版本已经将这些打包好了。比方在Ubuntu/debian里,这个包就叫做ia32-libs。为了安装它,你可以打开一个终端,然后输入下面的内容:

    sudo apt-get install ia32-libs


Dec 15 2009

windows2003不能自动分配USB移动硬盘盘符的解决方法

Category: 技术ssmax @ 22:52:35
windows2003不能自动分配USB移动硬盘盘符的解决方法
开始——>运行——>mountvol /e——>回车——>重启机器,win2003就会自动分配盘符给USB移动硬盘


Nov 24 2009

虚拟机 ubuntu vga 分辨率

Category: 技术ssmax @ 14:27:52

#  FRAMEBUFFER RESOLUTION SETTINGS
#     +————————————————-+
#          | 640×480    800×600    1024×768   1280×1024
#      —-+——————————————–
#      256 | 0×301=769  0×303=771  0×305=773   0×307=775
#      32K | 0×310=784  0×313=787  0×316=790   0×319=793
#      64K | 0×311=785  0×314=788  0×317=791   0x31A=794
#      16M | 0×312=786  0×315=789  0×318=792   0x31B=795
#     +————————————————-+

 

ubuntu 9.10 使用了最新的grub2,启动参数好像有不少变动,虚拟机的分辨率调整:

方法1,还是原来的vga=788

编辑 /etc/default/grub ,中的GRUB_CMDLINE_LINUX=”vga=788″

保存以后运行update-grub

但是这样子会显示

vga=788 is deprecated and asks me to use “set gfxpayload=800x600x16;800×600″ before the linux line.

意思就是vga参数已经是建议不要使用了,要用另外一种方法:

方法2:

编辑/boot/grub/grub.cfg

找到引导linux那几行

增加 set gfxpayload=800x600x16,注意不要带分号,如下:

### BEGIN /etc/grub.d/10_linux ###
menuentry “Ubuntu, Linux 2.6.31-14-generic-pae” {
        recordfail=1
        if [ -n ${have_grubenv} ]; then save_env recordfail; fi
        set quiet=1
        insmod ext2
        set root=(hd0,1)
        search –no-floppy –fs-uuid –set 9a441a57-5a71-4800-b46d-2e4c1cec6dee
        set gfxpayload=800x600x16
        linux   /boot/vmlinuz-2.6.31-14-generic-pae root=UUID=9a441a57-5a71-4800-b46d-2e4c1cec6dee ro   quiet splash
        initrd  /boot/initrd.img-2.6.31-14-generic-pae
}


Next Page »