Jax 的工作紀錄: 2013-05

2013-05-30 22:21

[PHP] 從網頁執行 SVN 更新

想說寫一個透過網頁就可以執行 SVN 更新的程式，結果並不是我想得那樣簡單，有一些眉角需要注意的說。

先以 Apache 的使用者帳號執行 SVN checkout，這樣 Apache 才有 SVN 的連結權力，才可以透過網頁執行 SVN update

su -s /bin/bash www-data
cd /var/www
svn checkout http://www.xxx.com/svn/my_site

在用 PHP 執行 shell 指令前要加上 export LANG=C.UTF-8 的環境宣告，不然 SVN update 時遇到中文會出現 error，Ubuntu 的 Apache 預設是 LANG=C
接著要為 SVN 補上 --accept theirs-full 的參數，這是當衝突發生時，都以 SVN Server 的檔案版本為主
最後再加上 2>&1，讓 PHP 可以取得包含錯誤的所有訊息

<?php
putenv('LANG=C.UTF-8');
$result = shell_exec('svn update --accept theirs-full /var/www/my_site 2>&1');
echo nl2br($result);

2013-05-29 23:38

[轉載] (X)HTML Strict 下的嵌套规则

轉載自：(X)HTML Strict 下的嵌套规则

下面是一份在 HTML 4 Strict 和 XHTML 1.0 Strict 下必须遵守的标签嵌套规则，比如你不能在 <a> 里面再嵌入一个 <a> 这样的约定。

说明：

为了方便读者阅读，本文中的标签使用了大写（根据 XHTML 的规则，元素名必须小写，比如 <html> 而不应是 <HTML>）
小写的单词表明一组或一系列 HTML 标签
每一项条目（标签）后都跟随一组标签列表，如果没有这个列表，那么表明该条目（标签）内部不允许包含任何标签。这意味着该条目内部只能包含纯文本内容（#PCDATA，见下文）。如果注明 (empty)，这意味着该条目内部不允许包含任何形式的内容。对于 flow，inline，block，OBJECT 和 BODY，其内部允许包含的内容在文中会单独给出。
#PCDATA 的意思是“parsed character data”，即纯文本内容（不包括任何 HTML 标签，但是转义内容可以存在，比如 ä 和 ä）
CDATA 的意思是“character data”，这意味着不包括转义内容的纯文本内容，详细内容可以参考CDATA Confusion
excluding ... 意即不得直接或者间接的包含所列的元素

注1. 以上内容基于 [HTML 4.01 Specification] 的 Strict DTD。JunChen 翻译自 Allowed nesting of elements in HTML 4 Strict (and XHTML 1.0 Strict)

注2. 对于 XHTML 1.0，基本上一致，不同点如下：

对于 <script> 和 <style> 的内容，在 HTML 4 里是 CDATA 而在 XHTML 里是 #PCDATA
在 XHTML 中，<table> 标签后可以紧跟一个 <tr>，而在 HTML 4.01 里，不允许这样，不过 <tbody> 标签又是可以省略的。意思就是说，如果代码中的 <table> 后紧跟 <tr>，对于 HTML 4.01，会隐性的生成一个 <tbody> 标签，而在 XHTML 里面就没有。这会影响到样式表使用 tbody 作为选择器。

2013-05-29 21:23

[轉載] Default style sheet for HTML 4

轉載自：W3C Appendix D. Default style sheet for HTML 4

This appendix is informative, not normative.

This style sheet describes the typical formatting of all HTML 4 ([HTML4]) elements based on extensive research into current UA practice. Developers are encouraged to use it as a default style sheet in their implementations.

The full presentation of some HTML elements cannot be expressed in CSS 2.1, including replaced elements ("img", "object"), scripting elements ("script", "applet"), form control elements, and frame elements.

For other elements, the legacy presentation can be described in CSS but the solution removes the element. For example, the FONT element can be replaced by attaching CSS declarations to other elements (e.g., DIV). Likewise, legacy presentation of presentational attributes (e.g., the "border" attribute on TABLE) can be described in CSS, but the markup in the source document must be changed.

html, address,
blockquote,
body, dd, div,
dl, dt, fieldset, form,
frame, frameset,
h1, h2, h3, h4,
h5, h6, noframes,
ol, p, ul, center,
dir, hr, menu, pre   { display: block; unicode-bidi: embed; }

li              { display: list-item; }

head            { display: none; }

table           { display: table; }

tr              { display: table-row; }

thead           { display: table-header-group; }

tbody           { display: table-row-group; }

tfoot           { display: table-footer-group; }

col             { display: table-column; }

colgroup        { display: table-column-group; }

td, th          { display: table-cell; }

caption         { display: table-caption; }

th              { font-weight: bolder; text-align: center; }

caption         { text-align: center; }

body            { margin: 8px; }

h1              { font-size: 2em; margin: .67em 0; }

h2              { font-size: 1.5em; margin: .75em 0; }

h3              { font-size: 1.17em; margin: .83em 0; }

h4, p,
blockquote, ul,
fieldset, form,
ol, dl, dir,
menu            { margin: 1.12em 0; }

h5              { font-size: .83em; margin: 1.5em 0; }

h6              { font-size: .75em; margin: 1.67em 0; }

h1, h2, h3, h4,
h5, h6, b,
strong          { font-weight: bolder; }

blockquote      { margin-left: 40px; margin-right: 40px; }

i, cite, em,
var, address    { font-style: italic; }

pre, tt, code,
kbd, samp       { font-family: monospace; }

pre             { white-space: pre; }

button, textarea,
input, select   { display: inline-block; }

big             { font-size: 1.17em; }

small, sub, sup { font-size: .83em; }

sub             { vertical-align: sub; }

sup             { vertical-align: super; }

table           { border-spacing: 2px; }

thead, tbody,
tfoot           { vertical-align: middle; }

td, th, tr      { vertical-align: inherit; }

s, strike, del  { text-decoration: line-through; }

hr              { border: 1px inset; }

ol, ul, dir,
menu, dd        { margin-left: 40px; }

ol              { list-style-type: decimal; }

ol ul, ul ol,
ul ul, ol ol    { margin-top: 0; margin-bottom: 0; }

u, ins          { text-decoration: underline; }

br:before       { content: "\A"; white-space: pre-line; }

center          { text-align: center; }

:link, :visited { text-decoration: underline; }

:focus          { outline: thin dotted invert; }


/* Begin bidirectionality settings (do not change) */
BDO[DIR="ltr"]  { direction: ltr; unicode-bidi: bidi-override; }

BDO[DIR="rtl"]  { direction: rtl; unicode-bidi: bidi-override; }

*[DIR="ltr"]    { direction: ltr; unicode-bidi: embed; }

*[DIR="rtl"]    { direction: rtl; unicode-bidi: embed; }

@media print {
    h1            { page-break-before: always; }

    h1, h2, h3,
    h4, h5, h6    { page-break-after: avoid; }

    ul, ol, dl    { page-break-before: avoid; }
}

2013-05-29 20:47

[轉載][PHP] Pattern Modifiers - 正規表示式的修飾符

下面是當前規則表達式裡可用的修飾. 括號內的名字是那些修飾符的內部 PCRE 名字.

i (PCRE_CASELESS)
如果設置了這個修飾符, 則表達式不區分大小寫.

m (PCRE_MULTILINE)
默認的, PCRE 認為目標字符串值是單行字符串 (即使他確實包含多行). 行開始標記 (^) 只匹配字符串的開始部分, 而行結束標記 ($) 只匹配字符串的尾部,或者一個結束行(除非指定 E 修飾符). 這個和 Perl 裡面一樣.

如果設定了這個修飾符, 行開始和行結束結構分別匹配在目標字符串任何新行的當前位置後面的或者以前的, 和每一個開始和結束一樣. 這個等於 Perl 裡面的 /m 修飾符. 如果目標字符串沒有 "n" 字符, 或者模式裡沒有 ^ 或 $ ,這個修飾符不起作用.

s (PCRE_DOTALL)
如果設置這個修飾符, 模式裡的一個"點"將匹配所有字符, 包括換行. 沒有他, 換行將被排除在外. 這個修飾符等同於 Perl 裡面的 /s 修飾符. 一個相反的類型，例如 [^a] 將總是匹配換行字符，而不管這個修飾符的限制.

x (PCRE_EXTENDED)
如果設置這個修飾符, 模式裡面的空格數句將會被全部忽略，除非用轉義符或者一個字符的內部類型,還有所有字符類型外的未轉義的 # 號之間的也被忽略. 這個等同於 Perl 裡面的 /x 修飾符, 這樣可以複雜的模式裡面加入註釋. 注意,只適用於數據字符. 空格字符將不會在指定的模式字符指定順序中出現。

e
如果設置這個修飾符, preg_replace() 將在替換值裡進行正常的涉及到 \ 的替換, 等同於在 PHP 代碼裡面一樣, 然後用於替換搜索到的字符串.

只在 preg_replace() 裡使用這個修飾符; 其它 PCRE 函數忽略他.

A (PCRE_ANCHORED)
如果設置這個修飾符, 模式被強制為錨（anchored）, 也就是說, 他將值匹配搜索字符串的開始. 這個效果可以通過恰當的模式結構自身來實現,那是在 Perl 裡面的唯一途徑.

D (PCRE_DOLLAR_ENDONLY)
如果設置這個修飾符,則模式裡的 $ 修飾符將僅匹配目標字符串裡的尾部. 沒有這個修飾符, $ 字符也匹配新行的尾部 (但是不再新行的前面). 如果設置了 m 修飾符則忽略這個修飾符. 在 Perl 裡面沒有類似的.

S
如果一個模式將被使用多次, 使用長些時間分析他來來提高匹配的速度. 如果使用這個修飾符，則進行額外的分析. 目前, 研究模式僅用於非錨模式，沒有一個固定的開始字符.

U (PCRE_UNGREEDY)
這個修飾符翻轉數量的 "greediness" ，使得默認不被 greedy，但是如果你緊跟問號（?)，則可以 greedy. 這個和 Perl 不兼容. 這個也可以通過在模式裡面的(?U) 修飾符得到.

X (PCRE_EXTRA)
這個修飾符打開額外的功能，這些和 Perl 不兼容. 任何模式裡面的後面帶字符但沒有特殊意義的反斜槓將引起錯誤, 從而儲備這些聯合用於將來的擴充. 默認的, 在 Perl 裡面, 反斜槓後面有無意義的字符被當成正常的 literal. 目前還沒有其他的控制特徵

2013-05-29 20:38

[轉載][Ubuntu] 檔案系統格式簡介

發信人: tdb.bbs@ptt.cc (tbd), 看板: Linux
標題: [分享] 檔案系統格式簡介
發信站: 批踢踢實業 (Wed Jul 19 06:22:33 2006)
轉信站: SayYa!ctu-reader!news.nctu!ptt
Origin: sally.csie.ntu.edu.tw

在 MS Winsows 的世界，硬碟可以格式化成 NTFS、FAT32、FAT16 等等。相同的，在 GNU/Linux 底下也是有很多不同的檔案系統格式可以選擇喔。目前在 GNU/Linux 底下，比較常用的有以下這幾種格式 Ext3, ReiserFS, XFS 和 JFS 等數種。當然各種格式都有其憂缺點，所以我們將在下面給與簡單的介紹。

除了 Ext2 以外，其它幾種都是日誌型檔案系統。那什們是日誌系統呢？就是系統會多用一些額外的空間紀錄硬碟的資料狀態，因而在不正常開關機後，不需整個硬碟從新掃描，來恢復系統狀態。

Ext2
此為一非常老舊且不支援日誌系統的檔案系統格式，早期 Linux 玩家還記得吧，每次不正常關機後，重新開機時錯誤檢查很久，且在沒有正常關機下，常常會讓您一次不見很多檔案，現在很少人使用這款檔案系統了！

Ext3
為 Ext2 個改良版，所以 Ext2 可以直接升級成為 Ext3 而不必從新格式化，這也可以讓舊的 Ext2 系統更加穩定。而主要和 Ext2 的差別是，增加了日誌系統 (metadata)，所以在不正常開關機時，可以迅速使系統恢復。而因為它舊有的系統相容，和所以很多發行版預設使用 Ext3。而在實際測試上，它的硬碟使用率不佳，大概只有真正空間的 93%會被使用到，並且其它效能測試表現中等。而且它在格式化與建立檔案系統的時間也是其它種類的數十倍。

ReiserFS - http://www.namesys.com
它是採用日誌型系統，為 Hans Reiser 所創使，所以以他的名子命名。技術上使用的是 B*-tree 為基礎的檔案系統，其特色為能很有效率地處理大型檔案到眾多小檔案都可以用很高的效率處理。實務上 ReiserFS 在處理檔案小於 1k 小檔案時，甚至效率可以比 Ext3 快約 10 倍，所以 ReiserFS 專長在處理很多小檔案。而在一般操作上，它的效能表現也有中上的程度。

XFS - http://oss.sgi.com/projects/xfs/
為繪圖工作站公司 SGI 為了高級繪圖處理器系統 IRIX 所設計的檔案格式，也是日誌型系統。而 SGI 亦將其移植到 GNU/Linux 上。而他本來是針對高效能繪圖設計，且為高階工作站使用，所以他在穩定行和效率是無雍致疑的。而在實務上的表現，它的處理各種檔案大小混合的情況下效率最好，並且在一般使用上有不錯的表現。

JFS - http://jfs.sourceforge.net
為全球最大電腦供應商 IBM 為 AIX 系列設計的日誌型檔案系統，技術上使用的是 B+-tree 為基礎的檔案系統，和 ReiserFS 使用 B*-tree 不同。而在穩定度上，IBM AIX 伺服器使用它，而此系列機器很多都在金融上使用，所以穩定是沒話說的。而它最重要的特色是在處理檔案 I/O 的時候，是這些檔案系統裏面最不佔 CPU 資源的，也就是 CPU 使用率最低。但在這樣節省使用 CPU 的情況下，它的效率表現也有中上以上的程度。

雖然 Ext3 效能不好 (在日誌型檔案系統中效率上最糟糕的)，那為何那們多人使用？那是因為當時 Ext3 可以直接從 Ext2 升級，而不需要先備份然後格式化後再把檔案拷備回去，所以造成使用人數較多了。但這也是不能怪它，因為它為了和 Ext2 相容，所以有很多的歷史包袱存在。因此我建議新的電腦考慮使用 ReiserFX，XFS 或 JFS。若是以效能為考慮，則可以選擇 ReiserFS 或 XFS。若是系統資源不多，要使用最低的 CPU 使用率，那們可以選擇 JFS，它有著最好的效能資源比。

而網路上有一些檔案系統效能評估，這裡列出來給讀者參考一下

http://www.debian-administration.org/articles/388
http://fsbench.netnation.com/
http://linuxgazette.net/122/TWDT.html
http://linuxgazette.net/102/piszcz.html

2013-05-20 22:41

[PHP] aHash, pHash, dHash 實做

<?php
/**
 * 圖片特徵 Hash 計算
 *
 * @version     $Id: ImageHash.php 4429 2012-04-17 13:20:31Z jax $
 * @author      jax.hu
 *
 * <code>
 *  //Sample_1
 *  $hashA = ImageHash::pHash('001.jpg');
 *  $hashB = ImageHash::pHash('002.jpg');
 *  if(ImageHash::isSimilar($hashA, $hashB)){
 *
 *  }
 *
 * </code>
 */

class ImageHash {

    /**讀取圖片至指定大小*/
    public static function readImageTo($imagePath, $width, $height){
        if(!$imagePath || !file_exists($imagePath)){ return null; }

        if(class_exists('Imagick')){
            $image = new Imagick($imagePath);
            $image->thumbnailImage($width, $height);
            $img = imagecreatefromstring($image->getImageBlob());
            $image->destroy(); $image = null;

        }else{
            $createFunc = array(
                IMAGETYPE_GIF   =>'imageCreateFromGIF',
                IMAGETYPE_JPEG  =>'imageCreateFromJPEG',
                IMAGETYPE_PNG   =>'imageCreateFromPNG',
                IMAGETYPE_BMP   =>'imageCreateFromBMP',
                IMAGETYPE_WBMP  =>'imageCreateFromWBMP',
                IMAGETYPE_XBM   =>'imageCreateFromXBM',
            );

            $type = exif_imagetype($imagePath);
            if(!array_key_exists($type, $createFunc)){ return null; }

            $func = $createFunc[$type];
            if(!function_exists($func)){ return null; }

            $src = $func($imagePath);
            $img = imageCreateTrueColor($width, $height);
            imageCopyResized(
                $img, $src, 
                0,0,0,0, 
                $width, $height, imagesX($src),imagesY($src)
            );
            imagedestroy($src);
        }

        return $img;
    }



    /**取得灰階數值*/
    public static function getGray($img,$x,$y){
        $col = imagecolorsforindex($img, imagecolorat($img,$x,$y));
        return intval($col['red']*0.3 + $col['green']*0.59 + $col['blue']*0.11);
    }



    /**取得 DCT 常數*/
    private static $_dctConst = null;
    public static function getDctConst(){
        if(self::$_dctConst){ return self::$_dctConst;}

        self::$_dctConst = array();
        for ($dctP=0; $dctP<8; $dctP++) {
            for ($p=0;$p<32;$p++) {
                self::$_dctConst[$dctP][$p] = 
                    cos( ((2*$p + 1)/64) * $dctP * pi() );
            }
        }

        return self::$_dctConst;
    }



    /**圖片檔案 aHash
     * @param string $filePath 檔案位址路徑
     * @return string 圖片 hash 值，失敗則是 null
     * */
    public static function aHash($imagePath){
        $img = self::readImageTo($imagePath, 8, 8);
        if(!$img){ return null; }

        $graySum = 0;
        $grays = array();
        for ($y=0; $y<8; $y++){
            for ($x=0; $x<8; $x++){
                $gray = self::getGray($img,$x,$y);
                $grays[] = $gray;
                $graySum +=  $gray;
            }
        }
        imagedestroy($img);

        /*計算所有像素的灰階平均值*/
        $average = $graySum/64;

        /*計算 hash 值*/
        foreach ($grays as $i => $gray){
            $grays[$i] = ($gray>=$average) ? '1' : '0';
        }

        return join('',$grays);
    }




    /**圖片檔案 pHash
     * @param string $filePath 檔案位址路徑
     * @return string 圖片 hash 值，失敗則是 null
     * */
    public static function pHash($imagePath){
        $img = self::readImageTo($imagePath, 32, 32);
        if(!$img){ return null; }

        /*取得灰階數值 32*32*/
        $grays = array();
        for ($y=0; $y<32; $y++){
            for ($x=0; $x<32; $x++){
                $grays[$y][$x] = self::getGray($img,$x,$y);
            }
        }
        imagedestroy($img);


        /*計算 DCT 8*8*/
        $dctConst = self::getDctConst();
        $dctSum = 0;
        $dcts = array();
        for ($dctY=0; $dctY<8; $dctY++) {
            for ($dctX=0; $dctX<8; $dctX++) {

                $sum = 1;
                for ($y=0;$y<32;$y++) {
                    for ($x=0;$x<32;$x++) {
                        $sum += 
                            $dctConst[$dctY][$y] * 
                            $dctConst[$dctX][$x] * 
                            $grays[$y][$x];
                    }
                }

                /*apply coefficients*/
                $sum *= .25;
                if ($dctY == 0 || $dctX == 0) {
                    $sum *= 1/sqrt(2);
                }

                $dcts[] = $sum;
                $dctSum +=  $sum;
            }
        }

        /*計算所有像素的灰階平均值*/
        $average = $dctSum/64;

        /*計算 hash 值*/
        foreach ($dcts as $i => $dct){
            $dcts[$i] = ($dct>=$average) ? '1' : '0';
        }

        return join('',$dcts);
    }



    /**圖片檔案 dHash
     * @param string $filePath 檔案位址路徑
     * @return string 圖片 hash 值，失敗則是 null
     * */
    public static function dHash($imagePath){
        $img = self::readImageTo($imagePath, 9, 8);
        if(!$img){ return null; }

        $grays = array();
        for ($y=0; $y<8; $y++){
            for ($x=0; $x<9; $x++){
                $grays[$y][$x] = $gray = self::getGray($img,$x,$y);
            }
        }
        imagedestroy($img);

        $bitStr = array();
        for ($y=0; $y<8; $y++){
            for ($x=0; $x<8; $x++){
                $bitStr[] = ($grays[$y][$x] < $grays[$y][$x+1]) ? '1' : '0';
            }
        }

        return join('',$bitStr);
    }



   /**比較兩個 hash 值，是不是相似
    * @param string $aHash A圖片的 hash 值
    * @param string $bHash B圖片的 hash 值
    * @return bool 當圖片相似則回傳 true，否則是 false
    * */
   public static function isSimilar($hashStrA, $hashStrB){
       $aL = strlen($hashStrA); $bL = strlen($hashStrB);
       if ($aL !== $bL){ return false; }

       /*計算兩個 hash 值的漢明距離*/
       $distance = 0;
       for($i=0; $i<$aL; $i++){
           if ($hashStrA{$i} !== $hashStrB{$i}){ $distance++; }
       }

       return ($distance <= 10) ? true : false;
   }

}

2013-05-01 21:59

[C#] 利用 LINQ 解析 XML 至 POCO

menu_config.xml

<?xml version="1.0" encoding="utf-8"?>
  <menu_config>
    <menu title="文章管理" url="~/Article" target="" allow="">
        <submenu title="列表" url="~/Article/list" allow="" />
        <submenu />
        <submenu title="新增" url="~/Article/add" allow="" />
    </menu>
    <menu />
    <menu title="帳號管理" url="~/Admin" />
  </menu_config>

//using System.Collections.Generic;
//using System.IO;
//using System.Linq;
//using System.Xml.Linq;
//using System.Text;

public class MenuDataModel
{
    /*POCO 資料欄位*/
    public string Title { get; set; }
    public string Url { get; set; }
    public string Target { get; set; }
    public string Allow { get; set; }
    public string Icon { get; set; }
    public List<MenuDataModel> Submenu { get; set; }

    /*取得資料清單*/
    public static List<MenuDataModel> GetList(string menuConfigPath)
    {
        /*讀取 XML 檔案*/
        var xmlContent = File.ReadAllText(menuConfigPath, Encoding.UTF8);
        var menuDocument = XDocument.Parse(xmlContent);

        /*利用 LINQ 轉成 POCO*/
        return menuDocument.Root.Elements("menu")
        .Select(menu => new MenuDataModel
        {
            /*取得 Element 上的 Attribute*/
            Title   = (string) menu.Attribute("title"),
            Url     = (string) menu.Attribute("url"),
            Target  = (string) menu.Attribute("target"),
            Icon    = (string) menu.Attribute("icon") ?? "Item",
            Allow   = (string) menu.Attribute("allow"),

            /*取得子層級 Element 上的 Attribute*/
            Submenu = menu.Elements("submenu")
            .Select(sub => new MenuDataModel
            {
                Title   = (string) sub.Attribute("title"),
                Url     = (string) sub.Attribute("url"),
                Target  = (string) sub.Attribute("target"),
                Icon    = (string) sub.Attribute("icon") ?? "Item",
                Allow   = (string) sub.Attribute("allow"),
            }).ToList(),
        }).ToList();
    }
}


void Main()
{
    var path = @"D:\menu_config.xml";

    List<MenuDataModel> list =  MenuDataModel.GetList(path);

    list.Dump();
}

2013-05-01 21:42

[C#] 取得 URL 頁面上的 title 內容

//using System.Net;
//using System.IO;
//using System.Text;

string url = @"http://msdn.microsoft.com/en-us/library/az24scfc.aspx";
string title = String.Empty;

WebResponse response = null;
WebRequest request = WebRequest.Create(url);

/*設定最長執行的毫秒數*/
request.Timeout = 10000; 

try{
    /*取得 URL 頁面資料*/
    response = request.GetResponse();
    StreamReader stream = new StreamReader(
        response.GetResponseStream(), Encoding.UTF8
    );

    /*只取得前 4096 個字*/
    char[] buf = new char[4096];
    stream.Read(buf, 0, buf.Length);

    /*尋找標題字串*/
    string pageText = new String(buf);
    string pattern = @"(?<=<title[^>]*>)([^<]*)(?=</title>)";
    title = Regex.Match(pageText, pattern, RegexOptions.IgnoreCase)
            .Value.Trim();

}catch(WebException e){
}finally{
    if(response!=null){ response.Close(); }
}

title.Dump();

訂閱：文章 (Atom)