Gb2312转utf-8(vbs+js)

2024-07-21 02:25:14

字体：大中小

来源：转载

供稿：网友

昨天看了一下cocoon counter的代码，发现里面是用vbscript转的，费了以上午时间来研究，还是被搞得晕糊糊- -

他的vb转换函数是这样的：

function decodeansi(s)
dim i, stmp, sresult, stmp1
sresult = ""
for i=1 to len(s)
if mid(s,i,1)="%" then
stmp = "&h" & mid(s,i+1,2)
if isnumeric(stmp) then
if cint(stmp)=0 then
i = i + 2
elseif cint(stmp)>0 and cint(stmp)<128 then
sresult = sresult & chr(stmp)
i = i + 2
else
if mid(s,i+3,1)="%" then
stmp1 = "&h" & mid(s,i+4,2)
if isnumeric(stmp1) then
sresult = sresult & chr(cint(stmp)*16*16 + cint(stmp1))
i = i + 5
end if
else
sresult = sresult & chr(stmp)
i = i + 2
end if
end if
else
sresult = sresult & mid(s,i,1)
end if
else
sresult = sresult & mid(s,i,1)
end if
next
decodeansi = sresult
end function

也就是用chr()函数把10进制的ansi 字符代码转换成文字。文字本身应该是unicode，也就是vbs自动完成了gb-utf的转换，下面是我测试的一些数据：
测试代码：（需要把上面的代码加在前面）

分别调整文件存储格式，codepage，charset得到的结果：

文件为ansi格式：
codepage=936：
response.charset = "gb2312";
strx = chr(54992)
strx:中
strx.charcodeat(0):20013
"中".charcodeat(0):20013
escape(strx):%u4e2d
encodeuri(strx):%e4%b8%ad
escape("中"):%u4e2d
string.fromcharcode(20013):中

response.charset = "utf-8";
strx = chr(54992)
strx:֐
strx.charcodeat(0):20013
"֐".charcodeat(0):20013
escape(strx):%u4e2d
encodeuri(strx):%e4%b8%ad
escape("֐"):%u4e2d
string.fromcharcode(20013):֐

codepage=65001:
response.charset = "gb2312";
strx = chr(54992)
strx:涓
strx.charcodeat(0):20013
"".charcodeat(0):-1.#ind
escape(strx):%u4e2d
encodeuri(strx):%e4%b8%ad
escape(""):
string.fromcharcode(20013):涓

response.charset = "utf-8";
strx = chr(54992)
strx:㝤
strx.charcodeat(0):14180
"".charcodeat(0):-1.#ind
escape(strx):%u3764
encodeuri(strx):%e3%9d%a4
escape(""):
string.fromcharcode(20013):中

文件为utf-8格式：
codepage=65001:
response.charset = "gb2312";
strx = chr(54992)
strx:涓
strx.charcodeat(0):20013
"涓?.charcodeat(0):20013
escape(strx):%u4e2d
encodeuri(strx):%e4%b8%ad
escape("涓?):%u4e2d
string.fromcharcode(20013):涓

response.charset = "utf-8";
strx = chr(54992)
strx:中
strx.charcodeat(0):20013
"中".charcodeat(0):20013
escape(strx):%u4e2d
encodeuri(strx):%e4%b8%ad
escape("中"):%u4e2d
string.fromcharcode(20013):中

codepage=936：
active server pages 错误 'asp 0245'
代码页值的混合使用
/referer_alapha/test2.asp，行 1
指定的 @codepage 值与包括文件的 codepage 或文件的保存格式的值不一致。

哈哈，是不是看晕了？我也晕，搞不明白为什么文件存储的格式跟chr(54992)这个函数怎么会扯上关系，而string.fromcharcode(20013)可以得到正确结果（测试的第四部分数据）。大概是vbs里面逻辑太混乱了。
不管怎样，有了这个方法，gb2312转utf-8简单多了。

上一篇：&#106ava&#115;cript客户端规定最多输入的字符串长度

下一篇：服务端VB&#083;cript与J&#083;cript几个相同写法