我们知道网络传输的,都是二进制字节流,那么服务器如何编码,怎么知道哪个字符集进行编码呢,那我们深入分析下tomcat连接,仔细探讨下。 接下来,我们看一下段代码,这是一个很简单的表单。
<form action="demo01?name=中国" method="post"> <input type="text" name="name1" value="张三"/> <input type="submit" value="提交"/> </form>controller中,我们直接用 HttpServletRequest,不用sPRing获取参数。
@RequestMapping(value = "/demo01", method = RequestMethod.GET) public String dologin1(HttpServletRequest request) throws UnsupportedEncodingException { log.info(request.getCharacterEncoding()); log.info("name:中国" + request.getParameter("name")); log.info("name1:张三" + request.getParameter("name1")); return "login";}运行tomcat,结果如下,中文乱码: 我们用fiddler查看请求的详情: 我们来经过测试下:
@Test public void test() throws UnsupportedEncodingException { String str = "中国"; byte[] bytes = str.getBytes("utf-8"); System.out.println(Hex.encodeHex(bytes)); System.out.println(new String(bytes, "iso8859-1")); String str1 = "张三"; byte[] bytes1 = str1.getBytes("utf-8"); System.out.println(Hex.encodeHex(bytes1)); System.out.println(new String(bytes1, "iso8859-1")); }打印如下:
e4b8ade59bbdiso8859-1编码: ä¸å›½e5bca0e4b889å¼ ä¸‰由此,可以发现,我使用的谷歌浏览器,默认使用的中文编码为utf-8,而tomcat编码默认的是iso8859-1编码,由于编码对应的字符不同,所以造成乱码。 既然有编码问题,那么肯定可以解决,查看tomcat手册 发现tomcat连接器可以指定uri编码,参数URIEncoding:This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used. 在server.xml中配置如下:
<Connector connectionTimeout="20000" port="8080" protocol="HTTP/1.1" URIEncoding="utf-8" redirectPort="8443"/>此时运行tomcat,uri参数问题解决,结果如下:
那请求体参数如何进行编码呢?我们查看servelt源码发现,请求体的编码可以在获取参数前进行设置,由此猜想,tomcat解析请求体参数是在第一次使用时进行解析,也不难理解,字符串解析是耗性能的,既然不需要使用,那么不用解析,同样就不用消耗这部分性能。
/** * Overrides the name of the character encoding used in the body of this * request. This method must be called prior to reading request parameters * or reading input using getReader(). Otherwise, it has no effect. * * @param env <code>String</code> containing the name of * the character encoding. * @throws UnsupportedEncodingException if this * ServletRequest is still in a state where a * character encoding may be set, but the specified * encoding is invalid */ public void setCharacterEncoding(String env) throws UnsupportedEncodingException;改变controller代码,增加utf-8编码:
@RequestMapping(value = "/demo01", method = RequestMethod.POST) public String dologin(HttpServletRequest request) throws UnsupportedEncodingException { request.setCharacterEncoding("utf-8"); log.info(request.getCharacterEncoding()); log.info("name:中国" + request.getParameter("name")); log.info("name1:张三" + request.getParameter("name1")); return "login"; }运行tomcat,发现编码问题完美解决: 难道每次获取参数前都要设置编码吗?肯定有更省事的方式,那就是过滤器,且我们可以直接用spring提供的现成的,org.springframework.web.filter.CharacterEncodingFilter,查看其代码:
@Override protected void doFilterInternal( HttpServletRequest request, HttpServletResponse response, FilterChain filterChain) throws ServletException, IOException { if (this.encoding != null && (this.forceEncoding || request.getCharacterEncoding() == null)) { request.setCharacterEncoding(this.encoding); if (this.forceEncoding) { response.setCharacterEncoding(this.encoding); } } filterChain.doFilter(request, response); }发现也就是设置request编码而已,没什么神秘的,不过既然有现成的,我们何必再造轮子 呢。 没有从tomcat源码中分析出问题有些遗憾,查看了tomcat部分源码,也没得要领,只能说明功力还不够,需要继续精进,不过合理的推导也不失为解决问题的一种好办法。
新闻热点
疑难解答