Merging other languages
This commit is contained in:
437
en/07.1.md
437
en/07.1.md
@@ -1,221 +1,216 @@
|
||||
# 7.1 XML处理
|
||||
XML作为一种数据交换和信息传递的格式已经十分普及。而随着Web服务日益广泛的应用,现在XML在日常的开发工作中也扮演了愈发重要的角色。这一小节, 我们将就Go语言标准包中的XML相关处理的包进行介绍。
|
||||
|
||||
这个小节不会涉及XML规范相关的内容(如需了解相关知识请参考其他文献),而是介绍如何用Go语言来编解码XML文件相关的知识。
|
||||
|
||||
假如你是一名运维人员,你为你所管理的所有服务器生成了如下内容的xml的配置文件:
|
||||
|
||||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<servers version="1">
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
</servers>
|
||||
|
||||
上面的XML文档描述了两个服务器的信息,包含了服务器名和服务器的IP信息,接下来的Go例子以此XML描述的信息进行操作。
|
||||
|
||||
## 解析XML
|
||||
如何解析如上这个XML文件呢? 我们可以通过xml包的`Unmarshal`函数来达到我们的目的
|
||||
|
||||
func Unmarshal(data []byte, v interface{}) error
|
||||
|
||||
data接收的是XML数据流,v是需要输出的结构,定义为interface,也就是可以把XML转换为任意的格式。我们这里主要介绍struct的转换,因为struct和XML都有类似树结构的特征。
|
||||
|
||||
示例代码如下:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"os"
|
||||
)
|
||||
|
||||
type Recurlyservers struct {
|
||||
XMLName xml.Name `xml:"servers"`
|
||||
Version string `xml:"version,attr"`
|
||||
Svs []server `xml:"server"`
|
||||
Description string `xml:",innerxml"`
|
||||
}
|
||||
|
||||
type server struct {
|
||||
XMLName xml.Name `xml:"server"`
|
||||
ServerName string `xml:"serverName"`
|
||||
ServerIP string `xml:"serverIP"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
file, err := os.Open("servers.xml") // For read access.
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v", err)
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
data, err := ioutil.ReadAll(file)
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v", err)
|
||||
return
|
||||
}
|
||||
v := Recurlyservers{}
|
||||
err = xml.Unmarshal(data, &v)
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
fmt.Println(v)
|
||||
}
|
||||
|
||||
|
||||
XML本质上是一种树形的数据格式,而我们可以定义与之匹配的go 语言的 struct类型,然后通过xml.Unmarshal来将xml中的数据解析成对应的struct对象。如上例子输出如下数据
|
||||
|
||||
{{ servers} 1 [{{ server} Shanghai_VPN 127.0.0.1} {{ server} Beijing_VPN 127.0.0.2}]
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
}
|
||||
|
||||
|
||||
上面的例子中,将xml文件解析成对应的struct对象是通过`xml.Unmarshal`来完成的,这个过程是如何实现的?可以看到我们的struct定义后面多了一些类似于`xml:"serverName"`这样的内容,这个是struct的一个特性,它们被称为 struct tag,它们是用来辅助反射的。我们来看一下`Unmarshal`的定义:
|
||||
|
||||
func Unmarshal(data []byte, v interface{}) error
|
||||
|
||||
我们看到函数定义了两个参数,第一个是XML数据流,第二个是存储的对应类型,目前支持struct、slice和string,XML包内部采用了反射来进行数据的映射,所以v里面的字段必须是导出的。`Unmarshal`解析的时候XML元素和字段怎么对应起来的呢?这是有一个优先级读取流程的,首先会读取struct tag,如果没有,那么就会对应字段名。必须注意一点的是解析的时候tag、字段名、XML元素都是大小写敏感的,所以必须一一对应字段。
|
||||
|
||||
Go语言的反射机制,可以利用这些tag信息来将来自XML文件中的数据反射成对应的struct对象,关于反射如何利用struct tag的更多内容请参阅reflect中的相关内容。
|
||||
|
||||
解析XML到struct的时候遵循如下的规则:
|
||||
|
||||
- 如果struct的一个字段是string或者[]byte类型且它的tag含有`",innerxml"`,Unmarshal将会将此字段所对应的元素内所有内嵌的原始xml累加到此字段上,如上面例子Description定义。最后的输出是
|
||||
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
|
||||
- 如果struct中有一个叫做XMLName,且类型为xml.Name字段,那么在解析的时候就会保存这个element的名字到该字段,如上面例子中的servers。
|
||||
- 如果某个struct字段的tag定义中含有XML结构中element的名称,那么解析的时候就会把相应的element值赋值给该字段,如上servername和serverip定义。
|
||||
- 如果某个struct字段的tag定义了中含有`",attr"`,那么解析的时候就会将该结构所对应的element的与字段同名的属性的值赋值给该字段,如上version定义。
|
||||
- 如果某个struct字段的tag定义 型如`"a>b>c"`,则解析的时候,会将xml结构a下面的b下面的c元素的值赋值给该字段。
|
||||
- 如果某个struct字段的tag定义了`"-"`,那么不会为该字段解析匹配任何xml数据。
|
||||
- 如果struct字段后面的tag定义了`",any"`,如果他的子元素在不满足其他的规则的时候就会匹配到这个字段。
|
||||
- 如果某个XML元素包含一条或者多条注释,那么这些注释将被累加到第一个tag含有",comments"的字段上,这个字段的类型可能是[]byte或string,如果没有这样的字段存在,那么注释将会被抛弃。
|
||||
|
||||
上面详细讲述了如何定义struct的tag。 只要设置对了tag,那么XML解析就如上面示例般简单,tag和XML的element是一一对应的关系,如上所示,我们还可以通过slice来表示多个同级元素。
|
||||
|
||||
>注意: 为了正确解析,go语言的xml包要求struct定义中的所有字段必须是可导出的(即首字母大写)
|
||||
|
||||
## 输出XML
|
||||
假若我们不是要解析如上所示的XML文件,而是生成它,那么在go语言中又该如何实现呢? xml包中提供了`Marshal`和`MarshalIndent`两个函数,来满足我们的需求。这两个函数主要的区别是第二个函数会增加前缀和缩进,函数的定义如下所示:
|
||||
|
||||
func Marshal(v interface{}) ([]byte, error)
|
||||
func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error)
|
||||
|
||||
两个函数第一个参数是用来生成XML的结构定义类型数据,都是返回生成的XML数据流。
|
||||
|
||||
下面我们来看一下如何输出如上的XML:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"os"
|
||||
)
|
||||
|
||||
type Servers struct {
|
||||
XMLName xml.Name `xml:"servers"`
|
||||
Version string `xml:"version,attr"`
|
||||
Svs []server `xml:"server"`
|
||||
}
|
||||
|
||||
type server struct {
|
||||
ServerName string `xml:"serverName"`
|
||||
ServerIP string `xml:"serverIP"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
v := &Servers{Version: "1"}
|
||||
v.Svs = append(v.Svs, server{"Shanghai_VPN", "127.0.0.1"})
|
||||
v.Svs = append(v.Svs, server{"Beijing_VPN", "127.0.0.2"})
|
||||
output, err := xml.MarshalIndent(v, " ", " ")
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v\n", err)
|
||||
}
|
||||
os.Stdout.Write([]byte(xml.Header))
|
||||
|
||||
os.Stdout.Write(output)
|
||||
}
|
||||
上面的代码输出如下信息:
|
||||
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<servers version="1">
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
</servers>
|
||||
|
||||
和我们之前定义的文件的格式一模一样,之所以会有`os.Stdout.Write([]byte(xml.Header))` 这句代码的出现,是因为`xml.MarshalIndent`或者`xml.Marshal`输出的信息都是不带XML头的,为了生成正确的xml文件,我们使用了xml包预定义的Header变量。
|
||||
|
||||
我们看到`Marshal`函数接收的参数v是interface{}类型的,即它可以接受任意类型的参数,那么xml包,根据什么规则来生成相应的XML文件呢?
|
||||
|
||||
- 如果v是 array或者slice,那么输出每一个元素,类似<type>value</type>
|
||||
- 如果v是指针,那么会Marshal指针指向的内容,如果指针为空,什么都不输出
|
||||
- 如果v是interface,那么就处理interface所包含的数据
|
||||
- 如果v是其他数据类型,就会输出这个数据类型所拥有的字段信息
|
||||
|
||||
生成的XML文件中的element的名字又是根据什么决定的呢?元素名按照如下优先级从struct中获取:
|
||||
|
||||
- 如果v是struct,XMLName的tag中定义的名称
|
||||
- 类型为xml.Name的名叫XMLName的字段的值
|
||||
- 通过struct中字段的tag来获取
|
||||
- 通过struct的字段名用来获取
|
||||
- marshall的类型名称
|
||||
|
||||
我们应如何设置struct 中字段的tag信息以控制最终xml文件的生成呢?
|
||||
|
||||
- XMLName不会被输出
|
||||
- tag中含有`"-"`的字段不会输出
|
||||
- tag中含有`"name,attr"`,会以name作为属性名,字段值作为值输出为这个XML元素的属性,如上version字段所描述
|
||||
- tag中含有`",attr"`,会以这个struct的字段名作为属性名输出为XML元素的属性,类似上一条,只是这个name默认是字段名了。
|
||||
- tag中含有`",chardata"`,输出为xml的 character data而非element。
|
||||
- tag中含有`",innerxml"`,将会被原样输出,而不会进行常规的编码过程
|
||||
- tag中含有`",comment"`,将被当作xml注释来输出,而不会进行常规的编码过程,字段值中不能含有"--"字符串
|
||||
- tag中含有`"omitempty"`,如果该字段的值为空值那么该字段就不会被输出到XML,空值包括:false、0、nil指针或nil接口,任何长度为0的array, slice, map或者string
|
||||
- tag中含有`"a>b>c"`,那么就会循环输出三个元素a包含b,b包含c,例如如下代码就会输出
|
||||
|
||||
FirstName string `xml:"name>first"`
|
||||
LastName string `xml:"name>last"`
|
||||
|
||||
<name>
|
||||
<first>Asta</first>
|
||||
<last>Xie</last>
|
||||
</name>
|
||||
|
||||
|
||||
上面我们介绍了如何使用Go语言的xml包来编/解码XML文件,重要的一点是对XML的所有操作都是通过struct tag来实现的,所以学会对struct tag的运用变得非常重要,在文章中我们简要的列举了如何定义tag。更多内容或tag定义请参看相应的官方资料。
|
||||
|
||||
## links
|
||||
* [目录](<preface.md>)
|
||||
* 上一节: [文本处理](<07.0.md>)
|
||||
* 下一节: [Json处理](<07.2.md>)
|
||||
# 7.1 XML
|
||||
|
||||
XML is a commonly used data communication format in web services. Today, it's assuming a more and more important role in web development. In this section, we're going to introduce how to work with XML through Go's standard library.
|
||||
|
||||
I will not make any attempts to teach XML's syntax or conventions. For that, please read more documentation about XML itself. We will only focus on how to encode and decode XML files in Go.
|
||||
|
||||
Suppose you work in IT, and you have to deal with the following XML configuration file:
|
||||
|
||||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<servers version="1">
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
</servers>
|
||||
|
||||
The above XML document contains two kinds of information about your server: the server name and IP. We will use this document in our following examples.
|
||||
|
||||
## Parse XML
|
||||
|
||||
How do we parse this XML document? We can use the `Unmarshal` function in Go's `xml` package to do this.
|
||||
|
||||
func Unmarshal(data []byte, v interface{}) error
|
||||
|
||||
the `data` parameter receives a data stream from an XML source, and `v` is the structure you want to output the parsed XML to. It is an interface, which means you can convert XML to any structure you desire. Here, we'll only talk about how to convert from XML to the `struct` type since they share similar tree structures.
|
||||
|
||||
Sample code:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"os"
|
||||
)
|
||||
|
||||
type Recurlyservers struct {
|
||||
XMLName xml.Name `xml:"servers"`
|
||||
Version string `xml:"version,attr"`
|
||||
Svs []server `xml:"server"`
|
||||
Description string `xml:",innerxml"`
|
||||
}
|
||||
|
||||
type server struct {
|
||||
XMLName xml.Name `xml:"server"`
|
||||
ServerName string `xml:"serverName"`
|
||||
ServerIP string `xml:"serverIP"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
file, err := os.Open("servers.xml") // For read access.
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v", err)
|
||||
return
|
||||
}
|
||||
defer file.Close()
|
||||
data, err := ioutil.ReadAll(file)
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v", err)
|
||||
return
|
||||
}
|
||||
v := Recurlyservers{}
|
||||
err = xml.Unmarshal(data, &v)
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
fmt.Println(v)
|
||||
}
|
||||
|
||||
XML is actually a tree data structure, and we can define a very similar structure using structs in Go, then use `xml.Unmarshal` to convert from XML to our struct object. The sample code will print the following content:
|
||||
|
||||
{{ servers} 1 [{{ server} Shanghai_VPN 127.0.0.1} {{ server} Beijing_VPN 127.0.0.2}]
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
}
|
||||
|
||||
We use `xml.Unmarshal` to parse the XML document to the corresponding struct object. You should see that we have something like `xml:"serverName"` in our struct. This is a feature of structs called `struct tags` for helping with reflection. Let's see the definition of `Unmarshal` again:
|
||||
|
||||
func Unmarshal(data []byte, v interface{}) error
|
||||
|
||||
The first argument is an XML data stream. The second argument is storage type and supports the struct, slice and string types. Go's XML package uses reflection for data mapping, so all fields in v should be exported. However, this causes a problem: how can it know which XML field corresponds to the mapped struct field? The answer is that the XML parser parses data in a certain order. The library will try to find the matching struct tag first. If a match cannot be found then it searches through the struct field names. Be aware that all tags, field names and XML elements are case sensitive, so you have to make sure that there is a one to one correspondence for the mapping to succeed.
|
||||
|
||||
Go's reflection mechanism allows you to use this tag information to reflect XML data to a struct object. If you want to know more about reflection in Go, please read the package documentation on struct tags and reflection.
|
||||
|
||||
Here are some rules when using the `xml` package to parse XML documents to structs:
|
||||
|
||||
- If the field type is a string or []byte with the tag `",innerxml"`, `Unmarshal` will assign raw XML data to it, like `Description` in the above example:
|
||||
|
||||
Shanghai_VPN127.0.0.1Beijing_VPN127.0.0.2
|
||||
|
||||
- If a field is called `XMLName` and its type is `xml.Name`, then it gets the element name, like `servers` in above example.
|
||||
- If a field's tag contains the corresponding element name, then it gets the element name as well, like `servername` and `serverip` in the above example.
|
||||
- If a field's tag contains `",attr"`, then it gets the corresponding element's attribute, like `version` in above example.
|
||||
- If a field's tag contains something like `"a>b>c"`, it gets the value of the element c of node b of node a.
|
||||
- If a field's tag contains `"="`, then it gets nothing.
|
||||
- If a field's tag contains `",any"`, then it gets all child elements which do not fit the other rules.
|
||||
- If the XML elements have one or more comments, all of these comments will be added to the first field that has the tag that contains `",comments"`. This field type can be a string or []byte. If this kind of field does not exist, all comments are discard.
|
||||
|
||||
These rules tell you how to define tags in structs. Once you understand these rules, mapping XML to structs will be as easy as the sample code above. Because tags and XML elements have a one to one correspondence, we can also use slices to represent multiple elements on the same level.
|
||||
|
||||
Note that all fields in structs should be exported (capitalized) in order to parse data correctly.
|
||||
|
||||
## Produce XML
|
||||
|
||||
What if we want to produce an XML document instead of parsing one. How do we do this in Go? Unsurprisingly, the `xml` package provides two functions which are `Marshal` and `MarshalIndent`, where the second function automatically indents the marshalled XML document. Their definition as follows:
|
||||
|
||||
func Marshal(v interface{}) ([]byte, error)
|
||||
func MarshalIndent(v interface{}, prefix, indent string) ([]byte, error)
|
||||
|
||||
The first argument in both of these functions is for storing a marshalled XML data stream.
|
||||
|
||||
Let's look at an example to see how this works:
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"os"
|
||||
)
|
||||
|
||||
type Servers struct {
|
||||
XMLName xml.Name `xml:"servers"`
|
||||
Version string `xml:"version,attr"`
|
||||
Svs []server `xml:"server"`
|
||||
}
|
||||
|
||||
type server struct {
|
||||
ServerName string `xml:"serverName"`
|
||||
ServerIP string `xml:"serverIP"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
v := &Servers{Version: "1"}
|
||||
v.Svs = append(v.Svs, server{"Shanghai_VPN", "127.0.0.1"})
|
||||
v.Svs = append(v.Svs, server{"Beijing_VPN", "127.0.0.2"})
|
||||
output, err := xml.MarshalIndent(v, " ", " ")
|
||||
if err != nil {
|
||||
fmt.Printf("error: %v\n", err)
|
||||
}
|
||||
os.Stdout.Write([]byte(xml.Header))
|
||||
|
||||
os.Stdout.Write(output)
|
||||
}
|
||||
|
||||
The above example prints the following information:
|
||||
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<servers version="1">
|
||||
<server>
|
||||
<serverName>Shanghai_VPN</serverName>
|
||||
<serverIP>127.0.0.1</serverIP>
|
||||
</server>
|
||||
<server>
|
||||
<serverName>Beijing_VPN</serverName>
|
||||
<serverIP>127.0.0.2</serverIP>
|
||||
</server>
|
||||
</servers>
|
||||
|
||||
As we've previously defined, the reason we have `os.Stdout.Write([]byte(xml.Header))` is because both `xml.MarshalIndent` and `xml.Marshal` do not output XML headers on their own, so we have to explicitly print them in order to produce XML documents correctly.
|
||||
|
||||
Here we can see that `Marshal` also receives a v parameter of type `interface{}`. So what are the rules when marshalling to an XML document?
|
||||
|
||||
- If v is an array or slice, it prints all elements like a value.
|
||||
- If v is a pointer, it prints the content that v is pointing to, printing nothing when v is nil.
|
||||
- If v is a interface, it deal with the interface as well.
|
||||
- If v is one of the other types, it prints the value of that type.
|
||||
|
||||
So how does `xml.Marshal` decide the elements' name? It follows the proceeding rules:
|
||||
|
||||
- If v is a struct, it defines the name in the tag of XMLName.
|
||||
- The field name is XMLName and the type is xml.Name.
|
||||
- Field tag in struct.
|
||||
- Field name in struct.
|
||||
- Type name of marshal.
|
||||
|
||||
Then we need to figure out how to set tags in order to produce the final XML document.
|
||||
|
||||
- XMLName will not be printed.
|
||||
- Fields that have tags containing `"-"` will not be printed.
|
||||
- If a tag contains `"name,attr"`, it uses name as the attribute name and the field value as the value, like `version` in the above example.
|
||||
- If a tag contains `",attr"`, it uses the field's name as the attribute name and the field value as its value.
|
||||
- If a tag contains `",chardata"`, it prints character data instead of element.
|
||||
- If a tag contains `",innerxml"`, it prints the raw value.
|
||||
- If a tag contains `",comment"`, it prints it as a comment without escaping, so you cannot have "--" in its value.
|
||||
- If a tag contains `"omitempty"`, it omits this field if its value is zero-value, including false, 0, nil pointer or nil interface, zero length of array, slice, map and string.
|
||||
- If a tag contains `"a>b>c"`, it prints three elements where a contains b and b contains c, like in the following code:
|
||||
|
||||
FirstName string `xml:"name>first"`
|
||||
LastName string `xml:"name>last"`
|
||||
|
||||
<name>
|
||||
<first>Asta</first>
|
||||
<last>Xie</last>
|
||||
</name>
|
||||
|
||||
You may have noticed that struct tags are very useful for dealing with XML, and the same goes for the other data formats we'll be discussing in the following sections. If you still find that you have problems with working with struct tags, you should probably read more documentation about them before diving into the next section.
|
||||
|
||||
## Links
|
||||
|
||||
- [Directory](preface.md)
|
||||
- Previous section: [Text files](07.0.md)
|
||||
- Next section: [JSON](07.2.md)
|
||||
|
||||
Reference in New Issue
Block a user