25
en/07.3.md
25
en/07.3.md
@@ -1,12 +1,11 @@
|
||||
# 7.3 Regexp
|
||||
|
||||
Regexp is a complicated but powerful tool for pattern matching and text manipulation. Although does not perform as well as pure text matching, it's more flexible. Based on its syntax, you can filter almost any kind of text from your source content. If you need to collect data in web development, it's not hard to use Regexp to retrieve meaningful data.
|
||||
Regular Expressions ("Regexp") is a complicated but powerful tool for pattern matching and text manipulation. Although it does not perform as well as pure text matching, it's more flexible. Based on its syntax, you can filter almost any kind of text from your source content. If you need to collect data in web development, it's not difficult to use Regexp to retrieve meaningful data.
|
||||
|
||||
Go has the `regexp` package, which provides official support for regexp. If you've already used regexp in other programming languages, you should be familiar with it. Note that Go implemented RE2 standard except for `\C`. For more details, follow this link: [http://code.google.com/p/re2/wiki/Syntax](http://code.google.com/p/re2/wiki/Syntax).
|
||||
|
||||
Go's `strings` package can actually do many jobs like searching (Contains, Index), replacing (Replace), parsing (Split, Join), etc., and it's faster than Regexp. However, these are all trivial operations. If you want to search a case insensitive string, Regexp should be your best choice. So, if the `strings` package is sufficient for your needs, just use it since it's easy to use and read; if you need to perform more advanced operations, use Regexp.
|
||||
|
||||
If you recall form verification from previous sections, we used Regexp to verify the validity of user input information. Be aware that all characters are UTF-8. Let's learn more about the Go `regexp` package!
|
||||
If you recall form validation from previous sections, we used Regexp to verify the validity of user input information. Be aware that all characters are UTF-8. Let's learn more about the Go `regexp` package!
|
||||
|
||||
## Match
|
||||
|
||||
@@ -16,7 +15,7 @@ The `regexp` package has 3 functions to match: if it matches a pattern, then it
|
||||
func MatchReader(pattern string, r io.RuneReader) (matched bool, error error)
|
||||
func MatchString(pattern string, s string) (matched bool, error error)
|
||||
|
||||
All of 3 functions check if `pattern` matches the input source, returning true if it matches. However if your Regex has syntax errors, it will return an error. The 3 input sources of these functions are `slice of byte`, `RuneReader` and `string`.
|
||||
All 3 functions check if `pattern` matches the input source, returning true if it matches. However if your Regex has syntax errors, it will return an error. The 3 input sources of these functions are `slice of byte`, `RuneReader` and `string`.
|
||||
|
||||
Here is an example of how to verify an IP address:
|
||||
|
||||
@@ -27,7 +26,7 @@ Here is an example of how to verify an IP address:
|
||||
return true
|
||||
}
|
||||
|
||||
As you can see, using pattern in the `regexp` package is not that different. Here's one more example on verifying if user input is valid:
|
||||
As you can see, using pattern in the `regexp` package is not that different. Here's one more example on verifying whether user input is valid:
|
||||
|
||||
func main() {
|
||||
if len(os.Args) == 1 {
|
||||
@@ -40,13 +39,13 @@ As you can see, using pattern in the `regexp` package is not that different. Her
|
||||
}
|
||||
}
|
||||
|
||||
In the above examples, we use `Match(Reader|Sting)` to check if content is valid, but they are all easy to use.
|
||||
In the above examples, we use `Match(Reader|String)` to check if content is valid, but they are all easy to use.
|
||||
|
||||
## Filter
|
||||
|
||||
Match mode can verify content but it cannot cut, filter or collect data from it. If you want to do that, you have to use complex mode of Regexp.
|
||||
Match mode can verify content but it cannot cut, filter or collect data from it. If you want to do that, you have to use the complex mode of Regexp.
|
||||
|
||||
Let's say we need to write a crawler. Here is an example that shows when you must use Regexp to filter and cut data.
|
||||
Let's say we need to write a crawler. Here is an example for when you must use Regexp to filter and cut data.
|
||||
|
||||
package main
|
||||
|
||||
@@ -106,7 +105,7 @@ Here are some functions to parse your Regexp syntax:
|
||||
|
||||
The difference between `ComplePOSIX` and `Compile` is that the former has to use POSIX syntax which is leftmost longest search, and the latter is only leftmost search. For instance, for Regexp `[a-z]{2,4}` and content `"aa09aaa88aaaa"`, `CompilePOSIX` returns `aaaa` but `Compile` returns `aa`. `Must` prefix means panic when the Regexp syntax is not correct, returning error otherwise.
|
||||
|
||||
Now that we know how to create a new Regexp, let's see what how the methods provided by this struct can help us to operate on content:
|
||||
Now that we know how to create a new Regexp, let's see how the methods provided by this struct can help us to operate on content:
|
||||
|
||||
func (re *Regexp) Find(b []byte) []byte
|
||||
func (re *Regexp) FindAll(b []byte, n int) [][]byte
|
||||
@@ -181,7 +180,7 @@ Code sample:
|
||||
fmt.Println(string(v))
|
||||
}
|
||||
|
||||
// Same thing like FindIndex().
|
||||
// Same as FindIndex().
|
||||
submatchindex := re2.FindSubmatchIndex([]byte(a))
|
||||
fmt.Println(submatchindex)
|
||||
|
||||
@@ -194,7 +193,7 @@ Code sample:
|
||||
fmt.Println(submatchallindex)
|
||||
}
|
||||
|
||||
As we've previously introduced, Regexp also has 3 methods for matching. They do the exact same things as the exported functions. In fact, those exported functions actually call these methods under the hood:
|
||||
As we've previously mentioned, Regexp also has 3 methods for matching. They do the exact same thing as the exported functions. In fact, those exported functions actually call these methods under the hood:
|
||||
|
||||
func (re *Regexp) Match(b []byte) bool
|
||||
func (re *Regexp) MatchReader(r io.RuneReader) bool
|
||||
@@ -209,7 +208,7 @@ Next, let's see how to replace strings using Regexp:
|
||||
func (re *Regexp) ReplaceAllString(src, repl string) string
|
||||
func (re *Regexp) ReplaceAllStringFunc(src string, repl func(string) string) string
|
||||
|
||||
These are used in the crawling example, so we don't explain more here.
|
||||
These are used in the crawling example, so we will not explain any further here.
|
||||
|
||||
Let's take a look at the definition of `Expand`:
|
||||
|
||||
@@ -232,7 +231,7 @@ So how do we use `Expand`?
|
||||
fmt.Println(string(res))
|
||||
}
|
||||
|
||||
At this point, you've learned the whole `regexp` package in Go. I hope that you can understand more by studying examples of key methods, so that you can do something interesting on your own.
|
||||
At this point, you've learnt the whole `regexp` package in Go. I hope that you can understand more by studying examples of key methods, so that you can do something interesting on your own.
|
||||
|
||||
## Links
|
||||
|
||||
|
||||
Reference in New Issue
Block a user