Replace godep with dep
This commit is contained in:
parent
1e7489927c
commit
bf5616c65b
14883 changed files with 3937406 additions and 361781 deletions
10
vendor/golang.org/x/text/.gitattributes
generated
vendored
Normal file
10
vendor/golang.org/x/text/.gitattributes
generated
vendored
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
# Treat all files in this repo as binary, with no git magic updating
|
||||
# line endings. Windows users contributing to Go will need to use a
|
||||
# modern version of git and editors capable of LF line endings.
|
||||
#
|
||||
# We'll prevent accidental CRLF line endings from entering the repo
|
||||
# via the git-review gofmt checks.
|
||||
#
|
||||
# See golang.org/issue/9281
|
||||
|
||||
* -text
|
||||
6
vendor/golang.org/x/text/.gitignore
generated
vendored
Normal file
6
vendor/golang.org/x/text/.gitignore
generated
vendored
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# Add no patterns to .gitignore except for files generated by the build.
|
||||
last-change
|
||||
/DATA
|
||||
# This file is rather large and the tests really only need to be run
|
||||
# after generation.
|
||||
/unicode/norm/data_test.go
|
||||
31
vendor/golang.org/x/text/CONTRIBUTING.md
generated
vendored
Normal file
31
vendor/golang.org/x/text/CONTRIBUTING.md
generated
vendored
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
# Contributing to Go
|
||||
|
||||
Go is an open source project.
|
||||
|
||||
It is the work of hundreds of contributors. We appreciate your help!
|
||||
|
||||
|
||||
## Filing issues
|
||||
|
||||
When [filing an issue](https://golang.org/issue/new), make sure to answer these five questions:
|
||||
|
||||
1. What version of Go are you using (`go version`)?
|
||||
2. What operating system and processor architecture are you using?
|
||||
3. What did you do?
|
||||
4. What did you expect to see?
|
||||
5. What did you see instead?
|
||||
|
||||
General questions should go to the [golang-nuts mailing list](https://groups.google.com/group/golang-nuts) instead of the issue tracker.
|
||||
The gophers there will answer or ask you to file an issue if you've tripped over a bug.
|
||||
|
||||
## Contributing code
|
||||
|
||||
Please read the [Contribution Guidelines](https://golang.org/doc/contribute.html)
|
||||
before sending patches.
|
||||
|
||||
**We do not accept GitHub pull requests**
|
||||
(we use [Gerrit](https://code.google.com/p/gerrit/) instead for code review).
|
||||
|
||||
Unless otherwise noted, the Go source files are distributed under
|
||||
the BSD-style license found in the LICENSE file.
|
||||
|
||||
91
vendor/golang.org/x/text/README.md
generated
vendored
Normal file
91
vendor/golang.org/x/text/README.md
generated
vendored
Normal file
|
|
@ -0,0 +1,91 @@
|
|||
# Go Text
|
||||
|
||||
This repository holds supplementary Go libraries for text processing, many involving Unicode.
|
||||
|
||||
## Semantic Versioning
|
||||
This repo uses Semantic versioning (http://semver.org/), so
|
||||
1. MAJOR version when you make incompatible API changes,
|
||||
1. MINOR version when you add functionality in a backwards-compatible manner,
|
||||
and
|
||||
1. PATCH version when you make backwards-compatible bug fixes.
|
||||
|
||||
A Unicode major and minor version bump is mapped to a major version bump in
|
||||
x/text.
|
||||
A path version bump in Unicode is mapped to a minor version bump in x/text.
|
||||
Note that, consistent with the definitions in semver, until version 1.0.0 of
|
||||
x/text is reached, the minor version is considered a major version.
|
||||
So going from 0.1.0 to 0.2.0 is considered to be a major version bump.
|
||||
|
||||
A major new CLDR version is mapped to a minor version increase in x/text.
|
||||
Any other new CLDR version is mapped to a patch version increase in x/text.
|
||||
|
||||
## Download/Install
|
||||
|
||||
The easiest way to install is to run `go get -u golang.org/x/text`. You can
|
||||
also manually git clone the repository to `$GOPATH/src/golang.org/x/text`.
|
||||
|
||||
## Contribute
|
||||
To submit changes to this repository, see http://golang.org/doc/contribute.html.
|
||||
|
||||
To generate the tables in this repository (except for the encoding tables),
|
||||
run go generate from this directory. By default tables are generated for the
|
||||
Unicode version in core and the CLDR version defined in
|
||||
golang.org/x/text/unicode/cldr.
|
||||
|
||||
Running go generate will as a side effect create a DATA subdirectory in this
|
||||
directory, which holds all files that are used as a source for generating the
|
||||
tables. This directory will also serve as a cache.
|
||||
|
||||
## Testing
|
||||
Run
|
||||
|
||||
go test ./...
|
||||
|
||||
from this directory to run all tests. Add the "-tags icu" flag to also run
|
||||
ICU conformance tests (if available). This requires that you have the correct
|
||||
ICU version installed on your system.
|
||||
|
||||
TODO:
|
||||
- updating unversioned source files.
|
||||
|
||||
## Generating Tables
|
||||
|
||||
To generate the tables in this repository (except for the encoding
|
||||
tables), run `go generate` from this directory. By default tables are
|
||||
generated for the Unicode version in core and the CLDR version defined in
|
||||
golang.org/x/text/unicode/cldr.
|
||||
|
||||
Running go generate will as a side effect create a DATA subdirectory in this
|
||||
directory which holds all files that are used as a source for generating the
|
||||
tables. This directory will also serve as a cache.
|
||||
|
||||
## Versions
|
||||
To update a Unicode version run
|
||||
|
||||
UNICODE_VERSION=x.x.x go generate
|
||||
|
||||
where `x.x.x` must correspond to a directory in http://www.unicode.org/Public/.
|
||||
If this version is newer than the version in core it will also update the
|
||||
relevant packages there. The idna package in x/net will always be updated.
|
||||
|
||||
To update a CLDR version run
|
||||
|
||||
CLDR_VERSION=version go generate
|
||||
|
||||
where `version` must correspond to a directory in
|
||||
http://www.unicode.org/Public/cldr/.
|
||||
|
||||
Note that the code gets adapted over time to changes in the data and that
|
||||
backwards compatibility is not maintained.
|
||||
So updating to a different version may not work.
|
||||
|
||||
The files in DATA/{iana|icu|w3|whatwg} are currently not versioned.
|
||||
|
||||
## Report Issues / Send Patches
|
||||
|
||||
This repository uses Gerrit for code changes. To learn how to submit changes to
|
||||
this repository, see https://golang.org/doc/contribute.html.
|
||||
|
||||
The main issue tracker for the image repository is located at
|
||||
https://github.com/golang/go/issues. Prefix your issue with "x/image:" in the
|
||||
subject line, so it is easy to find.
|
||||
2
vendor/golang.org/x/text/cases/cases.go
generated
vendored
2
vendor/golang.org/x/text/cases/cases.go
generated
vendored
|
|
@ -5,7 +5,7 @@
|
|||
//go:generate go run gen.go gen_trieval.go
|
||||
|
||||
// Package cases provides general and language-specific case mappers.
|
||||
package cases
|
||||
package cases // import "golang.org/x/text/cases"
|
||||
|
||||
import (
|
||||
"golang.org/x/text/language"
|
||||
|
|
|
|||
438
vendor/golang.org/x/text/cases/context_test.go
generated
vendored
Normal file
438
vendor/golang.org/x/text/cases/context_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,438 @@
|
|||
// Copyright 2014 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package cases
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
"unicode"
|
||||
|
||||
"golang.org/x/text/internal/testtext"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/transform"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
"golang.org/x/text/unicode/rangetable"
|
||||
)
|
||||
|
||||
// The following definitions are taken directly from Chapter 3 of The Unicode
|
||||
// Standard.
|
||||
|
||||
func propCased(r rune) bool {
|
||||
return propLower(r) || propUpper(r) || unicode.IsTitle(r)
|
||||
}
|
||||
|
||||
func propLower(r rune) bool {
|
||||
return unicode.IsLower(r) || unicode.Is(unicode.Other_Lowercase, r)
|
||||
}
|
||||
|
||||
func propUpper(r rune) bool {
|
||||
return unicode.IsUpper(r) || unicode.Is(unicode.Other_Uppercase, r)
|
||||
}
|
||||
|
||||
func propIgnore(r rune) bool {
|
||||
if unicode.In(r, unicode.Mn, unicode.Me, unicode.Cf, unicode.Lm, unicode.Sk) {
|
||||
return true
|
||||
}
|
||||
return caseIgnorable[r]
|
||||
}
|
||||
|
||||
func hasBreakProp(r rune) bool {
|
||||
// binary search over ranges
|
||||
lo := 0
|
||||
hi := len(breakProp)
|
||||
for lo < hi {
|
||||
m := lo + (hi-lo)/2
|
||||
bp := &breakProp[m]
|
||||
if bp.lo <= r && r <= bp.hi {
|
||||
return true
|
||||
}
|
||||
if r < bp.lo {
|
||||
hi = m
|
||||
} else {
|
||||
lo = m + 1
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func contextFromRune(r rune) *context {
|
||||
c := context{dst: make([]byte, 128), src: []byte(string(r)), atEOF: true}
|
||||
c.next()
|
||||
return &c
|
||||
}
|
||||
|
||||
func TestCaseProperties(t *testing.T) {
|
||||
if unicode.Version != UnicodeVersion {
|
||||
// Properties of existing code points may change by Unicode version, so
|
||||
// we need to skip.
|
||||
t.Skipf("Skipping as core Unicode version %s different than %s", unicode.Version, UnicodeVersion)
|
||||
}
|
||||
assigned := rangetable.Assigned(UnicodeVersion)
|
||||
coreVersion := rangetable.Assigned(unicode.Version)
|
||||
for r := rune(0); r <= lastRuneForTesting; r++ {
|
||||
if !unicode.In(r, assigned) || !unicode.In(r, coreVersion) {
|
||||
continue
|
||||
}
|
||||
c := contextFromRune(r)
|
||||
if got, want := c.info.isCaseIgnorable(), propIgnore(r); got != want {
|
||||
t.Errorf("caseIgnorable(%U): got %v; want %v (%x)", r, got, want, c.info)
|
||||
}
|
||||
// New letters may change case types, but existing case pairings should
|
||||
// not change. See Case Pair Stability in
|
||||
// http://unicode.org/policies/stability_policy.html.
|
||||
if rf := unicode.SimpleFold(r); rf != r && unicode.In(rf, assigned) {
|
||||
if got, want := c.info.isCased(), propCased(r); got != want {
|
||||
t.Errorf("cased(%U): got %v; want %v (%x)", r, got, want, c.info)
|
||||
}
|
||||
if got, want := c.caseType() == cUpper, propUpper(r); got != want {
|
||||
t.Errorf("upper(%U): got %v; want %v (%x)", r, got, want, c.info)
|
||||
}
|
||||
if got, want := c.caseType() == cLower, propLower(r); got != want {
|
||||
t.Errorf("lower(%U): got %v; want %v (%x)", r, got, want, c.info)
|
||||
}
|
||||
}
|
||||
if got, want := c.info.isBreak(), hasBreakProp(r); got != want {
|
||||
t.Errorf("isBreak(%U): got %v; want %v (%x)", r, got, want, c.info)
|
||||
}
|
||||
}
|
||||
// TODO: get title case from unicode file.
|
||||
}
|
||||
|
||||
func TestMapping(t *testing.T) {
|
||||
assigned := rangetable.Assigned(UnicodeVersion)
|
||||
coreVersion := rangetable.Assigned(unicode.Version)
|
||||
if coreVersion == nil {
|
||||
coreVersion = assigned
|
||||
}
|
||||
apply := func(r rune, f func(c *context) bool) string {
|
||||
c := contextFromRune(r)
|
||||
f(c)
|
||||
return string(c.dst[:c.pDst])
|
||||
}
|
||||
|
||||
for r, tt := range special {
|
||||
if got, want := apply(r, lower), tt.toLower; got != want {
|
||||
t.Errorf("lowerSpecial:(%U): got %+q; want %+q", r, got, want)
|
||||
}
|
||||
if got, want := apply(r, title), tt.toTitle; got != want {
|
||||
t.Errorf("titleSpecial:(%U): got %+q; want %+q", r, got, want)
|
||||
}
|
||||
if got, want := apply(r, upper), tt.toUpper; got != want {
|
||||
t.Errorf("upperSpecial:(%U): got %+q; want %+q", r, got, want)
|
||||
}
|
||||
}
|
||||
|
||||
for r := rune(0); r <= lastRuneForTesting; r++ {
|
||||
if !unicode.In(r, assigned) || !unicode.In(r, coreVersion) {
|
||||
continue
|
||||
}
|
||||
if rf := unicode.SimpleFold(r); rf == r || !unicode.In(rf, assigned) {
|
||||
continue
|
||||
}
|
||||
if _, ok := special[r]; ok {
|
||||
continue
|
||||
}
|
||||
want := string(unicode.ToLower(r))
|
||||
if got := apply(r, lower); got != want {
|
||||
t.Errorf("lower:%q (%U): got %q %U; want %q %U", r, r, got, []rune(got), want, []rune(want))
|
||||
}
|
||||
|
||||
want = string(unicode.ToUpper(r))
|
||||
if got := apply(r, upper); got != want {
|
||||
t.Errorf("upper:%q (%U): got %q %U; want %q %U", r, r, got, []rune(got), want, []rune(want))
|
||||
}
|
||||
|
||||
want = string(unicode.ToTitle(r))
|
||||
if got := apply(r, title); got != want {
|
||||
t.Errorf("title:%q (%U): got %q %U; want %q %U", r, r, got, []rune(got), want, []rune(want))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func runeFoldData(r rune) (x struct{ simple, full, special string }) {
|
||||
x = foldMap[r]
|
||||
if x.simple == "" {
|
||||
x.simple = string(unicode.ToLower(r))
|
||||
}
|
||||
if x.full == "" {
|
||||
x.full = string(unicode.ToLower(r))
|
||||
}
|
||||
if x.special == "" {
|
||||
x.special = x.full
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func TestFoldData(t *testing.T) {
|
||||
assigned := rangetable.Assigned(UnicodeVersion)
|
||||
coreVersion := rangetable.Assigned(unicode.Version)
|
||||
if coreVersion == nil {
|
||||
coreVersion = assigned
|
||||
}
|
||||
apply := func(r rune, f func(c *context) bool) (string, info) {
|
||||
c := contextFromRune(r)
|
||||
f(c)
|
||||
return string(c.dst[:c.pDst]), c.info.cccType()
|
||||
}
|
||||
for r := rune(0); r <= lastRuneForTesting; r++ {
|
||||
if !unicode.In(r, assigned) || !unicode.In(r, coreVersion) {
|
||||
continue
|
||||
}
|
||||
x := runeFoldData(r)
|
||||
if got, info := apply(r, foldFull); got != x.full {
|
||||
t.Errorf("full:%q (%U): got %q %U; want %q %U (ccc=%x)", r, r, got, []rune(got), x.full, []rune(x.full), info)
|
||||
}
|
||||
// TODO: special and simple.
|
||||
}
|
||||
}
|
||||
|
||||
func TestCCC(t *testing.T) {
|
||||
assigned := rangetable.Assigned(UnicodeVersion)
|
||||
normVersion := rangetable.Assigned(norm.Version)
|
||||
for r := rune(0); r <= lastRuneForTesting; r++ {
|
||||
if !unicode.In(r, assigned) || !unicode.In(r, normVersion) {
|
||||
continue
|
||||
}
|
||||
c := contextFromRune(r)
|
||||
|
||||
p := norm.NFC.PropertiesString(string(r))
|
||||
want := cccOther
|
||||
switch p.CCC() {
|
||||
case 0:
|
||||
want = cccZero
|
||||
case above:
|
||||
want = cccAbove
|
||||
}
|
||||
if got := c.info.cccType(); got != want {
|
||||
t.Errorf("%U: got %x; want %x", r, got, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestWordBreaks(t *testing.T) {
|
||||
for _, tt := range breakTest {
|
||||
testtext.Run(t, tt, func(t *testing.T) {
|
||||
parts := strings.Split(tt, "|")
|
||||
want := ""
|
||||
for _, s := range parts {
|
||||
found := false
|
||||
// This algorithm implements title casing given word breaks
|
||||
// as defined in the Unicode standard 3.13 R3.
|
||||
for _, r := range s {
|
||||
title := unicode.ToTitle(r)
|
||||
lower := unicode.ToLower(r)
|
||||
if !found && title != lower {
|
||||
found = true
|
||||
want += string(title)
|
||||
} else {
|
||||
want += string(lower)
|
||||
}
|
||||
}
|
||||
}
|
||||
src := strings.Join(parts, "")
|
||||
got := Title(language.Und).String(src)
|
||||
if got != want {
|
||||
t.Errorf("got %q; want %q", got, want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestContext(t *testing.T) {
|
||||
tests := []struct {
|
||||
desc string
|
||||
dstSize int
|
||||
atEOF bool
|
||||
src string
|
||||
out string
|
||||
nSrc int
|
||||
err error
|
||||
ops string
|
||||
prefixArg string
|
||||
prefixWant bool
|
||||
}{{
|
||||
desc: "next: past end, atEOF, no checkpoint",
|
||||
dstSize: 10,
|
||||
atEOF: true,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 2,
|
||||
ops: "next;next;next",
|
||||
// Test that calling prefix with a non-empty argument when the buffer
|
||||
// is depleted returns false.
|
||||
prefixArg: "x",
|
||||
prefixWant: false,
|
||||
}, {
|
||||
desc: "next: not at end, atEOF, no checkpoint",
|
||||
dstSize: 10,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
err: transform.ErrShortSrc,
|
||||
ops: "next;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "next: past end, !atEOF, no checkpoint",
|
||||
dstSize: 10,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
err: transform.ErrShortSrc,
|
||||
ops: "next;next;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "next: past end, !atEOF, checkpoint",
|
||||
dstSize: 10,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 2,
|
||||
ops: "next;next;checkpoint;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "copy: exact count, atEOF, no checkpoint",
|
||||
dstSize: 2,
|
||||
atEOF: true,
|
||||
src: "12",
|
||||
out: "12",
|
||||
nSrc: 2,
|
||||
ops: "next;copy;next;copy;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "copy: past end, !atEOF, no checkpoint",
|
||||
dstSize: 2,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
err: transform.ErrShortSrc,
|
||||
ops: "next;copy;next;copy;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "copy: past end, !atEOF, checkpoint",
|
||||
dstSize: 2,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "12",
|
||||
nSrc: 2,
|
||||
ops: "next;copy;next;copy;checkpoint;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "copy: short dst",
|
||||
dstSize: 1,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
err: transform.ErrShortDst,
|
||||
ops: "next;copy;next;copy;checkpoint;next",
|
||||
prefixArg: "12",
|
||||
prefixWant: false,
|
||||
}, {
|
||||
desc: "copy: short dst, checkpointed",
|
||||
dstSize: 1,
|
||||
atEOF: false,
|
||||
src: "12",
|
||||
out: "1",
|
||||
nSrc: 1,
|
||||
err: transform.ErrShortDst,
|
||||
ops: "next;copy;checkpoint;next;copy;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "writeString: simple",
|
||||
dstSize: 3,
|
||||
atEOF: true,
|
||||
src: "1",
|
||||
out: "1ab",
|
||||
nSrc: 1,
|
||||
ops: "next;copy;writeab;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "writeString: short dst",
|
||||
dstSize: 2,
|
||||
atEOF: true,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
err: transform.ErrShortDst,
|
||||
ops: "next;copy;writeab;next",
|
||||
prefixArg: "2",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "writeString: simple",
|
||||
dstSize: 3,
|
||||
atEOF: true,
|
||||
src: "12",
|
||||
out: "1ab",
|
||||
nSrc: 2,
|
||||
ops: "next;copy;next;writeab;next",
|
||||
prefixArg: "",
|
||||
prefixWant: true,
|
||||
}, {
|
||||
desc: "writeString: short dst",
|
||||
dstSize: 2,
|
||||
atEOF: true,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
err: transform.ErrShortDst,
|
||||
ops: "next;copy;next;writeab;next",
|
||||
prefixArg: "1",
|
||||
prefixWant: false,
|
||||
}, {
|
||||
desc: "prefix",
|
||||
dstSize: 2,
|
||||
atEOF: true,
|
||||
src: "12",
|
||||
out: "",
|
||||
nSrc: 0,
|
||||
// Context will assign an ErrShortSrc if the input wasn't exhausted.
|
||||
err: transform.ErrShortSrc,
|
||||
prefixArg: "12",
|
||||
prefixWant: true,
|
||||
}}
|
||||
for _, tt := range tests {
|
||||
c := context{dst: make([]byte, tt.dstSize), src: []byte(tt.src), atEOF: tt.atEOF}
|
||||
|
||||
for _, op := range strings.Split(tt.ops, ";") {
|
||||
switch op {
|
||||
case "next":
|
||||
c.next()
|
||||
case "checkpoint":
|
||||
c.checkpoint()
|
||||
case "writeab":
|
||||
c.writeString("ab")
|
||||
case "copy":
|
||||
c.copy()
|
||||
case "":
|
||||
default:
|
||||
t.Fatalf("unknown op %q", op)
|
||||
}
|
||||
}
|
||||
if got := c.hasPrefix(tt.prefixArg); got != tt.prefixWant {
|
||||
t.Errorf("%s:\nprefix was %v; want %v", tt.desc, got, tt.prefixWant)
|
||||
}
|
||||
nDst, nSrc, err := c.ret()
|
||||
if err != tt.err {
|
||||
t.Errorf("%s:\nerror was %v; want %v", tt.desc, err, tt.err)
|
||||
}
|
||||
if out := string(c.dst[:nDst]); out != tt.out {
|
||||
t.Errorf("%s:\nout was %q; want %q", tt.desc, out, tt.out)
|
||||
}
|
||||
if nSrc != tt.nSrc {
|
||||
t.Errorf("%s:\nnSrc was %d; want %d", tt.desc, nSrc, tt.nSrc)
|
||||
}
|
||||
}
|
||||
}
|
||||
53
vendor/golang.org/x/text/cases/example_test.go
generated
vendored
Normal file
53
vendor/golang.org/x/text/cases/example_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
// Copyright 2014 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package cases_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"golang.org/x/text/cases"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
func Example() {
|
||||
src := []string{
|
||||
"hello world!",
|
||||
"i with dot",
|
||||
"'n ijsberg",
|
||||
"here comes O'Brian",
|
||||
}
|
||||
for _, c := range []cases.Caser{
|
||||
cases.Lower(language.Und),
|
||||
cases.Upper(language.Turkish),
|
||||
cases.Title(language.Dutch),
|
||||
cases.Title(language.Und, cases.NoLower),
|
||||
} {
|
||||
fmt.Println()
|
||||
for _, s := range src {
|
||||
fmt.Println(c.String(s))
|
||||
}
|
||||
}
|
||||
|
||||
// Output:
|
||||
// hello world!
|
||||
// i with dot
|
||||
// 'n ijsberg
|
||||
// here comes o'brian
|
||||
//
|
||||
// HELLO WORLD!
|
||||
// İ WİTH DOT
|
||||
// 'N İJSBERG
|
||||
// HERE COMES O'BRİAN
|
||||
//
|
||||
// Hello World!
|
||||
// I With Dot
|
||||
// 'n IJsberg
|
||||
// Here Comes O'brian
|
||||
//
|
||||
// Hello World!
|
||||
// I With Dot
|
||||
// 'N Ijsberg
|
||||
// Here Comes O'Brian
|
||||
}
|
||||
51
vendor/golang.org/x/text/cases/fold_test.go
generated
vendored
Normal file
51
vendor/golang.org/x/text/cases/fold_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package cases
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/testtext"
|
||||
)
|
||||
|
||||
var foldTestCases = []string{
|
||||
"βß\u13f8", // "βssᏰ"
|
||||
"ab\u13fc\uab7aꭰ", // abᏴᎪᎠ
|
||||
"affifflast", // affifflast
|
||||
"Iİiı\u0345", // ii̇iıι
|
||||
"µµΜΜςσΣΣ", // μμμμσσσσ
|
||||
}
|
||||
|
||||
func TestFold(t *testing.T) {
|
||||
for _, tc := range foldTestCases {
|
||||
testEntry := func(name string, c Caser, m func(r rune) string) {
|
||||
want := ""
|
||||
for _, r := range tc {
|
||||
want += m(r)
|
||||
}
|
||||
if got := c.String(tc); got != want {
|
||||
t.Errorf("%s(%s) = %+q; want %+q", name, tc, got, want)
|
||||
}
|
||||
dst := make([]byte, 256) // big enough to hold any result
|
||||
src := []byte(tc)
|
||||
v := testtext.AllocsPerRun(20, func() {
|
||||
c.Transform(dst, src, true)
|
||||
})
|
||||
if v > 0 {
|
||||
t.Errorf("%s(%s): number of allocs was %f; want 0", name, tc, v)
|
||||
}
|
||||
}
|
||||
testEntry("FullFold", Fold(), func(r rune) string {
|
||||
return runeFoldData(r).full
|
||||
})
|
||||
// TODO:
|
||||
// testEntry("SimpleFold", Fold(Compact), func(r rune) string {
|
||||
// return runeFoldData(r).simple
|
||||
// })
|
||||
// testEntry("SpecialFold", Fold(Turkic), func(r rune) string {
|
||||
// return runeFoldData(r).special
|
||||
// })
|
||||
}
|
||||
}
|
||||
210
vendor/golang.org/x/text/cases/icu_test.go
generated
vendored
Normal file
210
vendor/golang.org/x/text/cases/icu_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,210 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build icu
|
||||
|
||||
package cases
|
||||
|
||||
import (
|
||||
"path"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/testtext"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
func TestICUConformance(t *testing.T) {
|
||||
// Build test set.
|
||||
input := []string{
|
||||
"a.a a_a",
|
||||
"a\u05d0a",
|
||||
"\u05d0'a",
|
||||
"a\u03084a",
|
||||
"a\u0308a",
|
||||
"a3\u30a3a",
|
||||
"a\u303aa",
|
||||
"a_\u303a_a",
|
||||
"1_a..a",
|
||||
"1_a.a",
|
||||
"a..a.",
|
||||
"a--a-",
|
||||
"a-a-",
|
||||
"a\u200ba",
|
||||
"a\u200b\u200ba",
|
||||
"a\u00ad\u00ada", // Format
|
||||
"a\u00ada",
|
||||
"a''a", // SingleQuote
|
||||
"a'a",
|
||||
"a::a", // MidLetter
|
||||
"a:a",
|
||||
"a..a", // MidNumLet
|
||||
"a.a",
|
||||
"a;;a", // MidNum
|
||||
"a;a",
|
||||
"a__a", // ExtendNumlet
|
||||
"a_a",
|
||||
"ΟΣ''a",
|
||||
}
|
||||
add := func(x interface{}) {
|
||||
switch v := x.(type) {
|
||||
case string:
|
||||
input = append(input, v)
|
||||
case []string:
|
||||
for _, s := range v {
|
||||
input = append(input, s)
|
||||
}
|
||||
}
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
add(tc.src)
|
||||
add(tc.lower)
|
||||
add(tc.upper)
|
||||
add(tc.title)
|
||||
}
|
||||
for _, tc := range bufferTests {
|
||||
add(tc.src)
|
||||
}
|
||||
for _, tc := range breakTest {
|
||||
add(strings.Replace(tc, "|", "", -1))
|
||||
}
|
||||
for _, tc := range foldTestCases {
|
||||
add(tc)
|
||||
}
|
||||
|
||||
// Compare ICU to Go.
|
||||
for _, c := range []string{"lower", "upper", "title", "fold"} {
|
||||
for _, tag := range []string{
|
||||
"und", "af", "az", "el", "lt", "nl", "tr",
|
||||
} {
|
||||
for _, s := range input {
|
||||
if exclude(c, tag, s) {
|
||||
continue
|
||||
}
|
||||
testtext.Run(t, path.Join(c, tag, s), func(t *testing.T) {
|
||||
want := doICU(tag, c, s)
|
||||
got := doGo(tag, c, s)
|
||||
if norm.NFC.String(got) != norm.NFC.String(want) {
|
||||
t.Errorf("\n in %[3]q (%+[3]q)\n got %[1]q (%+[1]q)\n want %[2]q (%+[2]q)", got, want, s)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// exclude indicates if a string should be excluded from testing.
|
||||
func exclude(cm, tag, s string) bool {
|
||||
list := []struct{ cm, tags, pattern string }{
|
||||
// TODO: Go does not handle certain esoteric breaks correctly. This will be
|
||||
// fixed once we have a real word break iterator. Alternatively, it
|
||||
// seems like we're not too far off from making it work, so we could
|
||||
// fix these last steps. But first verify that using a separate word
|
||||
// breaker does not hurt performance.
|
||||
{"title", "af nl", "a''a"},
|
||||
{"", "", "א'a"},
|
||||
|
||||
// All the exclusions below seem to be issues with the ICU
|
||||
// implementation (at version 57) and thus are not marked as TODO.
|
||||
|
||||
// ICU does not handle leading apostrophe for Dutch and
|
||||
// Afrikaans correctly. See http://unicode.org/cldr/trac/ticket/7078.
|
||||
{"title", "af nl", "'n"},
|
||||
{"title", "af nl", "'N"},
|
||||
|
||||
// Go terminates the final sigma check after a fixed number of
|
||||
// ignorables have been found. This ensures that the algorithm can make
|
||||
// progress in a streaming scenario.
|
||||
{"lower title", "", "\u039f\u03a3...............................a"},
|
||||
// This also applies to upper in Greek.
|
||||
// NOTE: we could fix the following two cases by adding state to elUpper
|
||||
// and aztrLower. However, considering a modifier to not belong to the
|
||||
// preceding letter after the maximum modifiers count is reached is
|
||||
// consistent with the behavior of unicode/norm.
|
||||
{"upper", "el", "\u03bf" + strings.Repeat("\u0321", 29) + "\u0313"},
|
||||
{"lower", "az tr lt", "I" + strings.Repeat("\u0321", 30) + "\u0307\u0300"},
|
||||
{"upper", "lt", "i" + strings.Repeat("\u0321", 30) + "\u0307\u0300"},
|
||||
{"lower", "lt", "I" + strings.Repeat("\u0321", 30) + "\u0300"},
|
||||
|
||||
// ICU title case seems to erroneously removes \u0307 from an upper case
|
||||
// I unconditionally, instead of only when lowercasing. The ICU
|
||||
// transform algorithm transforms these cases consistently with our
|
||||
// implementation.
|
||||
{"title", "az tr", "\u0307"},
|
||||
|
||||
// The spec says to remove \u0307 after Soft-Dotted characters. ICU
|
||||
// transforms conform but ucasemap_utf8ToUpper does not.
|
||||
{"upper title", "lt", "i\u0307"},
|
||||
{"upper title", "lt", "i" + strings.Repeat("\u0321", 29) + "\u0307\u0300"},
|
||||
|
||||
// Both Unicode and CLDR prescribe an extra explicit dot above after a
|
||||
// Soft_Dotted character if there are other modifiers.
|
||||
// ucasemap_utf8ToUpper does not do this; ICU transforms do.
|
||||
// The issue with ucasemap_utf8ToUpper seems to be that it does not
|
||||
// consider the modifiers that are part of composition in the evaluation
|
||||
// of More_Above. For instance, according to the More_Above rule for lt,
|
||||
// a dotted capital I (U+0130) becomes i\u0307\u0307 (an small i with
|
||||
// two additional dots). This seems odd, but is correct. ICU is
|
||||
// definitely not correct as it produces different results for different
|
||||
// normal forms. For instance, for an İ:
|
||||
// \u0130 (NFC) -> i\u0307 (incorrect)
|
||||
// I\u0307 (NFD) -> i\u0307\u0307 (correct)
|
||||
// We could argue that we should not add a \u0307 if there already is
|
||||
// one, but this may be hard to get correct and is not conform the
|
||||
// standard.
|
||||
{"lower title", "lt", "\u0130"},
|
||||
{"lower title", "lt", "\u00cf"},
|
||||
|
||||
// We are conform ICU ucasemap_utf8ToUpper if we remove support for
|
||||
// elUpper. However, this is clearly not conform the spec. Moreover, the
|
||||
// ICU transforms _do_ implement this transform and produces results
|
||||
// consistent with our implementation. Note that we still prefer to use
|
||||
// ucasemap_utf8ToUpper instead of transforms as the latter have
|
||||
// inconsistencies in the word breaking algorithm.
|
||||
{"upper", "el", "\u0386"}, // GREEK CAPITAL LETTER ALPHA WITH TONOS
|
||||
{"upper", "el", "\u0389"}, // GREEK CAPITAL LETTER ETA WITH TONOS
|
||||
{"upper", "el", "\u038A"}, // GREEK CAPITAL LETTER IOTA WITH TONOS
|
||||
|
||||
{"upper", "el", "\u0391"}, // GREEK CAPITAL LETTER ALPHA
|
||||
{"upper", "el", "\u0397"}, // GREEK CAPITAL LETTER ETA
|
||||
{"upper", "el", "\u0399"}, // GREEK CAPITAL LETTER IOTA
|
||||
|
||||
{"upper", "el", "\u03AC"}, // GREEK SMALL LETTER ALPHA WITH TONOS
|
||||
{"upper", "el", "\u03AE"}, // GREEK SMALL LETTER ALPHA WITH ETA
|
||||
{"upper", "el", "\u03AF"}, // GREEK SMALL LETTER ALPHA WITH IOTA
|
||||
|
||||
{"upper", "el", "\u03B1"}, // GREEK SMALL LETTER ALPHA
|
||||
{"upper", "el", "\u03B7"}, // GREEK SMALL LETTER ETA
|
||||
{"upper", "el", "\u03B9"}, // GREEK SMALL LETTER IOTA
|
||||
}
|
||||
for _, x := range list {
|
||||
if x.cm != "" && strings.Index(x.cm, cm) == -1 {
|
||||
continue
|
||||
}
|
||||
if x.tags != "" && strings.Index(x.tags, tag) == -1 {
|
||||
continue
|
||||
}
|
||||
if strings.Index(s, x.pattern) != -1 {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func doGo(tag, caser, input string) string {
|
||||
var c Caser
|
||||
t := language.MustParse(tag)
|
||||
switch caser {
|
||||
case "lower":
|
||||
c = Lower(t)
|
||||
case "upper":
|
||||
c = Upper(t)
|
||||
case "title":
|
||||
c = Title(t)
|
||||
case "fold":
|
||||
c = Fold()
|
||||
}
|
||||
return c.String(input)
|
||||
}
|
||||
950
vendor/golang.org/x/text/cases/map_test.go
generated
vendored
Normal file
950
vendor/golang.org/x/text/cases/map_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,950 @@
|
|||
// Copyright 2014 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package cases
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"path"
|
||||
"strings"
|
||||
"testing"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/internal/testtext"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/transform"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
type testCase struct {
|
||||
lang string
|
||||
src interface{} // string, []string, or nil to skip test
|
||||
title interface{} // string, []string, or nil to skip test
|
||||
lower interface{} // string, []string, or nil to skip test
|
||||
upper interface{} // string, []string, or nil to skip test
|
||||
opts options
|
||||
}
|
||||
|
||||
var testCases = []testCase{
|
||||
0: {
|
||||
lang: "und",
|
||||
src: "abc aBc ABC abC İsıI ΕΣΆΣ",
|
||||
title: "Abc Abc Abc Abc İsıi Εσάσ",
|
||||
lower: "abc abc abc abc i\u0307sıi εσάσ",
|
||||
upper: "ABC ABC ABC ABC İSII ΕΣΆΣ",
|
||||
opts: getOpts(HandleFinalSigma(false)),
|
||||
},
|
||||
|
||||
1: {
|
||||
lang: "und",
|
||||
src: "abc aBc ABC abC İsıI ΕΣΆΣ Σ _Σ -Σ",
|
||||
title: "Abc Abc Abc Abc İsıi Εσάς Σ _Σ -Σ",
|
||||
lower: "abc abc abc abc i\u0307sıi εσάς σ _σ -σ",
|
||||
upper: "ABC ABC ABC ABC İSII ΕΣΆΣ Σ _Σ -Σ",
|
||||
opts: getOpts(HandleFinalSigma(true)),
|
||||
},
|
||||
|
||||
2: { // Title cased runes.
|
||||
lang: supported,
|
||||
src: "DžA",
|
||||
title: "Dža",
|
||||
lower: "dža",
|
||||
upper: "DŽA",
|
||||
},
|
||||
|
||||
3: {
|
||||
// Title breaking.
|
||||
lang: supported,
|
||||
src: []string{
|
||||
"FOO CASE TEST",
|
||||
"DON'T DO THiS",
|
||||
"χωΡΊΣ χωΡΊΣ^a χωΡΊΣ:a χωΡΊΣ:^a χωΡΊΣ^ όμΩΣ Σ",
|
||||
"with-hyphens",
|
||||
"49ers 49ers",
|
||||
`"capitalize a^a -hyphen 0X _u a_u:a`,
|
||||
"MidNumLet a.b\u2018c\u2019d\u2024e\ufe52f\uff07f\uff0eg",
|
||||
"MidNum a,b;c\u037ed\u0589e\u060cf\u2044g\ufe50h",
|
||||
"\u0345 x\u3031x x\u05d0x \u05d0x a'.a a.a a4,a",
|
||||
},
|
||||
title: []string{
|
||||
"Foo Case Test",
|
||||
"Don't Do This",
|
||||
"Χωρίς Χωρίσ^A Χωρίσ:a Χωρίσ:^A Χωρίς^ Όμως Σ",
|
||||
"With-Hyphens",
|
||||
// Note that 49Ers is correct according to the spec.
|
||||
// TODO: provide some option to the user to treat different
|
||||
// characters as cased.
|
||||
"49Ers 49Ers",
|
||||
`"Capitalize A^A -Hyphen 0X _U A_u:a`,
|
||||
"Midnumlet A.b\u2018c\u2019d\u2024e\ufe52f\uff07f\uff0eg",
|
||||
"Midnum A,B;C\u037eD\u0589E\u060cF\u2044G\ufe50H",
|
||||
"\u0399 X\u3031X X\u05d0x \u05d0X A'.A A.a A4,A",
|
||||
},
|
||||
},
|
||||
|
||||
// TODO: These are known deviations from the options{} Unicode Word Breaking
|
||||
// Algorithm.
|
||||
// {
|
||||
// "und",
|
||||
// "x_\u3031_x a4,4a",
|
||||
// "X_\u3031_x A4,4a", // Currently is "X_\U3031_X A4,4A".
|
||||
// "x_\u3031_x a4,4a",
|
||||
// "X_\u3031_X A4,4A",
|
||||
// options{},
|
||||
// },
|
||||
|
||||
4: {
|
||||
// Tests title options
|
||||
lang: "und",
|
||||
src: "abc aBc ABC abC İsıI o'Brien",
|
||||
title: "Abc ABc ABC AbC İsıI O'Brien",
|
||||
opts: getOpts(NoLower),
|
||||
},
|
||||
|
||||
5: {
|
||||
lang: "el",
|
||||
src: "aBc ΟΔΌΣ Οδός Σο ΣΟ Σ oΣ ΟΣ σ ἕξ \u03ac",
|
||||
title: "Abc Οδός Οδός Σο Σο Σ Oς Ος Σ Ἕξ \u0386",
|
||||
lower: "abc οδός οδός σο σο σ oς ος σ ἕξ \u03ac",
|
||||
upper: "ABC ΟΔΟΣ ΟΔΟΣ ΣΟ ΣΟ Σ OΣ ΟΣ Σ ΕΞ \u0391", // Uppercase removes accents
|
||||
},
|
||||
|
||||
6: {
|
||||
lang: "tr az",
|
||||
src: "Isiİ İsıI I\u0307sIiİ İsıI\u0307 I\u0300\u0307",
|
||||
title: "Isii İsıı I\u0307sıii İsıi I\u0300\u0307",
|
||||
lower: "ısii isıı isıii isıi \u0131\u0300\u0307",
|
||||
upper: "ISİİ İSII I\u0307SIİİ İSII\u0307 I\u0300\u0307",
|
||||
},
|
||||
|
||||
7: {
|
||||
lang: "lt",
|
||||
src: "I Ï J J̈ Į Į̈ Ì Í Ĩ xi̇̈ xj̇̈ xį̇̈ xi̇̀ xi̇́ xi̇̃ XI XÏ XJ XJ̈ XĮ XĮ̈ XI̟̤",
|
||||
title: "I Ï J J̈ Į Į̈ Ì Í Ĩ Xi̇̈ Xj̇̈ Xį̇̈ Xi̇̀ Xi̇́ Xi̇̃ Xi Xi̇̈ Xj Xj̇̈ Xį Xį̇̈ Xi̟̤",
|
||||
lower: "i i̇̈ j j̇̈ į į̇̈ i̇̀ i̇́ i̇̃ xi̇̈ xj̇̈ xį̇̈ xi̇̀ xi̇́ xi̇̃ xi xi̇̈ xj xj̇̈ xį xį̇̈ xi̟̤",
|
||||
upper: "I Ï J J̈ Į Į̈ Ì Í Ĩ XÏ XJ̈ XĮ̈ XÌ XÍ XĨ XI XÏ XJ XJ̈ XĮ XĮ̈ XI̟̤",
|
||||
},
|
||||
|
||||
8: {
|
||||
lang: "lt",
|
||||
src: "\u012e\u0300 \u00cc i\u0307\u0300 i\u0307\u0301 i\u0307\u0303 i\u0307\u0308 i\u0300\u0307",
|
||||
title: "\u012e\u0300 \u00cc \u00cc \u00cd \u0128 \u00cf I\u0300\u0307",
|
||||
lower: "\u012f\u0307\u0300 i\u0307\u0300 i\u0307\u0300 i\u0307\u0301 i\u0307\u0303 i\u0307\u0308 i\u0300\u0307",
|
||||
upper: "\u012e\u0300 \u00cc \u00cc \u00cd \u0128 \u00cf I\u0300\u0307",
|
||||
},
|
||||
|
||||
9: {
|
||||
lang: "nl",
|
||||
src: "ijs IJs Ij Ijs İJ İJs aa aA 'ns 'S",
|
||||
title: "IJs IJs IJ IJs İj İjs Aa Aa 'ns 's",
|
||||
},
|
||||
|
||||
// Note: this specification is not currently part of CLDR. The same holds
|
||||
// for the leading apostrophe handling for Dutch.
|
||||
// See http://unicode.org/cldr/trac/ticket/7078.
|
||||
10: {
|
||||
lang: "af",
|
||||
src: "wag 'n bietjie",
|
||||
title: "Wag 'n Bietjie",
|
||||
lower: "wag 'n bietjie",
|
||||
upper: "WAG 'N BIETJIE",
|
||||
},
|
||||
}
|
||||
|
||||
func TestCaseMappings(t *testing.T) {
|
||||
for i, tt := range testCases {
|
||||
src, ok := tt.src.([]string)
|
||||
if !ok {
|
||||
src = strings.Split(tt.src.(string), " ")
|
||||
}
|
||||
|
||||
for _, lang := range strings.Split(tt.lang, " ") {
|
||||
tag := language.MustParse(lang)
|
||||
testEntry := func(name string, mk func(language.Tag, options) transform.SpanningTransformer, gold interface{}) {
|
||||
c := Caser{mk(tag, tt.opts)}
|
||||
if gold != nil {
|
||||
wants, ok := gold.([]string)
|
||||
if !ok {
|
||||
wants = strings.Split(gold.(string), " ")
|
||||
}
|
||||
for j, want := range wants {
|
||||
if got := c.String(src[j]); got != want {
|
||||
t.Errorf("%d:%s:\n%s.String(%+q):\ngot %+q;\nwant %+q", i, lang, name, src[j], got, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
dst := make([]byte, 256) // big enough to hold any result
|
||||
src := []byte(strings.Join(src, " "))
|
||||
v := testtext.AllocsPerRun(20, func() {
|
||||
c.Transform(dst, src, true)
|
||||
})
|
||||
if v > 1.1 {
|
||||
t.Errorf("%d:%s:\n%s: number of allocs was %f; want 0", i, lang, name, v)
|
||||
}
|
||||
}
|
||||
testEntry("Upper", makeUpper, tt.upper)
|
||||
testEntry("Lower", makeLower, tt.lower)
|
||||
testEntry("Title", makeTitle, tt.title)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestAlloc tests that some mapping methods should not cause any allocation.
|
||||
func TestAlloc(t *testing.T) {
|
||||
dst := make([]byte, 256) // big enough to hold any result
|
||||
src := []byte(txtNonASCII)
|
||||
|
||||
for i, f := range []func() Caser{
|
||||
func() Caser { return Upper(language.Und) },
|
||||
func() Caser { return Lower(language.Und) },
|
||||
func() Caser { return Lower(language.Und, HandleFinalSigma(false)) },
|
||||
// TODO: use a shared copy for these casers as well, in order of
|
||||
// importance, starting with the most important:
|
||||
// func() Caser { return Title(language.Und) },
|
||||
// func() Caser { return Title(language.Und, HandleFinalSigma(false)) },
|
||||
} {
|
||||
testtext.Run(t, "", func(t *testing.T) {
|
||||
var c Caser
|
||||
v := testtext.AllocsPerRun(10, func() {
|
||||
c = f()
|
||||
})
|
||||
if v > 0 {
|
||||
// TODO: Right now only Upper has 1 allocation. Special-case Lower
|
||||
// and Title as well to have less allocations for the root locale.
|
||||
t.Errorf("%d:init: number of allocs was %f; want 0", i, v)
|
||||
}
|
||||
v = testtext.AllocsPerRun(2, func() {
|
||||
c.Transform(dst, src, true)
|
||||
})
|
||||
if v > 0 {
|
||||
t.Errorf("%d:transform: number of allocs was %f; want 0", i, v)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func testHandover(t *testing.T, c Caser, src string) {
|
||||
want := c.String(src)
|
||||
// Find the common prefix.
|
||||
pSrc := 0
|
||||
for ; pSrc < len(src) && pSrc < len(want) && want[pSrc] == src[pSrc]; pSrc++ {
|
||||
}
|
||||
|
||||
// Test handover for each substring of the prefix.
|
||||
for i := 0; i < pSrc; i++ {
|
||||
testtext.Run(t, fmt.Sprint("interleave/", i), func(t *testing.T) {
|
||||
dst := make([]byte, 4*len(src))
|
||||
c.Reset()
|
||||
nSpan, _ := c.Span([]byte(src[:i]), false)
|
||||
copy(dst, src[:nSpan])
|
||||
nTransform, _, _ := c.Transform(dst[nSpan:], []byte(src[nSpan:]), true)
|
||||
got := string(dst[:nSpan+nTransform])
|
||||
if got != want {
|
||||
t.Errorf("full string: got %q; want %q", got, want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestHandover(t *testing.T) {
|
||||
testCases := []struct {
|
||||
desc string
|
||||
t Caser
|
||||
first, second string
|
||||
}{{
|
||||
"title/nosigma/single midword",
|
||||
Title(language.Und, HandleFinalSigma(false)),
|
||||
"A.", "a",
|
||||
}, {
|
||||
"title/nosigma/single midword",
|
||||
Title(language.Und, HandleFinalSigma(false)),
|
||||
"A", ".a",
|
||||
}, {
|
||||
"title/nosigma/double midword",
|
||||
Title(language.Und, HandleFinalSigma(false)),
|
||||
"A..", "a",
|
||||
}, {
|
||||
"title/nosigma/double midword",
|
||||
Title(language.Und, HandleFinalSigma(false)),
|
||||
"A.", ".a",
|
||||
}, {
|
||||
"title/nosigma/double midword",
|
||||
Title(language.Und, HandleFinalSigma(false)),
|
||||
"A", "..a",
|
||||
}, {
|
||||
"title/sigma/single midword",
|
||||
Title(language.Und),
|
||||
"ΟΣ.", "a",
|
||||
}, {
|
||||
"title/sigma/single midword",
|
||||
Title(language.Und),
|
||||
"ΟΣ", ".a",
|
||||
}, {
|
||||
"title/sigma/double midword",
|
||||
Title(language.Und),
|
||||
"ΟΣ..", "a",
|
||||
}, {
|
||||
"title/sigma/double midword",
|
||||
Title(language.Und),
|
||||
"ΟΣ.", ".a",
|
||||
}, {
|
||||
"title/sigma/double midword",
|
||||
Title(language.Und),
|
||||
"ΟΣ", "..a",
|
||||
}, {
|
||||
"title/af/leading apostrophe",
|
||||
Title(language.Afrikaans),
|
||||
"'", "n bietje",
|
||||
}}
|
||||
for _, tc := range testCases {
|
||||
testtext.Run(t, tc.desc, func(t *testing.T) {
|
||||
src := tc.first + tc.second
|
||||
want := tc.t.String(src)
|
||||
tc.t.Reset()
|
||||
n, _ := tc.t.Span([]byte(tc.first), false)
|
||||
|
||||
dst := make([]byte, len(want))
|
||||
copy(dst, tc.first[:n])
|
||||
|
||||
nDst, _, _ := tc.t.Transform(dst[n:], []byte(src[n:]), true)
|
||||
got := string(dst[:n+nDst])
|
||||
if got != want {
|
||||
t.Errorf("got %q; want %q", got, want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// minBufSize is the size of the buffer by which the casing operation in
|
||||
// this package are guaranteed to make progress.
|
||||
const minBufSize = norm.MaxSegmentSize
|
||||
|
||||
type bufferTest struct {
|
||||
desc, src, want string
|
||||
firstErr error
|
||||
dstSize, srcSize int
|
||||
t transform.SpanningTransformer
|
||||
}
|
||||
|
||||
var bufferTests []bufferTest
|
||||
|
||||
func init() {
|
||||
bufferTests = []bufferTest{{
|
||||
desc: "und/upper/short dst",
|
||||
src: "abcdefg",
|
||||
want: "ABCDEFG",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 3,
|
||||
srcSize: minBufSize,
|
||||
t: Upper(language.Und),
|
||||
}, {
|
||||
desc: "und/upper/short src",
|
||||
src: "123é56",
|
||||
want: "123É56",
|
||||
firstErr: transform.ErrShortSrc,
|
||||
dstSize: 4,
|
||||
srcSize: 4,
|
||||
t: Upper(language.Und),
|
||||
}, {
|
||||
desc: "und/upper/no error on short",
|
||||
src: "12",
|
||||
want: "12",
|
||||
firstErr: nil,
|
||||
dstSize: 1,
|
||||
srcSize: 1,
|
||||
t: Upper(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/short dst",
|
||||
src: "ABCDEFG",
|
||||
want: "abcdefg",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 3,
|
||||
srcSize: minBufSize,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/short src",
|
||||
src: "123É56",
|
||||
want: "123é56",
|
||||
firstErr: transform.ErrShortSrc,
|
||||
dstSize: 4,
|
||||
srcSize: 4,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/no error on short",
|
||||
src: "12",
|
||||
want: "12",
|
||||
firstErr: nil,
|
||||
dstSize: 1,
|
||||
srcSize: 1,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/simple (no final sigma)",
|
||||
src: "ΟΣ ΟΣΣ",
|
||||
want: "οσ οσσ",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Lower(language.Und, HandleFinalSigma(false)),
|
||||
}, {
|
||||
desc: "und/title/simple (no final sigma)",
|
||||
src: "ΟΣ ΟΣΣ",
|
||||
want: "Οσ Οσσ",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und, HandleFinalSigma(false)),
|
||||
}, {
|
||||
desc: "und/title/final sigma: no error",
|
||||
src: "ΟΣ",
|
||||
want: "Ος",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: short source",
|
||||
src: "ΟΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ",
|
||||
want: "Οσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσς",
|
||||
firstErr: transform.ErrShortSrc,
|
||||
dstSize: minBufSize,
|
||||
srcSize: 10,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: short destination 1",
|
||||
src: "ΟΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ",
|
||||
want: "Οσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσς",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 10,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: short destination 2",
|
||||
src: "ΟΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ",
|
||||
want: "Οσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσς",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 9,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: short destination 3",
|
||||
src: "ΟΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ",
|
||||
want: "Οσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσς",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 8,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/clipped UTF-8 rune",
|
||||
src: "σσσσσσσσσσσ",
|
||||
want: "Σσσσσσσσσσσ",
|
||||
firstErr: transform.ErrShortSrc,
|
||||
dstSize: minBufSize,
|
||||
srcSize: 5,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/clipped UTF-8 rune atEOF",
|
||||
src: "σσσ" + string([]byte{0xCF}),
|
||||
want: "Σσσ" + string([]byte{0xCF}),
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
// Note: the choice to change the final sigma at the end in case of
|
||||
// too many case ignorables is arbitrary. The main reason for this
|
||||
// choice is that it results in simpler code.
|
||||
desc: "und/title/final sigma: max ignorables",
|
||||
src: "ΟΣ" + strings.Repeat(".", maxIgnorable) + "a",
|
||||
want: "Οσ" + strings.Repeat(".", maxIgnorable) + "A",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
// Note: the choice to change the final sigma at the end in case of
|
||||
// too many case ignorables is arbitrary. The main reason for this
|
||||
// choice is that it results in simpler code.
|
||||
desc: "und/title/long string",
|
||||
src: "AA" + strings.Repeat(".", maxIgnorable+1) + "a",
|
||||
want: "Aa" + strings.Repeat(".", maxIgnorable+1) + "A",
|
||||
dstSize: minBufSize,
|
||||
srcSize: len("AA" + strings.Repeat(".", maxIgnorable+1)),
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
// Note: the choice to change the final sigma at the end in case of
|
||||
// too many case ignorables is arbitrary. The main reason for this
|
||||
// choice is that it results in simpler code.
|
||||
desc: "und/title/final sigma: too many ignorables",
|
||||
src: "ΟΣ" + strings.Repeat(".", maxIgnorable+1) + "a",
|
||||
want: "Ος" + strings.Repeat(".", maxIgnorable+1) + "A",
|
||||
dstSize: minBufSize,
|
||||
srcSize: len("ΟΣ" + strings.Repeat(".", maxIgnorable+1)),
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: apostrophe",
|
||||
src: "ΟΣ''a",
|
||||
want: "Οσ''A",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "el/upper/max ignorables",
|
||||
src: "ο" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0313",
|
||||
want: "Ο" + strings.Repeat("\u0321", maxIgnorable-1),
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Upper(language.Greek),
|
||||
}, {
|
||||
desc: "el/upper/too many ignorables",
|
||||
src: "ο" + strings.Repeat("\u0321", maxIgnorable) + "\u0313",
|
||||
want: "Ο" + strings.Repeat("\u0321", maxIgnorable) + "\u0313",
|
||||
dstSize: minBufSize,
|
||||
srcSize: len("ο" + strings.Repeat("\u0321", maxIgnorable)),
|
||||
t: Upper(language.Greek),
|
||||
}, {
|
||||
desc: "el/upper/short dst",
|
||||
src: "123ο",
|
||||
want: "123Ο",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 3,
|
||||
srcSize: minBufSize,
|
||||
t: Upper(language.Greek),
|
||||
}, {
|
||||
desc: "lt/lower/max ignorables",
|
||||
src: "I" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0300",
|
||||
want: "i" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0307\u0300",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/lower/too many ignorables",
|
||||
src: "I" + strings.Repeat("\u0321", maxIgnorable) + "\u0300",
|
||||
want: "i" + strings.Repeat("\u0321", maxIgnorable) + "\u0300",
|
||||
dstSize: minBufSize,
|
||||
srcSize: len("I" + strings.Repeat("\u0321", maxIgnorable)),
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/lower/decomposition with short dst buffer 1",
|
||||
src: "aaaaa\u00cc", // U+00CC LATIN CAPITAL LETTER I GRAVE
|
||||
firstErr: transform.ErrShortDst,
|
||||
want: "aaaaai\u0307\u0300",
|
||||
dstSize: 5,
|
||||
srcSize: minBufSize,
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/lower/decomposition with short dst buffer 2",
|
||||
src: "aaaa\u00cc", // U+00CC LATIN CAPITAL LETTER I GRAVE
|
||||
firstErr: transform.ErrShortDst,
|
||||
want: "aaaai\u0307\u0300",
|
||||
dstSize: 5,
|
||||
srcSize: minBufSize,
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/upper/max ignorables",
|
||||
src: "i" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0307\u0300",
|
||||
want: "I" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0300",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Upper(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/upper/too many ignorables",
|
||||
src: "i" + strings.Repeat("\u0321", maxIgnorable) + "\u0307\u0300",
|
||||
want: "I" + strings.Repeat("\u0321", maxIgnorable) + "\u0307\u0300",
|
||||
dstSize: minBufSize,
|
||||
srcSize: len("i" + strings.Repeat("\u0321", maxIgnorable)),
|
||||
t: Upper(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/upper/short dst",
|
||||
src: "12i\u0307\u0300",
|
||||
want: "12\u00cc",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 3,
|
||||
srcSize: minBufSize,
|
||||
t: Upper(language.Lithuanian),
|
||||
}, {
|
||||
desc: "aztr/lower/max ignorables",
|
||||
src: "I" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0307\u0300",
|
||||
want: "i" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0300",
|
||||
dstSize: minBufSize,
|
||||
srcSize: minBufSize,
|
||||
t: Lower(language.Turkish),
|
||||
}, {
|
||||
desc: "aztr/lower/too many ignorables",
|
||||
src: "I" + strings.Repeat("\u0321", maxIgnorable) + "\u0307\u0300",
|
||||
want: "\u0131" + strings.Repeat("\u0321", maxIgnorable) + "\u0307\u0300",
|
||||
dstSize: minBufSize,
|
||||
srcSize: len("I" + strings.Repeat("\u0321", maxIgnorable)),
|
||||
t: Lower(language.Turkish),
|
||||
}, {
|
||||
desc: "nl/title/pre-IJ cutoff",
|
||||
src: " ij",
|
||||
want: " IJ",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 2,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Dutch),
|
||||
}, {
|
||||
desc: "nl/title/mid-IJ cutoff",
|
||||
src: " ij",
|
||||
want: " IJ",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 3,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Dutch),
|
||||
}, {
|
||||
desc: "af/title/apostrophe",
|
||||
src: "'n bietje",
|
||||
want: "'n Bietje",
|
||||
firstErr: transform.ErrShortDst,
|
||||
dstSize: 3,
|
||||
srcSize: minBufSize,
|
||||
t: Title(language.Afrikaans),
|
||||
}}
|
||||
}
|
||||
|
||||
func TestShortBuffersAndOverflow(t *testing.T) {
|
||||
for i, tt := range bufferTests {
|
||||
testtext.Run(t, tt.desc, func(t *testing.T) {
|
||||
buf := make([]byte, tt.dstSize)
|
||||
got := []byte{}
|
||||
var nSrc, nDst int
|
||||
var err error
|
||||
for p := 0; p < len(tt.src); p += nSrc {
|
||||
q := p + tt.srcSize
|
||||
if q > len(tt.src) {
|
||||
q = len(tt.src)
|
||||
}
|
||||
nDst, nSrc, err = tt.t.Transform(buf, []byte(tt.src[p:q]), q == len(tt.src))
|
||||
got = append(got, buf[:nDst]...)
|
||||
|
||||
if p == 0 && err != tt.firstErr {
|
||||
t.Errorf("%d:%s:\n error was %v; want %v", i, tt.desc, err, tt.firstErr)
|
||||
break
|
||||
}
|
||||
}
|
||||
if string(got) != tt.want {
|
||||
t.Errorf("%d:%s:\ngot %+q;\nwant %+q", i, tt.desc, got, tt.want)
|
||||
}
|
||||
testHandover(t, Caser{tt.t}, tt.src)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestSpan(t *testing.T) {
|
||||
for _, tt := range []struct {
|
||||
desc string
|
||||
src string
|
||||
want string
|
||||
atEOF bool
|
||||
err error
|
||||
t Caser
|
||||
}{{
|
||||
desc: "und/upper/basic",
|
||||
src: "abcdefg",
|
||||
want: "",
|
||||
atEOF: true,
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Upper(language.Und),
|
||||
}, {
|
||||
desc: "und/upper/short src",
|
||||
src: "123É"[:4],
|
||||
want: "123",
|
||||
atEOF: false,
|
||||
err: transform.ErrShortSrc,
|
||||
t: Upper(language.Und),
|
||||
}, {
|
||||
desc: "und/upper/no error on short",
|
||||
src: "12",
|
||||
want: "12",
|
||||
atEOF: false,
|
||||
t: Upper(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/basic",
|
||||
src: "ABCDEFG",
|
||||
want: "",
|
||||
atEOF: true,
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/short src num",
|
||||
src: "123é"[:4],
|
||||
want: "123",
|
||||
atEOF: false,
|
||||
err: transform.ErrShortSrc,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/short src greek",
|
||||
src: "αβγé"[:7],
|
||||
want: "αβγ",
|
||||
atEOF: false,
|
||||
err: transform.ErrShortSrc,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/no error on short",
|
||||
src: "12",
|
||||
want: "12",
|
||||
atEOF: false,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/lower/simple (no final sigma)",
|
||||
src: "ος οσσ",
|
||||
want: "οσ οσσ",
|
||||
atEOF: true,
|
||||
t: Lower(language.Und, HandleFinalSigma(false)),
|
||||
}, {
|
||||
desc: "und/title/simple (no final sigma)",
|
||||
src: "Οσ Οσσ",
|
||||
want: "Οσ Οσσ",
|
||||
atEOF: true,
|
||||
t: Title(language.Und, HandleFinalSigma(false)),
|
||||
}, {
|
||||
desc: "und/lower/final sigma: no error",
|
||||
src: "οΣ", // Oς
|
||||
want: "ο", // Oς
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Lower(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: no error",
|
||||
src: "ΟΣ", // Oς
|
||||
want: "Ο", // Oς
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/final sigma: no short source!",
|
||||
src: "ΟσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσΣ",
|
||||
want: "Οσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσ",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/clipped UTF-8 rune",
|
||||
src: "Σσ" + string([]byte{0xCF}),
|
||||
want: "Σσ",
|
||||
atEOF: false,
|
||||
err: transform.ErrShortSrc,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "und/title/clipped UTF-8 rune atEOF",
|
||||
src: "Σσσ" + string([]byte{0xCF}),
|
||||
want: "Σσσ" + string([]byte{0xCF}),
|
||||
atEOF: true,
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
// Note: the choice to change the final sigma at the end in case of
|
||||
// too many case ignorables is arbitrary. The main reason for this
|
||||
// choice is that it results in simpler code.
|
||||
desc: "und/title/long string",
|
||||
src: "A" + strings.Repeat("a", maxIgnorable+5),
|
||||
want: "A" + strings.Repeat("a", maxIgnorable+5),
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
// Note: the choice to change the final sigma at the end in case of
|
||||
// too many case ignorables is arbitrary. The main reason for this
|
||||
// choice is that it results in simpler code.
|
||||
desc: "und/title/cyrillic",
|
||||
src: "При",
|
||||
want: "При",
|
||||
atEOF: true,
|
||||
t: Title(language.Und, HandleFinalSigma(false)),
|
||||
}, {
|
||||
// Note: the choice to change the final sigma at the end in case of
|
||||
// too many case ignorables is arbitrary. The main reason for this
|
||||
// choice is that it results in simpler code.
|
||||
desc: "und/title/final sigma: max ignorables",
|
||||
src: "Οσ" + strings.Repeat(".", maxIgnorable) + "A",
|
||||
want: "Οσ" + strings.Repeat(".", maxIgnorable) + "A",
|
||||
t: Title(language.Und),
|
||||
}, {
|
||||
desc: "el/upper/max ignorables - not implemented",
|
||||
src: "Ο" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0313",
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Upper(language.Greek),
|
||||
}, {
|
||||
desc: "el/upper/too many ignorables - not implemented",
|
||||
src: "Ο" + strings.Repeat("\u0321", maxIgnorable) + "\u0313",
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Upper(language.Greek),
|
||||
}, {
|
||||
desc: "el/upper/short dst",
|
||||
src: "123ο",
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Upper(language.Greek),
|
||||
}, {
|
||||
desc: "lt/lower/max ignorables",
|
||||
src: "i" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0307\u0300",
|
||||
want: "i" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0307\u0300",
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/lower/isLower",
|
||||
src: "I" + strings.Repeat("\u0321", maxIgnorable) + "\u0300",
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/lower/not identical",
|
||||
src: "aaaaa\u00cc", // U+00CC LATIN CAPITAL LETTER I GRAVE
|
||||
err: transform.ErrEndOfSpan,
|
||||
want: "aaaaa",
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/lower/identical",
|
||||
src: "aaaai\u0307\u0300", // U+00CC LATIN CAPITAL LETTER I GRAVE
|
||||
want: "aaaai\u0307\u0300",
|
||||
t: Lower(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/upper/not implemented",
|
||||
src: "I" + strings.Repeat("\u0321", maxIgnorable-1) + "\u0300",
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Upper(language.Lithuanian),
|
||||
}, {
|
||||
desc: "lt/upper/not implemented, ascii",
|
||||
src: "AB",
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Upper(language.Lithuanian),
|
||||
}, {
|
||||
desc: "nl/title/pre-IJ cutoff",
|
||||
src: " IJ",
|
||||
want: " IJ",
|
||||
t: Title(language.Dutch),
|
||||
}, {
|
||||
desc: "nl/title/mid-IJ cutoff",
|
||||
src: " Ia",
|
||||
want: " Ia",
|
||||
t: Title(language.Dutch),
|
||||
}, {
|
||||
desc: "af/title/apostrophe",
|
||||
src: "'n Bietje",
|
||||
want: "'n Bietje",
|
||||
t: Title(language.Afrikaans),
|
||||
}, {
|
||||
desc: "af/title/apostrophe-incorrect",
|
||||
src: "'N Bietje",
|
||||
// The Single_Quote (a MidWord), needs to be retained as unspanned so
|
||||
// that a successive call to Transform can detect that N should not be
|
||||
// capitalized.
|
||||
want: "",
|
||||
err: transform.ErrEndOfSpan,
|
||||
t: Title(language.Afrikaans),
|
||||
}} {
|
||||
testtext.Run(t, tt.desc, func(t *testing.T) {
|
||||
for p := 0; p < len(tt.want); p += utf8.RuneLen([]rune(tt.src[p:])[0]) {
|
||||
tt.t.Reset()
|
||||
n, err := tt.t.Span([]byte(tt.src[:p]), false)
|
||||
if err != nil && err != transform.ErrShortSrc {
|
||||
t.Errorf("early failure:Span(%+q): %v (%d < %d)", tt.src[:p], err, n, len(tt.want))
|
||||
break
|
||||
}
|
||||
}
|
||||
tt.t.Reset()
|
||||
n, err := tt.t.Span([]byte(tt.src), tt.atEOF)
|
||||
if n != len(tt.want) || err != tt.err {
|
||||
t.Errorf("Span(%+q, %v): got %d, %v; want %d, %v", tt.src, tt.atEOF, n, err, len(tt.want), tt.err)
|
||||
}
|
||||
testHandover(t, tt.t, tt.src)
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
var txtASCII = strings.Repeat("The quick brown fox jumps over the lazy dog. ", 50)
|
||||
|
||||
// Taken from http://creativecommons.org/licenses/by-sa/3.0/vn/
|
||||
const txt_vn = `Với các điều kiện sau: Ghi nhận công của tác giả. Nếu bạn sử
|
||||
dụng, chuyển đổi, hoặc xây dựng dự án từ nội dung được chia sẻ này, bạn phải áp
|
||||
dụng giấy phép này hoặc một giấy phép khác có các điều khoản tương tự như giấy
|
||||
phép này cho dự án của bạn. Hiểu rằng: Miễn — Bất kỳ các điều kiện nào trên đây
|
||||
cũng có thể được miễn bỏ nếu bạn được sự cho phép của người sở hữu bản quyền.
|
||||
Phạm vi công chúng — Khi tác phẩm hoặc bất kỳ chương nào của tác phẩm đã trong
|
||||
vùng dành cho công chúng theo quy định của pháp luật thì tình trạng của nó không
|
||||
bị ảnh hưởng bởi giấy phép trong bất kỳ trường hợp nào.`
|
||||
|
||||
// http://creativecommons.org/licenses/by-sa/2.5/cn/
|
||||
const txt_cn = `您可以自由: 复制、发行、展览、表演、放映、
|
||||
广播或通过信息网络传播本作品 创作演绎作品
|
||||
对本作品进行商业性使用 惟须遵守下列条件:
|
||||
署名 — 您必须按照作者或者许可人指定的方式对作品进行署名。
|
||||
相同方式共享 — 如果您改变、转换本作品或者以本作品为基础进行创作,
|
||||
您只能采用与本协议相同的许可协议发布基于本作品的演绎作品。`
|
||||
|
||||
// Taken from http://creativecommons.org/licenses/by-sa/1.0/deed.ru
|
||||
const txt_ru = `При обязательном соблюдении следующих условий: Attribution — Вы
|
||||
должны атрибутировать произведение (указывать автора и источник) в порядке,
|
||||
предусмотренном автором или лицензиаром (но только так, чтобы никоим образом не
|
||||
подразумевалось, что они поддерживают вас или использование вами данного
|
||||
произведения). Υπό τις ακόλουθες προϋποθέσεις:`
|
||||
|
||||
// Taken from http://creativecommons.org/licenses/by-sa/3.0/gr/
|
||||
const txt_gr = `Αναφορά Δημιουργού — Θα πρέπει να κάνετε την αναφορά στο έργο με
|
||||
τον τρόπο που έχει οριστεί από το δημιουργό ή το χορηγούντο την άδεια (χωρίς
|
||||
όμως να εννοείται με οποιονδήποτε τρόπο ότι εγκρίνουν εσάς ή τη χρήση του έργου
|
||||
από εσάς). Παρόμοια Διανομή — Εάν αλλοιώσετε, τροποποιήσετε ή δημιουργήσετε
|
||||
περαιτέρω βασισμένοι στο έργο θα μπορείτε να διανέμετε το έργο που θα προκύψει
|
||||
μόνο με την ίδια ή παρόμοια άδεια.`
|
||||
|
||||
const txtNonASCII = txt_vn + txt_cn + txt_ru + txt_gr
|
||||
|
||||
// TODO: Improve ASCII performance.
|
||||
|
||||
func BenchmarkCasers(b *testing.B) {
|
||||
for _, s := range []struct{ name, text string }{
|
||||
{"ascii", txtASCII},
|
||||
{"nonASCII", txtNonASCII},
|
||||
{"short", "При"},
|
||||
} {
|
||||
src := []byte(s.text)
|
||||
// Measure case mappings in bytes package for comparison.
|
||||
for _, f := range []struct {
|
||||
name string
|
||||
fn func(b []byte) []byte
|
||||
}{
|
||||
{"lower", bytes.ToLower},
|
||||
{"title", bytes.ToTitle},
|
||||
{"upper", bytes.ToUpper},
|
||||
} {
|
||||
testtext.Bench(b, path.Join(s.name, "bytes", f.name), func(b *testing.B) {
|
||||
b.SetBytes(int64(len(src)))
|
||||
for i := 0; i < b.N; i++ {
|
||||
f.fn(src)
|
||||
}
|
||||
})
|
||||
}
|
||||
for _, t := range []struct {
|
||||
name string
|
||||
caser transform.SpanningTransformer
|
||||
}{
|
||||
{"fold/default", Fold()},
|
||||
{"upper/default", Upper(language.Und)},
|
||||
{"lower/sigma", Lower(language.Und)},
|
||||
{"lower/simple", Lower(language.Und, HandleFinalSigma(false))},
|
||||
{"title/sigma", Title(language.Und)},
|
||||
{"title/simple", Title(language.Und, HandleFinalSigma(false))},
|
||||
} {
|
||||
c := Caser{t.caser}
|
||||
dst := make([]byte, len(src))
|
||||
testtext.Bench(b, path.Join(s.name, t.name, "transform"), func(b *testing.B) {
|
||||
b.SetBytes(int64(len(src)))
|
||||
for i := 0; i < b.N; i++ {
|
||||
c.Reset()
|
||||
c.Transform(dst, src, true)
|
||||
}
|
||||
})
|
||||
// No need to check span for simple cases, as they will be the same
|
||||
// as sigma.
|
||||
if strings.HasSuffix(t.name, "/simple") {
|
||||
continue
|
||||
}
|
||||
spanSrc := c.Bytes(src)
|
||||
testtext.Bench(b, path.Join(s.name, t.name, "span"), func(b *testing.B) {
|
||||
c.Reset()
|
||||
if n, _ := c.Span(spanSrc, true); n < len(spanSrc) {
|
||||
b.Fatalf("spanner is not recognizing text %q as done (at %d)", spanSrc, n)
|
||||
}
|
||||
b.SetBytes(int64(len(spanSrc)))
|
||||
for i := 0; i < b.N; i++ {
|
||||
c.Reset()
|
||||
c.Span(spanSrc, true)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
606
vendor/golang.org/x/text/cases/tables.go
generated
vendored
606
vendor/golang.org/x/text/cases/tables.go
generated
vendored
File diff suppressed because it is too large
Load diff
1158
vendor/golang.org/x/text/cases/tables_test.go
generated
vendored
Normal file
1158
vendor/golang.org/x/text/cases/tables_test.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
35
vendor/golang.org/x/text/cmd/gotext/doc.go
generated
vendored
Normal file
35
vendor/golang.org/x/text/cmd/gotext/doc.go
generated
vendored
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// DO NOT EDIT THIS FILE. GENERATED BY go generate.
|
||||
// Edit the documentation in other files and rerun go generate to generate this one.
|
||||
|
||||
// gotext is a tool for managing text in Go source code.
|
||||
//
|
||||
// Usage:
|
||||
//
|
||||
// gotext command [arguments]
|
||||
//
|
||||
// The commands are:
|
||||
//
|
||||
// extract extract strings to be translated from code
|
||||
//
|
||||
// Use "go help [command]" for more information about a command.
|
||||
//
|
||||
// Additional help topics:
|
||||
//
|
||||
//
|
||||
// Use "gotext help [topic]" for more information about that topic.
|
||||
//
|
||||
//
|
||||
// Extract strings to be translated from code
|
||||
//
|
||||
// Usage:
|
||||
//
|
||||
// go extract <package>*
|
||||
//
|
||||
//
|
||||
//
|
||||
//
|
||||
package main
|
||||
195
vendor/golang.org/x/text/cmd/gotext/extract.go
generated
vendored
Normal file
195
vendor/golang.org/x/text/cmd/gotext/extract.go
generated
vendored
Normal file
|
|
@ -0,0 +1,195 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"go/ast"
|
||||
"go/build"
|
||||
"go/constant"
|
||||
"go/format"
|
||||
"go/parser"
|
||||
"go/types"
|
||||
"io/ioutil"
|
||||
"os"
|
||||
"path"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/tools/go/loader"
|
||||
)
|
||||
|
||||
// TODO:
|
||||
// - merge information into existing files
|
||||
// - handle different file formats (PO, XLIFF)
|
||||
// - handle features (gender, plural)
|
||||
// - message rewriting
|
||||
|
||||
var cmdExtract = &Command{
|
||||
Run: runExtract,
|
||||
UsageLine: "extract <package>*",
|
||||
Short: "extract strings to be translated from code",
|
||||
}
|
||||
|
||||
func runExtract(cmd *Command, args []string) error {
|
||||
if len(args) == 0 {
|
||||
args = []string{"."}
|
||||
}
|
||||
|
||||
conf := loader.Config{
|
||||
Build: &build.Default,
|
||||
ParserMode: parser.ParseComments,
|
||||
}
|
||||
|
||||
// Use the initial packages from the command line.
|
||||
args, err := conf.FromArgs(args, false)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Load, parse and type-check the whole program.
|
||||
iprog, err := conf.Load()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// print returns Go syntax for the specified node.
|
||||
print := func(n ast.Node) string {
|
||||
var buf bytes.Buffer
|
||||
format.Node(&buf, conf.Fset, n)
|
||||
return buf.String()
|
||||
}
|
||||
|
||||
var translations []Translation
|
||||
|
||||
for _, info := range iprog.InitialPackages() {
|
||||
for _, f := range info.Files {
|
||||
// Associate comments with nodes.
|
||||
cmap := ast.NewCommentMap(iprog.Fset, f, f.Comments)
|
||||
getComment := func(n ast.Node) string {
|
||||
cs := cmap.Filter(n).Comments()
|
||||
if len(cs) > 0 {
|
||||
return strings.TrimSpace(cs[0].Text())
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// Find function calls.
|
||||
ast.Inspect(f, func(n ast.Node) bool {
|
||||
call, ok := n.(*ast.CallExpr)
|
||||
if !ok {
|
||||
return true
|
||||
}
|
||||
|
||||
// Skip calls of functions other than
|
||||
// (*message.Printer).{Sp,Fp,P}rintf.
|
||||
sel, ok := call.Fun.(*ast.SelectorExpr)
|
||||
if !ok {
|
||||
return true
|
||||
}
|
||||
meth := info.Selections[sel]
|
||||
if meth == nil || meth.Kind() != types.MethodVal {
|
||||
return true
|
||||
}
|
||||
// TODO: remove cheap hack and check if the type either
|
||||
// implements some interface or is specifically of type
|
||||
// "golang.org/x/text/message".Printer.
|
||||
m, ok := extractFuncs[path.Base(meth.Recv().String())]
|
||||
if !ok {
|
||||
return true
|
||||
}
|
||||
|
||||
// argn is the index of the format string.
|
||||
argn, ok := m[meth.Obj().Name()]
|
||||
if !ok || argn >= len(call.Args) {
|
||||
return true
|
||||
}
|
||||
|
||||
// Skip calls with non-constant format string.
|
||||
fmtstr := info.Types[call.Args[argn]].Value
|
||||
if fmtstr == nil || fmtstr.Kind() != constant.String {
|
||||
return true
|
||||
}
|
||||
|
||||
posn := conf.Fset.Position(call.Lparen)
|
||||
filepos := fmt.Sprintf("%s:%d:%d", filepath.Base(posn.Filename), posn.Line, posn.Column)
|
||||
|
||||
// TODO: identify the type of the format argument. If it is not
|
||||
// a string, multiple keys may be defined.
|
||||
var key []string
|
||||
|
||||
// TODO: replace substitutions (%v) with a translator friendly
|
||||
// notation. For instance:
|
||||
// "%d files remaining" -> "{numFiles} files remaining", or
|
||||
// "%d files remaining" -> "{arg1} files remaining"
|
||||
// Alternatively, this could be done at a later stage.
|
||||
msg := constant.StringVal(fmtstr)
|
||||
|
||||
// Construct a Translation unit.
|
||||
c := Translation{
|
||||
Key: key,
|
||||
Position: filepath.Join(info.Pkg.Path(), filepos),
|
||||
Original: Text{Msg: msg},
|
||||
ExtractedComment: getComment(call.Args[0]),
|
||||
// TODO(fix): this doesn't get the before comment.
|
||||
// Comment: getComment(call),
|
||||
}
|
||||
|
||||
for i, arg := range call.Args[argn+1:] {
|
||||
var val string
|
||||
if v := info.Types[arg].Value; v != nil {
|
||||
val = v.ExactString()
|
||||
}
|
||||
posn := conf.Fset.Position(arg.Pos())
|
||||
filepos := fmt.Sprintf("%s:%d:%d", filepath.Base(posn.Filename), posn.Line, posn.Column)
|
||||
c.Args = append(c.Args, Argument{
|
||||
ID: i + 1,
|
||||
Type: info.Types[arg].Type.String(),
|
||||
UnderlyingType: info.Types[arg].Type.Underlying().String(),
|
||||
Expr: print(arg),
|
||||
Value: val,
|
||||
Comment: getComment(arg),
|
||||
Position: filepath.Join(info.Pkg.Path(), filepos),
|
||||
// TODO report whether it implements
|
||||
// interfaces plural.Interface,
|
||||
// gender.Interface.
|
||||
})
|
||||
}
|
||||
|
||||
translations = append(translations, c)
|
||||
return true
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
data, err := json.MarshalIndent(translations, "", " ")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
for _, tag := range getLangs() {
|
||||
// TODO: merge with existing files, don't overwrite.
|
||||
os.MkdirAll(*dir, 0744)
|
||||
file := filepath.Join(*dir, fmt.Sprintf("gotext_%v.out.json", tag))
|
||||
if err := ioutil.WriteFile(file, data, 0744); err != nil {
|
||||
return fmt.Errorf("could not create file: %v", err)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractFuncs indicates the types and methods for which to extract strings,
|
||||
// and which argument to extract.
|
||||
// TODO: use the types in conf.Import("golang.org/x/text/message") to extract
|
||||
// the correct instances.
|
||||
var extractFuncs = map[string]map[string]int{
|
||||
// TODO: Printer -> *golang.org/x/text/message.Printer
|
||||
"message.Printer": {
|
||||
"Printf": 0,
|
||||
"Sprintf": 0,
|
||||
"Fprintf": 1,
|
||||
},
|
||||
}
|
||||
356
vendor/golang.org/x/text/cmd/gotext/main.go
generated
vendored
Normal file
356
vendor/golang.org/x/text/cmd/gotext/main.go
generated
vendored
Normal file
|
|
@ -0,0 +1,356 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go build -o gotext.latest
|
||||
//go:generate ./gotext.latest help gendocumentation
|
||||
//go:generate rm gotext.latest
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"bytes"
|
||||
"flag"
|
||||
"fmt"
|
||||
"go/build"
|
||||
"go/format"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"os"
|
||||
"strings"
|
||||
"sync"
|
||||
"text/template"
|
||||
"unicode"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/tools/go/buildutil"
|
||||
)
|
||||
|
||||
func init() {
|
||||
flag.Var((*buildutil.TagsFlag)(&build.Default.BuildTags), "tags", buildutil.TagsFlagDoc)
|
||||
}
|
||||
|
||||
var (
|
||||
dir = flag.String("dir", "textdata", "default subdirectory to store translation files")
|
||||
langs = flag.String("lang", "en", "comma-separated list of languages to process")
|
||||
)
|
||||
|
||||
// NOTE: the Command struct is copied from the go tool in core.
|
||||
|
||||
// A Command is an implementation of a go command
|
||||
// like go build or go fix.
|
||||
type Command struct {
|
||||
// Run runs the command.
|
||||
// The args are the arguments after the command name.
|
||||
Run func(cmd *Command, args []string) error
|
||||
|
||||
// UsageLine is the one-line usage message.
|
||||
// The first word in the line is taken to be the command name.
|
||||
UsageLine string
|
||||
|
||||
// Short is the short description shown in the 'go help' output.
|
||||
Short string
|
||||
|
||||
// Long is the long message shown in the 'go help <this-command>' output.
|
||||
Long string
|
||||
|
||||
// Flag is a set of flags specific to this command.
|
||||
Flag flag.FlagSet
|
||||
}
|
||||
|
||||
// Name returns the command's name: the first word in the usage line.
|
||||
func (c *Command) Name() string {
|
||||
name := c.UsageLine
|
||||
i := strings.Index(name, " ")
|
||||
if i >= 0 {
|
||||
name = name[:i]
|
||||
}
|
||||
return name
|
||||
}
|
||||
|
||||
func (c *Command) Usage() {
|
||||
fmt.Fprintf(os.Stderr, "usage: %s\n\n", c.UsageLine)
|
||||
fmt.Fprintf(os.Stderr, "%s\n", strings.TrimSpace(c.Long))
|
||||
os.Exit(2)
|
||||
}
|
||||
|
||||
// Runnable reports whether the command can be run; otherwise
|
||||
// it is a documentation pseudo-command such as importpath.
|
||||
func (c *Command) Runnable() bool {
|
||||
return c.Run != nil
|
||||
}
|
||||
|
||||
// Commands lists the available commands and help topics.
|
||||
// The order here is the order in which they are printed by 'go help'.
|
||||
var commands = []*Command{
|
||||
cmdExtract,
|
||||
// TODO:
|
||||
// - generate code from translations.
|
||||
// - update: full-cycle update of extraction, sending, and integration
|
||||
// - report: report of freshness of translations
|
||||
}
|
||||
|
||||
var exitStatus = 0
|
||||
var exitMu sync.Mutex
|
||||
|
||||
func setExitStatus(n int) {
|
||||
exitMu.Lock()
|
||||
if exitStatus < n {
|
||||
exitStatus = n
|
||||
}
|
||||
exitMu.Unlock()
|
||||
}
|
||||
|
||||
var origEnv []string
|
||||
|
||||
func main() {
|
||||
flag.Usage = usage
|
||||
flag.Parse()
|
||||
log.SetFlags(0)
|
||||
|
||||
args := flag.Args()
|
||||
if len(args) < 1 {
|
||||
usage()
|
||||
}
|
||||
|
||||
if args[0] == "help" {
|
||||
help(args[1:])
|
||||
return
|
||||
}
|
||||
|
||||
for _, cmd := range commands {
|
||||
if cmd.Name() == args[0] && cmd.Runnable() {
|
||||
cmd.Flag.Usage = func() { cmd.Usage() }
|
||||
cmd.Flag.Parse(args[1:])
|
||||
args = cmd.Flag.Args()
|
||||
if err := cmd.Run(cmd, args); err != nil {
|
||||
fatalf("gotext: %v", err)
|
||||
}
|
||||
exit()
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
fmt.Fprintf(os.Stderr, "gotext: unknown subcommand %q\nRun 'go help' for usage.\n", args[0])
|
||||
setExitStatus(2)
|
||||
exit()
|
||||
}
|
||||
|
||||
var usageTemplate = `gotext is a tool for managing text in Go source code.
|
||||
|
||||
Usage:
|
||||
|
||||
gotext command [arguments]
|
||||
|
||||
The commands are:
|
||||
{{range .}}{{if .Runnable}}
|
||||
{{.Name | printf "%-11s"}} {{.Short}}{{end}}{{end}}
|
||||
|
||||
Use "go help [command]" for more information about a command.
|
||||
|
||||
Additional help topics:
|
||||
{{range .}}{{if not .Runnable}}
|
||||
{{.Name | printf "%-11s"}} {{.Short}}{{end}}{{end}}
|
||||
|
||||
Use "gotext help [topic]" for more information about that topic.
|
||||
|
||||
`
|
||||
|
||||
var helpTemplate = `{{if .Runnable}}usage: go {{.UsageLine}}
|
||||
|
||||
{{end}}{{.Long | trim}}
|
||||
`
|
||||
|
||||
var documentationTemplate = `{{range .}}{{if .Short}}{{.Short | capitalize}}
|
||||
|
||||
{{end}}{{if .Runnable}}Usage:
|
||||
|
||||
go {{.UsageLine}}
|
||||
|
||||
{{end}}{{.Long | trim}}
|
||||
|
||||
|
||||
{{end}}`
|
||||
|
||||
// commentWriter writes a Go comment to the underlying io.Writer,
|
||||
// using line comment form (//).
|
||||
type commentWriter struct {
|
||||
W io.Writer
|
||||
wroteSlashes bool // Wrote "//" at the beginning of the current line.
|
||||
}
|
||||
|
||||
func (c *commentWriter) Write(p []byte) (int, error) {
|
||||
var n int
|
||||
for i, b := range p {
|
||||
if !c.wroteSlashes {
|
||||
s := "//"
|
||||
if b != '\n' {
|
||||
s = "// "
|
||||
}
|
||||
if _, err := io.WriteString(c.W, s); err != nil {
|
||||
return n, err
|
||||
}
|
||||
c.wroteSlashes = true
|
||||
}
|
||||
n0, err := c.W.Write(p[i : i+1])
|
||||
n += n0
|
||||
if err != nil {
|
||||
return n, err
|
||||
}
|
||||
if b == '\n' {
|
||||
c.wroteSlashes = false
|
||||
}
|
||||
}
|
||||
return len(p), nil
|
||||
}
|
||||
|
||||
// An errWriter wraps a writer, recording whether a write error occurred.
|
||||
type errWriter struct {
|
||||
w io.Writer
|
||||
err error
|
||||
}
|
||||
|
||||
func (w *errWriter) Write(b []byte) (int, error) {
|
||||
n, err := w.w.Write(b)
|
||||
if err != nil {
|
||||
w.err = err
|
||||
}
|
||||
return n, err
|
||||
}
|
||||
|
||||
// tmpl executes the given template text on data, writing the result to w.
|
||||
func tmpl(w io.Writer, text string, data interface{}) {
|
||||
t := template.New("top")
|
||||
t.Funcs(template.FuncMap{"trim": strings.TrimSpace, "capitalize": capitalize})
|
||||
template.Must(t.Parse(text))
|
||||
ew := &errWriter{w: w}
|
||||
err := t.Execute(ew, data)
|
||||
if ew.err != nil {
|
||||
// I/O error writing. Ignore write on closed pipe.
|
||||
if strings.Contains(ew.err.Error(), "pipe") {
|
||||
os.Exit(1)
|
||||
}
|
||||
fatalf("writing output: %v", ew.err)
|
||||
}
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
|
||||
func capitalize(s string) string {
|
||||
if s == "" {
|
||||
return s
|
||||
}
|
||||
r, n := utf8.DecodeRuneInString(s)
|
||||
return string(unicode.ToTitle(r)) + s[n:]
|
||||
}
|
||||
|
||||
func printUsage(w io.Writer) {
|
||||
bw := bufio.NewWriter(w)
|
||||
tmpl(bw, usageTemplate, commands)
|
||||
bw.Flush()
|
||||
}
|
||||
|
||||
func usage() {
|
||||
printUsage(os.Stderr)
|
||||
os.Exit(2)
|
||||
}
|
||||
|
||||
// help implements the 'help' command.
|
||||
func help(args []string) {
|
||||
if len(args) == 0 {
|
||||
printUsage(os.Stdout)
|
||||
// not exit 2: succeeded at 'go help'.
|
||||
return
|
||||
}
|
||||
if len(args) != 1 {
|
||||
fmt.Fprintf(os.Stderr, "usage: go help command\n\nToo many arguments given.\n")
|
||||
os.Exit(2) // failed at 'go help'
|
||||
}
|
||||
|
||||
arg := args[0]
|
||||
|
||||
// 'go help documentation' generates doc.go.
|
||||
if strings.HasSuffix(arg, "documentation") {
|
||||
w := &bytes.Buffer{}
|
||||
|
||||
fmt.Fprintln(w, "// Copyright 2016 The Go Authors. All rights reserved.")
|
||||
fmt.Fprintln(w, "// Use of this source code is governed by a BSD-style")
|
||||
fmt.Fprintln(w, "// license that can be found in the LICENSE file.")
|
||||
fmt.Fprintln(w)
|
||||
fmt.Fprintln(w, "// DO NOT EDIT THIS FILE. GENERATED BY go generate.")
|
||||
fmt.Fprintln(w, "// Edit the documentation in other files and rerun go generate to generate this one.")
|
||||
fmt.Fprintln(w)
|
||||
buf := new(bytes.Buffer)
|
||||
printUsage(buf)
|
||||
usage := &Command{Long: buf.String()}
|
||||
tmpl(&commentWriter{W: w}, documentationTemplate, append([]*Command{usage}, commands...))
|
||||
fmt.Fprintln(w, "package main")
|
||||
if arg == "gendocumentation" {
|
||||
b, err := format.Source(w.Bytes())
|
||||
if err != nil {
|
||||
errorf("Could not format generated docs: %v\n", err)
|
||||
}
|
||||
if err := ioutil.WriteFile("doc.go", b, 0666); err != nil {
|
||||
errorf("Could not create file alldocs.go: %v\n", err)
|
||||
}
|
||||
} else {
|
||||
fmt.Println(w.String())
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
for _, cmd := range commands {
|
||||
if cmd.Name() == arg {
|
||||
tmpl(os.Stdout, helpTemplate, cmd)
|
||||
// not exit 2: succeeded at 'go help cmd'.
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
fmt.Fprintf(os.Stderr, "Unknown help topic %#q. Run 'go help'.\n", arg)
|
||||
os.Exit(2) // failed at 'go help cmd'
|
||||
}
|
||||
|
||||
func getLangs() (tags []language.Tag) {
|
||||
for _, t := range strings.Split(*langs, ",") {
|
||||
tag, err := language.Parse(t)
|
||||
if err != nil {
|
||||
fatalf("gotext: could not parse language %q: %v", t, err)
|
||||
}
|
||||
tags = append(tags, tag)
|
||||
}
|
||||
return tags
|
||||
}
|
||||
|
||||
var atexitFuncs []func()
|
||||
|
||||
func atexit(f func()) {
|
||||
atexitFuncs = append(atexitFuncs, f)
|
||||
}
|
||||
|
||||
func exit() {
|
||||
for _, f := range atexitFuncs {
|
||||
f()
|
||||
}
|
||||
os.Exit(exitStatus)
|
||||
}
|
||||
|
||||
func fatalf(format string, args ...interface{}) {
|
||||
errorf(format, args...)
|
||||
exit()
|
||||
}
|
||||
|
||||
func errorf(format string, args ...interface{}) {
|
||||
log.Printf(format, args...)
|
||||
setExitStatus(1)
|
||||
}
|
||||
|
||||
func exitIfErrors() {
|
||||
if exitStatus != 0 {
|
||||
exit()
|
||||
}
|
||||
}
|
||||
127
vendor/golang.org/x/text/cmd/gotext/message.go
generated
vendored
Normal file
127
vendor/golang.org/x/text/cmd/gotext/message.go
generated
vendored
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package main
|
||||
|
||||
// TODO: these definitions should be moved to a package so that the can be used
|
||||
// by other tools.
|
||||
|
||||
// The file contains the structures used to define translations of a certain
|
||||
// messages.
|
||||
//
|
||||
// A translation may have multiple translations strings, or messages, depending
|
||||
// on the feature values of the various arguments. For instance, consider
|
||||
// a hypothetical translation from English to English, where the source defines
|
||||
// the format string "%d file(s) remaining". A completed translation, expressed
|
||||
// in JS, for this format string could look like:
|
||||
//
|
||||
// {
|
||||
// "Key": [
|
||||
// "\"%d files(s) remaining\""
|
||||
// ],
|
||||
// "Original": {
|
||||
// "Msg": "\"%d files(s) remaining\""
|
||||
// },
|
||||
// "Translation": {
|
||||
// "Select": {
|
||||
// "Feature": "plural",
|
||||
// "Arg": 1,
|
||||
// "Case": {
|
||||
// "one": { "Msg": "1 file remaining" },
|
||||
// "other": { "Msg": "%d files remaining" }
|
||||
// },
|
||||
// },
|
||||
// },
|
||||
// "Args": [
|
||||
// {
|
||||
// "ID": 2,
|
||||
// "Type": "int",
|
||||
// "UnderlyingType": "int",
|
||||
// "Expr": "nFiles",
|
||||
// "Comment": "number of files remaining",
|
||||
// "Position": "golang.org/x/text/cmd/gotext/demo.go:34:3"
|
||||
// }
|
||||
// ],
|
||||
// "Position": "golang.org/x/text/cmd/gotext/demo.go:33:10",
|
||||
// }
|
||||
//
|
||||
// Alternatively, the Translation section could be written as:
|
||||
//
|
||||
// "Translation": {
|
||||
// "Msg": "%d %[files]s remaining",
|
||||
// "Var": {
|
||||
// "files" : {
|
||||
// "Select": {
|
||||
// "Feature": "plural",
|
||||
// "Arg": 1,
|
||||
// "Case": {
|
||||
// "one": { "Msg": "file" },
|
||||
// "other": { "Msg": "files" }
|
||||
// }
|
||||
// }
|
||||
// }
|
||||
// }
|
||||
// }
|
||||
|
||||
// A Translation describes a translation for a single language for a single
|
||||
// message.
|
||||
type Translation struct {
|
||||
// Key contains a list of identifiers for the message. If this list is empty
|
||||
// Original is used as the key.
|
||||
Key []string `json:"key,omitempty"`
|
||||
Original Text `json:"original"`
|
||||
Translation Text `json:"translation"`
|
||||
ExtractedComment string `json:"extractedComment,omitempty"`
|
||||
TranslatorComment string `json:"translatorComment,omitempty"`
|
||||
|
||||
Args []Argument `json:"args,omitempty"`
|
||||
|
||||
// Extraction information.
|
||||
Position string `json:"position,omitempty"` // filePosition:line
|
||||
}
|
||||
|
||||
// An Argument contains information about the arguments passed to a message.
|
||||
type Argument struct {
|
||||
ID interface{} `json:"id"` // An int for printf-style calls, but could be a string.
|
||||
Type string `json:"type"`
|
||||
UnderlyingType string `json:"underlyingType"`
|
||||
Expr string `json:"expr"`
|
||||
Value string `json:"value,omitempty"`
|
||||
Comment string `json:"comment,omitempty"`
|
||||
Position string `json:"position,omitempty"`
|
||||
|
||||
// Features contains the features that are available for the implementation
|
||||
// of this argument.
|
||||
Features []Feature `json:"features,omitempty"`
|
||||
}
|
||||
|
||||
// Feature holds information about a feature that can be implemented by
|
||||
// an Argument.
|
||||
type Feature struct {
|
||||
Type string `json:"type"` // Right now this is only gender and plural.
|
||||
|
||||
// TODO: possible values and examples for the language under consideration.
|
||||
|
||||
}
|
||||
|
||||
// Text defines a message to be displayed.
|
||||
type Text struct {
|
||||
// Msg and Select contains the message to be displayed. Within a Text value
|
||||
// either Msg or Select is defined.
|
||||
Msg string `json:"msg,omitempty"`
|
||||
Select *Select `json:"select,omitempty"`
|
||||
// Var defines a map of variables that may be substituted in the selected
|
||||
// message.
|
||||
Var map[string]Text `json:"var,omitempty"`
|
||||
// Example contains an example message formatted with default values.
|
||||
Example string `json:"example,omitempty"`
|
||||
}
|
||||
|
||||
// Type Select selects a Text based on the feature value associated with
|
||||
// a feature of a certain argument.
|
||||
type Select struct {
|
||||
Feature string `json:"feature"` // Name of variable or Feature type
|
||||
Arg interface{} `json:"arg"` // The argument ID.
|
||||
Cases map[string]Text `json:"cases"`
|
||||
}
|
||||
1
vendor/golang.org/x/text/codereview.cfg
generated
vendored
Normal file
1
vendor/golang.org/x/text/codereview.cfg
generated
vendored
Normal file
|
|
@ -0,0 +1 @@
|
|||
issuerepo: golang/go
|
||||
702
vendor/golang.org/x/text/collate/build/builder.go
generated
vendored
Normal file
702
vendor/golang.org/x/text/collate/build/builder.go
generated
vendored
Normal file
|
|
@ -0,0 +1,702 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build // import "golang.org/x/text/collate/build"
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"sort"
|
||||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
// TODO: optimizations:
|
||||
// - expandElem is currently 20K. By putting unique colElems in a separate
|
||||
// table and having a byte array of indexes into this table, we can reduce
|
||||
// the total size to about 7K. By also factoring out the length bytes, we
|
||||
// can reduce this to about 6K.
|
||||
// - trie valueBlocks are currently 100K. There are a lot of sparse blocks
|
||||
// and many consecutive values with the same stride. This can be further
|
||||
// compacted.
|
||||
// - Compress secondary weights into 8 bits.
|
||||
// - Some LDML specs specify a context element. Currently we simply concatenate
|
||||
// those. Context can be implemented using the contraction trie. If Builder
|
||||
// could analyze and detect when using a context makes sense, there is no
|
||||
// need to expose this construct in the API.
|
||||
|
||||
// A Builder builds a root collation table. The user must specify the
|
||||
// collation elements for each entry. A common use will be to base the weights
|
||||
// on those specified in the allkeys* file as provided by the UCA or CLDR.
|
||||
type Builder struct {
|
||||
index *trieBuilder
|
||||
root ordering
|
||||
locale []*Tailoring
|
||||
t *table
|
||||
err error
|
||||
built bool
|
||||
|
||||
minNonVar int // lowest primary recorded for a variable
|
||||
varTop int // highest primary recorded for a non-variable
|
||||
|
||||
// indexes used for reusing expansions and contractions
|
||||
expIndex map[string]int // positions of expansions keyed by their string representation
|
||||
ctHandle map[string]ctHandle // contraction handles keyed by a concatenation of the suffixes
|
||||
ctElem map[string]int // contraction elements keyed by their string representation
|
||||
}
|
||||
|
||||
// A Tailoring builds a collation table based on another collation table.
|
||||
// The table is defined by specifying tailorings to the underlying table.
|
||||
// See http://unicode.org/reports/tr35/ for an overview of tailoring
|
||||
// collation tables. The CLDR contains pre-defined tailorings for a variety
|
||||
// of languages (See http://www.unicode.org/Public/cldr/<version>/core.zip.)
|
||||
type Tailoring struct {
|
||||
id string
|
||||
builder *Builder
|
||||
index *ordering
|
||||
|
||||
anchor *entry
|
||||
before bool
|
||||
}
|
||||
|
||||
// NewBuilder returns a new Builder.
|
||||
func NewBuilder() *Builder {
|
||||
return &Builder{
|
||||
index: newTrieBuilder(),
|
||||
root: makeRootOrdering(),
|
||||
expIndex: make(map[string]int),
|
||||
ctHandle: make(map[string]ctHandle),
|
||||
ctElem: make(map[string]int),
|
||||
}
|
||||
}
|
||||
|
||||
// Tailoring returns a Tailoring for the given locale. One should
|
||||
// have completed all calls to Add before calling Tailoring.
|
||||
func (b *Builder) Tailoring(loc language.Tag) *Tailoring {
|
||||
t := &Tailoring{
|
||||
id: loc.String(),
|
||||
builder: b,
|
||||
index: b.root.clone(),
|
||||
}
|
||||
t.index.id = t.id
|
||||
b.locale = append(b.locale, t)
|
||||
return t
|
||||
}
|
||||
|
||||
// Add adds an entry to the collation element table, mapping
|
||||
// a slice of runes to a sequence of collation elements.
|
||||
// A collation element is specified as list of weights: []int{primary, secondary, ...}.
|
||||
// The entries are typically obtained from a collation element table
|
||||
// as defined in http://www.unicode.org/reports/tr10/#Data_Table_Format.
|
||||
// Note that the collation elements specified by colelems are only used
|
||||
// as a guide. The actual weights generated by Builder may differ.
|
||||
// The argument variables is a list of indices into colelems that should contain
|
||||
// a value for each colelem that is a variable. (See the reference above.)
|
||||
func (b *Builder) Add(runes []rune, colelems [][]int, variables []int) error {
|
||||
str := string(runes)
|
||||
elems := make([]rawCE, len(colelems))
|
||||
for i, ce := range colelems {
|
||||
if len(ce) == 0 {
|
||||
break
|
||||
}
|
||||
elems[i] = makeRawCE(ce, 0)
|
||||
if len(ce) == 1 {
|
||||
elems[i].w[1] = defaultSecondary
|
||||
}
|
||||
if len(ce) <= 2 {
|
||||
elems[i].w[2] = defaultTertiary
|
||||
}
|
||||
if len(ce) <= 3 {
|
||||
elems[i].w[3] = ce[0]
|
||||
}
|
||||
}
|
||||
for i, ce := range elems {
|
||||
p := ce.w[0]
|
||||
isvar := false
|
||||
for _, j := range variables {
|
||||
if i == j {
|
||||
isvar = true
|
||||
}
|
||||
}
|
||||
if isvar {
|
||||
if p >= b.minNonVar && b.minNonVar > 0 {
|
||||
return fmt.Errorf("primary value %X of variable is larger than the smallest non-variable %X", p, b.minNonVar)
|
||||
}
|
||||
if p > b.varTop {
|
||||
b.varTop = p
|
||||
}
|
||||
} else if p > 1 { // 1 is a special primary value reserved for FFFE
|
||||
if p <= b.varTop {
|
||||
return fmt.Errorf("primary value %X of non-variable is smaller than the highest variable %X", p, b.varTop)
|
||||
}
|
||||
if b.minNonVar == 0 || p < b.minNonVar {
|
||||
b.minNonVar = p
|
||||
}
|
||||
}
|
||||
}
|
||||
elems, err := convertLargeWeights(elems)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
cccs := []uint8{}
|
||||
nfd := norm.NFD.String(str)
|
||||
for i := range nfd {
|
||||
cccs = append(cccs, norm.NFD.PropertiesString(nfd[i:]).CCC())
|
||||
}
|
||||
if len(cccs) < len(elems) {
|
||||
if len(cccs) > 2 {
|
||||
return fmt.Errorf("number of decomposed characters should be greater or equal to the number of collation elements for len(colelems) > 3 (%d < %d)", len(cccs), len(elems))
|
||||
}
|
||||
p := len(elems) - 1
|
||||
for ; p > 0 && elems[p].w[0] == 0; p-- {
|
||||
elems[p].ccc = cccs[len(cccs)-1]
|
||||
}
|
||||
for ; p >= 0; p-- {
|
||||
elems[p].ccc = cccs[0]
|
||||
}
|
||||
} else {
|
||||
for i := range elems {
|
||||
elems[i].ccc = cccs[i]
|
||||
}
|
||||
}
|
||||
// doNorm in collate.go assumes that the following conditions hold.
|
||||
if len(elems) > 1 && len(cccs) > 1 && cccs[0] != 0 && cccs[0] != cccs[len(cccs)-1] {
|
||||
return fmt.Errorf("incompatible CCC values for expansion %X (%d)", runes, cccs)
|
||||
}
|
||||
b.root.newEntry(str, elems)
|
||||
return nil
|
||||
}
|
||||
|
||||
func (t *Tailoring) setAnchor(anchor string) error {
|
||||
anchor = norm.NFC.String(anchor)
|
||||
a := t.index.find(anchor)
|
||||
if a == nil {
|
||||
a = t.index.newEntry(anchor, nil)
|
||||
a.implicit = true
|
||||
a.modified = true
|
||||
for _, r := range []rune(anchor) {
|
||||
e := t.index.find(string(r))
|
||||
e.lock = true
|
||||
}
|
||||
}
|
||||
t.anchor = a
|
||||
return nil
|
||||
}
|
||||
|
||||
// SetAnchor sets the point after which elements passed in subsequent calls to
|
||||
// Insert will be inserted. It is equivalent to the reset directive in an LDML
|
||||
// specification. See Insert for an example.
|
||||
// SetAnchor supports the following logical reset positions:
|
||||
// <first_tertiary_ignorable/>, <last_teriary_ignorable/>, <first_primary_ignorable/>,
|
||||
// and <last_non_ignorable/>.
|
||||
func (t *Tailoring) SetAnchor(anchor string) error {
|
||||
if err := t.setAnchor(anchor); err != nil {
|
||||
return err
|
||||
}
|
||||
t.before = false
|
||||
return nil
|
||||
}
|
||||
|
||||
// SetAnchorBefore is similar to SetAnchor, except that subsequent calls to
|
||||
// Insert will insert entries before the anchor.
|
||||
func (t *Tailoring) SetAnchorBefore(anchor string) error {
|
||||
if err := t.setAnchor(anchor); err != nil {
|
||||
return err
|
||||
}
|
||||
t.before = true
|
||||
return nil
|
||||
}
|
||||
|
||||
// Insert sets the ordering of str relative to the entry set by the previous
|
||||
// call to SetAnchor or Insert. The argument extend corresponds
|
||||
// to the extend elements as defined in LDML. A non-empty value for extend
|
||||
// will cause the collation elements corresponding to extend to be appended
|
||||
// to the collation elements generated for the entry added by Insert.
|
||||
// This has the same net effect as sorting str after the string anchor+extend.
|
||||
// See http://www.unicode.org/reports/tr10/#Tailoring_Example for details
|
||||
// on parametric tailoring and http://unicode.org/reports/tr35/#Collation_Elements
|
||||
// for full details on LDML.
|
||||
//
|
||||
// Examples: create a tailoring for Swedish, where "ä" is ordered after "z"
|
||||
// at the primary sorting level:
|
||||
// t := b.Tailoring("se")
|
||||
// t.SetAnchor("z")
|
||||
// t.Insert(colltab.Primary, "ä", "")
|
||||
// Order "ü" after "ue" at the secondary sorting level:
|
||||
// t.SetAnchor("ue")
|
||||
// t.Insert(colltab.Secondary, "ü","")
|
||||
// or
|
||||
// t.SetAnchor("u")
|
||||
// t.Insert(colltab.Secondary, "ü", "e")
|
||||
// Order "q" afer "ab" at the secondary level and "Q" after "q"
|
||||
// at the tertiary level:
|
||||
// t.SetAnchor("ab")
|
||||
// t.Insert(colltab.Secondary, "q", "")
|
||||
// t.Insert(colltab.Tertiary, "Q", "")
|
||||
// Order "b" before "a":
|
||||
// t.SetAnchorBefore("a")
|
||||
// t.Insert(colltab.Primary, "b", "")
|
||||
// Order "0" after the last primary ignorable:
|
||||
// t.SetAnchor("<last_primary_ignorable/>")
|
||||
// t.Insert(colltab.Primary, "0", "")
|
||||
func (t *Tailoring) Insert(level colltab.Level, str, extend string) error {
|
||||
if t.anchor == nil {
|
||||
return fmt.Errorf("%s:Insert: no anchor point set for tailoring of %s", t.id, str)
|
||||
}
|
||||
str = norm.NFC.String(str)
|
||||
e := t.index.find(str)
|
||||
if e == nil {
|
||||
e = t.index.newEntry(str, nil)
|
||||
} else if e.logical != noAnchor {
|
||||
return fmt.Errorf("%s:Insert: cannot reinsert logical reset position %q", t.id, e.str)
|
||||
}
|
||||
if e.lock {
|
||||
return fmt.Errorf("%s:Insert: cannot reinsert element %q", t.id, e.str)
|
||||
}
|
||||
a := t.anchor
|
||||
// Find the first element after the anchor which differs at a level smaller or
|
||||
// equal to the given level. Then insert at this position.
|
||||
// See http://unicode.org/reports/tr35/#Collation_Elements, Section 5.14.5 for details.
|
||||
e.before = t.before
|
||||
if t.before {
|
||||
t.before = false
|
||||
if a.prev == nil {
|
||||
a.insertBefore(e)
|
||||
} else {
|
||||
for a = a.prev; a.level > level; a = a.prev {
|
||||
}
|
||||
a.insertAfter(e)
|
||||
}
|
||||
e.level = level
|
||||
} else {
|
||||
for ; a.level > level; a = a.next {
|
||||
}
|
||||
e.level = a.level
|
||||
if a != e {
|
||||
a.insertAfter(e)
|
||||
a.level = level
|
||||
} else {
|
||||
// We don't set a to prev itself. This has the effect of the entry
|
||||
// getting new collation elements that are an increment of itself.
|
||||
// This is intentional.
|
||||
a.prev.level = level
|
||||
}
|
||||
}
|
||||
e.extend = norm.NFD.String(extend)
|
||||
e.exclude = false
|
||||
e.modified = true
|
||||
e.elems = nil
|
||||
t.anchor = e
|
||||
return nil
|
||||
}
|
||||
|
||||
func (o *ordering) getWeight(e *entry) []rawCE {
|
||||
if len(e.elems) == 0 && e.logical == noAnchor {
|
||||
if e.implicit {
|
||||
for _, r := range e.runes {
|
||||
e.elems = append(e.elems, o.getWeight(o.find(string(r)))...)
|
||||
}
|
||||
} else if e.before {
|
||||
count := [colltab.Identity + 1]int{}
|
||||
a := e
|
||||
for ; a.elems == nil && !a.implicit; a = a.next {
|
||||
count[a.level]++
|
||||
}
|
||||
e.elems = []rawCE{makeRawCE(a.elems[0].w, a.elems[0].ccc)}
|
||||
for i := colltab.Primary; i < colltab.Quaternary; i++ {
|
||||
if count[i] != 0 {
|
||||
e.elems[0].w[i] -= count[i]
|
||||
break
|
||||
}
|
||||
}
|
||||
if e.prev != nil {
|
||||
o.verifyWeights(e.prev, e, e.prev.level)
|
||||
}
|
||||
} else {
|
||||
prev := e.prev
|
||||
e.elems = nextWeight(prev.level, o.getWeight(prev))
|
||||
o.verifyWeights(e, e.next, e.level)
|
||||
}
|
||||
}
|
||||
return e.elems
|
||||
}
|
||||
|
||||
func (o *ordering) addExtension(e *entry) {
|
||||
if ex := o.find(e.extend); ex != nil {
|
||||
e.elems = append(e.elems, ex.elems...)
|
||||
} else {
|
||||
for _, r := range []rune(e.extend) {
|
||||
e.elems = append(e.elems, o.find(string(r)).elems...)
|
||||
}
|
||||
}
|
||||
e.extend = ""
|
||||
}
|
||||
|
||||
func (o *ordering) verifyWeights(a, b *entry, level colltab.Level) error {
|
||||
if level == colltab.Identity || b == nil || b.elems == nil || a.elems == nil {
|
||||
return nil
|
||||
}
|
||||
for i := colltab.Primary; i < level; i++ {
|
||||
if a.elems[0].w[i] < b.elems[0].w[i] {
|
||||
return nil
|
||||
}
|
||||
}
|
||||
if a.elems[0].w[level] >= b.elems[0].w[level] {
|
||||
err := fmt.Errorf("%s:overflow: collation elements of %q (%X) overflows those of %q (%X) at level %d (%X >= %X)", o.id, a.str, a.runes, b.str, b.runes, level, a.elems, b.elems)
|
||||
log.Println(err)
|
||||
// TODO: return the error instead, or better, fix the conflicting entry by making room.
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (b *Builder) error(e error) {
|
||||
if e != nil {
|
||||
b.err = e
|
||||
}
|
||||
}
|
||||
|
||||
func (b *Builder) errorID(locale string, e error) {
|
||||
if e != nil {
|
||||
b.err = fmt.Errorf("%s:%v", locale, e)
|
||||
}
|
||||
}
|
||||
|
||||
// patchNorm ensures that NFC and NFD counterparts are consistent.
|
||||
func (o *ordering) patchNorm() {
|
||||
// Insert the NFD counterparts, if necessary.
|
||||
for _, e := range o.ordered {
|
||||
nfd := norm.NFD.String(e.str)
|
||||
if nfd != e.str {
|
||||
if e0 := o.find(nfd); e0 != nil && !e0.modified {
|
||||
e0.elems = e.elems
|
||||
} else if e.modified && !equalCEArrays(o.genColElems(nfd), e.elems) {
|
||||
e := o.newEntry(nfd, e.elems)
|
||||
e.modified = true
|
||||
}
|
||||
}
|
||||
}
|
||||
// Update unchanged composed forms if one of their parts changed.
|
||||
for _, e := range o.ordered {
|
||||
nfd := norm.NFD.String(e.str)
|
||||
if e.modified || nfd == e.str {
|
||||
continue
|
||||
}
|
||||
if e0 := o.find(nfd); e0 != nil {
|
||||
e.elems = e0.elems
|
||||
} else {
|
||||
e.elems = o.genColElems(nfd)
|
||||
if norm.NFD.LastBoundary([]byte(nfd)) == 0 {
|
||||
r := []rune(nfd)
|
||||
head := string(r[0])
|
||||
tail := ""
|
||||
for i := 1; i < len(r); i++ {
|
||||
s := norm.NFC.String(head + string(r[i]))
|
||||
if e0 := o.find(s); e0 != nil && e0.modified {
|
||||
head = s
|
||||
} else {
|
||||
tail += string(r[i])
|
||||
}
|
||||
}
|
||||
e.elems = append(o.genColElems(head), o.genColElems(tail)...)
|
||||
}
|
||||
}
|
||||
}
|
||||
// Exclude entries for which the individual runes generate the same collation elements.
|
||||
for _, e := range o.ordered {
|
||||
if len(e.runes) > 1 && equalCEArrays(o.genColElems(e.str), e.elems) {
|
||||
e.exclude = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (b *Builder) buildOrdering(o *ordering) {
|
||||
for _, e := range o.ordered {
|
||||
o.getWeight(e)
|
||||
}
|
||||
for _, e := range o.ordered {
|
||||
o.addExtension(e)
|
||||
}
|
||||
o.patchNorm()
|
||||
o.sort()
|
||||
simplify(o)
|
||||
b.processExpansions(o) // requires simplify
|
||||
b.processContractions(o) // requires simplify
|
||||
|
||||
t := newNode()
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
if !e.skip() {
|
||||
ce, err := e.encode()
|
||||
b.errorID(o.id, err)
|
||||
t.insert(e.runes[0], ce)
|
||||
}
|
||||
}
|
||||
o.handle = b.index.addTrie(t)
|
||||
}
|
||||
|
||||
func (b *Builder) build() (*table, error) {
|
||||
if b.built {
|
||||
return b.t, b.err
|
||||
}
|
||||
b.built = true
|
||||
b.t = &table{
|
||||
Table: colltab.Table{
|
||||
MaxContractLen: utf8.UTFMax,
|
||||
VariableTop: uint32(b.varTop),
|
||||
},
|
||||
}
|
||||
|
||||
b.buildOrdering(&b.root)
|
||||
b.t.root = b.root.handle
|
||||
for _, t := range b.locale {
|
||||
b.buildOrdering(t.index)
|
||||
if b.err != nil {
|
||||
break
|
||||
}
|
||||
}
|
||||
i, err := b.index.generate()
|
||||
b.t.trie = *i
|
||||
b.t.Index = colltab.Trie{
|
||||
Index: i.index,
|
||||
Values: i.values,
|
||||
Index0: i.index[blockSize*b.t.root.lookupStart:],
|
||||
Values0: i.values[blockSize*b.t.root.valueStart:],
|
||||
}
|
||||
b.error(err)
|
||||
return b.t, b.err
|
||||
}
|
||||
|
||||
// Build builds the root Collator.
|
||||
func (b *Builder) Build() (colltab.Weighter, error) {
|
||||
table, err := b.build()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return table, nil
|
||||
}
|
||||
|
||||
// Build builds a Collator for Tailoring t.
|
||||
func (t *Tailoring) Build() (colltab.Weighter, error) {
|
||||
// TODO: implement.
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
// Print prints the tables for b and all its Tailorings as a Go file
|
||||
// that can be included in the Collate package.
|
||||
func (b *Builder) Print(w io.Writer) (n int, err error) {
|
||||
p := func(nn int, e error) {
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
}
|
||||
t, err := b.build()
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
p(fmt.Fprintf(w, `var availableLocales = "und`))
|
||||
for _, loc := range b.locale {
|
||||
if loc.id != "und" {
|
||||
p(fmt.Fprintf(w, ",%s", loc.id))
|
||||
}
|
||||
}
|
||||
p(fmt.Fprint(w, "\"\n\n"))
|
||||
p(fmt.Fprintf(w, "const varTop = 0x%x\n\n", b.varTop))
|
||||
p(fmt.Fprintln(w, "var locales = [...]tableIndex{"))
|
||||
for _, loc := range b.locale {
|
||||
if loc.id == "und" {
|
||||
p(t.fprintIndex(w, loc.index.handle, loc.id))
|
||||
}
|
||||
}
|
||||
for _, loc := range b.locale {
|
||||
if loc.id != "und" {
|
||||
p(t.fprintIndex(w, loc.index.handle, loc.id))
|
||||
}
|
||||
}
|
||||
p(fmt.Fprint(w, "}\n\n"))
|
||||
n, _, err = t.fprint(w, "main")
|
||||
return
|
||||
}
|
||||
|
||||
// reproducibleFromNFKD checks whether the given expansion could be generated
|
||||
// from an NFKD expansion.
|
||||
func reproducibleFromNFKD(e *entry, exp, nfkd []rawCE) bool {
|
||||
// Length must be equal.
|
||||
if len(exp) != len(nfkd) {
|
||||
return false
|
||||
}
|
||||
for i, ce := range exp {
|
||||
// Primary and secondary values should be equal.
|
||||
if ce.w[0] != nfkd[i].w[0] || ce.w[1] != nfkd[i].w[1] {
|
||||
return false
|
||||
}
|
||||
// Tertiary values should be equal to maxTertiary for third element onwards.
|
||||
// TODO: there seem to be a lot of cases in CLDR (e.g. ㏭ in zh.xml) that can
|
||||
// simply be dropped. Try this out by dropping the following code.
|
||||
if i >= 2 && ce.w[2] != maxTertiary {
|
||||
return false
|
||||
}
|
||||
if _, err := makeCE(ce); err != nil {
|
||||
// Simply return false. The error will be caught elsewhere.
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
func simplify(o *ordering) {
|
||||
// Runes that are a starter of a contraction should not be removed.
|
||||
// (To date, there is only Kannada character 0CCA.)
|
||||
keep := make(map[rune]bool)
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
if len(e.runes) > 1 {
|
||||
keep[e.runes[0]] = true
|
||||
}
|
||||
}
|
||||
// Tag entries for which the runes NFKD decompose to identical values.
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
s := e.str
|
||||
nfkd := norm.NFKD.String(s)
|
||||
nfd := norm.NFD.String(s)
|
||||
if e.decompose || len(e.runes) > 1 || len(e.elems) == 1 || keep[e.runes[0]] || nfkd == nfd {
|
||||
continue
|
||||
}
|
||||
if reproducibleFromNFKD(e, e.elems, o.genColElems(nfkd)) {
|
||||
e.decompose = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// appendExpansion converts the given collation sequence to
|
||||
// collation elements and adds them to the expansion table.
|
||||
// It returns an index to the expansion table.
|
||||
func (b *Builder) appendExpansion(e *entry) int {
|
||||
t := b.t
|
||||
i := len(t.ExpandElem)
|
||||
ce := uint32(len(e.elems))
|
||||
t.ExpandElem = append(t.ExpandElem, ce)
|
||||
for _, w := range e.elems {
|
||||
ce, err := makeCE(w)
|
||||
if err != nil {
|
||||
b.error(err)
|
||||
return -1
|
||||
}
|
||||
t.ExpandElem = append(t.ExpandElem, ce)
|
||||
}
|
||||
return i
|
||||
}
|
||||
|
||||
// processExpansions extracts data necessary to generate
|
||||
// the extraction tables.
|
||||
func (b *Builder) processExpansions(o *ordering) {
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
if !e.expansion() {
|
||||
continue
|
||||
}
|
||||
key := fmt.Sprintf("%v", e.elems)
|
||||
i, ok := b.expIndex[key]
|
||||
if !ok {
|
||||
i = b.appendExpansion(e)
|
||||
b.expIndex[key] = i
|
||||
}
|
||||
e.expansionIndex = i
|
||||
}
|
||||
}
|
||||
|
||||
func (b *Builder) processContractions(o *ordering) {
|
||||
// Collate contractions per starter rune.
|
||||
starters := []rune{}
|
||||
cm := make(map[rune][]*entry)
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
if e.contraction() {
|
||||
if len(e.str) > b.t.MaxContractLen {
|
||||
b.t.MaxContractLen = len(e.str)
|
||||
}
|
||||
r := e.runes[0]
|
||||
if _, ok := cm[r]; !ok {
|
||||
starters = append(starters, r)
|
||||
}
|
||||
cm[r] = append(cm[r], e)
|
||||
}
|
||||
}
|
||||
// Add entries of single runes that are at a start of a contraction.
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
if !e.contraction() {
|
||||
r := e.runes[0]
|
||||
if _, ok := cm[r]; ok {
|
||||
cm[r] = append(cm[r], e)
|
||||
}
|
||||
}
|
||||
}
|
||||
// Build the tries for the contractions.
|
||||
t := b.t
|
||||
for _, r := range starters {
|
||||
l := cm[r]
|
||||
// Compute suffix strings. There are 31 different contraction suffix
|
||||
// sets for 715 contractions and 82 contraction starter runes as of
|
||||
// version 6.0.0.
|
||||
sufx := []string{}
|
||||
hasSingle := false
|
||||
for _, e := range l {
|
||||
if len(e.runes) > 1 {
|
||||
sufx = append(sufx, string(e.runes[1:]))
|
||||
} else {
|
||||
hasSingle = true
|
||||
}
|
||||
}
|
||||
if !hasSingle {
|
||||
b.error(fmt.Errorf("no single entry for starter rune %U found", r))
|
||||
continue
|
||||
}
|
||||
// Unique the suffix set.
|
||||
sort.Strings(sufx)
|
||||
key := strings.Join(sufx, "\n")
|
||||
handle, ok := b.ctHandle[key]
|
||||
if !ok {
|
||||
var err error
|
||||
handle, err = appendTrie(&t.ContractTries, sufx)
|
||||
if err != nil {
|
||||
b.error(err)
|
||||
}
|
||||
b.ctHandle[key] = handle
|
||||
}
|
||||
// Bucket sort entries in index order.
|
||||
es := make([]*entry, len(l))
|
||||
for _, e := range l {
|
||||
var p, sn int
|
||||
if len(e.runes) > 1 {
|
||||
str := []byte(string(e.runes[1:]))
|
||||
p, sn = lookup(&t.ContractTries, handle, str)
|
||||
if sn != len(str) {
|
||||
log.Fatalf("%s: processContractions: unexpected length for '%X'; len=%d; want %d", o.id, e.runes, sn, len(str))
|
||||
}
|
||||
}
|
||||
if es[p] != nil {
|
||||
log.Fatalf("%s: multiple contractions for position %d for rune %U", o.id, p, e.runes[0])
|
||||
}
|
||||
es[p] = e
|
||||
}
|
||||
// Create collation elements for contractions.
|
||||
elems := []uint32{}
|
||||
for _, e := range es {
|
||||
ce, err := e.encodeBase()
|
||||
b.errorID(o.id, err)
|
||||
elems = append(elems, ce)
|
||||
}
|
||||
key = fmt.Sprintf("%v", elems)
|
||||
i, ok := b.ctElem[key]
|
||||
if !ok {
|
||||
i = len(t.ContractElem)
|
||||
b.ctElem[key] = i
|
||||
t.ContractElem = append(t.ContractElem, elems...)
|
||||
}
|
||||
// Store info in entry for starter rune.
|
||||
es[0].contractionIndex = i
|
||||
es[0].contractionHandle = handle
|
||||
}
|
||||
}
|
||||
290
vendor/golang.org/x/text/collate/build/builder_test.go
generated
vendored
Normal file
290
vendor/golang.org/x/text/collate/build/builder_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,290 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import "testing"
|
||||
|
||||
// cjk returns an implicit collation element for a CJK rune.
|
||||
func cjk(r rune) []rawCE {
|
||||
// A CJK character C is represented in the DUCET as
|
||||
// [.AAAA.0020.0002.C][.BBBB.0000.0000.C]
|
||||
// Where AAAA is the most significant 15 bits plus a base value.
|
||||
// Any base value will work for the test, so we pick the common value of FB40.
|
||||
const base = 0xFB40
|
||||
return []rawCE{
|
||||
{w: []int{base + int(r>>15), defaultSecondary, defaultTertiary, int(r)}},
|
||||
{w: []int{int(r&0x7FFF) | 0x8000, 0, 0, int(r)}},
|
||||
}
|
||||
}
|
||||
|
||||
func pCE(p int) []rawCE {
|
||||
return mkCE([]int{p, defaultSecondary, defaultTertiary, 0}, 0)
|
||||
}
|
||||
|
||||
func pqCE(p, q int) []rawCE {
|
||||
return mkCE([]int{p, defaultSecondary, defaultTertiary, q}, 0)
|
||||
}
|
||||
|
||||
func ptCE(p, t int) []rawCE {
|
||||
return mkCE([]int{p, defaultSecondary, t, 0}, 0)
|
||||
}
|
||||
|
||||
func ptcCE(p, t int, ccc uint8) []rawCE {
|
||||
return mkCE([]int{p, defaultSecondary, t, 0}, ccc)
|
||||
}
|
||||
|
||||
func sCE(s int) []rawCE {
|
||||
return mkCE([]int{0, s, defaultTertiary, 0}, 0)
|
||||
}
|
||||
|
||||
func stCE(s, t int) []rawCE {
|
||||
return mkCE([]int{0, s, t, 0}, 0)
|
||||
}
|
||||
|
||||
func scCE(s int, ccc uint8) []rawCE {
|
||||
return mkCE([]int{0, s, defaultTertiary, 0}, ccc)
|
||||
}
|
||||
|
||||
func mkCE(w []int, ccc uint8) []rawCE {
|
||||
return []rawCE{rawCE{w, ccc}}
|
||||
}
|
||||
|
||||
// ducetElem is used to define test data that is used to generate a table.
|
||||
type ducetElem struct {
|
||||
str string
|
||||
ces []rawCE
|
||||
}
|
||||
|
||||
func newBuilder(t *testing.T, ducet []ducetElem) *Builder {
|
||||
b := NewBuilder()
|
||||
for _, e := range ducet {
|
||||
ces := [][]int{}
|
||||
for _, ce := range e.ces {
|
||||
ces = append(ces, ce.w)
|
||||
}
|
||||
if err := b.Add([]rune(e.str), ces, nil); err != nil {
|
||||
t.Errorf(err.Error())
|
||||
}
|
||||
}
|
||||
b.t = &table{}
|
||||
b.root.sort()
|
||||
return b
|
||||
}
|
||||
|
||||
type convertTest struct {
|
||||
in, out []rawCE
|
||||
err bool
|
||||
}
|
||||
|
||||
var convLargeTests = []convertTest{
|
||||
{pCE(0xFB39), pCE(0xFB39), false},
|
||||
{cjk(0x2F9B2), pqCE(0x3F9B2, 0x2F9B2), false},
|
||||
{pCE(0xFB40), pCE(0), true},
|
||||
{append(pCE(0xFB40), pCE(0)[0]), pCE(0), true},
|
||||
{pCE(0xFFFE), pCE(illegalOffset), false},
|
||||
{pCE(0xFFFF), pCE(illegalOffset + 1), false},
|
||||
}
|
||||
|
||||
func TestConvertLarge(t *testing.T) {
|
||||
for i, tt := range convLargeTests {
|
||||
e := new(entry)
|
||||
for _, ce := range tt.in {
|
||||
e.elems = append(e.elems, makeRawCE(ce.w, ce.ccc))
|
||||
}
|
||||
elems, err := convertLargeWeights(e.elems)
|
||||
if tt.err {
|
||||
if err == nil {
|
||||
t.Errorf("%d: expected error; none found", i)
|
||||
}
|
||||
continue
|
||||
} else if err != nil {
|
||||
t.Errorf("%d: unexpected error: %v", i, err)
|
||||
}
|
||||
if !equalCEArrays(elems, tt.out) {
|
||||
t.Errorf("%d: conversion was %x; want %x", i, elems, tt.out)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Collation element table for simplify tests.
|
||||
var simplifyTest = []ducetElem{
|
||||
{"\u0300", sCE(30)}, // grave
|
||||
{"\u030C", sCE(40)}, // caron
|
||||
{"A", ptCE(100, 8)},
|
||||
{"D", ptCE(104, 8)},
|
||||
{"E", ptCE(105, 8)},
|
||||
{"I", ptCE(110, 8)},
|
||||
{"z", ptCE(130, 8)},
|
||||
{"\u05F2", append(ptCE(200, 4), ptCE(200, 4)[0])},
|
||||
{"\u05B7", sCE(80)},
|
||||
{"\u00C0", append(ptCE(100, 8), sCE(30)...)}, // A with grave, can be removed
|
||||
{"\u00C8", append(ptCE(105, 8), sCE(30)...)}, // E with grave
|
||||
{"\uFB1F", append(ptCE(200, 4), ptCE(200, 4)[0], sCE(80)[0])}, // eliminated by NFD
|
||||
{"\u00C8\u0302", ptCE(106, 8)}, // block previous from simplifying
|
||||
{"\u01C5", append(ptCE(104, 9), ptCE(130, 4)[0], stCE(40, maxTertiary)[0])}, // eliminated by NFKD
|
||||
// no removal: tertiary value of third element is not maxTertiary
|
||||
{"\u2162", append(ptCE(110, 9), ptCE(110, 4)[0], ptCE(110, 8)[0])},
|
||||
}
|
||||
|
||||
var genColTests = []ducetElem{
|
||||
{"\uFA70", pqCE(0x1FA70, 0xFA70)},
|
||||
{"A\u0300", append(ptCE(100, 8), sCE(30)...)},
|
||||
{"A\u0300\uFA70", append(ptCE(100, 8), sCE(30)[0], pqCE(0x1FA70, 0xFA70)[0])},
|
||||
{"A\u0300A\u0300", append(ptCE(100, 8), sCE(30)[0], ptCE(100, 8)[0], sCE(30)[0])},
|
||||
}
|
||||
|
||||
func TestGenColElems(t *testing.T) {
|
||||
b := newBuilder(t, simplifyTest[:5])
|
||||
|
||||
for i, tt := range genColTests {
|
||||
res := b.root.genColElems(tt.str)
|
||||
if !equalCEArrays(tt.ces, res) {
|
||||
t.Errorf("%d: result %X; want %X", i, res, tt.ces)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type strArray []string
|
||||
|
||||
func (sa strArray) contains(s string) bool {
|
||||
for _, e := range sa {
|
||||
if e == s {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
var simplifyRemoved = strArray{"\u00C0", "\uFB1F"}
|
||||
var simplifyMarked = strArray{"\u01C5"}
|
||||
|
||||
func TestSimplify(t *testing.T) {
|
||||
b := newBuilder(t, simplifyTest)
|
||||
o := &b.root
|
||||
simplify(o)
|
||||
|
||||
for i, tt := range simplifyTest {
|
||||
if simplifyRemoved.contains(tt.str) {
|
||||
continue
|
||||
}
|
||||
e := o.find(tt.str)
|
||||
if e.str != tt.str || !equalCEArrays(e.elems, tt.ces) {
|
||||
t.Errorf("%d: found element %s -> %X; want %s -> %X", i, e.str, e.elems, tt.str, tt.ces)
|
||||
break
|
||||
}
|
||||
}
|
||||
var i, k int
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
gold := simplifyMarked.contains(e.str)
|
||||
if gold {
|
||||
k++
|
||||
}
|
||||
if gold != e.decompose {
|
||||
t.Errorf("%d: %s has decompose %v; want %v", i, e.str, e.decompose, gold)
|
||||
}
|
||||
i++
|
||||
}
|
||||
if k != len(simplifyMarked) {
|
||||
t.Errorf(" an entry that should be marked as decompose was deleted")
|
||||
}
|
||||
}
|
||||
|
||||
var expandTest = []ducetElem{
|
||||
{"\u0300", append(scCE(29, 230), scCE(30, 230)...)},
|
||||
{"\u00C0", append(ptCE(100, 8), scCE(30, 230)...)},
|
||||
{"\u00C8", append(ptCE(105, 8), scCE(30, 230)...)},
|
||||
{"\u00C9", append(ptCE(105, 8), scCE(30, 230)...)}, // identical expansion
|
||||
{"\u05F2", append(ptCE(200, 4), ptCE(200, 4)[0], ptCE(200, 4)[0])},
|
||||
{"\u01FF", append(ptCE(200, 4), ptcCE(201, 4, 0)[0], scCE(30, 230)[0])},
|
||||
}
|
||||
|
||||
func TestExpand(t *testing.T) {
|
||||
const (
|
||||
totalExpansions = 5
|
||||
totalElements = 2 + 2 + 2 + 3 + 3 + totalExpansions
|
||||
)
|
||||
b := newBuilder(t, expandTest)
|
||||
o := &b.root
|
||||
b.processExpansions(o)
|
||||
|
||||
e := o.front()
|
||||
for _, tt := range expandTest {
|
||||
exp := b.t.ExpandElem[e.expansionIndex:]
|
||||
if int(exp[0]) != len(tt.ces) {
|
||||
t.Errorf("%U: len(expansion)==%d; want %d", []rune(tt.str)[0], exp[0], len(tt.ces))
|
||||
}
|
||||
exp = exp[1:]
|
||||
for j, w := range tt.ces {
|
||||
if ce, _ := makeCE(w); exp[j] != ce {
|
||||
t.Errorf("%U: element %d is %X; want %X", []rune(tt.str)[0], j, exp[j], ce)
|
||||
}
|
||||
}
|
||||
e, _ = e.nextIndexed()
|
||||
}
|
||||
// Verify uniquing.
|
||||
if len(b.t.ExpandElem) != totalElements {
|
||||
t.Errorf("len(expandElem)==%d; want %d", len(b.t.ExpandElem), totalElements)
|
||||
}
|
||||
}
|
||||
|
||||
var contractTest = []ducetElem{
|
||||
{"abc", pCE(102)},
|
||||
{"abd", pCE(103)},
|
||||
{"a", pCE(100)},
|
||||
{"ab", pCE(101)},
|
||||
{"ac", pCE(104)},
|
||||
{"bcd", pCE(202)},
|
||||
{"b", pCE(200)},
|
||||
{"bc", pCE(201)},
|
||||
{"bd", pCE(203)},
|
||||
// shares suffixes with a*
|
||||
{"Ab", pCE(301)},
|
||||
{"A", pCE(300)},
|
||||
{"Ac", pCE(304)},
|
||||
{"Abc", pCE(302)},
|
||||
{"Abd", pCE(303)},
|
||||
// starter to be ignored
|
||||
{"z", pCE(1000)},
|
||||
}
|
||||
|
||||
func TestContract(t *testing.T) {
|
||||
const (
|
||||
totalElements = 5 + 5 + 4
|
||||
)
|
||||
b := newBuilder(t, contractTest)
|
||||
o := &b.root
|
||||
b.processContractions(o)
|
||||
|
||||
indexMap := make(map[int]bool)
|
||||
handleMap := make(map[rune]*entry)
|
||||
for e := o.front(); e != nil; e, _ = e.nextIndexed() {
|
||||
if e.contractionHandle.n > 0 {
|
||||
handleMap[e.runes[0]] = e
|
||||
indexMap[e.contractionHandle.index] = true
|
||||
}
|
||||
}
|
||||
// Verify uniquing.
|
||||
if len(indexMap) != 2 {
|
||||
t.Errorf("number of tries is %d; want %d", len(indexMap), 2)
|
||||
}
|
||||
for _, tt := range contractTest {
|
||||
e, ok := handleMap[[]rune(tt.str)[0]]
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
str := tt.str[1:]
|
||||
offset, n := lookup(&b.t.ContractTries, e.contractionHandle, []byte(str))
|
||||
if len(str) != n {
|
||||
t.Errorf("%s: bytes consumed==%d; want %d", tt.str, n, len(str))
|
||||
}
|
||||
ce := b.t.ContractElem[offset+e.contractionIndex]
|
||||
if want, _ := makeCE(tt.ces[0]); want != ce {
|
||||
t.Errorf("%s: element %X; want %X", tt.str, ce, want)
|
||||
}
|
||||
}
|
||||
if len(b.t.ContractElem) != totalElements {
|
||||
t.Errorf("len(expandElem)==%d; want %d", len(b.t.ContractElem), totalElements)
|
||||
}
|
||||
}
|
||||
294
vendor/golang.org/x/text/collate/build/colelem.go
generated
vendored
Normal file
294
vendor/golang.org/x/text/collate/build/colelem.go
generated
vendored
Normal file
|
|
@ -0,0 +1,294 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"unicode"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
const (
|
||||
defaultSecondary = 0x20
|
||||
defaultTertiary = 0x2
|
||||
maxTertiary = 0x1F
|
||||
)
|
||||
|
||||
type rawCE struct {
|
||||
w []int
|
||||
ccc uint8
|
||||
}
|
||||
|
||||
func makeRawCE(w []int, ccc uint8) rawCE {
|
||||
ce := rawCE{w: make([]int, 4), ccc: ccc}
|
||||
copy(ce.w, w)
|
||||
return ce
|
||||
}
|
||||
|
||||
// A collation element is represented as an uint32.
|
||||
// In the typical case, a rune maps to a single collation element. If a rune
|
||||
// can be the start of a contraction or expands into multiple collation elements,
|
||||
// then the collation element that is associated with a rune will have a special
|
||||
// form to represent such m to n mappings. Such special collation elements
|
||||
// have a value >= 0x80000000.
|
||||
|
||||
const (
|
||||
maxPrimaryBits = 21
|
||||
maxSecondaryBits = 12
|
||||
maxTertiaryBits = 8
|
||||
)
|
||||
|
||||
func makeCE(ce rawCE) (uint32, error) {
|
||||
v, e := colltab.MakeElem(ce.w[0], ce.w[1], ce.w[2], ce.ccc)
|
||||
return uint32(v), e
|
||||
}
|
||||
|
||||
// For contractions, collation elements are of the form
|
||||
// 110bbbbb bbbbbbbb iiiiiiii iiiinnnn, where
|
||||
// - n* is the size of the first node in the contraction trie.
|
||||
// - i* is the index of the first node in the contraction trie.
|
||||
// - b* is the offset into the contraction collation element table.
|
||||
// See contract.go for details on the contraction trie.
|
||||
const (
|
||||
contractID = 0xC0000000
|
||||
maxNBits = 4
|
||||
maxTrieIndexBits = 12
|
||||
maxContractOffsetBits = 13
|
||||
)
|
||||
|
||||
func makeContractIndex(h ctHandle, offset int) (uint32, error) {
|
||||
if h.n >= 1<<maxNBits {
|
||||
return 0, fmt.Errorf("size of contraction trie node too large: %d >= %d", h.n, 1<<maxNBits)
|
||||
}
|
||||
if h.index >= 1<<maxTrieIndexBits {
|
||||
return 0, fmt.Errorf("size of contraction trie offset too large: %d >= %d", h.index, 1<<maxTrieIndexBits)
|
||||
}
|
||||
if offset >= 1<<maxContractOffsetBits {
|
||||
return 0, fmt.Errorf("contraction offset out of bounds: %x >= %x", offset, 1<<maxContractOffsetBits)
|
||||
}
|
||||
ce := uint32(contractID)
|
||||
ce += uint32(offset << (maxNBits + maxTrieIndexBits))
|
||||
ce += uint32(h.index << maxNBits)
|
||||
ce += uint32(h.n)
|
||||
return ce, nil
|
||||
}
|
||||
|
||||
// For expansions, collation elements are of the form
|
||||
// 11100000 00000000 bbbbbbbb bbbbbbbb,
|
||||
// where b* is the index into the expansion sequence table.
|
||||
const (
|
||||
expandID = 0xE0000000
|
||||
maxExpandIndexBits = 16
|
||||
)
|
||||
|
||||
func makeExpandIndex(index int) (uint32, error) {
|
||||
if index >= 1<<maxExpandIndexBits {
|
||||
return 0, fmt.Errorf("expansion index out of bounds: %x >= %x", index, 1<<maxExpandIndexBits)
|
||||
}
|
||||
return expandID + uint32(index), nil
|
||||
}
|
||||
|
||||
// Each list of collation elements corresponding to an expansion starts with
|
||||
// a header indicating the length of the sequence.
|
||||
func makeExpansionHeader(n int) (uint32, error) {
|
||||
return uint32(n), nil
|
||||
}
|
||||
|
||||
// Some runes can be expanded using NFKD decomposition. Instead of storing the full
|
||||
// sequence of collation elements, we decompose the rune and lookup the collation
|
||||
// elements for each rune in the decomposition and modify the tertiary weights.
|
||||
// The collation element, in this case, is of the form
|
||||
// 11110000 00000000 wwwwwwww vvvvvvvv, where
|
||||
// - v* is the replacement tertiary weight for the first rune,
|
||||
// - w* is the replacement tertiary weight for the second rune,
|
||||
// Tertiary weights of subsequent runes should be replaced with maxTertiary.
|
||||
// See http://www.unicode.org/reports/tr10/#Compatibility_Decompositions for more details.
|
||||
const (
|
||||
decompID = 0xF0000000
|
||||
)
|
||||
|
||||
func makeDecompose(t1, t2 int) (uint32, error) {
|
||||
if t1 >= 256 || t1 < 0 {
|
||||
return 0, fmt.Errorf("first tertiary weight out of bounds: %d >= 256", t1)
|
||||
}
|
||||
if t2 >= 256 || t2 < 0 {
|
||||
return 0, fmt.Errorf("second tertiary weight out of bounds: %d >= 256", t2)
|
||||
}
|
||||
return uint32(t2<<8+t1) + decompID, nil
|
||||
}
|
||||
|
||||
const (
|
||||
// These constants were taken from http://www.unicode.org/versions/Unicode6.0.0/ch12.pdf.
|
||||
minUnified rune = 0x4E00
|
||||
maxUnified = 0x9FFF
|
||||
minCompatibility = 0xF900
|
||||
maxCompatibility = 0xFAFF
|
||||
minRare = 0x3400
|
||||
maxRare = 0x4DBF
|
||||
)
|
||||
const (
|
||||
commonUnifiedOffset = 0x10000
|
||||
rareUnifiedOffset = 0x20000 // largest rune in common is U+FAFF
|
||||
otherOffset = 0x50000 // largest rune in rare is U+2FA1D
|
||||
illegalOffset = otherOffset + int(unicode.MaxRune)
|
||||
maxPrimary = illegalOffset + 1
|
||||
)
|
||||
|
||||
// implicitPrimary returns the primary weight for the a rune
|
||||
// for which there is no entry for the rune in the collation table.
|
||||
// We take a different approach from the one specified in
|
||||
// http://unicode.org/reports/tr10/#Implicit_Weights,
|
||||
// but preserve the resulting relative ordering of the runes.
|
||||
func implicitPrimary(r rune) int {
|
||||
if unicode.Is(unicode.Ideographic, r) {
|
||||
if r >= minUnified && r <= maxUnified {
|
||||
// The most common case for CJK.
|
||||
return int(r) + commonUnifiedOffset
|
||||
}
|
||||
if r >= minCompatibility && r <= maxCompatibility {
|
||||
// This will typically not hit. The DUCET explicitly specifies mappings
|
||||
// for all characters that do not decompose.
|
||||
return int(r) + commonUnifiedOffset
|
||||
}
|
||||
return int(r) + rareUnifiedOffset
|
||||
}
|
||||
return int(r) + otherOffset
|
||||
}
|
||||
|
||||
// convertLargeWeights converts collation elements with large
|
||||
// primaries (either double primaries or for illegal runes)
|
||||
// to our own representation.
|
||||
// A CJK character C is represented in the DUCET as
|
||||
// [.FBxx.0020.0002.C][.BBBB.0000.0000.C]
|
||||
// We will rewrite these characters to a single CE.
|
||||
// We assume the CJK values start at 0x8000.
|
||||
// See http://unicode.org/reports/tr10/#Implicit_Weights
|
||||
func convertLargeWeights(elems []rawCE) (res []rawCE, err error) {
|
||||
const (
|
||||
cjkPrimaryStart = 0xFB40
|
||||
rarePrimaryStart = 0xFB80
|
||||
otherPrimaryStart = 0xFBC0
|
||||
illegalPrimary = 0xFFFE
|
||||
highBitsMask = 0x3F
|
||||
lowBitsMask = 0x7FFF
|
||||
lowBitsFlag = 0x8000
|
||||
shiftBits = 15
|
||||
)
|
||||
for i := 0; i < len(elems); i++ {
|
||||
ce := elems[i].w
|
||||
p := ce[0]
|
||||
if p < cjkPrimaryStart {
|
||||
continue
|
||||
}
|
||||
if p > 0xFFFF {
|
||||
return elems, fmt.Errorf("found primary weight %X; should be <= 0xFFFF", p)
|
||||
}
|
||||
if p >= illegalPrimary {
|
||||
ce[0] = illegalOffset + p - illegalPrimary
|
||||
} else {
|
||||
if i+1 >= len(elems) {
|
||||
return elems, fmt.Errorf("second part of double primary weight missing: %v", elems)
|
||||
}
|
||||
if elems[i+1].w[0]&lowBitsFlag == 0 {
|
||||
return elems, fmt.Errorf("malformed second part of double primary weight: %v", elems)
|
||||
}
|
||||
np := ((p & highBitsMask) << shiftBits) + elems[i+1].w[0]&lowBitsMask
|
||||
switch {
|
||||
case p < rarePrimaryStart:
|
||||
np += commonUnifiedOffset
|
||||
case p < otherPrimaryStart:
|
||||
np += rareUnifiedOffset
|
||||
default:
|
||||
p += otherOffset
|
||||
}
|
||||
ce[0] = np
|
||||
for j := i + 1; j+1 < len(elems); j++ {
|
||||
elems[j] = elems[j+1]
|
||||
}
|
||||
elems = elems[:len(elems)-1]
|
||||
}
|
||||
}
|
||||
return elems, nil
|
||||
}
|
||||
|
||||
// nextWeight computes the first possible collation weights following elems
|
||||
// for the given level.
|
||||
func nextWeight(level colltab.Level, elems []rawCE) []rawCE {
|
||||
if level == colltab.Identity {
|
||||
next := make([]rawCE, len(elems))
|
||||
copy(next, elems)
|
||||
return next
|
||||
}
|
||||
next := []rawCE{makeRawCE(elems[0].w, elems[0].ccc)}
|
||||
next[0].w[level]++
|
||||
if level < colltab.Secondary {
|
||||
next[0].w[colltab.Secondary] = defaultSecondary
|
||||
}
|
||||
if level < colltab.Tertiary {
|
||||
next[0].w[colltab.Tertiary] = defaultTertiary
|
||||
}
|
||||
// Filter entries that cannot influence ordering.
|
||||
for _, ce := range elems[1:] {
|
||||
skip := true
|
||||
for i := colltab.Primary; i < level; i++ {
|
||||
skip = skip && ce.w[i] == 0
|
||||
}
|
||||
if !skip {
|
||||
next = append(next, ce)
|
||||
}
|
||||
}
|
||||
return next
|
||||
}
|
||||
|
||||
func nextVal(elems []rawCE, i int, level colltab.Level) (index, value int) {
|
||||
for ; i < len(elems) && elems[i].w[level] == 0; i++ {
|
||||
}
|
||||
if i < len(elems) {
|
||||
return i, elems[i].w[level]
|
||||
}
|
||||
return i, 0
|
||||
}
|
||||
|
||||
// compareWeights returns -1 if a < b, 1 if a > b, or 0 otherwise.
|
||||
// It also returns the collation level at which the difference is found.
|
||||
func compareWeights(a, b []rawCE) (result int, level colltab.Level) {
|
||||
for level := colltab.Primary; level < colltab.Identity; level++ {
|
||||
var va, vb int
|
||||
for ia, ib := 0, 0; ia < len(a) || ib < len(b); ia, ib = ia+1, ib+1 {
|
||||
ia, va = nextVal(a, ia, level)
|
||||
ib, vb = nextVal(b, ib, level)
|
||||
if va != vb {
|
||||
if va < vb {
|
||||
return -1, level
|
||||
} else {
|
||||
return 1, level
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0, colltab.Identity
|
||||
}
|
||||
|
||||
func equalCE(a, b rawCE) bool {
|
||||
for i := 0; i < 3; i++ {
|
||||
if b.w[i] != a.w[i] {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
func equalCEArrays(a, b []rawCE) bool {
|
||||
if len(a) != len(b) {
|
||||
return false
|
||||
}
|
||||
for i := range a {
|
||||
if !equalCE(a[i], b[i]) {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
215
vendor/golang.org/x/text/collate/build/colelem_test.go
generated
vendored
Normal file
215
vendor/golang.org/x/text/collate/build/colelem_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,215 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
type ceTest struct {
|
||||
f func(in []int) (uint32, error)
|
||||
arg []int
|
||||
val uint32
|
||||
}
|
||||
|
||||
func normalCE(in []int) (ce uint32, err error) {
|
||||
return makeCE(rawCE{w: in[:3], ccc: uint8(in[3])})
|
||||
}
|
||||
|
||||
func expandCE(in []int) (ce uint32, err error) {
|
||||
return makeExpandIndex(in[0])
|
||||
}
|
||||
|
||||
func contractCE(in []int) (ce uint32, err error) {
|
||||
return makeContractIndex(ctHandle{in[0], in[1]}, in[2])
|
||||
}
|
||||
|
||||
func decompCE(in []int) (ce uint32, err error) {
|
||||
return makeDecompose(in[0], in[1])
|
||||
}
|
||||
|
||||
var ceTests = []ceTest{
|
||||
{normalCE, []int{0, 0, 0, 0}, 0xA0000000},
|
||||
{normalCE, []int{0, 0x28, 3, 0}, 0xA0002803},
|
||||
{normalCE, []int{0, 0x28, 3, 0xFF}, 0xAFF02803},
|
||||
{normalCE, []int{100, defaultSecondary, 3, 0}, 0x0000C883},
|
||||
// non-ignorable primary with non-default secondary
|
||||
{normalCE, []int{100, 0x28, defaultTertiary, 0}, 0x4000C828},
|
||||
{normalCE, []int{100, defaultSecondary + 8, 3, 0}, 0x0000C983},
|
||||
{normalCE, []int{100, 0, 3, 0}, 0xFFFF}, // non-ignorable primary with non-supported secondary
|
||||
{normalCE, []int{100, 1, 3, 0}, 0xFFFF},
|
||||
{normalCE, []int{1 << maxPrimaryBits, defaultSecondary, 0, 0}, 0xFFFF},
|
||||
{normalCE, []int{0, 1 << maxSecondaryBits, 0, 0}, 0xFFFF},
|
||||
{normalCE, []int{100, defaultSecondary, 1 << maxTertiaryBits, 0}, 0xFFFF},
|
||||
{normalCE, []int{0x123, defaultSecondary, 8, 0xFF}, 0x88FF0123},
|
||||
{normalCE, []int{0x123, defaultSecondary + 1, 8, 0xFF}, 0xFFFF},
|
||||
|
||||
{contractCE, []int{0, 0, 0}, 0xC0000000},
|
||||
{contractCE, []int{1, 1, 1}, 0xC0010011},
|
||||
{contractCE, []int{1, (1 << maxNBits) - 1, 1}, 0xC001001F},
|
||||
{contractCE, []int{(1 << maxTrieIndexBits) - 1, 1, 1}, 0xC001FFF1},
|
||||
{contractCE, []int{1, 1, (1 << maxContractOffsetBits) - 1}, 0xDFFF0011},
|
||||
{contractCE, []int{1, (1 << maxNBits), 1}, 0xFFFF},
|
||||
{contractCE, []int{(1 << maxTrieIndexBits), 1, 1}, 0xFFFF},
|
||||
{contractCE, []int{1, (1 << maxContractOffsetBits), 1}, 0xFFFF},
|
||||
|
||||
{expandCE, []int{0}, 0xE0000000},
|
||||
{expandCE, []int{5}, 0xE0000005},
|
||||
{expandCE, []int{(1 << maxExpandIndexBits) - 1}, 0xE000FFFF},
|
||||
{expandCE, []int{1 << maxExpandIndexBits}, 0xFFFF},
|
||||
|
||||
{decompCE, []int{0, 0}, 0xF0000000},
|
||||
{decompCE, []int{1, 1}, 0xF0000101},
|
||||
{decompCE, []int{0x1F, 0x1F}, 0xF0001F1F},
|
||||
{decompCE, []int{256, 0x1F}, 0xFFFF},
|
||||
{decompCE, []int{0x1F, 256}, 0xFFFF},
|
||||
}
|
||||
|
||||
func TestColElem(t *testing.T) {
|
||||
for i, tt := range ceTests {
|
||||
in := make([]int, len(tt.arg))
|
||||
copy(in, tt.arg)
|
||||
ce, err := tt.f(in)
|
||||
if tt.val == 0xFFFF {
|
||||
if err == nil {
|
||||
t.Errorf("%d: expected error for args %x", i, tt.arg)
|
||||
}
|
||||
continue
|
||||
}
|
||||
if err != nil {
|
||||
t.Errorf("%d: unexpected error: %v", i, err.Error())
|
||||
}
|
||||
if ce != tt.val {
|
||||
t.Errorf("%d: colElem=%X; want %X", i, ce, tt.val)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func mkRawCES(in [][]int) []rawCE {
|
||||
out := []rawCE{}
|
||||
for _, w := range in {
|
||||
out = append(out, rawCE{w: w})
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
type weightsTest struct {
|
||||
a, b [][]int
|
||||
level colltab.Level
|
||||
result int
|
||||
}
|
||||
|
||||
var nextWeightTests = []weightsTest{
|
||||
{
|
||||
a: [][]int{{100, 20, 5, 0}},
|
||||
b: [][]int{{101, defaultSecondary, defaultTertiary, 0}},
|
||||
level: colltab.Primary,
|
||||
},
|
||||
{
|
||||
a: [][]int{{100, 20, 5, 0}},
|
||||
b: [][]int{{100, 21, defaultTertiary, 0}},
|
||||
level: colltab.Secondary,
|
||||
},
|
||||
{
|
||||
a: [][]int{{100, 20, 5, 0}},
|
||||
b: [][]int{{100, 20, 6, 0}},
|
||||
level: colltab.Tertiary,
|
||||
},
|
||||
{
|
||||
a: [][]int{{100, 20, 5, 0}},
|
||||
b: [][]int{{100, 20, 5, 0}},
|
||||
level: colltab.Identity,
|
||||
},
|
||||
}
|
||||
|
||||
var extra = [][]int{{200, 32, 8, 0}, {0, 32, 8, 0}, {0, 0, 8, 0}, {0, 0, 0, 0}}
|
||||
|
||||
func TestNextWeight(t *testing.T) {
|
||||
for i, tt := range nextWeightTests {
|
||||
test := func(l colltab.Level, tt weightsTest, a, gold [][]int) {
|
||||
res := nextWeight(tt.level, mkRawCES(a))
|
||||
if !equalCEArrays(mkRawCES(gold), res) {
|
||||
t.Errorf("%d:%d: expected weights %d; found %d", i, l, gold, res)
|
||||
}
|
||||
}
|
||||
test(-1, tt, tt.a, tt.b)
|
||||
for l := colltab.Primary; l <= colltab.Tertiary; l++ {
|
||||
if tt.level <= l {
|
||||
test(l, tt, append(tt.a, extra[l]), tt.b)
|
||||
} else {
|
||||
test(l, tt, append(tt.a, extra[l]), append(tt.b, extra[l]))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var compareTests = []weightsTest{
|
||||
{
|
||||
[][]int{{100, 20, 5, 0}},
|
||||
[][]int{{100, 20, 5, 0}},
|
||||
colltab.Identity,
|
||||
0,
|
||||
},
|
||||
{
|
||||
[][]int{{100, 20, 5, 0}, extra[0]},
|
||||
[][]int{{100, 20, 5, 1}},
|
||||
colltab.Primary,
|
||||
1,
|
||||
},
|
||||
{
|
||||
[][]int{{100, 20, 5, 0}},
|
||||
[][]int{{101, 20, 5, 0}},
|
||||
colltab.Primary,
|
||||
-1,
|
||||
},
|
||||
{
|
||||
[][]int{{101, 20, 5, 0}},
|
||||
[][]int{{100, 20, 5, 0}},
|
||||
colltab.Primary,
|
||||
1,
|
||||
},
|
||||
{
|
||||
[][]int{{100, 0, 0, 0}, {0, 20, 5, 0}},
|
||||
[][]int{{0, 20, 5, 0}, {100, 0, 0, 0}},
|
||||
colltab.Identity,
|
||||
0,
|
||||
},
|
||||
{
|
||||
[][]int{{100, 20, 5, 0}},
|
||||
[][]int{{100, 21, 5, 0}},
|
||||
colltab.Secondary,
|
||||
-1,
|
||||
},
|
||||
{
|
||||
[][]int{{100, 20, 5, 0}},
|
||||
[][]int{{100, 20, 2, 0}},
|
||||
colltab.Tertiary,
|
||||
1,
|
||||
},
|
||||
{
|
||||
[][]int{{100, 20, 5, 1}},
|
||||
[][]int{{100, 20, 5, 2}},
|
||||
colltab.Quaternary,
|
||||
-1,
|
||||
},
|
||||
}
|
||||
|
||||
func TestCompareWeights(t *testing.T) {
|
||||
for i, tt := range compareTests {
|
||||
test := func(tt weightsTest, a, b [][]int) {
|
||||
res, level := compareWeights(mkRawCES(a), mkRawCES(b))
|
||||
if res != tt.result {
|
||||
t.Errorf("%d: expected comparison result %d; found %d", i, tt.result, res)
|
||||
}
|
||||
if level != tt.level {
|
||||
t.Errorf("%d: expected level %d; found %d", i, tt.level, level)
|
||||
}
|
||||
}
|
||||
test(tt, tt.a, tt.b)
|
||||
test(tt, append(tt.a, extra[0]), append(tt.b, extra[0]))
|
||||
}
|
||||
}
|
||||
309
vendor/golang.org/x/text/collate/build/contract.go
generated
vendored
Normal file
309
vendor/golang.org/x/text/collate/build/contract.go
generated
vendored
Normal file
|
|
@ -0,0 +1,309 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"reflect"
|
||||
"sort"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
// This file contains code for detecting contractions and generating
|
||||
// the necessary tables.
|
||||
// Any Unicode Collation Algorithm (UCA) table entry that has more than
|
||||
// one rune one the left-hand side is called a contraction.
|
||||
// See http://www.unicode.org/reports/tr10/#Contractions for more details.
|
||||
//
|
||||
// We define the following terms:
|
||||
// initial: a rune that appears as the first rune in a contraction.
|
||||
// suffix: a sequence of runes succeeding the initial rune
|
||||
// in a given contraction.
|
||||
// non-initial: a rune that appears in a suffix.
|
||||
//
|
||||
// A rune may be both an initial and a non-initial and may be so in
|
||||
// many contractions. An initial may typically also appear by itself.
|
||||
// In case of ambiguities, the UCA requires we match the longest
|
||||
// contraction.
|
||||
//
|
||||
// Many contraction rules share the same set of possible suffixes.
|
||||
// We store sets of suffixes in a trie that associates an index with
|
||||
// each suffix in the set. This index can be used to look up a
|
||||
// collation element associated with the (starter rune, suffix) pair.
|
||||
//
|
||||
// The trie is defined on a UTF-8 byte sequence.
|
||||
// The overall trie is represented as an array of ctEntries. Each node of the trie
|
||||
// is represented as a subsequence of ctEntries, where each entry corresponds to
|
||||
// a possible match of a next character in the search string. An entry
|
||||
// also includes the length and offset to the next sequence of entries
|
||||
// to check in case of a match.
|
||||
|
||||
const (
|
||||
final = 0
|
||||
noIndex = 0xFF
|
||||
)
|
||||
|
||||
// ctEntry associates to a matching byte an offset and/or next sequence of
|
||||
// bytes to check. A ctEntry c is called final if a match means that the
|
||||
// longest suffix has been found. An entry c is final if c.N == 0.
|
||||
// A single final entry can match a range of characters to an offset.
|
||||
// A non-final entry always matches a single byte. Note that a non-final
|
||||
// entry might still resemble a completed suffix.
|
||||
// Examples:
|
||||
// The suffix strings "ab" and "ac" can be represented as:
|
||||
// []ctEntry{
|
||||
// {'a', 1, 1, noIndex}, // 'a' by itself does not match, so i is 0xFF.
|
||||
// {'b', 'c', 0, 1}, // "ab" -> 1, "ac" -> 2
|
||||
// }
|
||||
//
|
||||
// The suffix strings "ab", "abc", "abd", and "abcd" can be represented as:
|
||||
// []ctEntry{
|
||||
// {'a', 1, 1, noIndex}, // 'a' must be followed by 'b'.
|
||||
// {'b', 1, 2, 1}, // "ab" -> 1, may be followed by 'c' or 'd'.
|
||||
// {'d', 'd', final, 3}, // "abd" -> 3
|
||||
// {'c', 4, 1, 2}, // "abc" -> 2, may be followed by 'd'.
|
||||
// {'d', 'd', final, 4}, // "abcd" -> 4
|
||||
// }
|
||||
// See genStateTests in contract_test.go for more examples.
|
||||
type ctEntry struct {
|
||||
L uint8 // non-final: byte value to match; final: lowest match in range.
|
||||
H uint8 // non-final: relative index to next block; final: highest match in range.
|
||||
N uint8 // non-final: length of next block; final: final
|
||||
I uint8 // result offset. Will be noIndex if more bytes are needed to complete.
|
||||
}
|
||||
|
||||
// contractTrieSet holds a set of contraction tries. The tries are stored
|
||||
// consecutively in the entry field.
|
||||
type contractTrieSet []struct{ l, h, n, i uint8 }
|
||||
|
||||
// ctHandle is used to identify a trie in the trie set, consisting in an offset
|
||||
// in the array and the size of the first node.
|
||||
type ctHandle struct {
|
||||
index, n int
|
||||
}
|
||||
|
||||
// appendTrie adds a new trie for the given suffixes to the trie set and returns
|
||||
// a handle to it. The handle will be invalid on error.
|
||||
func appendTrie(ct *colltab.ContractTrieSet, suffixes []string) (ctHandle, error) {
|
||||
es := make([]stridx, len(suffixes))
|
||||
for i, s := range suffixes {
|
||||
es[i].str = s
|
||||
}
|
||||
sort.Sort(offsetSort(es))
|
||||
for i := range es {
|
||||
es[i].index = i + 1
|
||||
}
|
||||
sort.Sort(genidxSort(es))
|
||||
i := len(*ct)
|
||||
n, err := genStates(ct, es)
|
||||
if err != nil {
|
||||
*ct = (*ct)[:i]
|
||||
return ctHandle{}, err
|
||||
}
|
||||
return ctHandle{i, n}, nil
|
||||
}
|
||||
|
||||
// genStates generates ctEntries for a given suffix set and returns
|
||||
// the number of entries for the first node.
|
||||
func genStates(ct *colltab.ContractTrieSet, sis []stridx) (int, error) {
|
||||
if len(sis) == 0 {
|
||||
return 0, fmt.Errorf("genStates: list of suffices must be non-empty")
|
||||
}
|
||||
start := len(*ct)
|
||||
// create entries for differing first bytes.
|
||||
for _, si := range sis {
|
||||
s := si.str
|
||||
if len(s) == 0 {
|
||||
continue
|
||||
}
|
||||
added := false
|
||||
c := s[0]
|
||||
if len(s) > 1 {
|
||||
for j := len(*ct) - 1; j >= start; j-- {
|
||||
if (*ct)[j].L == c {
|
||||
added = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !added {
|
||||
*ct = append(*ct, ctEntry{L: c, I: noIndex})
|
||||
}
|
||||
} else {
|
||||
for j := len(*ct) - 1; j >= start; j-- {
|
||||
// Update the offset for longer suffixes with the same byte.
|
||||
if (*ct)[j].L == c {
|
||||
(*ct)[j].I = uint8(si.index)
|
||||
added = true
|
||||
}
|
||||
// Extend range of final ctEntry, if possible.
|
||||
if (*ct)[j].H+1 == c {
|
||||
(*ct)[j].H = c
|
||||
added = true
|
||||
}
|
||||
}
|
||||
if !added {
|
||||
*ct = append(*ct, ctEntry{L: c, H: c, N: final, I: uint8(si.index)})
|
||||
}
|
||||
}
|
||||
}
|
||||
n := len(*ct) - start
|
||||
// Append nodes for the remainder of the suffixes for each ctEntry.
|
||||
sp := 0
|
||||
for i, end := start, len(*ct); i < end; i++ {
|
||||
fe := (*ct)[i]
|
||||
if fe.H == 0 { // uninitialized non-final
|
||||
ln := len(*ct) - start - n
|
||||
if ln > 0xFF {
|
||||
return 0, fmt.Errorf("genStates: relative block offset too large: %d > 255", ln)
|
||||
}
|
||||
fe.H = uint8(ln)
|
||||
// Find first non-final strings with same byte as current entry.
|
||||
for ; sis[sp].str[0] != fe.L; sp++ {
|
||||
}
|
||||
se := sp + 1
|
||||
for ; se < len(sis) && len(sis[se].str) > 1 && sis[se].str[0] == fe.L; se++ {
|
||||
}
|
||||
sl := sis[sp:se]
|
||||
sp = se
|
||||
for i, si := range sl {
|
||||
sl[i].str = si.str[1:]
|
||||
}
|
||||
nn, err := genStates(ct, sl)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
fe.N = uint8(nn)
|
||||
(*ct)[i] = fe
|
||||
}
|
||||
}
|
||||
sort.Sort(entrySort((*ct)[start : start+n]))
|
||||
return n, nil
|
||||
}
|
||||
|
||||
// There may be both a final and non-final entry for a byte if the byte
|
||||
// is implied in a range of matches in the final entry.
|
||||
// We need to ensure that the non-final entry comes first in that case.
|
||||
type entrySort colltab.ContractTrieSet
|
||||
|
||||
func (fe entrySort) Len() int { return len(fe) }
|
||||
func (fe entrySort) Swap(i, j int) { fe[i], fe[j] = fe[j], fe[i] }
|
||||
func (fe entrySort) Less(i, j int) bool {
|
||||
return fe[i].L > fe[j].L
|
||||
}
|
||||
|
||||
// stridx is used for sorting suffixes and their associated offsets.
|
||||
type stridx struct {
|
||||
str string
|
||||
index int
|
||||
}
|
||||
|
||||
// For computing the offsets, we first sort by size, and then by string.
|
||||
// This ensures that strings that only differ in the last byte by 1
|
||||
// are sorted consecutively in increasing order such that they can
|
||||
// be packed as a range in a final ctEntry.
|
||||
type offsetSort []stridx
|
||||
|
||||
func (si offsetSort) Len() int { return len(si) }
|
||||
func (si offsetSort) Swap(i, j int) { si[i], si[j] = si[j], si[i] }
|
||||
func (si offsetSort) Less(i, j int) bool {
|
||||
if len(si[i].str) != len(si[j].str) {
|
||||
return len(si[i].str) > len(si[j].str)
|
||||
}
|
||||
return si[i].str < si[j].str
|
||||
}
|
||||
|
||||
// For indexing, we want to ensure that strings are sorted in string order, where
|
||||
// for strings with the same prefix, we put longer strings before shorter ones.
|
||||
type genidxSort []stridx
|
||||
|
||||
func (si genidxSort) Len() int { return len(si) }
|
||||
func (si genidxSort) Swap(i, j int) { si[i], si[j] = si[j], si[i] }
|
||||
func (si genidxSort) Less(i, j int) bool {
|
||||
if strings.HasPrefix(si[j].str, si[i].str) {
|
||||
return false
|
||||
}
|
||||
if strings.HasPrefix(si[i].str, si[j].str) {
|
||||
return true
|
||||
}
|
||||
return si[i].str < si[j].str
|
||||
}
|
||||
|
||||
// lookup matches the longest suffix in str and returns the associated offset
|
||||
// and the number of bytes consumed.
|
||||
func lookup(ct *colltab.ContractTrieSet, h ctHandle, str []byte) (index, ns int) {
|
||||
states := (*ct)[h.index:]
|
||||
p := 0
|
||||
n := h.n
|
||||
for i := 0; i < n && p < len(str); {
|
||||
e := states[i]
|
||||
c := str[p]
|
||||
if c >= e.L {
|
||||
if e.L == c {
|
||||
p++
|
||||
if e.I != noIndex {
|
||||
index, ns = int(e.I), p
|
||||
}
|
||||
if e.N != final {
|
||||
// set to new state
|
||||
i, states, n = 0, states[int(e.H)+n:], int(e.N)
|
||||
} else {
|
||||
return
|
||||
}
|
||||
continue
|
||||
} else if e.N == final && c <= e.H {
|
||||
p++
|
||||
return int(c-e.L) + int(e.I), p
|
||||
}
|
||||
}
|
||||
i++
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
// print writes the contractTrieSet t as compilable Go code to w. It returns
|
||||
// the total number of bytes written and the size of the resulting data structure in bytes.
|
||||
func print(t *colltab.ContractTrieSet, w io.Writer, name string) (n, size int, err error) {
|
||||
update3 := func(nn, sz int, e error) {
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
size += sz
|
||||
}
|
||||
update2 := func(nn int, e error) { update3(nn, 0, e) }
|
||||
|
||||
update3(printArray(*t, w, name))
|
||||
update2(fmt.Fprintf(w, "var %sContractTrieSet = ", name))
|
||||
update3(printStruct(*t, w, name))
|
||||
update2(fmt.Fprintln(w))
|
||||
return
|
||||
}
|
||||
|
||||
func printArray(ct colltab.ContractTrieSet, w io.Writer, name string) (n, size int, err error) {
|
||||
p := func(f string, a ...interface{}) {
|
||||
nn, e := fmt.Fprintf(w, f, a...)
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
}
|
||||
size = len(ct) * 4
|
||||
p("// %sCTEntries: %d entries, %d bytes\n", name, len(ct), size)
|
||||
p("var %sCTEntries = [%d]struct{L,H,N,I uint8}{\n", name, len(ct))
|
||||
for _, fe := range ct {
|
||||
p("\t{0x%X, 0x%X, %d, %d},\n", fe.L, fe.H, fe.N, fe.I)
|
||||
}
|
||||
p("}\n")
|
||||
return
|
||||
}
|
||||
|
||||
func printStruct(ct colltab.ContractTrieSet, w io.Writer, name string) (n, size int, err error) {
|
||||
n, err = fmt.Fprintf(w, "colltab.ContractTrieSet( %sCTEntries[:] )", name)
|
||||
size = int(reflect.TypeOf(ct).Size())
|
||||
return
|
||||
}
|
||||
266
vendor/golang.org/x/text/collate/build/contract_test.go
generated
vendored
Normal file
266
vendor/golang.org/x/text/collate/build/contract_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,266 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"sort"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
var largetosmall = []stridx{
|
||||
{"a", 5},
|
||||
{"ab", 4},
|
||||
{"abc", 3},
|
||||
{"abcd", 2},
|
||||
{"abcde", 1},
|
||||
{"abcdef", 0},
|
||||
}
|
||||
|
||||
var offsetSortTests = [][]stridx{
|
||||
{
|
||||
{"bcde", 1},
|
||||
{"bc", 5},
|
||||
{"ab", 4},
|
||||
{"bcd", 3},
|
||||
{"abcd", 0},
|
||||
{"abc", 2},
|
||||
},
|
||||
largetosmall,
|
||||
}
|
||||
|
||||
func TestOffsetSort(t *testing.T) {
|
||||
for i, st := range offsetSortTests {
|
||||
sort.Sort(offsetSort(st))
|
||||
for j, si := range st {
|
||||
if j != si.index {
|
||||
t.Errorf("%d: failed: %v", i, st)
|
||||
}
|
||||
}
|
||||
}
|
||||
for i, tt := range genStateTests {
|
||||
// ensure input is well-formed
|
||||
sort.Sort(offsetSort(tt.in))
|
||||
for j, si := range tt.in {
|
||||
if si.index != j+1 {
|
||||
t.Errorf("%dth sort failed: %v", i, tt.in)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var genidxtest1 = []stridx{
|
||||
{"bcde", 3},
|
||||
{"bc", 6},
|
||||
{"ab", 2},
|
||||
{"bcd", 5},
|
||||
{"abcd", 0},
|
||||
{"abc", 1},
|
||||
{"bcdf", 4},
|
||||
}
|
||||
|
||||
var genidxSortTests = [][]stridx{
|
||||
genidxtest1,
|
||||
largetosmall,
|
||||
}
|
||||
|
||||
func TestGenIdxSort(t *testing.T) {
|
||||
for i, st := range genidxSortTests {
|
||||
sort.Sort(genidxSort(st))
|
||||
for j, si := range st {
|
||||
if j != si.index {
|
||||
t.Errorf("%dth sort failed %v", i, st)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var entrySortTests = []colltab.ContractTrieSet{
|
||||
{
|
||||
{10, 0, 1, 3},
|
||||
{99, 0, 1, 0},
|
||||
{20, 50, 0, 2},
|
||||
{30, 0, 1, 1},
|
||||
},
|
||||
}
|
||||
|
||||
func TestEntrySort(t *testing.T) {
|
||||
for i, et := range entrySortTests {
|
||||
sort.Sort(entrySort(et))
|
||||
for j, fe := range et {
|
||||
if j != int(fe.I) {
|
||||
t.Errorf("%dth sort failed %v", i, et)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type GenStateTest struct {
|
||||
in []stridx
|
||||
firstBlockLen int
|
||||
out colltab.ContractTrieSet
|
||||
}
|
||||
|
||||
var genStateTests = []GenStateTest{
|
||||
{[]stridx{
|
||||
{"abc", 1},
|
||||
},
|
||||
1,
|
||||
colltab.ContractTrieSet{
|
||||
{'a', 0, 1, noIndex},
|
||||
{'b', 0, 1, noIndex},
|
||||
{'c', 'c', final, 1},
|
||||
},
|
||||
},
|
||||
{[]stridx{
|
||||
{"abc", 1},
|
||||
{"abd", 2},
|
||||
{"abe", 3},
|
||||
},
|
||||
1,
|
||||
colltab.ContractTrieSet{
|
||||
{'a', 0, 1, noIndex},
|
||||
{'b', 0, 1, noIndex},
|
||||
{'c', 'e', final, 1},
|
||||
},
|
||||
},
|
||||
{[]stridx{
|
||||
{"abc", 1},
|
||||
{"ab", 2},
|
||||
{"a", 3},
|
||||
},
|
||||
1,
|
||||
colltab.ContractTrieSet{
|
||||
{'a', 0, 1, 3},
|
||||
{'b', 0, 1, 2},
|
||||
{'c', 'c', final, 1},
|
||||
},
|
||||
},
|
||||
{[]stridx{
|
||||
{"abc", 1},
|
||||
{"abd", 2},
|
||||
{"ab", 3},
|
||||
{"ac", 4},
|
||||
{"a", 5},
|
||||
{"b", 6},
|
||||
},
|
||||
2,
|
||||
colltab.ContractTrieSet{
|
||||
{'b', 'b', final, 6},
|
||||
{'a', 0, 2, 5},
|
||||
{'c', 'c', final, 4},
|
||||
{'b', 0, 1, 3},
|
||||
{'c', 'd', final, 1},
|
||||
},
|
||||
},
|
||||
{[]stridx{
|
||||
{"bcde", 2},
|
||||
{"bc", 7},
|
||||
{"ab", 6},
|
||||
{"bcd", 5},
|
||||
{"abcd", 1},
|
||||
{"abc", 4},
|
||||
{"bcdf", 3},
|
||||
},
|
||||
2,
|
||||
colltab.ContractTrieSet{
|
||||
{'b', 3, 1, noIndex},
|
||||
{'a', 0, 1, noIndex},
|
||||
{'b', 0, 1, 6},
|
||||
{'c', 0, 1, 4},
|
||||
{'d', 'd', final, 1},
|
||||
{'c', 0, 1, 7},
|
||||
{'d', 0, 1, 5},
|
||||
{'e', 'f', final, 2},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
func TestGenStates(t *testing.T) {
|
||||
for i, tt := range genStateTests {
|
||||
si := []stridx{}
|
||||
for _, e := range tt.in {
|
||||
si = append(si, e)
|
||||
}
|
||||
// ensure input is well-formed
|
||||
sort.Sort(genidxSort(si))
|
||||
ct := colltab.ContractTrieSet{}
|
||||
n, _ := genStates(&ct, si)
|
||||
if nn := tt.firstBlockLen; nn != n {
|
||||
t.Errorf("%d: block len %v; want %v", i, n, nn)
|
||||
}
|
||||
if lv, lw := len(ct), len(tt.out); lv != lw {
|
||||
t.Errorf("%d: len %v; want %v", i, lv, lw)
|
||||
continue
|
||||
}
|
||||
for j, fe := range tt.out {
|
||||
const msg = "%d:%d: value %s=%v; want %v"
|
||||
if fe.L != ct[j].L {
|
||||
t.Errorf(msg, i, j, "l", ct[j].L, fe.L)
|
||||
}
|
||||
if fe.H != ct[j].H {
|
||||
t.Errorf(msg, i, j, "h", ct[j].H, fe.H)
|
||||
}
|
||||
if fe.N != ct[j].N {
|
||||
t.Errorf(msg, i, j, "n", ct[j].N, fe.N)
|
||||
}
|
||||
if fe.I != ct[j].I {
|
||||
t.Errorf(msg, i, j, "i", ct[j].I, fe.I)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestLookupContraction(t *testing.T) {
|
||||
for i, tt := range genStateTests {
|
||||
input := []string{}
|
||||
for _, e := range tt.in {
|
||||
input = append(input, e.str)
|
||||
}
|
||||
cts := colltab.ContractTrieSet{}
|
||||
h, _ := appendTrie(&cts, input)
|
||||
for j, si := range tt.in {
|
||||
str := si.str
|
||||
for _, s := range []string{str, str + "X"} {
|
||||
msg := "%d:%d: %s(%s) %v; want %v"
|
||||
idx, sn := lookup(&cts, h, []byte(s))
|
||||
if idx != si.index {
|
||||
t.Errorf(msg, i, j, "index", s, idx, si.index)
|
||||
}
|
||||
if sn != len(str) {
|
||||
t.Errorf(msg, i, j, "sn", s, sn, len(str))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestPrintContractionTrieSet(t *testing.T) {
|
||||
testdata := colltab.ContractTrieSet(genStateTests[4].out)
|
||||
buf := &bytes.Buffer{}
|
||||
print(&testdata, buf, "test")
|
||||
if contractTrieOutput != buf.String() {
|
||||
t.Errorf("output differs; found\n%s", buf.String())
|
||||
println(string(buf.Bytes()))
|
||||
}
|
||||
}
|
||||
|
||||
const contractTrieOutput = `// testCTEntries: 8 entries, 32 bytes
|
||||
var testCTEntries = [8]struct{L,H,N,I uint8}{
|
||||
{0x62, 0x3, 1, 255},
|
||||
{0x61, 0x0, 1, 255},
|
||||
{0x62, 0x0, 1, 6},
|
||||
{0x63, 0x0, 1, 4},
|
||||
{0x64, 0x64, 0, 1},
|
||||
{0x63, 0x0, 1, 7},
|
||||
{0x64, 0x0, 1, 5},
|
||||
{0x65, 0x66, 0, 2},
|
||||
}
|
||||
var testContractTrieSet = colltab.ContractTrieSet( testCTEntries[:] )
|
||||
`
|
||||
393
vendor/golang.org/x/text/collate/build/order.go
generated
vendored
Normal file
393
vendor/golang.org/x/text/collate/build/order.go
generated
vendored
Normal file
|
|
@ -0,0 +1,393 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"sort"
|
||||
"strings"
|
||||
"unicode"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
type logicalAnchor int
|
||||
|
||||
const (
|
||||
firstAnchor logicalAnchor = -1
|
||||
noAnchor = 0
|
||||
lastAnchor = 1
|
||||
)
|
||||
|
||||
// entry is used to keep track of a single entry in the collation element table
|
||||
// during building. Examples of entries can be found in the Default Unicode
|
||||
// Collation Element Table.
|
||||
// See http://www.unicode.org/Public/UCA/6.0.0/allkeys.txt.
|
||||
type entry struct {
|
||||
str string // same as string(runes)
|
||||
runes []rune
|
||||
elems []rawCE // the collation elements
|
||||
extend string // weights of extend to be appended to elems
|
||||
before bool // weights relative to next instead of previous.
|
||||
lock bool // entry is used in extension and can no longer be moved.
|
||||
|
||||
// prev, next, and level are used to keep track of tailorings.
|
||||
prev, next *entry
|
||||
level colltab.Level // next differs at this level
|
||||
skipRemove bool // do not unlink when removed
|
||||
|
||||
decompose bool // can use NFKD decomposition to generate elems
|
||||
exclude bool // do not include in table
|
||||
implicit bool // derived, is not included in the list
|
||||
modified bool // entry was modified in tailoring
|
||||
logical logicalAnchor
|
||||
|
||||
expansionIndex int // used to store index into expansion table
|
||||
contractionHandle ctHandle
|
||||
contractionIndex int // index into contraction elements
|
||||
}
|
||||
|
||||
func (e *entry) String() string {
|
||||
return fmt.Sprintf("%X (%q) -> %X (ch:%x; ci:%d, ei:%d)",
|
||||
e.runes, e.str, e.elems, e.contractionHandle, e.contractionIndex, e.expansionIndex)
|
||||
}
|
||||
|
||||
func (e *entry) skip() bool {
|
||||
return e.contraction()
|
||||
}
|
||||
|
||||
func (e *entry) expansion() bool {
|
||||
return !e.decompose && len(e.elems) > 1
|
||||
}
|
||||
|
||||
func (e *entry) contraction() bool {
|
||||
return len(e.runes) > 1
|
||||
}
|
||||
|
||||
func (e *entry) contractionStarter() bool {
|
||||
return e.contractionHandle.n != 0
|
||||
}
|
||||
|
||||
// nextIndexed gets the next entry that needs to be stored in the table.
|
||||
// It returns the entry and the collation level at which the next entry differs
|
||||
// from the current entry.
|
||||
// Entries that can be explicitly derived and logical reset positions are
|
||||
// examples of entries that will not be indexed.
|
||||
func (e *entry) nextIndexed() (*entry, colltab.Level) {
|
||||
level := e.level
|
||||
for e = e.next; e != nil && (e.exclude || len(e.elems) == 0); e = e.next {
|
||||
if e.level < level {
|
||||
level = e.level
|
||||
}
|
||||
}
|
||||
return e, level
|
||||
}
|
||||
|
||||
// remove unlinks entry e from the sorted chain and clears the collation
|
||||
// elements. e may not be at the front or end of the list. This should always
|
||||
// be the case, as the front and end of the list are always logical anchors,
|
||||
// which may not be removed.
|
||||
func (e *entry) remove() {
|
||||
if e.logical != noAnchor {
|
||||
log.Fatalf("may not remove anchor %q", e.str)
|
||||
}
|
||||
// TODO: need to set e.prev.level to e.level if e.level is smaller?
|
||||
e.elems = nil
|
||||
if !e.skipRemove {
|
||||
if e.prev != nil {
|
||||
e.prev.next = e.next
|
||||
}
|
||||
if e.next != nil {
|
||||
e.next.prev = e.prev
|
||||
}
|
||||
}
|
||||
e.skipRemove = false
|
||||
}
|
||||
|
||||
// insertAfter inserts n after e.
|
||||
func (e *entry) insertAfter(n *entry) {
|
||||
if e == n {
|
||||
panic("e == anchor")
|
||||
}
|
||||
if e == nil {
|
||||
panic("unexpected nil anchor")
|
||||
}
|
||||
n.remove()
|
||||
n.decompose = false // redo decomposition test
|
||||
|
||||
n.next = e.next
|
||||
n.prev = e
|
||||
if e.next != nil {
|
||||
e.next.prev = n
|
||||
}
|
||||
e.next = n
|
||||
}
|
||||
|
||||
// insertBefore inserts n before e.
|
||||
func (e *entry) insertBefore(n *entry) {
|
||||
if e == n {
|
||||
panic("e == anchor")
|
||||
}
|
||||
if e == nil {
|
||||
panic("unexpected nil anchor")
|
||||
}
|
||||
n.remove()
|
||||
n.decompose = false // redo decomposition test
|
||||
|
||||
n.prev = e.prev
|
||||
n.next = e
|
||||
if e.prev != nil {
|
||||
e.prev.next = n
|
||||
}
|
||||
e.prev = n
|
||||
}
|
||||
|
||||
func (e *entry) encodeBase() (ce uint32, err error) {
|
||||
switch {
|
||||
case e.expansion():
|
||||
ce, err = makeExpandIndex(e.expansionIndex)
|
||||
default:
|
||||
if e.decompose {
|
||||
log.Fatal("decompose should be handled elsewhere")
|
||||
}
|
||||
ce, err = makeCE(e.elems[0])
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func (e *entry) encode() (ce uint32, err error) {
|
||||
if e.skip() {
|
||||
log.Fatal("cannot build colElem for entry that should be skipped")
|
||||
}
|
||||
switch {
|
||||
case e.decompose:
|
||||
t1 := e.elems[0].w[2]
|
||||
t2 := 0
|
||||
if len(e.elems) > 1 {
|
||||
t2 = e.elems[1].w[2]
|
||||
}
|
||||
ce, err = makeDecompose(t1, t2)
|
||||
case e.contractionStarter():
|
||||
ce, err = makeContractIndex(e.contractionHandle, e.contractionIndex)
|
||||
default:
|
||||
if len(e.runes) > 1 {
|
||||
log.Fatal("colElem: contractions are handled in contraction trie")
|
||||
}
|
||||
ce, err = e.encodeBase()
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
// entryLess returns true if a sorts before b and false otherwise.
|
||||
func entryLess(a, b *entry) bool {
|
||||
if res, _ := compareWeights(a.elems, b.elems); res != 0 {
|
||||
return res == -1
|
||||
}
|
||||
if a.logical != noAnchor {
|
||||
return a.logical == firstAnchor
|
||||
}
|
||||
if b.logical != noAnchor {
|
||||
return b.logical == lastAnchor
|
||||
}
|
||||
return a.str < b.str
|
||||
}
|
||||
|
||||
type sortedEntries []*entry
|
||||
|
||||
func (s sortedEntries) Len() int {
|
||||
return len(s)
|
||||
}
|
||||
|
||||
func (s sortedEntries) Swap(i, j int) {
|
||||
s[i], s[j] = s[j], s[i]
|
||||
}
|
||||
|
||||
func (s sortedEntries) Less(i, j int) bool {
|
||||
return entryLess(s[i], s[j])
|
||||
}
|
||||
|
||||
type ordering struct {
|
||||
id string
|
||||
entryMap map[string]*entry
|
||||
ordered []*entry
|
||||
handle *trieHandle
|
||||
}
|
||||
|
||||
// insert inserts e into both entryMap and ordered.
|
||||
// Note that insert simply appends e to ordered. To reattain a sorted
|
||||
// order, o.sort() should be called.
|
||||
func (o *ordering) insert(e *entry) {
|
||||
if e.logical == noAnchor {
|
||||
o.entryMap[e.str] = e
|
||||
} else {
|
||||
// Use key format as used in UCA rules.
|
||||
o.entryMap[fmt.Sprintf("[%s]", e.str)] = e
|
||||
// Also add index entry for XML format.
|
||||
o.entryMap[fmt.Sprintf("<%s/>", strings.Replace(e.str, " ", "_", -1))] = e
|
||||
}
|
||||
o.ordered = append(o.ordered, e)
|
||||
}
|
||||
|
||||
// newEntry creates a new entry for the given info and inserts it into
|
||||
// the index.
|
||||
func (o *ordering) newEntry(s string, ces []rawCE) *entry {
|
||||
e := &entry{
|
||||
runes: []rune(s),
|
||||
elems: ces,
|
||||
str: s,
|
||||
}
|
||||
o.insert(e)
|
||||
return e
|
||||
}
|
||||
|
||||
// find looks up and returns the entry for the given string.
|
||||
// It returns nil if str is not in the index and if an implicit value
|
||||
// cannot be derived, that is, if str represents more than one rune.
|
||||
func (o *ordering) find(str string) *entry {
|
||||
e := o.entryMap[str]
|
||||
if e == nil {
|
||||
r := []rune(str)
|
||||
if len(r) == 1 {
|
||||
const (
|
||||
firstHangul = 0xAC00
|
||||
lastHangul = 0xD7A3
|
||||
)
|
||||
if r[0] >= firstHangul && r[0] <= lastHangul {
|
||||
ce := []rawCE{}
|
||||
nfd := norm.NFD.String(str)
|
||||
for _, r := range nfd {
|
||||
ce = append(ce, o.find(string(r)).elems...)
|
||||
}
|
||||
e = o.newEntry(nfd, ce)
|
||||
} else {
|
||||
e = o.newEntry(string(r[0]), []rawCE{
|
||||
{w: []int{
|
||||
implicitPrimary(r[0]),
|
||||
defaultSecondary,
|
||||
defaultTertiary,
|
||||
int(r[0]),
|
||||
},
|
||||
},
|
||||
})
|
||||
e.modified = true
|
||||
}
|
||||
e.exclude = true // do not index implicits
|
||||
}
|
||||
}
|
||||
return e
|
||||
}
|
||||
|
||||
// makeRootOrdering returns a newly initialized ordering value and populates
|
||||
// it with a set of logical reset points that can be used as anchors.
|
||||
// The anchors first_tertiary_ignorable and __END__ will always sort at
|
||||
// the beginning and end, respectively. This means that prev and next are non-nil
|
||||
// for any indexed entry.
|
||||
func makeRootOrdering() ordering {
|
||||
const max = unicode.MaxRune
|
||||
o := ordering{
|
||||
entryMap: make(map[string]*entry),
|
||||
}
|
||||
insert := func(typ logicalAnchor, s string, ce []int) {
|
||||
e := &entry{
|
||||
elems: []rawCE{{w: ce}},
|
||||
str: s,
|
||||
exclude: true,
|
||||
logical: typ,
|
||||
}
|
||||
o.insert(e)
|
||||
}
|
||||
insert(firstAnchor, "first tertiary ignorable", []int{0, 0, 0, 0})
|
||||
insert(lastAnchor, "last tertiary ignorable", []int{0, 0, 0, max})
|
||||
insert(lastAnchor, "last primary ignorable", []int{0, defaultSecondary, defaultTertiary, max})
|
||||
insert(lastAnchor, "last non ignorable", []int{maxPrimary, defaultSecondary, defaultTertiary, max})
|
||||
insert(lastAnchor, "__END__", []int{1 << maxPrimaryBits, defaultSecondary, defaultTertiary, max})
|
||||
return o
|
||||
}
|
||||
|
||||
// patchForInsert eleminates entries from the list with more than one collation element.
|
||||
// The next and prev fields of the eliminated entries still point to appropriate
|
||||
// values in the newly created list.
|
||||
// It requires that sort has been called.
|
||||
func (o *ordering) patchForInsert() {
|
||||
for i := 0; i < len(o.ordered)-1; {
|
||||
e := o.ordered[i]
|
||||
lev := e.level
|
||||
n := e.next
|
||||
for ; n != nil && len(n.elems) > 1; n = n.next {
|
||||
if n.level < lev {
|
||||
lev = n.level
|
||||
}
|
||||
n.skipRemove = true
|
||||
}
|
||||
for ; o.ordered[i] != n; i++ {
|
||||
o.ordered[i].level = lev
|
||||
o.ordered[i].next = n
|
||||
o.ordered[i+1].prev = e
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// clone copies all ordering of es into a new ordering value.
|
||||
func (o *ordering) clone() *ordering {
|
||||
o.sort()
|
||||
oo := ordering{
|
||||
entryMap: make(map[string]*entry),
|
||||
}
|
||||
for _, e := range o.ordered {
|
||||
ne := &entry{
|
||||
runes: e.runes,
|
||||
elems: e.elems,
|
||||
str: e.str,
|
||||
decompose: e.decompose,
|
||||
exclude: e.exclude,
|
||||
logical: e.logical,
|
||||
}
|
||||
oo.insert(ne)
|
||||
}
|
||||
oo.sort() // link all ordering.
|
||||
oo.patchForInsert()
|
||||
return &oo
|
||||
}
|
||||
|
||||
// front returns the first entry to be indexed.
|
||||
// It assumes that sort() has been called.
|
||||
func (o *ordering) front() *entry {
|
||||
e := o.ordered[0]
|
||||
if e.prev != nil {
|
||||
log.Panicf("unexpected first entry: %v", e)
|
||||
}
|
||||
// The first entry is always a logical position, which should not be indexed.
|
||||
e, _ = e.nextIndexed()
|
||||
return e
|
||||
}
|
||||
|
||||
// sort sorts all ordering based on their collation elements and initializes
|
||||
// the prev, next, and level fields accordingly.
|
||||
func (o *ordering) sort() {
|
||||
sort.Sort(sortedEntries(o.ordered))
|
||||
l := o.ordered
|
||||
for i := 1; i < len(l); i++ {
|
||||
k := i - 1
|
||||
l[k].next = l[i]
|
||||
_, l[k].level = compareWeights(l[k].elems, l[i].elems)
|
||||
l[i].prev = l[k]
|
||||
}
|
||||
}
|
||||
|
||||
// genColElems generates a collation element array from the runes in str. This
|
||||
// assumes that all collation elements have already been added to the Builder.
|
||||
func (o *ordering) genColElems(str string) []rawCE {
|
||||
elems := []rawCE{}
|
||||
for _, r := range []rune(str) {
|
||||
for _, ce := range o.find(string(r)).elems {
|
||||
if ce.w[0] != 0 || ce.w[1] != 0 || ce.w[2] != 0 {
|
||||
elems = append(elems, ce)
|
||||
}
|
||||
}
|
||||
}
|
||||
return elems
|
||||
}
|
||||
229
vendor/golang.org/x/text/collate/build/order_test.go
generated
vendored
Normal file
229
vendor/golang.org/x/text/collate/build/order_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,229 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"strconv"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
type entryTest struct {
|
||||
f func(in []int) (uint32, error)
|
||||
arg []int
|
||||
val uint32
|
||||
}
|
||||
|
||||
// makeList returns a list of entries of length n+2, with n normal
|
||||
// entries plus a leading and trailing anchor.
|
||||
func makeList(n int) []*entry {
|
||||
es := make([]*entry, n+2)
|
||||
weights := []rawCE{{w: []int{100, 20, 5, 0}}}
|
||||
for i := range es {
|
||||
runes := []rune{rune(i)}
|
||||
es[i] = &entry{
|
||||
runes: runes,
|
||||
elems: weights,
|
||||
}
|
||||
weights = nextWeight(colltab.Primary, weights)
|
||||
}
|
||||
for i := 1; i < len(es); i++ {
|
||||
es[i-1].next = es[i]
|
||||
es[i].prev = es[i-1]
|
||||
_, es[i-1].level = compareWeights(es[i-1].elems, es[i].elems)
|
||||
}
|
||||
es[0].exclude = true
|
||||
es[0].logical = firstAnchor
|
||||
es[len(es)-1].exclude = true
|
||||
es[len(es)-1].logical = lastAnchor
|
||||
return es
|
||||
}
|
||||
|
||||
func TestNextIndexed(t *testing.T) {
|
||||
const n = 5
|
||||
es := makeList(n)
|
||||
for i := int64(0); i < 1<<n; i++ {
|
||||
mask := strconv.FormatInt(i+(1<<n), 2)
|
||||
for i, c := range mask {
|
||||
es[i].exclude = c == '1'
|
||||
}
|
||||
e := es[0]
|
||||
for i, c := range mask {
|
||||
if c == '0' {
|
||||
e, _ = e.nextIndexed()
|
||||
if e != es[i] {
|
||||
t.Errorf("%d: expected entry %d; found %d", i, es[i].elems, e.elems)
|
||||
}
|
||||
}
|
||||
}
|
||||
if e, _ = e.nextIndexed(); e != nil {
|
||||
t.Errorf("%d: expected nil entry; found %d", i, e.elems)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestRemove(t *testing.T) {
|
||||
const n = 5
|
||||
for i := int64(0); i < 1<<n; i++ {
|
||||
es := makeList(n)
|
||||
mask := strconv.FormatInt(i+(1<<n), 2)
|
||||
for i, c := range mask {
|
||||
if c == '0' {
|
||||
es[i].remove()
|
||||
}
|
||||
}
|
||||
e := es[0]
|
||||
for i, c := range mask {
|
||||
if c == '1' {
|
||||
if e != es[i] {
|
||||
t.Errorf("%d: expected entry %d; found %d", i, es[i].elems, e.elems)
|
||||
}
|
||||
e, _ = e.nextIndexed()
|
||||
}
|
||||
}
|
||||
if e != nil {
|
||||
t.Errorf("%d: expected nil entry; found %d", i, e.elems)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// nextPerm generates the next permutation of the array. The starting
|
||||
// permutation is assumed to be a list of integers sorted in increasing order.
|
||||
// It returns false if there are no more permuations left.
|
||||
func nextPerm(a []int) bool {
|
||||
i := len(a) - 2
|
||||
for ; i >= 0; i-- {
|
||||
if a[i] < a[i+1] {
|
||||
break
|
||||
}
|
||||
}
|
||||
if i < 0 {
|
||||
return false
|
||||
}
|
||||
for j := len(a) - 1; j >= i; j-- {
|
||||
if a[j] > a[i] {
|
||||
a[i], a[j] = a[j], a[i]
|
||||
break
|
||||
}
|
||||
}
|
||||
for j := i + 1; j < (len(a)+i+1)/2; j++ {
|
||||
a[j], a[len(a)+i-j] = a[len(a)+i-j], a[j]
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
func TestInsertAfter(t *testing.T) {
|
||||
const n = 5
|
||||
orig := makeList(n)
|
||||
perm := make([]int, n)
|
||||
for i := range perm {
|
||||
perm[i] = i + 1
|
||||
}
|
||||
for ok := true; ok; ok = nextPerm(perm) {
|
||||
es := makeList(n)
|
||||
last := es[0]
|
||||
for _, i := range perm {
|
||||
last.insertAfter(es[i])
|
||||
last = es[i]
|
||||
}
|
||||
for _, e := range es {
|
||||
e.elems = es[0].elems
|
||||
}
|
||||
e := es[0]
|
||||
for _, i := range perm {
|
||||
e, _ = e.nextIndexed()
|
||||
if e.runes[0] != orig[i].runes[0] {
|
||||
t.Errorf("%d:%d: expected entry %X; found %X", perm, i, orig[i].runes, e.runes)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestInsertBefore(t *testing.T) {
|
||||
const n = 5
|
||||
orig := makeList(n)
|
||||
perm := make([]int, n)
|
||||
for i := range perm {
|
||||
perm[i] = i + 1
|
||||
}
|
||||
for ok := true; ok; ok = nextPerm(perm) {
|
||||
es := makeList(n)
|
||||
last := es[len(es)-1]
|
||||
for _, i := range perm {
|
||||
last.insertBefore(es[i])
|
||||
last = es[i]
|
||||
}
|
||||
for _, e := range es {
|
||||
e.elems = es[0].elems
|
||||
}
|
||||
e := es[0]
|
||||
for i := n - 1; i >= 0; i-- {
|
||||
e, _ = e.nextIndexed()
|
||||
if e.runes[0] != rune(perm[i]) {
|
||||
t.Errorf("%d:%d: expected entry %X; found %X", perm, i, orig[i].runes, e.runes)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type entryLessTest struct {
|
||||
a, b *entry
|
||||
res bool
|
||||
}
|
||||
|
||||
var (
|
||||
w1 = []rawCE{{w: []int{100, 20, 5, 5}}}
|
||||
w2 = []rawCE{{w: []int{101, 20, 5, 5}}}
|
||||
)
|
||||
|
||||
var entryLessTests = []entryLessTest{
|
||||
{&entry{str: "a", elems: w1},
|
||||
&entry{str: "a", elems: w1},
|
||||
false,
|
||||
},
|
||||
{&entry{str: "a", elems: w1},
|
||||
&entry{str: "a", elems: w2},
|
||||
true,
|
||||
},
|
||||
{&entry{str: "a", elems: w1},
|
||||
&entry{str: "b", elems: w1},
|
||||
true,
|
||||
},
|
||||
{&entry{str: "a", elems: w2},
|
||||
&entry{str: "a", elems: w1},
|
||||
false,
|
||||
},
|
||||
{&entry{str: "c", elems: w1},
|
||||
&entry{str: "b", elems: w1},
|
||||
false,
|
||||
},
|
||||
{&entry{str: "a", elems: w1, logical: firstAnchor},
|
||||
&entry{str: "a", elems: w1},
|
||||
true,
|
||||
},
|
||||
{&entry{str: "a", elems: w1},
|
||||
&entry{str: "b", elems: w1, logical: firstAnchor},
|
||||
false,
|
||||
},
|
||||
{&entry{str: "b", elems: w1},
|
||||
&entry{str: "a", elems: w1, logical: lastAnchor},
|
||||
true,
|
||||
},
|
||||
{&entry{str: "a", elems: w1, logical: lastAnchor},
|
||||
&entry{str: "c", elems: w1},
|
||||
false,
|
||||
},
|
||||
}
|
||||
|
||||
func TestEntryLess(t *testing.T) {
|
||||
for i, tt := range entryLessTests {
|
||||
if res := entryLess(tt.a, tt.b); res != tt.res {
|
||||
t.Errorf("%d: was %v; want %v", i, res, tt.res)
|
||||
}
|
||||
}
|
||||
}
|
||||
81
vendor/golang.org/x/text/collate/build/table.go
generated
vendored
Normal file
81
vendor/golang.org/x/text/collate/build/table.go
generated
vendored
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"reflect"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
// table is an intermediate structure that roughly resembles the table in collate.
|
||||
type table struct {
|
||||
colltab.Table
|
||||
trie trie
|
||||
root *trieHandle
|
||||
}
|
||||
|
||||
// print writes the table as Go compilable code to w. It prefixes the
|
||||
// variable names with name. It returns the number of bytes written
|
||||
// and the size of the resulting table.
|
||||
func (t *table) fprint(w io.Writer, name string) (n, size int, err error) {
|
||||
update := func(nn, sz int, e error) {
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
size += sz
|
||||
}
|
||||
// Write arrays needed for the structure.
|
||||
update(printColElems(w, t.ExpandElem, name+"ExpandElem"))
|
||||
update(printColElems(w, t.ContractElem, name+"ContractElem"))
|
||||
update(t.trie.printArrays(w, name))
|
||||
update(printArray(t.ContractTries, w, name))
|
||||
|
||||
nn, e := fmt.Fprintf(w, "// Total size of %sTable is %d bytes\n", name, size)
|
||||
update(nn, 0, e)
|
||||
return
|
||||
}
|
||||
|
||||
func (t *table) fprintIndex(w io.Writer, h *trieHandle, id string) (n int, err error) {
|
||||
p := func(f string, a ...interface{}) {
|
||||
nn, e := fmt.Fprintf(w, f, a...)
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
}
|
||||
p("\t{ // %s\n", id)
|
||||
p("\t\tlookupOffset: 0x%x,\n", h.lookupStart)
|
||||
p("\t\tvaluesOffset: 0x%x,\n", h.valueStart)
|
||||
p("\t},\n")
|
||||
return
|
||||
}
|
||||
|
||||
func printColElems(w io.Writer, a []uint32, name string) (n, sz int, err error) {
|
||||
p := func(f string, a ...interface{}) {
|
||||
nn, e := fmt.Fprintf(w, f, a...)
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
}
|
||||
sz = len(a) * int(reflect.TypeOf(uint32(0)).Size())
|
||||
p("// %s: %d entries, %d bytes\n", name, len(a), sz)
|
||||
p("var %s = [%d]uint32 {", name, len(a))
|
||||
for i, c := range a {
|
||||
switch {
|
||||
case i%64 == 0:
|
||||
p("\n\t// Block %d, offset 0x%x\n", i/64, i)
|
||||
case (i%64)%6 == 0:
|
||||
p("\n\t")
|
||||
}
|
||||
p("0x%.8X, ", c)
|
||||
}
|
||||
p("\n}\n\n")
|
||||
return
|
||||
}
|
||||
290
vendor/golang.org/x/text/collate/build/trie.go
generated
vendored
Normal file
290
vendor/golang.org/x/text/collate/build/trie.go
generated
vendored
Normal file
|
|
@ -0,0 +1,290 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// The trie in this file is used to associate the first full character
|
||||
// in a UTF-8 string to a collation element.
|
||||
// All but the last byte in a UTF-8 byte sequence are
|
||||
// used to look up offsets in the index table to be used for the next byte.
|
||||
// The last byte is used to index into a table of collation elements.
|
||||
// This file contains the code for the generation of the trie.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"hash/fnv"
|
||||
"io"
|
||||
"reflect"
|
||||
)
|
||||
|
||||
const (
|
||||
blockSize = 64
|
||||
blockOffset = 2 // Subtract 2 blocks to compensate for the 0x80 added to continuation bytes.
|
||||
)
|
||||
|
||||
type trieHandle struct {
|
||||
lookupStart uint16 // offset in table for first byte
|
||||
valueStart uint16 // offset in table for first byte
|
||||
}
|
||||
|
||||
type trie struct {
|
||||
index []uint16
|
||||
values []uint32
|
||||
}
|
||||
|
||||
// trieNode is the intermediate trie structure used for generating a trie.
|
||||
type trieNode struct {
|
||||
index []*trieNode
|
||||
value []uint32
|
||||
b byte
|
||||
refValue uint16
|
||||
refIndex uint16
|
||||
}
|
||||
|
||||
func newNode() *trieNode {
|
||||
return &trieNode{
|
||||
index: make([]*trieNode, 64),
|
||||
value: make([]uint32, 128), // root node size is 128 instead of 64
|
||||
}
|
||||
}
|
||||
|
||||
func (n *trieNode) isInternal() bool {
|
||||
return n.value != nil
|
||||
}
|
||||
|
||||
func (n *trieNode) insert(r rune, value uint32) {
|
||||
const maskx = 0x3F // mask out two most-significant bits
|
||||
str := string(r)
|
||||
if len(str) == 1 {
|
||||
n.value[str[0]] = value
|
||||
return
|
||||
}
|
||||
for i := 0; i < len(str)-1; i++ {
|
||||
b := str[i] & maskx
|
||||
if n.index == nil {
|
||||
n.index = make([]*trieNode, blockSize)
|
||||
}
|
||||
nn := n.index[b]
|
||||
if nn == nil {
|
||||
nn = &trieNode{}
|
||||
nn.b = b
|
||||
n.index[b] = nn
|
||||
}
|
||||
n = nn
|
||||
}
|
||||
if n.value == nil {
|
||||
n.value = make([]uint32, blockSize)
|
||||
}
|
||||
b := str[len(str)-1] & maskx
|
||||
n.value[b] = value
|
||||
}
|
||||
|
||||
type trieBuilder struct {
|
||||
t *trie
|
||||
|
||||
roots []*trieHandle
|
||||
|
||||
lookupBlocks []*trieNode
|
||||
valueBlocks []*trieNode
|
||||
|
||||
lookupBlockIdx map[uint32]*trieNode
|
||||
valueBlockIdx map[uint32]*trieNode
|
||||
}
|
||||
|
||||
func newTrieBuilder() *trieBuilder {
|
||||
index := &trieBuilder{}
|
||||
index.lookupBlocks = make([]*trieNode, 0)
|
||||
index.valueBlocks = make([]*trieNode, 0)
|
||||
index.lookupBlockIdx = make(map[uint32]*trieNode)
|
||||
index.valueBlockIdx = make(map[uint32]*trieNode)
|
||||
// The third nil is the default null block. The other two blocks
|
||||
// are used to guarantee an offset of at least 3 for each block.
|
||||
index.lookupBlocks = append(index.lookupBlocks, nil, nil, nil)
|
||||
index.t = &trie{}
|
||||
return index
|
||||
}
|
||||
|
||||
func (b *trieBuilder) computeOffsets(n *trieNode) *trieNode {
|
||||
hasher := fnv.New32()
|
||||
if n.index != nil {
|
||||
for i, nn := range n.index {
|
||||
var vi, vv uint16
|
||||
if nn != nil {
|
||||
nn = b.computeOffsets(nn)
|
||||
n.index[i] = nn
|
||||
vi = nn.refIndex
|
||||
vv = nn.refValue
|
||||
}
|
||||
hasher.Write([]byte{byte(vi >> 8), byte(vi)})
|
||||
hasher.Write([]byte{byte(vv >> 8), byte(vv)})
|
||||
}
|
||||
h := hasher.Sum32()
|
||||
nn, ok := b.lookupBlockIdx[h]
|
||||
if !ok {
|
||||
n.refIndex = uint16(len(b.lookupBlocks)) - blockOffset
|
||||
b.lookupBlocks = append(b.lookupBlocks, n)
|
||||
b.lookupBlockIdx[h] = n
|
||||
} else {
|
||||
n = nn
|
||||
}
|
||||
} else {
|
||||
for _, v := range n.value {
|
||||
hasher.Write([]byte{byte(v >> 24), byte(v >> 16), byte(v >> 8), byte(v)})
|
||||
}
|
||||
h := hasher.Sum32()
|
||||
nn, ok := b.valueBlockIdx[h]
|
||||
if !ok {
|
||||
n.refValue = uint16(len(b.valueBlocks)) - blockOffset
|
||||
n.refIndex = n.refValue
|
||||
b.valueBlocks = append(b.valueBlocks, n)
|
||||
b.valueBlockIdx[h] = n
|
||||
} else {
|
||||
n = nn
|
||||
}
|
||||
}
|
||||
return n
|
||||
}
|
||||
|
||||
func (b *trieBuilder) addStartValueBlock(n *trieNode) uint16 {
|
||||
hasher := fnv.New32()
|
||||
for _, v := range n.value[:2*blockSize] {
|
||||
hasher.Write([]byte{byte(v >> 24), byte(v >> 16), byte(v >> 8), byte(v)})
|
||||
}
|
||||
h := hasher.Sum32()
|
||||
nn, ok := b.valueBlockIdx[h]
|
||||
if !ok {
|
||||
n.refValue = uint16(len(b.valueBlocks))
|
||||
n.refIndex = n.refValue
|
||||
b.valueBlocks = append(b.valueBlocks, n)
|
||||
// Add a dummy block to accommodate the double block size.
|
||||
b.valueBlocks = append(b.valueBlocks, nil)
|
||||
b.valueBlockIdx[h] = n
|
||||
} else {
|
||||
n = nn
|
||||
}
|
||||
return n.refValue
|
||||
}
|
||||
|
||||
func genValueBlock(t *trie, n *trieNode) {
|
||||
if n != nil {
|
||||
for _, v := range n.value {
|
||||
t.values = append(t.values, v)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func genLookupBlock(t *trie, n *trieNode) {
|
||||
for _, nn := range n.index {
|
||||
v := uint16(0)
|
||||
if nn != nil {
|
||||
if n.index != nil {
|
||||
v = nn.refIndex
|
||||
} else {
|
||||
v = nn.refValue
|
||||
}
|
||||
}
|
||||
t.index = append(t.index, v)
|
||||
}
|
||||
}
|
||||
|
||||
func (b *trieBuilder) addTrie(n *trieNode) *trieHandle {
|
||||
h := &trieHandle{}
|
||||
b.roots = append(b.roots, h)
|
||||
h.valueStart = b.addStartValueBlock(n)
|
||||
if len(b.roots) == 1 {
|
||||
// We insert a null block after the first start value block.
|
||||
// This ensures that continuation bytes UTF-8 sequences of length
|
||||
// greater than 2 will automatically hit a null block if there
|
||||
// was an undefined entry.
|
||||
b.valueBlocks = append(b.valueBlocks, nil)
|
||||
}
|
||||
n = b.computeOffsets(n)
|
||||
// Offset by one extra block as the first byte starts at 0xC0 instead of 0x80.
|
||||
h.lookupStart = n.refIndex - 1
|
||||
return h
|
||||
}
|
||||
|
||||
// generate generates and returns the trie for n.
|
||||
func (b *trieBuilder) generate() (t *trie, err error) {
|
||||
t = b.t
|
||||
if len(b.valueBlocks) >= 1<<16 {
|
||||
return nil, fmt.Errorf("maximum number of value blocks exceeded (%d > %d)", len(b.valueBlocks), 1<<16)
|
||||
}
|
||||
if len(b.lookupBlocks) >= 1<<16 {
|
||||
return nil, fmt.Errorf("maximum number of lookup blocks exceeded (%d > %d)", len(b.lookupBlocks), 1<<16)
|
||||
}
|
||||
genValueBlock(t, b.valueBlocks[0])
|
||||
genValueBlock(t, &trieNode{value: make([]uint32, 64)})
|
||||
for i := 2; i < len(b.valueBlocks); i++ {
|
||||
genValueBlock(t, b.valueBlocks[i])
|
||||
}
|
||||
n := &trieNode{index: make([]*trieNode, 64)}
|
||||
genLookupBlock(t, n)
|
||||
genLookupBlock(t, n)
|
||||
genLookupBlock(t, n)
|
||||
for i := 3; i < len(b.lookupBlocks); i++ {
|
||||
genLookupBlock(t, b.lookupBlocks[i])
|
||||
}
|
||||
return b.t, nil
|
||||
}
|
||||
|
||||
func (t *trie) printArrays(w io.Writer, name string) (n, size int, err error) {
|
||||
p := func(f string, a ...interface{}) {
|
||||
nn, e := fmt.Fprintf(w, f, a...)
|
||||
n += nn
|
||||
if err == nil {
|
||||
err = e
|
||||
}
|
||||
}
|
||||
nv := len(t.values)
|
||||
p("// %sValues: %d entries, %d bytes\n", name, nv, nv*4)
|
||||
p("// Block 2 is the null block.\n")
|
||||
p("var %sValues = [%d]uint32 {", name, nv)
|
||||
var printnewline bool
|
||||
for i, v := range t.values {
|
||||
if i%blockSize == 0 {
|
||||
p("\n\t// Block %#x, offset %#x", i/blockSize, i)
|
||||
}
|
||||
if i%4 == 0 {
|
||||
printnewline = true
|
||||
}
|
||||
if v != 0 {
|
||||
if printnewline {
|
||||
p("\n\t")
|
||||
printnewline = false
|
||||
}
|
||||
p("%#04x:%#08x, ", i, v)
|
||||
}
|
||||
}
|
||||
p("\n}\n\n")
|
||||
ni := len(t.index)
|
||||
p("// %sLookup: %d entries, %d bytes\n", name, ni, ni*2)
|
||||
p("// Block 0 is the null block.\n")
|
||||
p("var %sLookup = [%d]uint16 {", name, ni)
|
||||
printnewline = false
|
||||
for i, v := range t.index {
|
||||
if i%blockSize == 0 {
|
||||
p("\n\t// Block %#x, offset %#x", i/blockSize, i)
|
||||
}
|
||||
if i%8 == 0 {
|
||||
printnewline = true
|
||||
}
|
||||
if v != 0 {
|
||||
if printnewline {
|
||||
p("\n\t")
|
||||
printnewline = false
|
||||
}
|
||||
p("%#03x:%#02x, ", i, v)
|
||||
}
|
||||
}
|
||||
p("\n}\n\n")
|
||||
return n, nv*4 + ni*2, err
|
||||
}
|
||||
|
||||
func (t *trie) printStruct(w io.Writer, handle *trieHandle, name string) (n, sz int, err error) {
|
||||
const msg = "trie{ %sLookup[%d:], %sValues[%d:], %sLookup[:], %sValues[:]}"
|
||||
n, err = fmt.Fprintf(w, msg, name, handle.lookupStart*blockSize, name, handle.valueStart*blockSize, name, name)
|
||||
sz += int(reflect.TypeOf(trie{}).Size())
|
||||
return
|
||||
}
|
||||
107
vendor/golang.org/x/text/collate/build/trie_test.go
generated
vendored
Normal file
107
vendor/golang.org/x/text/collate/build/trie_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,107 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package build
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// We take the smallest, largest and an arbitrary value for each
|
||||
// of the UTF-8 sequence lengths.
|
||||
var testRunes = []rune{
|
||||
0x01, 0x0C, 0x7F, // 1-byte sequences
|
||||
0x80, 0x100, 0x7FF, // 2-byte sequences
|
||||
0x800, 0x999, 0xFFFF, // 3-byte sequences
|
||||
0x10000, 0x10101, 0x10FFFF, // 4-byte sequences
|
||||
0x200, 0x201, 0x202, 0x210, 0x215, // five entries in one sparse block
|
||||
}
|
||||
|
||||
func makeTestTrie(t *testing.T) trie {
|
||||
n := newNode()
|
||||
for i, r := range testRunes {
|
||||
n.insert(r, uint32(i))
|
||||
}
|
||||
idx := newTrieBuilder()
|
||||
idx.addTrie(n)
|
||||
tr, err := idx.generate()
|
||||
if err != nil {
|
||||
t.Errorf(err.Error())
|
||||
}
|
||||
return *tr
|
||||
}
|
||||
|
||||
func TestGenerateTrie(t *testing.T) {
|
||||
testdata := makeTestTrie(t)
|
||||
buf := &bytes.Buffer{}
|
||||
testdata.printArrays(buf, "test")
|
||||
fmt.Fprintf(buf, "var testTrie = ")
|
||||
testdata.printStruct(buf, &trieHandle{19, 0}, "test")
|
||||
if output != buf.String() {
|
||||
t.Error("output differs")
|
||||
}
|
||||
}
|
||||
|
||||
var output = `// testValues: 832 entries, 3328 bytes
|
||||
// Block 2 is the null block.
|
||||
var testValues = [832]uint32 {
|
||||
// Block 0x0, offset 0x0
|
||||
0x000c:0x00000001,
|
||||
// Block 0x1, offset 0x40
|
||||
0x007f:0x00000002,
|
||||
// Block 0x2, offset 0x80
|
||||
// Block 0x3, offset 0xc0
|
||||
0x00c0:0x00000003,
|
||||
// Block 0x4, offset 0x100
|
||||
0x0100:0x00000004,
|
||||
// Block 0x5, offset 0x140
|
||||
0x0140:0x0000000c, 0x0141:0x0000000d, 0x0142:0x0000000e,
|
||||
0x0150:0x0000000f,
|
||||
0x0155:0x00000010,
|
||||
// Block 0x6, offset 0x180
|
||||
0x01bf:0x00000005,
|
||||
// Block 0x7, offset 0x1c0
|
||||
0x01c0:0x00000006,
|
||||
// Block 0x8, offset 0x200
|
||||
0x0219:0x00000007,
|
||||
// Block 0x9, offset 0x240
|
||||
0x027f:0x00000008,
|
||||
// Block 0xa, offset 0x280
|
||||
0x0280:0x00000009,
|
||||
// Block 0xb, offset 0x2c0
|
||||
0x02c1:0x0000000a,
|
||||
// Block 0xc, offset 0x300
|
||||
0x033f:0x0000000b,
|
||||
}
|
||||
|
||||
// testLookup: 640 entries, 1280 bytes
|
||||
// Block 0 is the null block.
|
||||
var testLookup = [640]uint16 {
|
||||
// Block 0x0, offset 0x0
|
||||
// Block 0x1, offset 0x40
|
||||
// Block 0x2, offset 0x80
|
||||
// Block 0x3, offset 0xc0
|
||||
0x0e0:0x05, 0x0e6:0x06,
|
||||
// Block 0x4, offset 0x100
|
||||
0x13f:0x07,
|
||||
// Block 0x5, offset 0x140
|
||||
0x140:0x08, 0x144:0x09,
|
||||
// Block 0x6, offset 0x180
|
||||
0x190:0x03,
|
||||
// Block 0x7, offset 0x1c0
|
||||
0x1ff:0x0a,
|
||||
// Block 0x8, offset 0x200
|
||||
0x20f:0x05,
|
||||
// Block 0x9, offset 0x240
|
||||
0x242:0x01, 0x244:0x02,
|
||||
0x248:0x03,
|
||||
0x25f:0x04,
|
||||
0x260:0x01,
|
||||
0x26f:0x02,
|
||||
0x270:0x04, 0x274:0x06,
|
||||
}
|
||||
|
||||
var testTrie = trie{ testLookup[1216:], testValues[0:], testLookup[:], testValues[:]}`
|
||||
403
vendor/golang.org/x/text/collate/collate.go
generated
vendored
Normal file
403
vendor/golang.org/x/text/collate/collate.go
generated
vendored
Normal file
|
|
@ -0,0 +1,403 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// TODO: remove hard-coded versions when we have implemented fractional weights.
|
||||
// The current implementation is incompatible with later CLDR versions.
|
||||
//go:generate go run maketables.go -cldr=23 -unicode=6.2.0
|
||||
|
||||
// Package collate contains types for comparing and sorting Unicode strings
|
||||
// according to a given collation order.
|
||||
package collate // import "golang.org/x/text/collate"
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// Collator provides functionality for comparing strings for a given
|
||||
// collation order.
|
||||
type Collator struct {
|
||||
options
|
||||
|
||||
sorter sorter
|
||||
|
||||
_iter [2]iter
|
||||
}
|
||||
|
||||
func (c *Collator) iter(i int) *iter {
|
||||
// TODO: evaluate performance for making the second iterator optional.
|
||||
return &c._iter[i]
|
||||
}
|
||||
|
||||
// Supported returns the list of languages for which collating differs from its parent.
|
||||
func Supported() []language.Tag {
|
||||
// TODO: use language.Coverage instead.
|
||||
|
||||
t := make([]language.Tag, len(tags))
|
||||
copy(t, tags)
|
||||
return t
|
||||
}
|
||||
|
||||
func init() {
|
||||
ids := strings.Split(availableLocales, ",")
|
||||
tags = make([]language.Tag, len(ids))
|
||||
for i, s := range ids {
|
||||
tags[i] = language.Raw.MustParse(s)
|
||||
}
|
||||
}
|
||||
|
||||
var tags []language.Tag
|
||||
|
||||
// New returns a new Collator initialized for the given locale.
|
||||
func New(t language.Tag, o ...Option) *Collator {
|
||||
index := colltab.MatchLang(t, tags)
|
||||
c := newCollator(getTable(locales[index]))
|
||||
|
||||
// Set options from the user-supplied tag.
|
||||
c.setFromTag(t)
|
||||
|
||||
// Set the user-supplied options.
|
||||
c.setOptions(o)
|
||||
|
||||
c.init()
|
||||
return c
|
||||
}
|
||||
|
||||
// NewFromTable returns a new Collator for the given Weighter.
|
||||
func NewFromTable(w colltab.Weighter, o ...Option) *Collator {
|
||||
c := newCollator(w)
|
||||
c.setOptions(o)
|
||||
c.init()
|
||||
return c
|
||||
}
|
||||
|
||||
func (c *Collator) init() {
|
||||
if c.numeric {
|
||||
c.t = colltab.NewNumericWeighter(c.t)
|
||||
}
|
||||
c._iter[0].init(c)
|
||||
c._iter[1].init(c)
|
||||
}
|
||||
|
||||
// Buffer holds keys generated by Key and KeyString.
|
||||
type Buffer struct {
|
||||
buf [4096]byte
|
||||
key []byte
|
||||
}
|
||||
|
||||
func (b *Buffer) init() {
|
||||
if b.key == nil {
|
||||
b.key = b.buf[:0]
|
||||
}
|
||||
}
|
||||
|
||||
// Reset clears the buffer from previous results generated by Key and KeyString.
|
||||
func (b *Buffer) Reset() {
|
||||
b.key = b.key[:0]
|
||||
}
|
||||
|
||||
// Compare returns an integer comparing the two byte slices.
|
||||
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b.
|
||||
func (c *Collator) Compare(a, b []byte) int {
|
||||
// TODO: skip identical prefixes once we have a fast way to detect if a rune is
|
||||
// part of a contraction. This would lead to roughly a 10% speedup for the colcmp regtest.
|
||||
c.iter(0).SetInput(a)
|
||||
c.iter(1).SetInput(b)
|
||||
if res := c.compare(); res != 0 {
|
||||
return res
|
||||
}
|
||||
if !c.ignore[colltab.Identity] {
|
||||
return bytes.Compare(a, b)
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
// CompareString returns an integer comparing the two strings.
|
||||
// The result will be 0 if a==b, -1 if a < b, and +1 if a > b.
|
||||
func (c *Collator) CompareString(a, b string) int {
|
||||
// TODO: skip identical prefixes once we have a fast way to detect if a rune is
|
||||
// part of a contraction. This would lead to roughly a 10% speedup for the colcmp regtest.
|
||||
c.iter(0).SetInputString(a)
|
||||
c.iter(1).SetInputString(b)
|
||||
if res := c.compare(); res != 0 {
|
||||
return res
|
||||
}
|
||||
if !c.ignore[colltab.Identity] {
|
||||
if a < b {
|
||||
return -1
|
||||
} else if a > b {
|
||||
return 1
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func compareLevel(f func(i *iter) int, a, b *iter) int {
|
||||
a.pce = 0
|
||||
b.pce = 0
|
||||
for {
|
||||
va := f(a)
|
||||
vb := f(b)
|
||||
if va != vb {
|
||||
if va < vb {
|
||||
return -1
|
||||
}
|
||||
return 1
|
||||
} else if va == 0 {
|
||||
break
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func (c *Collator) compare() int {
|
||||
ia, ib := c.iter(0), c.iter(1)
|
||||
// Process primary level
|
||||
if c.alternate != altShifted {
|
||||
// TODO: implement script reordering
|
||||
if res := compareLevel((*iter).nextPrimary, ia, ib); res != 0 {
|
||||
return res
|
||||
}
|
||||
} else {
|
||||
// TODO: handle shifted
|
||||
}
|
||||
if !c.ignore[colltab.Secondary] {
|
||||
f := (*iter).nextSecondary
|
||||
if c.backwards {
|
||||
f = (*iter).prevSecondary
|
||||
}
|
||||
if res := compareLevel(f, ia, ib); res != 0 {
|
||||
return res
|
||||
}
|
||||
}
|
||||
// TODO: special case handling (Danish?)
|
||||
if !c.ignore[colltab.Tertiary] || c.caseLevel {
|
||||
if res := compareLevel((*iter).nextTertiary, ia, ib); res != 0 {
|
||||
return res
|
||||
}
|
||||
if !c.ignore[colltab.Quaternary] {
|
||||
if res := compareLevel((*iter).nextQuaternary, ia, ib); res != 0 {
|
||||
return res
|
||||
}
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
// Key returns the collation key for str.
|
||||
// Passing the buffer buf may avoid memory allocations.
|
||||
// The returned slice will point to an allocation in Buffer and will remain
|
||||
// valid until the next call to buf.Reset().
|
||||
func (c *Collator) Key(buf *Buffer, str []byte) []byte {
|
||||
// See http://www.unicode.org/reports/tr10/#Main_Algorithm for more details.
|
||||
buf.init()
|
||||
return c.key(buf, c.getColElems(str))
|
||||
}
|
||||
|
||||
// KeyFromString returns the collation key for str.
|
||||
// Passing the buffer buf may avoid memory allocations.
|
||||
// The returned slice will point to an allocation in Buffer and will retain
|
||||
// valid until the next call to buf.ResetKeys().
|
||||
func (c *Collator) KeyFromString(buf *Buffer, str string) []byte {
|
||||
// See http://www.unicode.org/reports/tr10/#Main_Algorithm for more details.
|
||||
buf.init()
|
||||
return c.key(buf, c.getColElemsString(str))
|
||||
}
|
||||
|
||||
func (c *Collator) key(buf *Buffer, w []colltab.Elem) []byte {
|
||||
processWeights(c.alternate, c.t.Top(), w)
|
||||
kn := len(buf.key)
|
||||
c.keyFromElems(buf, w)
|
||||
return buf.key[kn:]
|
||||
}
|
||||
|
||||
func (c *Collator) getColElems(str []byte) []colltab.Elem {
|
||||
i := c.iter(0)
|
||||
i.SetInput(str)
|
||||
for i.Next() {
|
||||
}
|
||||
return i.Elems
|
||||
}
|
||||
|
||||
func (c *Collator) getColElemsString(str string) []colltab.Elem {
|
||||
i := c.iter(0)
|
||||
i.SetInputString(str)
|
||||
for i.Next() {
|
||||
}
|
||||
return i.Elems
|
||||
}
|
||||
|
||||
type iter struct {
|
||||
wa [512]colltab.Elem
|
||||
|
||||
colltab.Iter
|
||||
pce int
|
||||
}
|
||||
|
||||
func (i *iter) init(c *Collator) {
|
||||
i.Weighter = c.t
|
||||
i.Elems = i.wa[:0]
|
||||
}
|
||||
|
||||
func (i *iter) nextPrimary() int {
|
||||
for {
|
||||
for ; i.pce < i.N; i.pce++ {
|
||||
if v := i.Elems[i.pce].Primary(); v != 0 {
|
||||
i.pce++
|
||||
return v
|
||||
}
|
||||
}
|
||||
if !i.Next() {
|
||||
return 0
|
||||
}
|
||||
}
|
||||
panic("should not reach here")
|
||||
}
|
||||
|
||||
func (i *iter) nextSecondary() int {
|
||||
for ; i.pce < len(i.Elems); i.pce++ {
|
||||
if v := i.Elems[i.pce].Secondary(); v != 0 {
|
||||
i.pce++
|
||||
return v
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func (i *iter) prevSecondary() int {
|
||||
for ; i.pce < len(i.Elems); i.pce++ {
|
||||
if v := i.Elems[len(i.Elems)-i.pce-1].Secondary(); v != 0 {
|
||||
i.pce++
|
||||
return v
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func (i *iter) nextTertiary() int {
|
||||
for ; i.pce < len(i.Elems); i.pce++ {
|
||||
if v := i.Elems[i.pce].Tertiary(); v != 0 {
|
||||
i.pce++
|
||||
return int(v)
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func (i *iter) nextQuaternary() int {
|
||||
for ; i.pce < len(i.Elems); i.pce++ {
|
||||
if v := i.Elems[i.pce].Quaternary(); v != 0 {
|
||||
i.pce++
|
||||
return v
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func appendPrimary(key []byte, p int) []byte {
|
||||
// Convert to variable length encoding; supports up to 23 bits.
|
||||
if p <= 0x7FFF {
|
||||
key = append(key, uint8(p>>8), uint8(p))
|
||||
} else {
|
||||
key = append(key, uint8(p>>16)|0x80, uint8(p>>8), uint8(p))
|
||||
}
|
||||
return key
|
||||
}
|
||||
|
||||
// keyFromElems converts the weights ws to a compact sequence of bytes.
|
||||
// The result will be appended to the byte buffer in buf.
|
||||
func (c *Collator) keyFromElems(buf *Buffer, ws []colltab.Elem) {
|
||||
for _, v := range ws {
|
||||
if w := v.Primary(); w > 0 {
|
||||
buf.key = appendPrimary(buf.key, w)
|
||||
}
|
||||
}
|
||||
if !c.ignore[colltab.Secondary] {
|
||||
buf.key = append(buf.key, 0, 0)
|
||||
// TODO: we can use one 0 if we can guarantee that all non-zero weights are > 0xFF.
|
||||
if !c.backwards {
|
||||
for _, v := range ws {
|
||||
if w := v.Secondary(); w > 0 {
|
||||
buf.key = append(buf.key, uint8(w>>8), uint8(w))
|
||||
}
|
||||
}
|
||||
} else {
|
||||
for i := len(ws) - 1; i >= 0; i-- {
|
||||
if w := ws[i].Secondary(); w > 0 {
|
||||
buf.key = append(buf.key, uint8(w>>8), uint8(w))
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if c.caseLevel {
|
||||
buf.key = append(buf.key, 0, 0)
|
||||
}
|
||||
if !c.ignore[colltab.Tertiary] || c.caseLevel {
|
||||
buf.key = append(buf.key, 0, 0)
|
||||
for _, v := range ws {
|
||||
if w := v.Tertiary(); w > 0 {
|
||||
buf.key = append(buf.key, uint8(w))
|
||||
}
|
||||
}
|
||||
// Derive the quaternary weights from the options and other levels.
|
||||
// Note that we represent MaxQuaternary as 0xFF. The first byte of the
|
||||
// representation of a primary weight is always smaller than 0xFF,
|
||||
// so using this single byte value will compare correctly.
|
||||
if !c.ignore[colltab.Quaternary] && c.alternate >= altShifted {
|
||||
if c.alternate == altShiftTrimmed {
|
||||
lastNonFFFF := len(buf.key)
|
||||
buf.key = append(buf.key, 0)
|
||||
for _, v := range ws {
|
||||
if w := v.Quaternary(); w == colltab.MaxQuaternary {
|
||||
buf.key = append(buf.key, 0xFF)
|
||||
} else if w > 0 {
|
||||
buf.key = appendPrimary(buf.key, w)
|
||||
lastNonFFFF = len(buf.key)
|
||||
}
|
||||
}
|
||||
buf.key = buf.key[:lastNonFFFF]
|
||||
} else {
|
||||
buf.key = append(buf.key, 0)
|
||||
for _, v := range ws {
|
||||
if w := v.Quaternary(); w == colltab.MaxQuaternary {
|
||||
buf.key = append(buf.key, 0xFF)
|
||||
} else if w > 0 {
|
||||
buf.key = appendPrimary(buf.key, w)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func processWeights(vw alternateHandling, top uint32, wa []colltab.Elem) {
|
||||
ignore := false
|
||||
vtop := int(top)
|
||||
switch vw {
|
||||
case altShifted, altShiftTrimmed:
|
||||
for i := range wa {
|
||||
if p := wa[i].Primary(); p <= vtop && p != 0 {
|
||||
wa[i] = colltab.MakeQuaternary(p)
|
||||
ignore = true
|
||||
} else if p == 0 {
|
||||
if ignore {
|
||||
wa[i] = colltab.Ignore
|
||||
}
|
||||
} else {
|
||||
ignore = false
|
||||
}
|
||||
}
|
||||
case altBlanked:
|
||||
for i := range wa {
|
||||
if p := wa[i].Primary(); p <= vtop && (ignore || p != 0) {
|
||||
wa[i] = colltab.Ignore
|
||||
ignore = true
|
||||
} else {
|
||||
ignore = false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
482
vendor/golang.org/x/text/collate/collate_test.go
generated
vendored
Normal file
482
vendor/golang.org/x/text/collate/collate_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,482 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
type weightsTest struct {
|
||||
opt opts
|
||||
in, out ColElems
|
||||
}
|
||||
|
||||
type opts struct {
|
||||
lev int
|
||||
alt alternateHandling
|
||||
top int
|
||||
|
||||
backwards bool
|
||||
caseLevel bool
|
||||
}
|
||||
|
||||
// ignore returns an initialized boolean array based on the given Level.
|
||||
// A negative value means using the default setting of quaternary.
|
||||
func ignore(level colltab.Level) (ignore [colltab.NumLevels]bool) {
|
||||
if level < 0 {
|
||||
level = colltab.Quaternary
|
||||
}
|
||||
for i := range ignore {
|
||||
ignore[i] = level < colltab.Level(i)
|
||||
}
|
||||
return ignore
|
||||
}
|
||||
|
||||
func makeCE(w []int) colltab.Elem {
|
||||
ce, err := colltab.MakeElem(w[0], w[1], w[2], uint8(w[3]))
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
return ce
|
||||
}
|
||||
|
||||
func (o opts) collator() *Collator {
|
||||
c := &Collator{
|
||||
options: options{
|
||||
ignore: ignore(colltab.Level(o.lev - 1)),
|
||||
alternate: o.alt,
|
||||
backwards: o.backwards,
|
||||
caseLevel: o.caseLevel,
|
||||
variableTop: uint32(o.top),
|
||||
},
|
||||
}
|
||||
return c
|
||||
}
|
||||
|
||||
const (
|
||||
maxQ = 0x1FFFFF
|
||||
)
|
||||
|
||||
func wpq(p, q int) Weights {
|
||||
return W(p, defaults.Secondary, defaults.Tertiary, q)
|
||||
}
|
||||
|
||||
func wsq(s, q int) Weights {
|
||||
return W(0, s, defaults.Tertiary, q)
|
||||
}
|
||||
|
||||
func wq(q int) Weights {
|
||||
return W(0, 0, 0, q)
|
||||
}
|
||||
|
||||
var zero = W(0, 0, 0, 0)
|
||||
|
||||
var processTests = []weightsTest{
|
||||
// Shifted
|
||||
{ // simple sequence of non-variables
|
||||
opt: opts{alt: altShifted, top: 100},
|
||||
in: ColElems{W(200), W(300), W(400)},
|
||||
out: ColElems{wpq(200, maxQ), wpq(300, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // first is a variable
|
||||
opt: opts{alt: altShifted, top: 250},
|
||||
in: ColElems{W(200), W(300), W(400)},
|
||||
out: ColElems{wq(200), wpq(300, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // all but first are variable
|
||||
opt: opts{alt: altShifted, top: 999},
|
||||
in: ColElems{W(1000), W(200), W(300), W(400)},
|
||||
out: ColElems{wpq(1000, maxQ), wq(200), wq(300), wq(400)},
|
||||
},
|
||||
{ // first is a modifier
|
||||
opt: opts{alt: altShifted, top: 999},
|
||||
in: ColElems{W(0, 10), W(1000)},
|
||||
out: ColElems{wsq(10, maxQ), wpq(1000, maxQ)},
|
||||
},
|
||||
{ // primary ignorables
|
||||
opt: opts{alt: altShifted, top: 250},
|
||||
in: ColElems{W(200), W(0, 10), W(300), W(0, 15), W(400)},
|
||||
out: ColElems{wq(200), zero, wpq(300, maxQ), wsq(15, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // secondary ignorables
|
||||
opt: opts{alt: altShifted, top: 250},
|
||||
in: ColElems{W(200), W(0, 0, 10), W(300), W(0, 0, 15), W(400)},
|
||||
out: ColElems{wq(200), zero, wpq(300, maxQ), W(0, 0, 15, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // tertiary ignorables, no change
|
||||
opt: opts{alt: altShifted, top: 250},
|
||||
in: ColElems{W(200), zero, W(300), zero, W(400)},
|
||||
out: ColElems{wq(200), zero, wpq(300, maxQ), zero, wpq(400, maxQ)},
|
||||
},
|
||||
|
||||
// ShiftTrimmed (same as Shifted)
|
||||
{ // simple sequence of non-variables
|
||||
opt: opts{alt: altShiftTrimmed, top: 100},
|
||||
in: ColElems{W(200), W(300), W(400)},
|
||||
out: ColElems{wpq(200, maxQ), wpq(300, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // first is a variable
|
||||
opt: opts{alt: altShiftTrimmed, top: 250},
|
||||
in: ColElems{W(200), W(300), W(400)},
|
||||
out: ColElems{wq(200), wpq(300, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // all but first are variable
|
||||
opt: opts{alt: altShiftTrimmed, top: 999},
|
||||
in: ColElems{W(1000), W(200), W(300), W(400)},
|
||||
out: ColElems{wpq(1000, maxQ), wq(200), wq(300), wq(400)},
|
||||
},
|
||||
{ // first is a modifier
|
||||
opt: opts{alt: altShiftTrimmed, top: 999},
|
||||
in: ColElems{W(0, 10), W(1000)},
|
||||
out: ColElems{wsq(10, maxQ), wpq(1000, maxQ)},
|
||||
},
|
||||
{ // primary ignorables
|
||||
opt: opts{alt: altShiftTrimmed, top: 250},
|
||||
in: ColElems{W(200), W(0, 10), W(300), W(0, 15), W(400)},
|
||||
out: ColElems{wq(200), zero, wpq(300, maxQ), wsq(15, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // secondary ignorables
|
||||
opt: opts{alt: altShiftTrimmed, top: 250},
|
||||
in: ColElems{W(200), W(0, 0, 10), W(300), W(0, 0, 15), W(400)},
|
||||
out: ColElems{wq(200), zero, wpq(300, maxQ), W(0, 0, 15, maxQ), wpq(400, maxQ)},
|
||||
},
|
||||
{ // tertiary ignorables, no change
|
||||
opt: opts{alt: altShiftTrimmed, top: 250},
|
||||
in: ColElems{W(200), zero, W(300), zero, W(400)},
|
||||
out: ColElems{wq(200), zero, wpq(300, maxQ), zero, wpq(400, maxQ)},
|
||||
},
|
||||
|
||||
// Blanked
|
||||
{ // simple sequence of non-variables
|
||||
opt: opts{alt: altBlanked, top: 100},
|
||||
in: ColElems{W(200), W(300), W(400)},
|
||||
out: ColElems{W(200), W(300), W(400)},
|
||||
},
|
||||
{ // first is a variable
|
||||
opt: opts{alt: altBlanked, top: 250},
|
||||
in: ColElems{W(200), W(300), W(400)},
|
||||
out: ColElems{zero, W(300), W(400)},
|
||||
},
|
||||
{ // all but first are variable
|
||||
opt: opts{alt: altBlanked, top: 999},
|
||||
in: ColElems{W(1000), W(200), W(300), W(400)},
|
||||
out: ColElems{W(1000), zero, zero, zero},
|
||||
},
|
||||
{ // first is a modifier
|
||||
opt: opts{alt: altBlanked, top: 999},
|
||||
in: ColElems{W(0, 10), W(1000)},
|
||||
out: ColElems{W(0, 10), W(1000)},
|
||||
},
|
||||
{ // primary ignorables
|
||||
opt: opts{alt: altBlanked, top: 250},
|
||||
in: ColElems{W(200), W(0, 10), W(300), W(0, 15), W(400)},
|
||||
out: ColElems{zero, zero, W(300), W(0, 15), W(400)},
|
||||
},
|
||||
{ // secondary ignorables
|
||||
opt: opts{alt: altBlanked, top: 250},
|
||||
in: ColElems{W(200), W(0, 0, 10), W(300), W(0, 0, 15), W(400)},
|
||||
out: ColElems{zero, zero, W(300), W(0, 0, 15), W(400)},
|
||||
},
|
||||
{ // tertiary ignorables, no change
|
||||
opt: opts{alt: altBlanked, top: 250},
|
||||
in: ColElems{W(200), zero, W(300), zero, W(400)},
|
||||
out: ColElems{zero, zero, W(300), zero, W(400)},
|
||||
},
|
||||
|
||||
// Non-ignorable: input is always equal to output.
|
||||
{ // all but first are variable
|
||||
opt: opts{alt: altNonIgnorable, top: 999},
|
||||
in: ColElems{W(1000), W(200), W(300), W(400)},
|
||||
out: ColElems{W(1000), W(200), W(300), W(400)},
|
||||
},
|
||||
{ // primary ignorables
|
||||
opt: opts{alt: altNonIgnorable, top: 250},
|
||||
in: ColElems{W(200), W(0, 10), W(300), W(0, 15), W(400)},
|
||||
out: ColElems{W(200), W(0, 10), W(300), W(0, 15), W(400)},
|
||||
},
|
||||
{ // secondary ignorables
|
||||
opt: opts{alt: altNonIgnorable, top: 250},
|
||||
in: ColElems{W(200), W(0, 0, 10), W(300), W(0, 0, 15), W(400)},
|
||||
out: ColElems{W(200), W(0, 0, 10), W(300), W(0, 0, 15), W(400)},
|
||||
},
|
||||
{ // tertiary ignorables, no change
|
||||
opt: opts{alt: altNonIgnorable, top: 250},
|
||||
in: ColElems{W(200), zero, W(300), zero, W(400)},
|
||||
out: ColElems{W(200), zero, W(300), zero, W(400)},
|
||||
},
|
||||
}
|
||||
|
||||
func TestProcessWeights(t *testing.T) {
|
||||
for i, tt := range processTests {
|
||||
in := convertFromWeights(tt.in)
|
||||
out := convertFromWeights(tt.out)
|
||||
processWeights(tt.opt.alt, uint32(tt.opt.top), in)
|
||||
for j, w := range in {
|
||||
if w != out[j] {
|
||||
t.Errorf("%d: Weights %d was %v; want %v", i, j, w, out[j])
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type keyFromElemTest struct {
|
||||
opt opts
|
||||
in ColElems
|
||||
out []byte
|
||||
}
|
||||
|
||||
var defS = byte(defaults.Secondary)
|
||||
var defT = byte(defaults.Tertiary)
|
||||
|
||||
const sep = 0 // separator byte
|
||||
|
||||
var keyFromElemTests = []keyFromElemTest{
|
||||
{ // simple primary and secondary weights.
|
||||
opts{alt: altShifted},
|
||||
ColElems{W(0x200), W(0x7FFF), W(0, 0x30), W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00, // primary
|
||||
sep, sep, 0, defS, 0, defS, 0, 0x30, 0, defS, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
sep, 0xFF, 0xFF, 0xFF, 0xFF, // quaternary
|
||||
},
|
||||
},
|
||||
{ // same as first, but with zero element that need to be removed
|
||||
opts{alt: altShifted},
|
||||
ColElems{W(0x200), zero, W(0x7FFF), W(0, 0x30), zero, W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00, // primary
|
||||
sep, sep, 0, defS, 0, defS, 0, 0x30, 0, defS, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
sep, 0xFF, 0xFF, 0xFF, 0xFF, // quaternary
|
||||
},
|
||||
},
|
||||
{ // same as first, with large primary values
|
||||
opts{alt: altShifted},
|
||||
ColElems{W(0x200), W(0x8000), W(0, 0x30), W(0x12345)},
|
||||
[]byte{0x2, 0, 0x80, 0x80, 0x00, 0x81, 0x23, 0x45, // primary
|
||||
sep, sep, 0, defS, 0, defS, 0, 0x30, 0, defS, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
sep, 0xFF, 0xFF, 0xFF, 0xFF, // quaternary
|
||||
},
|
||||
},
|
||||
{ // same as first, but with the secondary level backwards
|
||||
opts{alt: altShifted, backwards: true},
|
||||
ColElems{W(0x200), W(0x7FFF), W(0, 0x30), W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00, // primary
|
||||
sep, sep, 0, defS, 0, 0x30, 0, defS, 0, defS, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
sep, 0xFF, 0xFF, 0xFF, 0xFF, // quaternary
|
||||
},
|
||||
},
|
||||
{ // same as first, ignoring quaternary level
|
||||
opts{alt: altShifted, lev: 3},
|
||||
ColElems{W(0x200), zero, W(0x7FFF), W(0, 0x30), zero, W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00, // primary
|
||||
sep, sep, 0, defS, 0, defS, 0, 0x30, 0, defS, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
},
|
||||
},
|
||||
{ // same as first, ignoring tertiary level
|
||||
opts{alt: altShifted, lev: 2},
|
||||
ColElems{W(0x200), zero, W(0x7FFF), W(0, 0x30), zero, W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00, // primary
|
||||
sep, sep, 0, defS, 0, defS, 0, 0x30, 0, defS, // secondary
|
||||
},
|
||||
},
|
||||
{ // same as first, ignoring secondary level
|
||||
opts{alt: altShifted, lev: 1},
|
||||
ColElems{W(0x200), zero, W(0x7FFF), W(0, 0x30), zero, W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00},
|
||||
},
|
||||
{ // simple primary and secondary weights.
|
||||
opts{alt: altShiftTrimmed, top: 0x250},
|
||||
ColElems{W(0x300), W(0x200), W(0x7FFF), W(0, 0x30), W(0x800)},
|
||||
[]byte{0x3, 0, 0x7F, 0xFF, 0x8, 0x00, // primary
|
||||
sep, sep, 0, defS, 0, defS, 0, 0x30, 0, defS, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
sep, 0xFF, 0x2, 0, // quaternary
|
||||
},
|
||||
},
|
||||
{ // as first, primary with case level enabled
|
||||
opts{alt: altShifted, lev: 1, caseLevel: true},
|
||||
ColElems{W(0x200), W(0x7FFF), W(0, 0x30), W(0x100)},
|
||||
[]byte{0x2, 0, 0x7F, 0xFF, 0x1, 0x00, // primary
|
||||
sep, sep, // secondary
|
||||
sep, sep, defT, defT, defT, defT, // tertiary
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
func TestKeyFromElems(t *testing.T) {
|
||||
buf := Buffer{}
|
||||
for i, tt := range keyFromElemTests {
|
||||
buf.Reset()
|
||||
in := convertFromWeights(tt.in)
|
||||
processWeights(tt.opt.alt, uint32(tt.opt.top), in)
|
||||
tt.opt.collator().keyFromElems(&buf, in)
|
||||
res := buf.key
|
||||
if len(res) != len(tt.out) {
|
||||
t.Errorf("%d: len(ws) was %d; want %d (%X should be %X)", i, len(res), len(tt.out), res, tt.out)
|
||||
}
|
||||
n := len(res)
|
||||
if len(tt.out) < n {
|
||||
n = len(tt.out)
|
||||
}
|
||||
for j, c := range res[:n] {
|
||||
if c != tt.out[j] {
|
||||
t.Errorf("%d: byte %d was %X; want %X", i, j, c, tt.out[j])
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestGetColElems(t *testing.T) {
|
||||
for i, tt := range appendNextTests {
|
||||
c, err := makeTable(tt.in)
|
||||
if err != nil {
|
||||
// error is reported in TestAppendNext
|
||||
continue
|
||||
}
|
||||
// Create one large test per table
|
||||
str := make([]byte, 0, 4000)
|
||||
out := ColElems{}
|
||||
for len(str) < 3000 {
|
||||
for _, chk := range tt.chk {
|
||||
str = append(str, chk.in[:chk.n]...)
|
||||
out = append(out, chk.out...)
|
||||
}
|
||||
}
|
||||
for j, chk := range append(tt.chk, check{string(str), len(str), out}) {
|
||||
out := convertFromWeights(chk.out)
|
||||
ce := c.getColElems([]byte(chk.in)[:chk.n])
|
||||
if len(ce) != len(out) {
|
||||
t.Errorf("%d:%d: len(ws) was %d; want %d", i, j, len(ce), len(out))
|
||||
continue
|
||||
}
|
||||
cnt := 0
|
||||
for k, w := range ce {
|
||||
w, _ = colltab.MakeElem(w.Primary(), w.Secondary(), int(w.Tertiary()), 0)
|
||||
if w != out[k] {
|
||||
t.Errorf("%d:%d: Weights %d was %X; want %X", i, j, k, w, out[k])
|
||||
cnt++
|
||||
}
|
||||
if cnt > 10 {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type keyTest struct {
|
||||
in string
|
||||
out []byte
|
||||
}
|
||||
|
||||
var keyTests = []keyTest{
|
||||
{"abc",
|
||||
[]byte{0, 100, 0, 200, 1, 44, 0, 0, 0, 32, 0, 32, 0, 32, 0, 0, 2, 2, 2, 0, 255, 255, 255},
|
||||
},
|
||||
{"a\u0301",
|
||||
[]byte{0, 102, 0, 0, 0, 32, 0, 0, 2, 0, 255},
|
||||
},
|
||||
{"aaaaa",
|
||||
[]byte{0, 100, 0, 100, 0, 100, 0, 100, 0, 100, 0, 0,
|
||||
0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 0,
|
||||
2, 2, 2, 2, 2, 0,
|
||||
255, 255, 255, 255, 255,
|
||||
},
|
||||
},
|
||||
// Issue 16391: incomplete rune at end of UTF-8 sequence.
|
||||
{"\xc2", []byte{133, 255, 253, 0, 0, 0, 32, 0, 0, 2, 0, 255}},
|
||||
{"\xc2a", []byte{133, 255, 253, 0, 100, 0, 0, 0, 32, 0, 32, 0, 0, 2, 2, 0, 255, 255}},
|
||||
}
|
||||
|
||||
func TestKey(t *testing.T) {
|
||||
c, _ := makeTable(appendNextTests[4].in)
|
||||
c.alternate = altShifted
|
||||
c.ignore = ignore(colltab.Quaternary)
|
||||
buf := Buffer{}
|
||||
keys1 := [][]byte{}
|
||||
keys2 := [][]byte{}
|
||||
for _, tt := range keyTests {
|
||||
keys1 = append(keys1, c.Key(&buf, []byte(tt.in)))
|
||||
keys2 = append(keys2, c.KeyFromString(&buf, tt.in))
|
||||
}
|
||||
// Separate generation from testing to ensure buffers are not overwritten.
|
||||
for i, tt := range keyTests {
|
||||
if !bytes.Equal(keys1[i], tt.out) {
|
||||
t.Errorf("%d: Key(%q) = %d; want %d", i, tt.in, keys1[i], tt.out)
|
||||
}
|
||||
if !bytes.Equal(keys2[i], tt.out) {
|
||||
t.Errorf("%d: KeyFromString(%q) = %d; want %d", i, tt.in, keys2[i], tt.out)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type compareTest struct {
|
||||
a, b string
|
||||
res int // comparison result
|
||||
}
|
||||
|
||||
var compareTests = []compareTest{
|
||||
{"a\u0301", "a", 1},
|
||||
{"a\u0301b", "ab", 1},
|
||||
{"a", "a\u0301", -1},
|
||||
{"ab", "a\u0301b", -1},
|
||||
{"bc", "a\u0301c", 1},
|
||||
{"ab", "aB", -1},
|
||||
{"a\u0301", "a\u0301", 0},
|
||||
{"a", "a", 0},
|
||||
// Only clip prefixes of whole runes.
|
||||
{"\u302E", "\u302F", 1},
|
||||
// Don't clip prefixes when last rune of prefix may be part of contraction.
|
||||
{"a\u035E", "a\u0301\u035F", -1},
|
||||
{"a\u0301\u035Fb", "a\u0301\u035F", -1},
|
||||
}
|
||||
|
||||
func TestCompare(t *testing.T) {
|
||||
c, _ := makeTable(appendNextTests[4].in)
|
||||
for i, tt := range compareTests {
|
||||
if res := c.Compare([]byte(tt.a), []byte(tt.b)); res != tt.res {
|
||||
t.Errorf("%d: Compare(%q, %q) == %d; want %d", i, tt.a, tt.b, res, tt.res)
|
||||
}
|
||||
if res := c.CompareString(tt.a, tt.b); res != tt.res {
|
||||
t.Errorf("%d: CompareString(%q, %q) == %d; want %d", i, tt.a, tt.b, res, tt.res)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestNumeric(t *testing.T) {
|
||||
c := New(language.English, Loose, Numeric)
|
||||
|
||||
for i, tt := range []struct {
|
||||
a, b string
|
||||
want int
|
||||
}{
|
||||
{"1", "2", -1},
|
||||
{"2", "12", -1},
|
||||
{"2", "12", -1}, // Fullwidth is sorted as usual.
|
||||
{"₂", "₁₂", 1}, // Subscript is not sorted as numbers.
|
||||
{"②", "①②", 1}, // Circled is not sorted as numbers.
|
||||
{ // Imperial Aramaic, is not sorted as number.
|
||||
"\U00010859",
|
||||
"\U00010858\U00010859",
|
||||
1,
|
||||
},
|
||||
{"12", "2", 1},
|
||||
{"A-1", "A-2", -1},
|
||||
{"A-2", "A-12", -1},
|
||||
{"A-12", "A-2", 1},
|
||||
{"A-0001", "A-1", 0},
|
||||
} {
|
||||
if got := c.CompareString(tt.a, tt.b); got != tt.want {
|
||||
t.Errorf("%d: CompareString(%s, %s) = %d; want %d", i, tt.a, tt.b, got, tt.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
51
vendor/golang.org/x/text/collate/export_test.go
generated
vendored
Normal file
51
vendor/golang.org/x/text/collate/export_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
// Export for testing.
|
||||
// TODO: no longer necessary. Remove at some point.
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
)
|
||||
|
||||
const (
|
||||
defaultSecondary = 0x20
|
||||
defaultTertiary = 0x2
|
||||
)
|
||||
|
||||
type Weights struct {
|
||||
Primary, Secondary, Tertiary, Quaternary int
|
||||
}
|
||||
|
||||
func W(ce ...int) Weights {
|
||||
w := Weights{ce[0], defaultSecondary, defaultTertiary, 0}
|
||||
if len(ce) > 1 {
|
||||
w.Secondary = ce[1]
|
||||
}
|
||||
if len(ce) > 2 {
|
||||
w.Tertiary = ce[2]
|
||||
}
|
||||
if len(ce) > 3 {
|
||||
w.Quaternary = ce[3]
|
||||
}
|
||||
return w
|
||||
}
|
||||
func (w Weights) String() string {
|
||||
return fmt.Sprintf("[%X.%X.%X.%X]", w.Primary, w.Secondary, w.Tertiary, w.Quaternary)
|
||||
}
|
||||
|
||||
func convertFromWeights(ws []Weights) []colltab.Elem {
|
||||
out := make([]colltab.Elem, len(ws))
|
||||
for i, w := range ws {
|
||||
out[i], _ = colltab.MakeElem(w.Primary, w.Secondary, w.Tertiary, 0)
|
||||
if out[i] == colltab.Ignore && w.Quaternary > 0 {
|
||||
out[i] = colltab.MakeQuaternary(w.Quaternary)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
32
vendor/golang.org/x/text/collate/index.go
generated
vendored
Normal file
32
vendor/golang.org/x/text/collate/index.go
generated
vendored
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
import "golang.org/x/text/internal/colltab"
|
||||
|
||||
const blockSize = 64
|
||||
|
||||
func getTable(t tableIndex) *colltab.Table {
|
||||
return &colltab.Table{
|
||||
Index: colltab.Trie{
|
||||
Index0: mainLookup[:][blockSize*t.lookupOffset:],
|
||||
Values0: mainValues[:][blockSize*t.valuesOffset:],
|
||||
Index: mainLookup[:],
|
||||
Values: mainValues[:],
|
||||
},
|
||||
ExpandElem: mainExpandElem[:],
|
||||
ContractTries: colltab.ContractTrieSet(mainCTEntries[:]),
|
||||
ContractElem: mainContractElem[:],
|
||||
MaxContractLen: 18,
|
||||
VariableTop: varTop,
|
||||
}
|
||||
}
|
||||
|
||||
// tableIndex holds information for constructing a table
|
||||
// for a certain locale based on the main table.
|
||||
type tableIndex struct {
|
||||
lookupOffset uint32
|
||||
valuesOffset uint32
|
||||
}
|
||||
553
vendor/golang.org/x/text/collate/maketables.go
generated
vendored
Normal file
553
vendor/golang.org/x/text/collate/maketables.go
generated
vendored
Normal file
|
|
@ -0,0 +1,553 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
// Collation table generator.
|
||||
// Data read from the web.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"archive/zip"
|
||||
"bufio"
|
||||
"bytes"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"os"
|
||||
"regexp"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/collate"
|
||||
"golang.org/x/text/collate/build"
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/internal/gen"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/unicode/cldr"
|
||||
)
|
||||
|
||||
var (
|
||||
test = flag.Bool("test", false,
|
||||
"test existing tables; can be used to compare web data with package data.")
|
||||
short = flag.Bool("short", false, `Use "short" alternatives, when available.`)
|
||||
draft = flag.Bool("draft", false, `Use draft versions, when available.`)
|
||||
tags = flag.String("tags", "", "build tags to be included after +build directive")
|
||||
pkg = flag.String("package", "collate",
|
||||
"the name of the package in which the generated file is to be included")
|
||||
|
||||
tables = flagStringSetAllowAll("tables", "collate", "collate,chars",
|
||||
"comma-spearated list of tables to generate.")
|
||||
exclude = flagStringSet("exclude", "zh2", "",
|
||||
"comma-separated list of languages to exclude.")
|
||||
include = flagStringSet("include", "", "",
|
||||
"comma-separated list of languages to include. Include trumps exclude.")
|
||||
// TODO: Not included: unihan gb2312han zhuyin big5han (for size reasons)
|
||||
// TODO: Not included: traditional (buggy for Bengali)
|
||||
types = flagStringSetAllowAll("types", "standard,phonebook,phonetic,reformed,pinyin,stroke", "",
|
||||
"comma-separated list of types that should be included.")
|
||||
)
|
||||
|
||||
// stringSet implements an ordered set based on a list. It implements flag.Value
|
||||
// to allow a set to be specified as a comma-separated list.
|
||||
type stringSet struct {
|
||||
s []string
|
||||
allowed *stringSet
|
||||
dirty bool // needs compaction if true
|
||||
all bool
|
||||
allowAll bool
|
||||
}
|
||||
|
||||
func flagStringSet(name, def, allowed, usage string) *stringSet {
|
||||
ss := &stringSet{}
|
||||
if allowed != "" {
|
||||
usage += fmt.Sprintf(" (allowed values: any of %s)", allowed)
|
||||
ss.allowed = &stringSet{}
|
||||
failOnError(ss.allowed.Set(allowed))
|
||||
}
|
||||
ss.Set(def)
|
||||
flag.Var(ss, name, usage)
|
||||
return ss
|
||||
}
|
||||
|
||||
func flagStringSetAllowAll(name, def, allowed, usage string) *stringSet {
|
||||
ss := &stringSet{allowAll: true}
|
||||
if allowed == "" {
|
||||
flag.Var(ss, name, usage+fmt.Sprintf(` Use "all" to select all.`))
|
||||
} else {
|
||||
ss.allowed = &stringSet{}
|
||||
failOnError(ss.allowed.Set(allowed))
|
||||
flag.Var(ss, name, usage+fmt.Sprintf(` (allowed values: "all" or any of %s)`, allowed))
|
||||
}
|
||||
ss.Set(def)
|
||||
return ss
|
||||
}
|
||||
|
||||
func (ss stringSet) Len() int {
|
||||
return len(ss.s)
|
||||
}
|
||||
|
||||
func (ss stringSet) String() string {
|
||||
return strings.Join(ss.s, ",")
|
||||
}
|
||||
|
||||
func (ss *stringSet) Set(s string) error {
|
||||
if ss.allowAll && s == "all" {
|
||||
ss.s = nil
|
||||
ss.all = true
|
||||
return nil
|
||||
}
|
||||
ss.s = ss.s[:0]
|
||||
for _, s := range strings.Split(s, ",") {
|
||||
if s := strings.TrimSpace(s); s != "" {
|
||||
if ss.allowed != nil && !ss.allowed.contains(s) {
|
||||
return fmt.Errorf("unsupported value %q; must be one of %s", s, ss.allowed)
|
||||
}
|
||||
ss.add(s)
|
||||
}
|
||||
}
|
||||
ss.compact()
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ss *stringSet) add(s string) {
|
||||
ss.s = append(ss.s, s)
|
||||
ss.dirty = true
|
||||
}
|
||||
|
||||
func (ss *stringSet) values() []string {
|
||||
ss.compact()
|
||||
return ss.s
|
||||
}
|
||||
|
||||
func (ss *stringSet) contains(s string) bool {
|
||||
if ss.all {
|
||||
return true
|
||||
}
|
||||
for _, v := range ss.s {
|
||||
if v == s {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (ss *stringSet) compact() {
|
||||
if !ss.dirty {
|
||||
return
|
||||
}
|
||||
a := ss.s
|
||||
sort.Strings(a)
|
||||
k := 0
|
||||
for i := 1; i < len(a); i++ {
|
||||
if a[k] != a[i] {
|
||||
a[k+1] = a[i]
|
||||
k++
|
||||
}
|
||||
}
|
||||
ss.s = a[:k+1]
|
||||
ss.dirty = false
|
||||
}
|
||||
|
||||
func skipLang(l string) bool {
|
||||
if include.Len() > 0 {
|
||||
return !include.contains(l)
|
||||
}
|
||||
return exclude.contains(l)
|
||||
}
|
||||
|
||||
// altInclude returns a list of alternatives (for the LDML alt attribute)
|
||||
// in order of preference. An empty string in this list indicates the
|
||||
// default entry.
|
||||
func altInclude() []string {
|
||||
l := []string{}
|
||||
if *short {
|
||||
l = append(l, "short")
|
||||
}
|
||||
l = append(l, "")
|
||||
// TODO: handle draft using cldr.SetDraftLevel
|
||||
if *draft {
|
||||
l = append(l, "proposed")
|
||||
}
|
||||
return l
|
||||
}
|
||||
|
||||
func failOnError(e error) {
|
||||
if e != nil {
|
||||
log.Panic(e)
|
||||
}
|
||||
}
|
||||
|
||||
func openArchive() *zip.Reader {
|
||||
f := gen.OpenCLDRCoreZip()
|
||||
buffer, err := ioutil.ReadAll(f)
|
||||
f.Close()
|
||||
failOnError(err)
|
||||
archive, err := zip.NewReader(bytes.NewReader(buffer), int64(len(buffer)))
|
||||
failOnError(err)
|
||||
return archive
|
||||
}
|
||||
|
||||
// parseUCA parses a Default Unicode Collation Element Table of the format
|
||||
// specified in http://www.unicode.org/reports/tr10/#File_Format.
|
||||
// It returns the variable top.
|
||||
func parseUCA(builder *build.Builder) {
|
||||
var r io.ReadCloser
|
||||
var err error
|
||||
for _, f := range openArchive().File {
|
||||
if strings.HasSuffix(f.Name, "allkeys_CLDR.txt") {
|
||||
r, err = f.Open()
|
||||
}
|
||||
}
|
||||
if r == nil {
|
||||
log.Fatal("File allkeys_CLDR.txt not found in archive.")
|
||||
}
|
||||
failOnError(err)
|
||||
defer r.Close()
|
||||
scanner := bufio.NewScanner(r)
|
||||
colelem := regexp.MustCompile(`\[([.*])([0-9A-F.]+)\]`)
|
||||
for i := 1; scanner.Scan(); i++ {
|
||||
line := scanner.Text()
|
||||
if len(line) == 0 || line[0] == '#' {
|
||||
continue
|
||||
}
|
||||
if line[0] == '@' {
|
||||
// parse properties
|
||||
switch {
|
||||
case strings.HasPrefix(line[1:], "version "):
|
||||
a := strings.Split(line[1:], " ")
|
||||
if a[1] != gen.UnicodeVersion() {
|
||||
log.Fatalf("incompatible version %s; want %s", a[1], gen.UnicodeVersion())
|
||||
}
|
||||
case strings.HasPrefix(line[1:], "backwards "):
|
||||
log.Fatalf("%d: unsupported option backwards", i)
|
||||
default:
|
||||
log.Printf("%d: unknown option %s", i, line[1:])
|
||||
}
|
||||
} else {
|
||||
// parse entries
|
||||
part := strings.Split(line, " ; ")
|
||||
if len(part) != 2 {
|
||||
log.Fatalf("%d: production rule without ';': %v", i, line)
|
||||
}
|
||||
lhs := []rune{}
|
||||
for _, v := range strings.Split(part[0], " ") {
|
||||
if v == "" {
|
||||
continue
|
||||
}
|
||||
lhs = append(lhs, rune(convHex(i, v)))
|
||||
}
|
||||
var n int
|
||||
var vars []int
|
||||
rhs := [][]int{}
|
||||
for i, m := range colelem.FindAllStringSubmatch(part[1], -1) {
|
||||
n += len(m[0])
|
||||
elem := []int{}
|
||||
for _, h := range strings.Split(m[2], ".") {
|
||||
elem = append(elem, convHex(i, h))
|
||||
}
|
||||
if m[1] == "*" {
|
||||
vars = append(vars, i)
|
||||
}
|
||||
rhs = append(rhs, elem)
|
||||
}
|
||||
if len(part[1]) < n+3 || part[1][n+1] != '#' {
|
||||
log.Fatalf("%d: expected comment; found %s", i, part[1][n:])
|
||||
}
|
||||
if *test {
|
||||
testInput.add(string(lhs))
|
||||
}
|
||||
failOnError(builder.Add(lhs, rhs, vars))
|
||||
}
|
||||
}
|
||||
if scanner.Err() != nil {
|
||||
log.Fatal(scanner.Err())
|
||||
}
|
||||
}
|
||||
|
||||
func convHex(line int, s string) int {
|
||||
r, e := strconv.ParseInt(s, 16, 32)
|
||||
if e != nil {
|
||||
log.Fatalf("%d: %v", line, e)
|
||||
}
|
||||
return int(r)
|
||||
}
|
||||
|
||||
var testInput = stringSet{}
|
||||
|
||||
var charRe = regexp.MustCompile(`&#x([0-9A-F]*);`)
|
||||
var tagRe = regexp.MustCompile(`<([a-z_]*) */>`)
|
||||
|
||||
var mainLocales = []string{}
|
||||
|
||||
// charsets holds a list of exemplar characters per category.
|
||||
type charSets map[string][]string
|
||||
|
||||
func (p charSets) fprint(w io.Writer) {
|
||||
fmt.Fprintln(w, "[exN]string{")
|
||||
for i, k := range []string{"", "contractions", "punctuation", "auxiliary", "currencySymbol", "index"} {
|
||||
if set := p[k]; len(set) != 0 {
|
||||
fmt.Fprintf(w, "\t\t%d: %q,\n", i, strings.Join(set, " "))
|
||||
}
|
||||
}
|
||||
fmt.Fprintln(w, "\t},")
|
||||
}
|
||||
|
||||
var localeChars = make(map[string]charSets)
|
||||
|
||||
const exemplarHeader = `
|
||||
type exemplarType int
|
||||
const (
|
||||
exCharacters exemplarType = iota
|
||||
exContractions
|
||||
exPunctuation
|
||||
exAuxiliary
|
||||
exCurrency
|
||||
exIndex
|
||||
exN
|
||||
)
|
||||
`
|
||||
|
||||
func printExemplarCharacters(w io.Writer) {
|
||||
fmt.Fprintln(w, exemplarHeader)
|
||||
fmt.Fprintln(w, "var exemplarCharacters = map[string][exN]string{")
|
||||
for _, loc := range mainLocales {
|
||||
fmt.Fprintf(w, "\t%q: ", loc)
|
||||
localeChars[loc].fprint(w)
|
||||
}
|
||||
fmt.Fprintln(w, "}")
|
||||
}
|
||||
|
||||
func decodeCLDR(d *cldr.Decoder) *cldr.CLDR {
|
||||
r := gen.OpenCLDRCoreZip()
|
||||
data, err := d.DecodeZip(r)
|
||||
failOnError(err)
|
||||
return data
|
||||
}
|
||||
|
||||
// parseMain parses XML files in the main directory of the CLDR core.zip file.
|
||||
func parseMain() {
|
||||
d := &cldr.Decoder{}
|
||||
d.SetDirFilter("main")
|
||||
d.SetSectionFilter("characters")
|
||||
data := decodeCLDR(d)
|
||||
for _, loc := range data.Locales() {
|
||||
x := data.RawLDML(loc)
|
||||
if skipLang(x.Identity.Language.Type) {
|
||||
continue
|
||||
}
|
||||
if x.Characters != nil {
|
||||
x, _ = data.LDML(loc)
|
||||
loc = language.Make(loc).String()
|
||||
for _, ec := range x.Characters.ExemplarCharacters {
|
||||
if ec.Draft != "" {
|
||||
continue
|
||||
}
|
||||
if _, ok := localeChars[loc]; !ok {
|
||||
mainLocales = append(mainLocales, loc)
|
||||
localeChars[loc] = make(charSets)
|
||||
}
|
||||
localeChars[loc][ec.Type] = parseCharacters(ec.Data())
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func parseCharacters(chars string) []string {
|
||||
parseSingle := func(s string) (r rune, tail string, escaped bool) {
|
||||
if s[0] == '\\' {
|
||||
return rune(s[1]), s[2:], true
|
||||
}
|
||||
r, sz := utf8.DecodeRuneInString(s)
|
||||
return r, s[sz:], false
|
||||
}
|
||||
chars = strings.TrimSpace(chars)
|
||||
if n := len(chars) - 1; chars[n] == ']' && chars[0] == '[' {
|
||||
chars = chars[1:n]
|
||||
}
|
||||
list := []string{}
|
||||
var r, last, end rune
|
||||
for len(chars) > 0 {
|
||||
if chars[0] == '{' { // character sequence
|
||||
buf := []rune{}
|
||||
for chars = chars[1:]; len(chars) > 0; {
|
||||
r, chars, _ = parseSingle(chars)
|
||||
if r == '}' {
|
||||
break
|
||||
}
|
||||
if r == ' ' {
|
||||
log.Fatalf("space not supported in sequence %q", chars)
|
||||
}
|
||||
buf = append(buf, r)
|
||||
}
|
||||
list = append(list, string(buf))
|
||||
last = 0
|
||||
} else { // single character
|
||||
escaped := false
|
||||
r, chars, escaped = parseSingle(chars)
|
||||
if r != ' ' {
|
||||
if r == '-' && !escaped {
|
||||
if last == 0 {
|
||||
log.Fatal("'-' should be preceded by a character")
|
||||
}
|
||||
end, chars, _ = parseSingle(chars)
|
||||
for ; last <= end; last++ {
|
||||
list = append(list, string(last))
|
||||
}
|
||||
last = 0
|
||||
} else {
|
||||
list = append(list, string(r))
|
||||
last = r
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return list
|
||||
}
|
||||
|
||||
var fileRe = regexp.MustCompile(`.*/collation/(.*)\.xml`)
|
||||
|
||||
// typeMap translates legacy type keys to their BCP47 equivalent.
|
||||
var typeMap = map[string]string{
|
||||
"phonebook": "phonebk",
|
||||
"traditional": "trad",
|
||||
}
|
||||
|
||||
// parseCollation parses XML files in the collation directory of the CLDR core.zip file.
|
||||
func parseCollation(b *build.Builder) {
|
||||
d := &cldr.Decoder{}
|
||||
d.SetDirFilter("collation")
|
||||
data := decodeCLDR(d)
|
||||
for _, loc := range data.Locales() {
|
||||
x, err := data.LDML(loc)
|
||||
failOnError(err)
|
||||
if skipLang(x.Identity.Language.Type) {
|
||||
continue
|
||||
}
|
||||
cs := x.Collations.Collation
|
||||
sl := cldr.MakeSlice(&cs)
|
||||
if len(types.s) == 0 {
|
||||
sl.SelectAnyOf("type", x.Collations.Default())
|
||||
} else if !types.all {
|
||||
sl.SelectAnyOf("type", types.s...)
|
||||
}
|
||||
sl.SelectOnePerGroup("alt", altInclude())
|
||||
|
||||
for _, c := range cs {
|
||||
id, err := language.Parse(loc)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "invalid locale: %q", err)
|
||||
continue
|
||||
}
|
||||
// Support both old- and new-style defaults.
|
||||
d := c.Type
|
||||
if x.Collations.DefaultCollation == nil {
|
||||
d = x.Collations.Default()
|
||||
} else {
|
||||
d = x.Collations.DefaultCollation.Data()
|
||||
}
|
||||
// We assume tables are being built either for search or collation,
|
||||
// but not both. For search the default is always "search".
|
||||
if d != c.Type && c.Type != "search" {
|
||||
typ := c.Type
|
||||
if len(c.Type) > 8 {
|
||||
typ = typeMap[c.Type]
|
||||
}
|
||||
id, err = id.SetTypeForKey("co", typ)
|
||||
failOnError(err)
|
||||
}
|
||||
t := b.Tailoring(id)
|
||||
c.Process(processor{t})
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type processor struct {
|
||||
t *build.Tailoring
|
||||
}
|
||||
|
||||
func (p processor) Reset(anchor string, before int) (err error) {
|
||||
if before != 0 {
|
||||
err = p.t.SetAnchorBefore(anchor)
|
||||
} else {
|
||||
err = p.t.SetAnchor(anchor)
|
||||
}
|
||||
failOnError(err)
|
||||
return nil
|
||||
}
|
||||
|
||||
func (p processor) Insert(level int, str, context, extend string) error {
|
||||
str = context + str
|
||||
if *test {
|
||||
testInput.add(str)
|
||||
}
|
||||
// TODO: mimic bug in old maketables: remove.
|
||||
err := p.t.Insert(colltab.Level(level-1), str, context+extend)
|
||||
failOnError(err)
|
||||
return nil
|
||||
}
|
||||
|
||||
func (p processor) Index(id string) {
|
||||
}
|
||||
|
||||
func testCollator(c *collate.Collator) {
|
||||
c0 := collate.New(language.Und)
|
||||
|
||||
// iterator over all characters for all locales and check
|
||||
// whether Key is equal.
|
||||
buf := collate.Buffer{}
|
||||
|
||||
// Add all common and not too uncommon runes to the test set.
|
||||
for i := rune(0); i < 0x30000; i++ {
|
||||
testInput.add(string(i))
|
||||
}
|
||||
for i := rune(0xE0000); i < 0xF0000; i++ {
|
||||
testInput.add(string(i))
|
||||
}
|
||||
for _, str := range testInput.values() {
|
||||
k0 := c0.KeyFromString(&buf, str)
|
||||
k := c.KeyFromString(&buf, str)
|
||||
if !bytes.Equal(k0, k) {
|
||||
failOnError(fmt.Errorf("test:%U: keys differ (%x vs %x)", []rune(str), k0, k))
|
||||
}
|
||||
buf.Reset()
|
||||
}
|
||||
fmt.Println("PASS")
|
||||
}
|
||||
|
||||
func main() {
|
||||
gen.Init()
|
||||
b := build.NewBuilder()
|
||||
parseUCA(b)
|
||||
if tables.contains("chars") {
|
||||
parseMain()
|
||||
}
|
||||
parseCollation(b)
|
||||
|
||||
c, err := b.Build()
|
||||
failOnError(err)
|
||||
|
||||
if *test {
|
||||
testCollator(collate.NewFromTable(c))
|
||||
} else {
|
||||
w := &bytes.Buffer{}
|
||||
|
||||
gen.WriteUnicodeVersion(w)
|
||||
gen.WriteCLDRVersion(w)
|
||||
|
||||
if tables.contains("collate") {
|
||||
_, err = b.Print(w)
|
||||
failOnError(err)
|
||||
}
|
||||
if tables.contains("chars") {
|
||||
printExemplarCharacters(w)
|
||||
}
|
||||
gen.WriteGoFile("tables.go", *pkg, w.Bytes())
|
||||
}
|
||||
}
|
||||
239
vendor/golang.org/x/text/collate/option.go
generated
vendored
Normal file
239
vendor/golang.org/x/text/collate/option.go
generated
vendored
Normal file
|
|
@ -0,0 +1,239 @@
|
|||
// Copyright 2014 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
import (
|
||||
"sort"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
// newCollator creates a new collator with default options configured.
|
||||
func newCollator(t colltab.Weighter) *Collator {
|
||||
// Initialize a collator with default options.
|
||||
c := &Collator{
|
||||
options: options{
|
||||
ignore: [colltab.NumLevels]bool{
|
||||
colltab.Quaternary: true,
|
||||
colltab.Identity: true,
|
||||
},
|
||||
f: norm.NFD,
|
||||
t: t,
|
||||
},
|
||||
}
|
||||
|
||||
// TODO: store vt in tags or remove.
|
||||
c.variableTop = t.Top()
|
||||
|
||||
return c
|
||||
}
|
||||
|
||||
// An Option is used to change the behavior of a Collator. Options override the
|
||||
// settings passed through the locale identifier.
|
||||
type Option struct {
|
||||
priority int
|
||||
f func(o *options)
|
||||
}
|
||||
|
||||
type prioritizedOptions []Option
|
||||
|
||||
func (p prioritizedOptions) Len() int {
|
||||
return len(p)
|
||||
}
|
||||
|
||||
func (p prioritizedOptions) Swap(i, j int) {
|
||||
p[i], p[j] = p[j], p[i]
|
||||
}
|
||||
|
||||
func (p prioritizedOptions) Less(i, j int) bool {
|
||||
return p[i].priority < p[j].priority
|
||||
}
|
||||
|
||||
type options struct {
|
||||
// ignore specifies which levels to ignore.
|
||||
ignore [colltab.NumLevels]bool
|
||||
|
||||
// caseLevel is true if there is an additional level of case matching
|
||||
// between the secondary and tertiary levels.
|
||||
caseLevel bool
|
||||
|
||||
// backwards specifies the order of sorting at the secondary level.
|
||||
// This option exists predominantly to support reverse sorting of accents in French.
|
||||
backwards bool
|
||||
|
||||
// numeric specifies whether any sequence of decimal digits (category is Nd)
|
||||
// is sorted at a primary level with its numeric value.
|
||||
// For example, "A-21" < "A-123".
|
||||
// This option is set by wrapping the main Weighter with NewNumericWeighter.
|
||||
numeric bool
|
||||
|
||||
// alternate specifies an alternative handling of variables.
|
||||
alternate alternateHandling
|
||||
|
||||
// variableTop is the largest primary value that is considered to be
|
||||
// variable.
|
||||
variableTop uint32
|
||||
|
||||
t colltab.Weighter
|
||||
|
||||
f norm.Form
|
||||
}
|
||||
|
||||
func (o *options) setOptions(opts []Option) {
|
||||
sort.Sort(prioritizedOptions(opts))
|
||||
for _, x := range opts {
|
||||
x.f(o)
|
||||
}
|
||||
}
|
||||
|
||||
// OptionsFromTag extracts the BCP47 collation options from the tag and
|
||||
// configures a collator accordingly. These options are set before any other
|
||||
// option.
|
||||
func OptionsFromTag(t language.Tag) Option {
|
||||
return Option{0, func(o *options) {
|
||||
o.setFromTag(t)
|
||||
}}
|
||||
}
|
||||
|
||||
func (o *options) setFromTag(t language.Tag) {
|
||||
o.caseLevel = ldmlBool(t, o.caseLevel, "kc")
|
||||
o.backwards = ldmlBool(t, o.backwards, "kb")
|
||||
o.numeric = ldmlBool(t, o.numeric, "kn")
|
||||
|
||||
// Extract settings from the BCP47 u extension.
|
||||
switch t.TypeForKey("ks") { // strength
|
||||
case "level1":
|
||||
o.ignore[colltab.Secondary] = true
|
||||
o.ignore[colltab.Tertiary] = true
|
||||
case "level2":
|
||||
o.ignore[colltab.Tertiary] = true
|
||||
case "level3", "":
|
||||
// The default.
|
||||
case "level4":
|
||||
o.ignore[colltab.Quaternary] = false
|
||||
case "identic":
|
||||
o.ignore[colltab.Quaternary] = false
|
||||
o.ignore[colltab.Identity] = false
|
||||
}
|
||||
|
||||
switch t.TypeForKey("ka") {
|
||||
case "shifted":
|
||||
o.alternate = altShifted
|
||||
// The following two types are not official BCP47, but we support them to
|
||||
// give access to this otherwise hidden functionality. The name blanked is
|
||||
// derived from the LDML name blanked and posix reflects the main use of
|
||||
// the shift-trimmed option.
|
||||
case "blanked":
|
||||
o.alternate = altBlanked
|
||||
case "posix":
|
||||
o.alternate = altShiftTrimmed
|
||||
}
|
||||
|
||||
// TODO: caseFirst ("kf"), reorder ("kr"), and maybe variableTop ("vt").
|
||||
|
||||
// Not used:
|
||||
// - normalization ("kk", not necessary for this implementation)
|
||||
// - hiraganaQuatenary ("kh", obsolete)
|
||||
}
|
||||
|
||||
func ldmlBool(t language.Tag, old bool, key string) bool {
|
||||
switch t.TypeForKey(key) {
|
||||
case "true":
|
||||
return true
|
||||
case "false":
|
||||
return false
|
||||
default:
|
||||
return old
|
||||
}
|
||||
}
|
||||
|
||||
var (
|
||||
// IgnoreCase sets case-insensitive comparison.
|
||||
IgnoreCase Option = ignoreCase
|
||||
ignoreCase = Option{3, ignoreCaseF}
|
||||
|
||||
// IgnoreDiacritics causes diacritical marks to be ignored. ("o" == "ö").
|
||||
IgnoreDiacritics Option = ignoreDiacritics
|
||||
ignoreDiacritics = Option{3, ignoreDiacriticsF}
|
||||
|
||||
// IgnoreWidth causes full-width characters to match their half-width
|
||||
// equivalents.
|
||||
IgnoreWidth Option = ignoreWidth
|
||||
ignoreWidth = Option{2, ignoreWidthF}
|
||||
|
||||
// Loose sets the collator to ignore diacritics, case and weight.
|
||||
Loose Option = loose
|
||||
loose = Option{4, looseF}
|
||||
|
||||
// Force ordering if strings are equivalent but not equal.
|
||||
Force Option = force
|
||||
force = Option{5, forceF}
|
||||
|
||||
// Numeric specifies that numbers should sort numerically ("2" < "12").
|
||||
Numeric Option = numeric
|
||||
numeric = Option{5, numericF}
|
||||
)
|
||||
|
||||
func ignoreWidthF(o *options) {
|
||||
o.ignore[colltab.Tertiary] = true
|
||||
o.caseLevel = true
|
||||
}
|
||||
|
||||
func ignoreDiacriticsF(o *options) {
|
||||
o.ignore[colltab.Secondary] = true
|
||||
}
|
||||
|
||||
func ignoreCaseF(o *options) {
|
||||
o.ignore[colltab.Tertiary] = true
|
||||
o.caseLevel = false
|
||||
}
|
||||
|
||||
func looseF(o *options) {
|
||||
ignoreWidthF(o)
|
||||
ignoreDiacriticsF(o)
|
||||
ignoreCaseF(o)
|
||||
}
|
||||
|
||||
func forceF(o *options) {
|
||||
o.ignore[colltab.Identity] = false
|
||||
}
|
||||
|
||||
func numericF(o *options) { o.numeric = true }
|
||||
|
||||
// Reorder overrides the pre-defined ordering of scripts and character sets.
|
||||
func Reorder(s ...string) Option {
|
||||
// TODO: need fractional weights to implement this.
|
||||
panic("TODO: implement")
|
||||
}
|
||||
|
||||
// TODO: consider making these public again. These options cannot be fully
|
||||
// specified in BCP47, so an API interface seems warranted. Still a higher-level
|
||||
// interface would be nice (e.g. a POSIX option for enabling altShiftTrimmed)
|
||||
|
||||
// alternateHandling identifies the various ways in which variables are handled.
|
||||
// A rune with a primary weight lower than the variable top is considered a
|
||||
// variable.
|
||||
// See http://www.unicode.org/reports/tr10/#Variable_Weighting for details.
|
||||
type alternateHandling int
|
||||
|
||||
const (
|
||||
// altNonIgnorable turns off special handling of variables.
|
||||
altNonIgnorable alternateHandling = iota
|
||||
|
||||
// altBlanked sets variables and all subsequent primary ignorables to be
|
||||
// ignorable at all levels. This is identical to removing all variables
|
||||
// and subsequent primary ignorables from the input.
|
||||
altBlanked
|
||||
|
||||
// altShifted sets variables to be ignorable for levels one through three and
|
||||
// adds a fourth level based on the values of the ignored levels.
|
||||
altShifted
|
||||
|
||||
// altShiftTrimmed is a slight variant of altShifted that is used to
|
||||
// emulate POSIX.
|
||||
altShiftTrimmed
|
||||
)
|
||||
209
vendor/golang.org/x/text/collate/option_test.go
generated
vendored
Normal file
209
vendor/golang.org/x/text/collate/option_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,209 @@
|
|||
// Copyright 2014 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
package collate
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
var (
|
||||
defaultIgnore = ignore(colltab.Tertiary)
|
||||
defaultTable = getTable(locales[0])
|
||||
)
|
||||
|
||||
func TestOptions(t *testing.T) {
|
||||
for i, tt := range []struct {
|
||||
in []Option
|
||||
out options
|
||||
}{
|
||||
0: {
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
},
|
||||
},
|
||||
1: {
|
||||
in: []Option{IgnoreDiacritics},
|
||||
out: options{
|
||||
ignore: [colltab.NumLevels]bool{false, true, false, true, true},
|
||||
},
|
||||
},
|
||||
2: {
|
||||
in: []Option{IgnoreCase, IgnoreDiacritics},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Primary),
|
||||
},
|
||||
},
|
||||
3: {
|
||||
in: []Option{ignoreDiacritics, IgnoreWidth},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Primary),
|
||||
caseLevel: true,
|
||||
},
|
||||
},
|
||||
4: {
|
||||
in: []Option{IgnoreWidth, ignoreDiacritics},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Primary),
|
||||
caseLevel: true,
|
||||
},
|
||||
},
|
||||
5: {
|
||||
in: []Option{IgnoreCase, IgnoreWidth},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Secondary),
|
||||
},
|
||||
},
|
||||
6: {
|
||||
in: []Option{IgnoreCase, IgnoreWidth, Loose},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Primary),
|
||||
},
|
||||
},
|
||||
7: {
|
||||
in: []Option{Force, IgnoreCase, IgnoreWidth, Loose},
|
||||
out: options{
|
||||
ignore: [colltab.NumLevels]bool{false, true, true, true, false},
|
||||
},
|
||||
},
|
||||
8: {
|
||||
in: []Option{IgnoreDiacritics, IgnoreCase},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Primary),
|
||||
},
|
||||
},
|
||||
9: {
|
||||
in: []Option{Numeric},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
numeric: true,
|
||||
},
|
||||
},
|
||||
10: {
|
||||
in: []Option{OptionsFromTag(language.MustParse("und-u-ks-level1"))},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Primary),
|
||||
},
|
||||
},
|
||||
11: {
|
||||
in: []Option{OptionsFromTag(language.MustParse("und-u-ks-level4"))},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Quaternary),
|
||||
},
|
||||
},
|
||||
12: {
|
||||
in: []Option{OptionsFromTag(language.MustParse("und-u-ks-identic"))},
|
||||
out: options{},
|
||||
},
|
||||
13: {
|
||||
in: []Option{
|
||||
OptionsFromTag(language.MustParse("und-u-kn-true-kb-true-kc-true")),
|
||||
},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
caseLevel: true,
|
||||
backwards: true,
|
||||
numeric: true,
|
||||
},
|
||||
},
|
||||
14: {
|
||||
in: []Option{
|
||||
OptionsFromTag(language.MustParse("und-u-kn-true-kb-true-kc-true")),
|
||||
OptionsFromTag(language.MustParse("und-u-kn-false-kb-false-kc-false")),
|
||||
},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
},
|
||||
},
|
||||
15: {
|
||||
in: []Option{
|
||||
OptionsFromTag(language.MustParse("und-u-kn-true-kb-true-kc-true")),
|
||||
OptionsFromTag(language.MustParse("und-u-kn-foo-kb-foo-kc-foo")),
|
||||
},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
caseLevel: true,
|
||||
backwards: true,
|
||||
numeric: true,
|
||||
},
|
||||
},
|
||||
16: { // Normal options take precedence over tag options.
|
||||
in: []Option{
|
||||
Numeric, IgnoreCase,
|
||||
OptionsFromTag(language.MustParse("und-u-kn-false-kc-true")),
|
||||
},
|
||||
out: options{
|
||||
ignore: ignore(colltab.Secondary),
|
||||
caseLevel: false,
|
||||
numeric: true,
|
||||
},
|
||||
},
|
||||
17: {
|
||||
in: []Option{
|
||||
OptionsFromTag(language.MustParse("und-u-ka-shifted")),
|
||||
},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
alternate: altShifted,
|
||||
},
|
||||
},
|
||||
18: {
|
||||
in: []Option{
|
||||
OptionsFromTag(language.MustParse("und-u-ka-blanked")),
|
||||
},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
alternate: altBlanked,
|
||||
},
|
||||
},
|
||||
19: {
|
||||
in: []Option{
|
||||
OptionsFromTag(language.MustParse("und-u-ka-posix")),
|
||||
},
|
||||
out: options{
|
||||
ignore: defaultIgnore,
|
||||
alternate: altShiftTrimmed,
|
||||
},
|
||||
},
|
||||
} {
|
||||
c := newCollator(defaultTable)
|
||||
c.t = nil
|
||||
c.variableTop = 0
|
||||
c.f = 0
|
||||
|
||||
c.setOptions(tt.in)
|
||||
if !reflect.DeepEqual(c.options, tt.out) {
|
||||
t.Errorf("%d: got %v; want %v", i, c.options, tt.out)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestAlternateSortTypes(t *testing.T) {
|
||||
testCases := []struct {
|
||||
lang string
|
||||
in []string
|
||||
want []string
|
||||
}{{
|
||||
lang: "zh,cmn,zh-Hant-u-co-pinyin,zh-HK-u-co-pinyin,zh-pinyin",
|
||||
in: []string{"爸爸", "妈妈", "儿子", "女儿"},
|
||||
want: []string{"爸爸", "儿子", "妈妈", "女儿"},
|
||||
}, {
|
||||
lang: "zh-Hant,zh-u-co-stroke,zh-Hant-u-co-stroke",
|
||||
in: []string{"爸爸", "妈妈", "儿子", "女儿"},
|
||||
want: []string{"儿子", "女儿", "妈妈", "爸爸"},
|
||||
}}
|
||||
for _, tc := range testCases {
|
||||
for _, tag := range strings.Split(tc.lang, ",") {
|
||||
got := append([]string{}, tc.in...)
|
||||
New(language.MustParse(tag)).SortStrings(got)
|
||||
if !reflect.DeepEqual(got, tc.want) {
|
||||
t.Errorf("New(%s).SortStrings(%v) = %v; want %v", tag, tc.in, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
230
vendor/golang.org/x/text/collate/reg_test.go
generated
vendored
Normal file
230
vendor/golang.org/x/text/collate/reg_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,230 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
import (
|
||||
"archive/zip"
|
||||
"bufio"
|
||||
"bytes"
|
||||
"flag"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"path"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"testing"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/collate/build"
|
||||
"golang.org/x/text/internal/gen"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
var long = flag.Bool("long", false,
|
||||
"run time-consuming tests, such as tests that fetch data online")
|
||||
|
||||
// This regression test runs tests for the test files in CollationTest.zip
|
||||
// (taken from http://www.unicode.org/Public/UCA/<gen.UnicodeVersion()>/).
|
||||
//
|
||||
// The test files have the following form:
|
||||
// # header
|
||||
// 0009 0021; # ('\u0009') <CHARACTER TABULATION> [| | | 0201 025E]
|
||||
// 0009 003F; # ('\u0009') <CHARACTER TABULATION> [| | | 0201 0263]
|
||||
// 000A 0021; # ('\u000A') <LINE FEED (LF)> [| | | 0202 025E]
|
||||
// 000A 003F; # ('\u000A') <LINE FEED (LF)> [| | | 0202 0263]
|
||||
//
|
||||
// The part before the semicolon is the hex representation of a sequence
|
||||
// of runes. After the hash mark is a comment. The strings
|
||||
// represented by rune sequence are in the file in sorted order, as
|
||||
// defined by the DUCET.
|
||||
|
||||
type Test struct {
|
||||
name string
|
||||
str [][]byte
|
||||
comment []string
|
||||
}
|
||||
|
||||
var versionRe = regexp.MustCompile(`# UCA Version: (.*)\n?$`)
|
||||
var testRe = regexp.MustCompile(`^([\dA-F ]+);.*# (.*)\n?$`)
|
||||
|
||||
func TestCollation(t *testing.T) {
|
||||
if !gen.IsLocal() && !*long {
|
||||
t.Skip("skipping test to prevent downloading; to run use -long or use -local to specify a local source")
|
||||
}
|
||||
t.Skip("must first update to new file format to support test")
|
||||
for _, test := range loadTestData() {
|
||||
doTest(t, test)
|
||||
}
|
||||
}
|
||||
|
||||
func Error(e error) {
|
||||
if e != nil {
|
||||
log.Fatal(e)
|
||||
}
|
||||
}
|
||||
|
||||
// parseUCA parses a Default Unicode Collation Element Table of the format
|
||||
// specified in http://www.unicode.org/reports/tr10/#File_Format.
|
||||
// It returns the variable top.
|
||||
func parseUCA(builder *build.Builder) {
|
||||
r := gen.OpenUnicodeFile("UCA", "", "allkeys.txt")
|
||||
defer r.Close()
|
||||
input := bufio.NewReader(r)
|
||||
colelem := regexp.MustCompile(`\[([.*])([0-9A-F.]+)\]`)
|
||||
for i := 1; true; i++ {
|
||||
l, prefix, err := input.ReadLine()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
Error(err)
|
||||
line := string(l)
|
||||
if prefix {
|
||||
log.Fatalf("%d: buffer overflow", i)
|
||||
}
|
||||
if len(line) == 0 || line[0] == '#' {
|
||||
continue
|
||||
}
|
||||
if line[0] == '@' {
|
||||
if strings.HasPrefix(line[1:], "version ") {
|
||||
if v := strings.Split(line[1:], " ")[1]; v != gen.UnicodeVersion() {
|
||||
log.Fatalf("incompatible version %s; want %s", v, gen.UnicodeVersion())
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// parse entries
|
||||
part := strings.Split(line, " ; ")
|
||||
if len(part) != 2 {
|
||||
log.Fatalf("%d: production rule without ';': %v", i, line)
|
||||
}
|
||||
lhs := []rune{}
|
||||
for _, v := range strings.Split(part[0], " ") {
|
||||
if v != "" {
|
||||
lhs = append(lhs, rune(convHex(i, v)))
|
||||
}
|
||||
}
|
||||
vars := []int{}
|
||||
rhs := [][]int{}
|
||||
for i, m := range colelem.FindAllStringSubmatch(part[1], -1) {
|
||||
if m[1] == "*" {
|
||||
vars = append(vars, i)
|
||||
}
|
||||
elem := []int{}
|
||||
for _, h := range strings.Split(m[2], ".") {
|
||||
elem = append(elem, convHex(i, h))
|
||||
}
|
||||
rhs = append(rhs, elem)
|
||||
}
|
||||
builder.Add(lhs, rhs, vars)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func convHex(line int, s string) int {
|
||||
r, e := strconv.ParseInt(s, 16, 32)
|
||||
if e != nil {
|
||||
log.Fatalf("%d: %v", line, e)
|
||||
}
|
||||
return int(r)
|
||||
}
|
||||
|
||||
func loadTestData() []Test {
|
||||
f := gen.OpenUnicodeFile("UCA", "", "CollationTest.zip")
|
||||
buffer, err := ioutil.ReadAll(f)
|
||||
f.Close()
|
||||
Error(err)
|
||||
archive, err := zip.NewReader(bytes.NewReader(buffer), int64(len(buffer)))
|
||||
Error(err)
|
||||
tests := []Test{}
|
||||
for _, f := range archive.File {
|
||||
// Skip the short versions, which are simply duplicates of the long versions.
|
||||
if strings.Contains(f.Name, "SHORT") || f.FileInfo().IsDir() {
|
||||
continue
|
||||
}
|
||||
ff, err := f.Open()
|
||||
Error(err)
|
||||
defer ff.Close()
|
||||
scanner := bufio.NewScanner(ff)
|
||||
test := Test{name: path.Base(f.Name)}
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
if len(line) <= 1 || line[0] == '#' {
|
||||
if m := versionRe.FindStringSubmatch(line); m != nil {
|
||||
if m[1] != gen.UnicodeVersion() {
|
||||
log.Printf("warning:%s: version is %s; want %s", f.Name, m[1], gen.UnicodeVersion())
|
||||
}
|
||||
}
|
||||
continue
|
||||
}
|
||||
m := testRe.FindStringSubmatch(line)
|
||||
if m == nil || len(m) < 3 {
|
||||
log.Fatalf(`Failed to parse: "%s" result: %#v`, line, m)
|
||||
}
|
||||
str := []byte{}
|
||||
// In the regression test data (unpaired) surrogates are assigned a weight
|
||||
// corresponding to their code point value. However, utf8.DecodeRune,
|
||||
// which is used to compute the implicit weight, assigns FFFD to surrogates.
|
||||
// We therefore skip tests with surrogates. This skips about 35 entries
|
||||
// per test.
|
||||
valid := true
|
||||
for _, split := range strings.Split(m[1], " ") {
|
||||
r, err := strconv.ParseUint(split, 16, 64)
|
||||
Error(err)
|
||||
valid = valid && utf8.ValidRune(rune(r))
|
||||
str = append(str, string(rune(r))...)
|
||||
}
|
||||
if valid {
|
||||
test.str = append(test.str, str)
|
||||
test.comment = append(test.comment, m[2])
|
||||
}
|
||||
}
|
||||
if scanner.Err() != nil {
|
||||
log.Fatal(scanner.Err())
|
||||
}
|
||||
tests = append(tests, test)
|
||||
}
|
||||
return tests
|
||||
}
|
||||
|
||||
var errorCount int
|
||||
|
||||
func runes(b []byte) []rune {
|
||||
return []rune(string(b))
|
||||
}
|
||||
|
||||
var shifted = language.MustParse("und-u-ka-shifted-ks-level4")
|
||||
|
||||
func doTest(t *testing.T, tc Test) {
|
||||
bld := build.NewBuilder()
|
||||
parseUCA(bld)
|
||||
w, err := bld.Build()
|
||||
Error(err)
|
||||
var tag language.Tag
|
||||
if !strings.Contains(tc.name, "NON_IGNOR") {
|
||||
tag = shifted
|
||||
}
|
||||
c := NewFromTable(w, OptionsFromTag(tag))
|
||||
b := &Buffer{}
|
||||
prev := tc.str[0]
|
||||
for i := 1; i < len(tc.str); i++ {
|
||||
b.Reset()
|
||||
s := tc.str[i]
|
||||
ka := c.Key(b, prev)
|
||||
kb := c.Key(b, s)
|
||||
if r := bytes.Compare(ka, kb); r == 1 {
|
||||
t.Errorf("%s:%d: Key(%.4X) < Key(%.4X) (%X < %X) == %d; want -1 or 0", tc.name, i, []rune(string(prev)), []rune(string(s)), ka, kb, r)
|
||||
prev = s
|
||||
continue
|
||||
}
|
||||
if r := c.Compare(prev, s); r == 1 {
|
||||
t.Errorf("%s:%d: Compare(%.4X, %.4X) == %d; want -1 or 0", tc.name, i, runes(prev), runes(s), r)
|
||||
}
|
||||
if r := c.Compare(s, prev); r == -1 {
|
||||
t.Errorf("%s:%d: Compare(%.4X, %.4X) == %d; want 1 or 0", tc.name, i, runes(s), runes(prev), r)
|
||||
}
|
||||
prev = s
|
||||
}
|
||||
}
|
||||
81
vendor/golang.org/x/text/collate/sort.go
generated
vendored
Normal file
81
vendor/golang.org/x/text/collate/sort.go
generated
vendored
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"sort"
|
||||
)
|
||||
|
||||
const (
|
||||
maxSortBuffer = 40960
|
||||
maxSortEntries = 4096
|
||||
)
|
||||
|
||||
type swapper interface {
|
||||
Swap(i, j int)
|
||||
}
|
||||
|
||||
type sorter struct {
|
||||
buf *Buffer
|
||||
keys [][]byte
|
||||
src swapper
|
||||
}
|
||||
|
||||
func (s *sorter) init(n int) {
|
||||
if s.buf == nil {
|
||||
s.buf = &Buffer{}
|
||||
s.buf.init()
|
||||
}
|
||||
if cap(s.keys) < n {
|
||||
s.keys = make([][]byte, n)
|
||||
}
|
||||
s.keys = s.keys[0:n]
|
||||
}
|
||||
|
||||
func (s *sorter) sort(src swapper) {
|
||||
s.src = src
|
||||
sort.Sort(s)
|
||||
}
|
||||
|
||||
func (s sorter) Len() int {
|
||||
return len(s.keys)
|
||||
}
|
||||
|
||||
func (s sorter) Less(i, j int) bool {
|
||||
return bytes.Compare(s.keys[i], s.keys[j]) == -1
|
||||
}
|
||||
|
||||
func (s sorter) Swap(i, j int) {
|
||||
s.keys[i], s.keys[j] = s.keys[j], s.keys[i]
|
||||
s.src.Swap(i, j)
|
||||
}
|
||||
|
||||
// A Lister can be sorted by Collator's Sort method.
|
||||
type Lister interface {
|
||||
Len() int
|
||||
Swap(i, j int)
|
||||
// Bytes returns the bytes of the text at index i.
|
||||
Bytes(i int) []byte
|
||||
}
|
||||
|
||||
// Sort uses sort.Sort to sort the strings represented by x using the rules of c.
|
||||
func (c *Collator) Sort(x Lister) {
|
||||
n := x.Len()
|
||||
c.sorter.init(n)
|
||||
for i := 0; i < n; i++ {
|
||||
c.sorter.keys[i] = c.Key(c.sorter.buf, x.Bytes(i))
|
||||
}
|
||||
c.sorter.sort(x)
|
||||
}
|
||||
|
||||
// SortStrings uses sort.Sort to sort the strings in x using the rules of c.
|
||||
func (c *Collator) SortStrings(x []string) {
|
||||
c.sorter.init(len(x))
|
||||
for i, s := range x {
|
||||
c.sorter.keys[i] = c.KeyFromString(c.sorter.buf, s)
|
||||
}
|
||||
c.sorter.sort(sort.StringSlice(x))
|
||||
}
|
||||
55
vendor/golang.org/x/text/collate/sort_test.go
generated
vendored
Normal file
55
vendor/golang.org/x/text/collate/sort_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/collate"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
func ExampleCollator_Strings() {
|
||||
c := collate.New(language.Und)
|
||||
strings := []string{
|
||||
"ad",
|
||||
"ab",
|
||||
"äb",
|
||||
"ac",
|
||||
}
|
||||
c.SortStrings(strings)
|
||||
fmt.Println(strings)
|
||||
// Output: [ab äb ac ad]
|
||||
}
|
||||
|
||||
type sorter []string
|
||||
|
||||
func (s sorter) Len() int {
|
||||
return len(s)
|
||||
}
|
||||
|
||||
func (s sorter) Swap(i, j int) {
|
||||
s[j], s[i] = s[i], s[j]
|
||||
}
|
||||
|
||||
func (s sorter) Bytes(i int) []byte {
|
||||
return []byte(s[i])
|
||||
}
|
||||
|
||||
func TestSort(t *testing.T) {
|
||||
c := collate.New(language.English)
|
||||
strings := []string{
|
||||
"bcd",
|
||||
"abc",
|
||||
"ddd",
|
||||
}
|
||||
c.Sort(sorter(strings))
|
||||
res := fmt.Sprint(strings)
|
||||
want := "[abc bcd ddd]"
|
||||
if res != want {
|
||||
t.Errorf("found %s; want %s", res, want)
|
||||
}
|
||||
}
|
||||
291
vendor/golang.org/x/text/collate/table_test.go
generated
vendored
Normal file
291
vendor/golang.org/x/text/collate/table_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,291 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package collate
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/collate/build"
|
||||
"golang.org/x/text/internal/colltab"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
type ColElems []Weights
|
||||
|
||||
type input struct {
|
||||
str string
|
||||
ces [][]int
|
||||
}
|
||||
|
||||
type check struct {
|
||||
in string
|
||||
n int
|
||||
out ColElems
|
||||
}
|
||||
|
||||
type tableTest struct {
|
||||
in []input
|
||||
chk []check
|
||||
}
|
||||
|
||||
func w(ce ...int) Weights {
|
||||
return W(ce...)
|
||||
}
|
||||
|
||||
var defaults = w(0)
|
||||
|
||||
func pt(p, t int) []int {
|
||||
return []int{p, defaults.Secondary, t}
|
||||
}
|
||||
|
||||
func makeTable(in []input) (*Collator, error) {
|
||||
b := build.NewBuilder()
|
||||
for _, r := range in {
|
||||
if e := b.Add([]rune(r.str), r.ces, nil); e != nil {
|
||||
panic(e)
|
||||
}
|
||||
}
|
||||
t, err := b.Build()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return NewFromTable(t), nil
|
||||
}
|
||||
|
||||
// modSeq holds a seqeunce of modifiers in increasing order of CCC long enough
|
||||
// to cause a segment overflow if not handled correctly. The last rune in this
|
||||
// list has a CCC of 214.
|
||||
var modSeq = []rune{
|
||||
0x05B1, 0x05B2, 0x05B3, 0x05B4, 0x05B5, 0x05B6, 0x05B7, 0x05B8, 0x05B9, 0x05BB,
|
||||
0x05BC, 0x05BD, 0x05BF, 0x05C1, 0x05C2, 0xFB1E, 0x064B, 0x064C, 0x064D, 0x064E,
|
||||
0x064F, 0x0650, 0x0651, 0x0652, 0x0670, 0x0711, 0x0C55, 0x0C56, 0x0E38, 0x0E48,
|
||||
0x0EB8, 0x0EC8, 0x0F71, 0x0F72, 0x0F74, 0x0321, 0x1DCE,
|
||||
}
|
||||
|
||||
var mods []input
|
||||
var modW = func() ColElems {
|
||||
ws := ColElems{}
|
||||
for _, r := range modSeq {
|
||||
rune := norm.NFC.PropertiesString(string(r))
|
||||
ws = append(ws, w(0, int(rune.CCC())))
|
||||
mods = append(mods, input{string(r), [][]int{{0, int(rune.CCC())}}})
|
||||
}
|
||||
return ws
|
||||
}()
|
||||
|
||||
var appendNextTests = []tableTest{
|
||||
{ // test getWeights
|
||||
[]input{
|
||||
{"a", [][]int{{100}}},
|
||||
{"b", [][]int{{105}}},
|
||||
{"c", [][]int{{110}}},
|
||||
{"ß", [][]int{{120}}},
|
||||
},
|
||||
[]check{
|
||||
{"a", 1, ColElems{w(100)}},
|
||||
{"b", 1, ColElems{w(105)}},
|
||||
{"c", 1, ColElems{w(110)}},
|
||||
{"d", 1, ColElems{w(0x50064)}},
|
||||
{"ab", 1, ColElems{w(100)}},
|
||||
{"bc", 1, ColElems{w(105)}},
|
||||
{"dd", 1, ColElems{w(0x50064)}},
|
||||
{"ß", 2, ColElems{w(120)}},
|
||||
},
|
||||
},
|
||||
{ // test expansion
|
||||
[]input{
|
||||
{"u", [][]int{{100}}},
|
||||
{"U", [][]int{{100}, {0, 25}}},
|
||||
{"w", [][]int{{100}, {100}}},
|
||||
{"W", [][]int{{100}, {0, 25}, {100}, {0, 25}}},
|
||||
},
|
||||
[]check{
|
||||
{"u", 1, ColElems{w(100)}},
|
||||
{"U", 1, ColElems{w(100), w(0, 25)}},
|
||||
{"w", 1, ColElems{w(100), w(100)}},
|
||||
{"W", 1, ColElems{w(100), w(0, 25), w(100), w(0, 25)}},
|
||||
},
|
||||
},
|
||||
{ // test decompose
|
||||
[]input{
|
||||
{"D", [][]int{pt(104, 8)}},
|
||||
{"z", [][]int{pt(130, 8)}},
|
||||
{"\u030C", [][]int{{0, 40}}}, // Caron
|
||||
{"\u01C5", [][]int{pt(104, 9), pt(130, 4), {0, 40, 0x1F}}}, // Dž = D+z+caron
|
||||
},
|
||||
[]check{
|
||||
{"\u01C5", 2, ColElems{w(pt(104, 9)...), w(pt(130, 4)...), w(0, 40, 0x1F)}},
|
||||
},
|
||||
},
|
||||
{ // test basic contraction
|
||||
[]input{
|
||||
{"a", [][]int{{100}}},
|
||||
{"ab", [][]int{{101}}},
|
||||
{"aab", [][]int{{101}, {101}}},
|
||||
{"abc", [][]int{{102}}},
|
||||
{"b", [][]int{{200}}},
|
||||
{"c", [][]int{{300}}},
|
||||
{"d", [][]int{{400}}},
|
||||
},
|
||||
[]check{
|
||||
{"a", 1, ColElems{w(100)}},
|
||||
{"aa", 1, ColElems{w(100)}},
|
||||
{"aac", 1, ColElems{w(100)}},
|
||||
{"d", 1, ColElems{w(400)}},
|
||||
{"ab", 2, ColElems{w(101)}},
|
||||
{"abb", 2, ColElems{w(101)}},
|
||||
{"aab", 3, ColElems{w(101), w(101)}},
|
||||
{"aaba", 3, ColElems{w(101), w(101)}},
|
||||
{"abc", 3, ColElems{w(102)}},
|
||||
{"abcd", 3, ColElems{w(102)}},
|
||||
},
|
||||
},
|
||||
{ // test discontinuous contraction
|
||||
append(mods, []input{
|
||||
// modifiers; secondary weight equals ccc
|
||||
{"\u0316", [][]int{{0, 220}}},
|
||||
{"\u0317", [][]int{{0, 220}, {0, 220}}},
|
||||
{"\u302D", [][]int{{0, 222}}},
|
||||
{"\u302E", [][]int{{0, 225}}}, // used as starter
|
||||
{"\u302F", [][]int{{0, 224}}}, // used as starter
|
||||
{"\u18A9", [][]int{{0, 228}}},
|
||||
{"\u0300", [][]int{{0, 230}}},
|
||||
{"\u0301", [][]int{{0, 230}}},
|
||||
{"\u0315", [][]int{{0, 232}}},
|
||||
{"\u031A", [][]int{{0, 232}}},
|
||||
{"\u035C", [][]int{{0, 233}}},
|
||||
{"\u035F", [][]int{{0, 233}}},
|
||||
{"\u035D", [][]int{{0, 234}}},
|
||||
{"\u035E", [][]int{{0, 234}}},
|
||||
{"\u0345", [][]int{{0, 240}}},
|
||||
|
||||
// starters
|
||||
{"a", [][]int{{100}}},
|
||||
{"b", [][]int{{200}}},
|
||||
{"c", [][]int{{300}}},
|
||||
{"\u03B1", [][]int{{900}}},
|
||||
{"\x01", [][]int{{0, 0, 0, 0}}},
|
||||
|
||||
// contractions
|
||||
{"a\u0300", [][]int{{101}}},
|
||||
{"a\u0301", [][]int{{102}}},
|
||||
{"a\u035E", [][]int{{110}}},
|
||||
{"a\u035Eb\u035E", [][]int{{115}}},
|
||||
{"ac\u035Eaca\u035E", [][]int{{116}}},
|
||||
{"a\u035Db\u035D", [][]int{{117}}},
|
||||
{"a\u0301\u035Db", [][]int{{120}}},
|
||||
{"a\u0301\u035F", [][]int{{121}}},
|
||||
{"a\u0301\u035Fb", [][]int{{119}}},
|
||||
{"\u03B1\u0345", [][]int{{901}, {902}}},
|
||||
{"\u302E\u302F", [][]int{{0, 131}, {0, 131}}},
|
||||
{"\u302F\u18A9", [][]int{{0, 130}}},
|
||||
}...),
|
||||
[]check{
|
||||
{"a\x01\u0300", 1, ColElems{w(100)}},
|
||||
{"ab", 1, ColElems{w(100)}}, // closing segment
|
||||
{"a\u0316\u0300b", 5, ColElems{w(101), w(0, 220)}}, // closing segment
|
||||
{"a\u0316\u0300", 5, ColElems{w(101), w(0, 220)}}, // no closing segment
|
||||
{"a\u0316\u0300\u035Cb", 5, ColElems{w(101), w(0, 220)}}, // completes before segment end
|
||||
{"a\u0316\u0300\u035C", 5, ColElems{w(101), w(0, 220)}}, // completes before segment end
|
||||
|
||||
{"a\u0316\u0301b", 5, ColElems{w(102), w(0, 220)}}, // closing segment
|
||||
{"a\u0316\u0301", 5, ColElems{w(102), w(0, 220)}}, // no closing segment
|
||||
{"a\u0316\u0301\u035Cb", 5, ColElems{w(102), w(0, 220)}}, // completes before segment end
|
||||
{"a\u0316\u0301\u035C", 5, ColElems{w(102), w(0, 220)}}, // completes before segment end
|
||||
|
||||
// match blocked by modifier with same ccc
|
||||
{"a\u0301\u0315\u031A\u035Fb", 3, ColElems{w(102)}},
|
||||
|
||||
// multiple gaps
|
||||
{"a\u0301\u035Db", 6, ColElems{w(120)}},
|
||||
{"a\u0301\u035F", 5, ColElems{w(121)}},
|
||||
{"a\u0301\u035Fb", 6, ColElems{w(119)}},
|
||||
{"a\u0316\u0301\u035F", 7, ColElems{w(121), w(0, 220)}},
|
||||
{"a\u0301\u0315\u035Fb", 7, ColElems{w(121), w(0, 232)}},
|
||||
{"a\u0316\u0301\u0315\u035Db", 5, ColElems{w(102), w(0, 220)}},
|
||||
{"a\u0316\u0301\u0315\u035F", 9, ColElems{w(121), w(0, 220), w(0, 232)}},
|
||||
{"a\u0316\u0301\u0315\u035Fb", 9, ColElems{w(121), w(0, 220), w(0, 232)}},
|
||||
{"a\u0316\u0301\u0315\u035F\u035D", 9, ColElems{w(121), w(0, 220), w(0, 232)}},
|
||||
{"a\u0316\u0301\u0315\u035F\u035Db", 9, ColElems{w(121), w(0, 220), w(0, 232)}},
|
||||
|
||||
// handling of segment overflow
|
||||
{ // just fits within segment
|
||||
"a" + string(modSeq[:30]) + "\u0301",
|
||||
3 + len(string(modSeq[:30])),
|
||||
append(ColElems{w(102)}, modW[:30]...),
|
||||
},
|
||||
{"a" + string(modSeq[:31]) + "\u0301", 1, ColElems{w(100)}}, // overflow
|
||||
{"a" + string(modSeq) + "\u0301", 1, ColElems{w(100)}},
|
||||
{ // just fits within segment with two interstitial runes
|
||||
"a" + string(modSeq[:28]) + "\u0301\u0315\u035F",
|
||||
7 + len(string(modSeq[:28])),
|
||||
append(append(ColElems{w(121)}, modW[:28]...), w(0, 232)),
|
||||
},
|
||||
{ // second half does not fit within segment
|
||||
"a" + string(modSeq[:29]) + "\u0301\u0315\u035F",
|
||||
3 + len(string(modSeq[:29])),
|
||||
append(ColElems{w(102)}, modW[:29]...),
|
||||
},
|
||||
|
||||
// discontinuity can only occur in last normalization segment
|
||||
{"a\u035Eb\u035E", 6, ColElems{w(115)}},
|
||||
{"a\u0316\u035Eb\u035E", 5, ColElems{w(110), w(0, 220)}},
|
||||
{"a\u035Db\u035D", 6, ColElems{w(117)}},
|
||||
{"a\u0316\u035Db\u035D", 1, ColElems{w(100)}},
|
||||
{"a\u035Eb\u0316\u035E", 8, ColElems{w(115), w(0, 220)}},
|
||||
{"a\u035Db\u0316\u035D", 8, ColElems{w(117), w(0, 220)}},
|
||||
{"ac\u035Eaca\u035E", 9, ColElems{w(116)}},
|
||||
{"a\u0316c\u035Eaca\u035E", 1, ColElems{w(100)}},
|
||||
{"ac\u035Eac\u0316a\u035E", 1, ColElems{w(100)}},
|
||||
|
||||
// expanding contraction
|
||||
{"\u03B1\u0345", 4, ColElems{w(901), w(902)}},
|
||||
|
||||
// Theoretical possibilities
|
||||
// contraction within a gap
|
||||
{"a\u302F\u18A9\u0301", 9, ColElems{w(102), w(0, 130)}},
|
||||
// expansion within a gap
|
||||
{"a\u0317\u0301", 5, ColElems{w(102), w(0, 220), w(0, 220)}},
|
||||
// repeating CCC blocks last modifier
|
||||
{"a\u302E\u302F\u0301", 1, ColElems{w(100)}},
|
||||
// The trailing combining characters (with lower CCC) should block the first one.
|
||||
// TODO: make the following pass.
|
||||
// {"a\u035E\u0316\u0316", 1, ColElems{w(100)}},
|
||||
{"a\u035F\u035Eb", 5, ColElems{w(110), w(0, 233)}},
|
||||
// Last combiner should match after normalization.
|
||||
// TODO: make the following pass.
|
||||
// {"a\u035D\u0301", 3, ColElems{w(102), w(0, 234)}},
|
||||
// The first combiner is blocking the second one as they have the same CCC.
|
||||
{"a\u035D\u035Eb", 1, ColElems{w(100)}},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
func TestAppendNext(t *testing.T) {
|
||||
for i, tt := range appendNextTests {
|
||||
c, err := makeTable(tt.in)
|
||||
if err != nil {
|
||||
t.Errorf("%d: error creating table: %v", i, err)
|
||||
continue
|
||||
}
|
||||
for j, chk := range tt.chk {
|
||||
ws, n := c.t.AppendNext(nil, []byte(chk.in))
|
||||
if n != chk.n {
|
||||
t.Errorf("%d:%d: bytes consumed was %d; want %d", i, j, n, chk.n)
|
||||
}
|
||||
out := convertFromWeights(chk.out)
|
||||
if len(ws) != len(out) {
|
||||
t.Errorf("%d:%d: len(ws) was %d; want %d (%X vs %X)\n%X", i, j, len(ws), len(out), ws, out, chk.in)
|
||||
continue
|
||||
}
|
||||
for k, w := range ws {
|
||||
w, _ = colltab.MakeElem(w.Primary(), w.Secondary(), int(w.Tertiary()), 0)
|
||||
if w != out[k] {
|
||||
t.Errorf("%d:%d: Weights %d was %X; want %X", i, j, k, w, out[k])
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
73789
vendor/golang.org/x/text/collate/tables.go
generated
vendored
Normal file
73789
vendor/golang.org/x/text/collate/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
7
vendor/golang.org/x/text/collate/tools/colcmp/Makefile
generated
vendored
Normal file
7
vendor/golang.org/x/text/collate/tools/colcmp/Makefile
generated
vendored
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
# Copyright 2012 The Go Authors. All rights reserved.
|
||||
# Use of this source code is governed by a BSD-style
|
||||
# license that can be found in the LICENSE file.
|
||||
|
||||
chars:
|
||||
go run ../../maketables.go -tables=chars -package=main > chars.go
|
||||
gofmt -w -s chars.go
|
||||
1156
vendor/golang.org/x/text/collate/tools/colcmp/chars.go
generated
vendored
Normal file
1156
vendor/golang.org/x/text/collate/tools/colcmp/chars.go
generated
vendored
Normal file
File diff suppressed because one or more lines are too long
97
vendor/golang.org/x/text/collate/tools/colcmp/col.go
generated
vendored
Normal file
97
vendor/golang.org/x/text/collate/tools/colcmp/col.go
generated
vendored
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"log"
|
||||
"unicode/utf16"
|
||||
|
||||
"golang.org/x/text/collate"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// Input holds an input string in both UTF-8 and UTF-16 format.
|
||||
type Input struct {
|
||||
index int // used for restoring to original random order
|
||||
UTF8 []byte
|
||||
UTF16 []uint16
|
||||
key []byte // used for sorting
|
||||
}
|
||||
|
||||
func (i Input) String() string {
|
||||
return string(i.UTF8)
|
||||
}
|
||||
|
||||
func makeInput(s8 []byte, s16 []uint16) Input {
|
||||
return Input{UTF8: s8, UTF16: s16}
|
||||
}
|
||||
|
||||
func makeInputString(s string) Input {
|
||||
return Input{
|
||||
UTF8: []byte(s),
|
||||
UTF16: utf16.Encode([]rune(s)),
|
||||
}
|
||||
}
|
||||
|
||||
// Collator is an interface for architecture-specific implementations of collation.
|
||||
type Collator interface {
|
||||
// Key generates a sort key for the given input. Implemenations
|
||||
// may return nil if a collator does not support sort keys.
|
||||
Key(s Input) []byte
|
||||
|
||||
// Compare returns -1 if a < b, 1 if a > b and 0 if a == b.
|
||||
Compare(a, b Input) int
|
||||
}
|
||||
|
||||
// CollatorFactory creates a Collator for a given language tag.
|
||||
type CollatorFactory struct {
|
||||
name string
|
||||
makeFn func(tag string) (Collator, error)
|
||||
description string
|
||||
}
|
||||
|
||||
var collators = []CollatorFactory{}
|
||||
|
||||
// AddFactory registers f as a factory for an implementation of Collator.
|
||||
func AddFactory(f CollatorFactory) {
|
||||
collators = append(collators, f)
|
||||
}
|
||||
|
||||
func getCollator(name, locale string) Collator {
|
||||
for _, f := range collators {
|
||||
if f.name == name {
|
||||
col, err := f.makeFn(locale)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
return col
|
||||
}
|
||||
}
|
||||
log.Fatalf("collator of type %q not found", name)
|
||||
return nil
|
||||
}
|
||||
|
||||
// goCollator is an implemention of Collator using go's own collator.
|
||||
type goCollator struct {
|
||||
c *collate.Collator
|
||||
buf collate.Buffer
|
||||
}
|
||||
|
||||
func init() {
|
||||
AddFactory(CollatorFactory{"go", newGoCollator, "Go's native collator implementation."})
|
||||
}
|
||||
|
||||
func newGoCollator(loc string) (Collator, error) {
|
||||
c := &goCollator{c: collate.New(language.Make(loc))}
|
||||
return c, nil
|
||||
}
|
||||
|
||||
func (c *goCollator) Key(b Input) []byte {
|
||||
return c.c.Key(&c.buf, b.UTF8)
|
||||
}
|
||||
|
||||
func (c *goCollator) Compare(a, b Input) int {
|
||||
return c.c.Compare(a.UTF8, b.UTF8)
|
||||
}
|
||||
529
vendor/golang.org/x/text/collate/tools/colcmp/colcmp.go
generated
vendored
Normal file
529
vendor/golang.org/x/text/collate/tools/colcmp/colcmp.go
generated
vendored
Normal file
|
|
@ -0,0 +1,529 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package main // import "golang.org/x/text/collate/tools/colcmp"
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"os"
|
||||
"runtime/pprof"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
"text/template"
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
var (
|
||||
doNorm = flag.Bool("norm", false, "normalize input strings")
|
||||
cases = flag.Bool("case", false, "generate case variants")
|
||||
verbose = flag.Bool("verbose", false, "print results")
|
||||
debug = flag.Bool("debug", false, "output debug information")
|
||||
locales = flag.String("locale", "en_US", "the locale to use. May be a comma-separated list for some commands.")
|
||||
col = flag.String("col", "go", "collator to test")
|
||||
gold = flag.String("gold", "go", "collator used as the gold standard")
|
||||
usecmp = flag.Bool("usecmp", false,
|
||||
`use comparison instead of sort keys when sorting. Must be "test", "gold" or "both"`)
|
||||
cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
|
||||
exclude = flag.String("exclude", "", "exclude errors that contain any of the characters")
|
||||
limit = flag.Int("limit", 5000000, "maximum number of samples to generate for one run")
|
||||
)
|
||||
|
||||
func failOnError(err error) {
|
||||
if err != nil {
|
||||
log.Panic(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Test holds test data for testing a locale-collator pair.
|
||||
// Test also provides functionality that is commonly used by the various commands.
|
||||
type Test struct {
|
||||
ctxt *Context
|
||||
Name string
|
||||
Locale string
|
||||
ColName string
|
||||
|
||||
Col Collator
|
||||
UseCompare bool
|
||||
|
||||
Input []Input
|
||||
Duration time.Duration
|
||||
|
||||
start time.Time
|
||||
msg string
|
||||
count int
|
||||
}
|
||||
|
||||
func (t *Test) clear() {
|
||||
t.Col = nil
|
||||
t.Input = nil
|
||||
}
|
||||
|
||||
const (
|
||||
msgGeneratingInput = "generating input"
|
||||
msgGeneratingKeys = "generating keys"
|
||||
msgSorting = "sorting"
|
||||
)
|
||||
|
||||
var lastLen = 0
|
||||
|
||||
func (t *Test) SetStatus(msg string) {
|
||||
if *debug || *verbose {
|
||||
fmt.Printf("%s: %s...\n", t.Name, msg)
|
||||
} else if t.ctxt.out != nil {
|
||||
fmt.Fprint(t.ctxt.out, strings.Repeat(" ", lastLen))
|
||||
fmt.Fprint(t.ctxt.out, strings.Repeat("\b", lastLen))
|
||||
fmt.Fprint(t.ctxt.out, msg, "...")
|
||||
lastLen = len(msg) + 3
|
||||
fmt.Fprint(t.ctxt.out, strings.Repeat("\b", lastLen))
|
||||
}
|
||||
}
|
||||
|
||||
// Start is used by commands to signal the start of an operation.
|
||||
func (t *Test) Start(msg string) {
|
||||
t.SetStatus(msg)
|
||||
t.count = 0
|
||||
t.msg = msg
|
||||
t.start = time.Now()
|
||||
}
|
||||
|
||||
// Stop is used by commands to signal the end of an operation.
|
||||
func (t *Test) Stop() (time.Duration, int) {
|
||||
d := time.Now().Sub(t.start)
|
||||
t.Duration += d
|
||||
if *debug || *verbose {
|
||||
fmt.Printf("%s: %s done. (%.3fs /%dK ops)\n", t.Name, t.msg, d.Seconds(), t.count/1000)
|
||||
}
|
||||
return d, t.count
|
||||
}
|
||||
|
||||
// generateKeys generates sort keys for all the inputs.
|
||||
func (t *Test) generateKeys() {
|
||||
for i, s := range t.Input {
|
||||
b := t.Col.Key(s)
|
||||
t.Input[i].key = b
|
||||
if *debug {
|
||||
fmt.Printf("%s (%X): %X\n", string(s.UTF8), s.UTF16, b)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Sort sorts the inputs. It generates sort keys if this is required by the
|
||||
// chosen sort method.
|
||||
func (t *Test) Sort() (tkey, tsort time.Duration, nkey, nsort int) {
|
||||
if *cpuprofile != "" {
|
||||
f, err := os.Create(*cpuprofile)
|
||||
failOnError(err)
|
||||
pprof.StartCPUProfile(f)
|
||||
defer pprof.StopCPUProfile()
|
||||
}
|
||||
if t.UseCompare || t.Col.Key(t.Input[0]) == nil {
|
||||
t.Start(msgSorting)
|
||||
sort.Sort(&testCompare{*t})
|
||||
tsort, nsort = t.Stop()
|
||||
} else {
|
||||
t.Start(msgGeneratingKeys)
|
||||
t.generateKeys()
|
||||
t.count = len(t.Input)
|
||||
tkey, nkey = t.Stop()
|
||||
t.Start(msgSorting)
|
||||
sort.Sort(t)
|
||||
tsort, nsort = t.Stop()
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func (t *Test) Swap(a, b int) {
|
||||
t.Input[a], t.Input[b] = t.Input[b], t.Input[a]
|
||||
}
|
||||
|
||||
func (t *Test) Less(a, b int) bool {
|
||||
t.count++
|
||||
return bytes.Compare(t.Input[a].key, t.Input[b].key) == -1
|
||||
}
|
||||
|
||||
func (t Test) Len() int {
|
||||
return len(t.Input)
|
||||
}
|
||||
|
||||
type testCompare struct {
|
||||
Test
|
||||
}
|
||||
|
||||
func (t *testCompare) Less(a, b int) bool {
|
||||
t.count++
|
||||
return t.Col.Compare(t.Input[a], t.Input[b]) == -1
|
||||
}
|
||||
|
||||
type testRestore struct {
|
||||
Test
|
||||
}
|
||||
|
||||
func (t *testRestore) Less(a, b int) bool {
|
||||
return t.Input[a].index < t.Input[b].index
|
||||
}
|
||||
|
||||
// GenerateInput generates input phrases for the locale tested by t.
|
||||
func (t *Test) GenerateInput() {
|
||||
t.Input = nil
|
||||
if t.ctxt.lastLocale != t.Locale {
|
||||
gen := phraseGenerator{}
|
||||
gen.init(t.Locale)
|
||||
t.SetStatus(msgGeneratingInput)
|
||||
t.ctxt.lastInput = nil // allow the previous value to be garbage collected.
|
||||
t.Input = gen.generate(*doNorm)
|
||||
t.ctxt.lastInput = t.Input
|
||||
t.ctxt.lastLocale = t.Locale
|
||||
} else {
|
||||
t.Input = t.ctxt.lastInput
|
||||
for i := range t.Input {
|
||||
t.Input[i].key = nil
|
||||
}
|
||||
sort.Sort(&testRestore{*t})
|
||||
}
|
||||
}
|
||||
|
||||
// Context holds all tests and settings translated from command line options.
|
||||
type Context struct {
|
||||
test []*Test
|
||||
last *Test
|
||||
|
||||
lastLocale string
|
||||
lastInput []Input
|
||||
|
||||
out io.Writer
|
||||
}
|
||||
|
||||
func (ts *Context) Printf(format string, a ...interface{}) {
|
||||
ts.assertBuf()
|
||||
fmt.Fprintf(ts.out, format, a...)
|
||||
}
|
||||
|
||||
func (ts *Context) Print(a ...interface{}) {
|
||||
ts.assertBuf()
|
||||
fmt.Fprint(ts.out, a...)
|
||||
}
|
||||
|
||||
// assertBuf sets up an io.Writer for output, if it doesn't already exist.
|
||||
// In debug and verbose mode, output is buffered so that the regular output
|
||||
// will not interfere with the additional output. Otherwise, output is
|
||||
// written directly to stdout for a more responsive feel.
|
||||
func (ts *Context) assertBuf() {
|
||||
if ts.out != nil {
|
||||
return
|
||||
}
|
||||
if *debug || *verbose {
|
||||
ts.out = &bytes.Buffer{}
|
||||
} else {
|
||||
ts.out = os.Stdout
|
||||
}
|
||||
}
|
||||
|
||||
// flush flushes the contents of ts.out to stdout, if it is not stdout already.
|
||||
func (ts *Context) flush() {
|
||||
if ts.out != nil {
|
||||
if _, ok := ts.out.(io.ReadCloser); !ok {
|
||||
io.Copy(os.Stdout, ts.out.(io.Reader))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// parseTests creates all tests from command lines and returns
|
||||
// a Context to hold them.
|
||||
func parseTests() *Context {
|
||||
ctxt := &Context{}
|
||||
colls := strings.Split(*col, ",")
|
||||
for _, loc := range strings.Split(*locales, ",") {
|
||||
loc = strings.TrimSpace(loc)
|
||||
for _, name := range colls {
|
||||
name = strings.TrimSpace(name)
|
||||
col := getCollator(name, loc)
|
||||
ctxt.test = append(ctxt.test, &Test{
|
||||
ctxt: ctxt,
|
||||
Locale: loc,
|
||||
ColName: name,
|
||||
UseCompare: *usecmp,
|
||||
Col: col,
|
||||
})
|
||||
}
|
||||
}
|
||||
return ctxt
|
||||
}
|
||||
|
||||
func (c *Context) Len() int {
|
||||
return len(c.test)
|
||||
}
|
||||
|
||||
func (c *Context) Test(i int) *Test {
|
||||
if c.last != nil {
|
||||
c.last.clear()
|
||||
}
|
||||
c.last = c.test[i]
|
||||
return c.last
|
||||
}
|
||||
|
||||
func parseInput(args []string) []Input {
|
||||
input := []Input{}
|
||||
for _, s := range args {
|
||||
rs := []rune{}
|
||||
for len(s) > 0 {
|
||||
var r rune
|
||||
r, _, s, _ = strconv.UnquoteChar(s, '\'')
|
||||
rs = append(rs, r)
|
||||
}
|
||||
s = string(rs)
|
||||
if *doNorm {
|
||||
s = norm.NFD.String(s)
|
||||
}
|
||||
input = append(input, makeInputString(s))
|
||||
}
|
||||
return input
|
||||
}
|
||||
|
||||
// A Command is an implementation of a colcmp command.
|
||||
type Command struct {
|
||||
Run func(cmd *Context, args []string)
|
||||
Usage string
|
||||
Short string
|
||||
Long string
|
||||
}
|
||||
|
||||
func (cmd Command) Name() string {
|
||||
return strings.SplitN(cmd.Usage, " ", 2)[0]
|
||||
}
|
||||
|
||||
var commands = []*Command{
|
||||
cmdSort,
|
||||
cmdBench,
|
||||
cmdRegress,
|
||||
}
|
||||
|
||||
const sortHelp = `
|
||||
Sort sorts a given list of strings. Strings are separated by whitespace.
|
||||
`
|
||||
|
||||
var cmdSort = &Command{
|
||||
Run: runSort,
|
||||
Usage: "sort <string>*",
|
||||
Short: "sort a given list of strings",
|
||||
Long: sortHelp,
|
||||
}
|
||||
|
||||
func runSort(ctxt *Context, args []string) {
|
||||
input := parseInput(args)
|
||||
if len(input) == 0 {
|
||||
log.Fatalf("Nothing to sort.")
|
||||
}
|
||||
if ctxt.Len() > 1 {
|
||||
ctxt.Print("COLL LOCALE RESULT\n")
|
||||
}
|
||||
for i := 0; i < ctxt.Len(); i++ {
|
||||
t := ctxt.Test(i)
|
||||
t.Input = append(t.Input, input...)
|
||||
t.Sort()
|
||||
if ctxt.Len() > 1 {
|
||||
ctxt.Printf("%-5s %-5s ", t.ColName, t.Locale)
|
||||
}
|
||||
for _, s := range t.Input {
|
||||
ctxt.Print(string(s.UTF8), " ")
|
||||
}
|
||||
ctxt.Print("\n")
|
||||
}
|
||||
}
|
||||
|
||||
const benchHelp = `
|
||||
Bench runs a benchmark for the given list of collator implementations.
|
||||
If no collator implementations are given, the go collator will be used.
|
||||
`
|
||||
|
||||
var cmdBench = &Command{
|
||||
Run: runBench,
|
||||
Usage: "bench",
|
||||
Short: "benchmark a given list of collator implementations",
|
||||
Long: benchHelp,
|
||||
}
|
||||
|
||||
func runBench(ctxt *Context, args []string) {
|
||||
ctxt.Printf("%-7s %-5s %-6s %-24s %-24s %-5s %s\n", "LOCALE", "COLL", "N", "KEYS", "SORT", "AVGLN", "TOTAL")
|
||||
for i := 0; i < ctxt.Len(); i++ {
|
||||
t := ctxt.Test(i)
|
||||
ctxt.Printf("%-7s %-5s ", t.Locale, t.ColName)
|
||||
t.GenerateInput()
|
||||
ctxt.Printf("%-6s ", fmt.Sprintf("%dK", t.Len()/1000))
|
||||
tkey, tsort, nkey, nsort := t.Sort()
|
||||
p := func(dur time.Duration, n int) {
|
||||
s := ""
|
||||
if dur > 0 {
|
||||
s = fmt.Sprintf("%6.3fs ", dur.Seconds())
|
||||
if n > 0 {
|
||||
s += fmt.Sprintf("%15s", fmt.Sprintf("(%4.2f ns/op)", float64(dur)/float64(n)))
|
||||
}
|
||||
}
|
||||
ctxt.Printf("%-24s ", s)
|
||||
}
|
||||
p(tkey, nkey)
|
||||
p(tsort, nsort)
|
||||
|
||||
total := 0
|
||||
for _, s := range t.Input {
|
||||
total += len(s.key)
|
||||
}
|
||||
ctxt.Printf("%-5d ", total/t.Len())
|
||||
ctxt.Printf("%6.3fs\n", t.Duration.Seconds())
|
||||
if *debug {
|
||||
for _, s := range t.Input {
|
||||
fmt.Print(string(s.UTF8), " ")
|
||||
}
|
||||
fmt.Println()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const regressHelp = `
|
||||
Regress runs a monkey test by comparing the results of randomly generated tests
|
||||
between two implementations of a collator. The user may optionally pass a list
|
||||
of strings to regress against instead of the default test set.
|
||||
`
|
||||
|
||||
var cmdRegress = &Command{
|
||||
Run: runRegress,
|
||||
Usage: "regress -gold=<col> -test=<col> [string]*",
|
||||
Short: "run a monkey test between two collators",
|
||||
Long: regressHelp,
|
||||
}
|
||||
|
||||
const failedKeyCompare = `
|
||||
%s:%d: incorrect comparison result for input:
|
||||
a: %q (%.4X)
|
||||
key: %s
|
||||
b: %q (%.4X)
|
||||
key: %s
|
||||
Compare(a, b) = %d; want %d.
|
||||
|
||||
gold keys:
|
||||
a: %s
|
||||
b: %s
|
||||
`
|
||||
|
||||
const failedCompare = `
|
||||
%s:%d: incorrect comparison result for input:
|
||||
a: %q (%.4X)
|
||||
b: %q (%.4X)
|
||||
Compare(a, b) = %d; want %d.
|
||||
`
|
||||
|
||||
func keyStr(b []byte) string {
|
||||
buf := &bytes.Buffer{}
|
||||
for _, v := range b {
|
||||
fmt.Fprintf(buf, "%.2X ", v)
|
||||
}
|
||||
return buf.String()
|
||||
}
|
||||
|
||||
func runRegress(ctxt *Context, args []string) {
|
||||
input := parseInput(args)
|
||||
for i := 0; i < ctxt.Len(); i++ {
|
||||
t := ctxt.Test(i)
|
||||
if len(input) > 0 {
|
||||
t.Input = append(t.Input, input...)
|
||||
} else {
|
||||
t.GenerateInput()
|
||||
}
|
||||
t.Sort()
|
||||
count := 0
|
||||
gold := getCollator(*gold, t.Locale)
|
||||
for i := 1; i < len(t.Input); i++ {
|
||||
ia := t.Input[i-1]
|
||||
ib := t.Input[i]
|
||||
if bytes.IndexAny(ib.UTF8, *exclude) != -1 {
|
||||
i++
|
||||
continue
|
||||
}
|
||||
if bytes.IndexAny(ia.UTF8, *exclude) != -1 {
|
||||
continue
|
||||
}
|
||||
goldCmp := gold.Compare(ia, ib)
|
||||
if cmp := bytes.Compare(ia.key, ib.key); cmp != goldCmp {
|
||||
count++
|
||||
a := string(ia.UTF8)
|
||||
b := string(ib.UTF8)
|
||||
fmt.Printf(failedKeyCompare, t.Locale, i-1, a, []rune(a), keyStr(ia.key), b, []rune(b), keyStr(ib.key), cmp, goldCmp, keyStr(gold.Key(ia)), keyStr(gold.Key(ib)))
|
||||
} else if cmp := t.Col.Compare(ia, ib); cmp != goldCmp {
|
||||
count++
|
||||
a := string(ia.UTF8)
|
||||
b := string(ib.UTF8)
|
||||
fmt.Printf(failedCompare, t.Locale, i-1, a, []rune(a), b, []rune(b), cmp, goldCmp)
|
||||
}
|
||||
}
|
||||
if count > 0 {
|
||||
ctxt.Printf("Found %d inconsistencies in %d entries.\n", count, t.Len()-1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const helpTemplate = `
|
||||
colcmp is a tool for testing and benchmarking collation
|
||||
|
||||
Usage: colcmp command [arguments]
|
||||
|
||||
The commands are:
|
||||
{{range .}}
|
||||
{{.Name | printf "%-11s"}} {{.Short}}{{end}}
|
||||
|
||||
Use "col help [topic]" for more information about that topic.
|
||||
`
|
||||
|
||||
const detailedHelpTemplate = `
|
||||
Usage: colcmp {{.Usage}}
|
||||
|
||||
{{.Long | trim}}
|
||||
`
|
||||
|
||||
func runHelp(args []string) {
|
||||
t := template.New("help")
|
||||
t.Funcs(template.FuncMap{"trim": strings.TrimSpace})
|
||||
if len(args) < 1 {
|
||||
template.Must(t.Parse(helpTemplate))
|
||||
failOnError(t.Execute(os.Stderr, &commands))
|
||||
} else {
|
||||
for _, cmd := range commands {
|
||||
if cmd.Name() == args[0] {
|
||||
template.Must(t.Parse(detailedHelpTemplate))
|
||||
failOnError(t.Execute(os.Stderr, cmd))
|
||||
os.Exit(0)
|
||||
}
|
||||
}
|
||||
log.Fatalf("Unknown command %q. Run 'colcmp help'.", args[0])
|
||||
}
|
||||
os.Exit(0)
|
||||
}
|
||||
|
||||
func main() {
|
||||
flag.Parse()
|
||||
log.SetFlags(0)
|
||||
|
||||
ctxt := parseTests()
|
||||
|
||||
if flag.NArg() < 1 {
|
||||
runHelp(nil)
|
||||
}
|
||||
args := flag.Args()[1:]
|
||||
if flag.Arg(0) == "help" {
|
||||
runHelp(args)
|
||||
}
|
||||
for _, cmd := range commands {
|
||||
if cmd.Name() == flag.Arg(0) {
|
||||
cmd.Run(ctxt, args)
|
||||
ctxt.flush()
|
||||
return
|
||||
}
|
||||
}
|
||||
runHelp(flag.Args())
|
||||
}
|
||||
111
vendor/golang.org/x/text/collate/tools/colcmp/darwin.go
generated
vendored
Normal file
111
vendor/golang.org/x/text/collate/tools/colcmp/darwin.go
generated
vendored
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build darwin
|
||||
|
||||
package main
|
||||
|
||||
/*
|
||||
#cgo LDFLAGS: -framework CoreFoundation
|
||||
#include <CoreFoundation/CFBase.h>
|
||||
#include <CoreFoundation/CoreFoundation.h>
|
||||
*/
|
||||
import "C"
|
||||
import (
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
func init() {
|
||||
AddFactory(CollatorFactory{"osx", newOSX16Collator,
|
||||
"OS X/Darwin collator, using native strings."})
|
||||
AddFactory(CollatorFactory{"osx8", newOSX8Collator,
|
||||
"OS X/Darwin collator for UTF-8."})
|
||||
}
|
||||
|
||||
func osxUInt8P(s []byte) *C.UInt8 {
|
||||
return (*C.UInt8)(unsafe.Pointer(&s[0]))
|
||||
}
|
||||
|
||||
func osxCharP(s []uint16) *C.UniChar {
|
||||
return (*C.UniChar)(unsafe.Pointer(&s[0]))
|
||||
}
|
||||
|
||||
// osxCollator implements an Collator based on OS X's CoreFoundation.
|
||||
type osxCollator struct {
|
||||
loc C.CFLocaleRef
|
||||
opt C.CFStringCompareFlags
|
||||
}
|
||||
|
||||
func (c *osxCollator) init(locale string) {
|
||||
l := C.CFStringCreateWithBytes(
|
||||
nil,
|
||||
osxUInt8P([]byte(locale)),
|
||||
C.CFIndex(len(locale)),
|
||||
C.kCFStringEncodingUTF8,
|
||||
C.Boolean(0),
|
||||
)
|
||||
c.loc = C.CFLocaleCreate(nil, l)
|
||||
}
|
||||
|
||||
func newOSX8Collator(locale string) (Collator, error) {
|
||||
c := &osx8Collator{}
|
||||
c.init(locale)
|
||||
return c, nil
|
||||
}
|
||||
|
||||
func newOSX16Collator(locale string) (Collator, error) {
|
||||
c := &osx16Collator{}
|
||||
c.init(locale)
|
||||
return c, nil
|
||||
}
|
||||
|
||||
func (c osxCollator) Key(s Input) []byte {
|
||||
return nil // sort keys not supported by OS X CoreFoundation
|
||||
}
|
||||
|
||||
type osx8Collator struct {
|
||||
osxCollator
|
||||
}
|
||||
|
||||
type osx16Collator struct {
|
||||
osxCollator
|
||||
}
|
||||
|
||||
func (c osx16Collator) Compare(a, b Input) int {
|
||||
sa := C.CFStringCreateWithCharactersNoCopy(
|
||||
nil,
|
||||
osxCharP(a.UTF16),
|
||||
C.CFIndex(len(a.UTF16)),
|
||||
nil,
|
||||
)
|
||||
sb := C.CFStringCreateWithCharactersNoCopy(
|
||||
nil,
|
||||
osxCharP(b.UTF16),
|
||||
C.CFIndex(len(b.UTF16)),
|
||||
nil,
|
||||
)
|
||||
_range := C.CFRangeMake(0, C.CFStringGetLength(sa))
|
||||
return int(C.CFStringCompareWithOptionsAndLocale(sa, sb, _range, c.opt, c.loc))
|
||||
}
|
||||
|
||||
func (c osx8Collator) Compare(a, b Input) int {
|
||||
sa := C.CFStringCreateWithBytesNoCopy(
|
||||
nil,
|
||||
osxUInt8P(a.UTF8),
|
||||
C.CFIndex(len(a.UTF8)),
|
||||
C.kCFStringEncodingUTF8,
|
||||
C.Boolean(0),
|
||||
nil,
|
||||
)
|
||||
sb := C.CFStringCreateWithBytesNoCopy(
|
||||
nil,
|
||||
osxUInt8P(b.UTF8),
|
||||
C.CFIndex(len(b.UTF8)),
|
||||
C.kCFStringEncodingUTF8,
|
||||
C.Boolean(0),
|
||||
nil,
|
||||
)
|
||||
_range := C.CFRangeMake(0, C.CFStringGetLength(sa))
|
||||
return int(C.CFStringCompareWithOptionsAndLocale(sa, sb, _range, c.opt, c.loc))
|
||||
}
|
||||
183
vendor/golang.org/x/text/collate/tools/colcmp/gen.go
generated
vendored
Normal file
183
vendor/golang.org/x/text/collate/tools/colcmp/gen.go
generated
vendored
Normal file
|
|
@ -0,0 +1,183 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"math"
|
||||
"math/rand"
|
||||
"strings"
|
||||
"unicode"
|
||||
"unicode/utf16"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/unicode/norm"
|
||||
)
|
||||
|
||||
// TODO: replace with functionality in language package.
|
||||
// parent computes the parent language for the given language.
|
||||
// It returns false if the parent is already root.
|
||||
func parent(locale string) (parent string, ok bool) {
|
||||
if locale == "und" {
|
||||
return "", false
|
||||
}
|
||||
if i := strings.LastIndex(locale, "-"); i != -1 {
|
||||
return locale[:i], true
|
||||
}
|
||||
return "und", true
|
||||
}
|
||||
|
||||
// rewriter is used to both unique strings and create variants of strings
|
||||
// to add to the test set.
|
||||
type rewriter struct {
|
||||
seen map[string]bool
|
||||
addCases bool
|
||||
}
|
||||
|
||||
func newRewriter() *rewriter {
|
||||
return &rewriter{
|
||||
seen: make(map[string]bool),
|
||||
}
|
||||
}
|
||||
|
||||
func (r *rewriter) insert(a []string, s string) []string {
|
||||
if !r.seen[s] {
|
||||
r.seen[s] = true
|
||||
a = append(a, s)
|
||||
}
|
||||
return a
|
||||
}
|
||||
|
||||
// rewrite takes a sequence of strings in, adds variants of the these strings
|
||||
// based on options and removes duplicates.
|
||||
func (r *rewriter) rewrite(ss []string) []string {
|
||||
ns := []string{}
|
||||
for _, s := range ss {
|
||||
ns = r.insert(ns, s)
|
||||
if r.addCases {
|
||||
rs := []rune(s)
|
||||
rn := rs[0]
|
||||
for c := unicode.SimpleFold(rn); c != rn; c = unicode.SimpleFold(c) {
|
||||
rs[0] = c
|
||||
ns = r.insert(ns, string(rs))
|
||||
}
|
||||
}
|
||||
}
|
||||
return ns
|
||||
}
|
||||
|
||||
// exemplarySet holds a parsed set of characters from the exemplarCharacters table.
|
||||
type exemplarySet struct {
|
||||
typ exemplarType
|
||||
set []string
|
||||
charIndex int // cumulative total of phrases, including this set
|
||||
}
|
||||
|
||||
type phraseGenerator struct {
|
||||
sets [exN]exemplarySet
|
||||
n int
|
||||
}
|
||||
|
||||
func (g *phraseGenerator) init(id string) {
|
||||
ec := exemplarCharacters
|
||||
loc := language.Make(id).String()
|
||||
// get sets for locale or parent locale if the set is not defined.
|
||||
for i := range g.sets {
|
||||
for p, ok := loc, true; ok; p, ok = parent(p) {
|
||||
if set, ok := ec[p]; ok && set[i] != "" {
|
||||
g.sets[i].set = strings.Split(set[i], " ")
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
r := newRewriter()
|
||||
r.addCases = *cases
|
||||
for i := range g.sets {
|
||||
g.sets[i].set = r.rewrite(g.sets[i].set)
|
||||
}
|
||||
// compute indexes
|
||||
for i, set := range g.sets {
|
||||
g.n += len(set.set)
|
||||
g.sets[i].charIndex = g.n
|
||||
}
|
||||
}
|
||||
|
||||
// phrase returns the ith phrase, where i < g.n.
|
||||
func (g *phraseGenerator) phrase(i int) string {
|
||||
for _, set := range g.sets {
|
||||
if i < set.charIndex {
|
||||
return set.set[i-(set.charIndex-len(set.set))]
|
||||
}
|
||||
}
|
||||
panic("index out of range")
|
||||
}
|
||||
|
||||
// generate generates inputs by combining all pairs of examplar strings.
|
||||
// If doNorm is true, all input strings are normalized to NFC.
|
||||
// TODO: allow other variations, statistical models, and random
|
||||
// trailing sequences.
|
||||
func (g *phraseGenerator) generate(doNorm bool) []Input {
|
||||
const (
|
||||
M = 1024 * 1024
|
||||
buf8Size = 30 * M
|
||||
buf16Size = 10 * M
|
||||
)
|
||||
// TODO: use a better way to limit the input size.
|
||||
if sq := int(math.Sqrt(float64(*limit))); g.n > sq {
|
||||
g.n = sq
|
||||
}
|
||||
size := g.n * g.n
|
||||
a := make([]Input, 0, size)
|
||||
buf8 := make([]byte, 0, buf8Size)
|
||||
buf16 := make([]uint16, 0, buf16Size)
|
||||
|
||||
addInput := func(str string) {
|
||||
buf8 = buf8[len(buf8):]
|
||||
buf16 = buf16[len(buf16):]
|
||||
if len(str) > cap(buf8) {
|
||||
buf8 = make([]byte, 0, buf8Size)
|
||||
}
|
||||
if len(str) > cap(buf16) {
|
||||
buf16 = make([]uint16, 0, buf16Size)
|
||||
}
|
||||
if doNorm {
|
||||
buf8 = norm.NFD.AppendString(buf8, str)
|
||||
} else {
|
||||
buf8 = append(buf8, str...)
|
||||
}
|
||||
buf16 = appendUTF16(buf16, buf8)
|
||||
a = append(a, makeInput(buf8, buf16))
|
||||
}
|
||||
for i := 0; i < g.n; i++ {
|
||||
p1 := g.phrase(i)
|
||||
addInput(p1)
|
||||
for j := 0; j < g.n; j++ {
|
||||
p2 := g.phrase(j)
|
||||
addInput(p1 + p2)
|
||||
}
|
||||
}
|
||||
// permutate
|
||||
rnd := rand.New(rand.NewSource(int64(rand.Int())))
|
||||
for i := range a {
|
||||
j := i + rnd.Intn(len(a)-i)
|
||||
a[i], a[j] = a[j], a[i]
|
||||
a[i].index = i // allow restoring this order if input is used multiple times.
|
||||
}
|
||||
return a
|
||||
}
|
||||
|
||||
func appendUTF16(buf []uint16, s []byte) []uint16 {
|
||||
for len(s) > 0 {
|
||||
r, sz := utf8.DecodeRune(s)
|
||||
s = s[sz:]
|
||||
r1, r2 := utf16.EncodeRune(r)
|
||||
if r1 != 0xFFFD {
|
||||
buf = append(buf, uint16(r1), uint16(r2))
|
||||
} else {
|
||||
buf = append(buf, uint16(r))
|
||||
}
|
||||
}
|
||||
return buf
|
||||
}
|
||||
209
vendor/golang.org/x/text/collate/tools/colcmp/icu.go
generated
vendored
Normal file
209
vendor/golang.org/x/text/collate/tools/colcmp/icu.go
generated
vendored
Normal file
|
|
@ -0,0 +1,209 @@
|
|||
// Copyright 2012 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build icu
|
||||
|
||||
package main
|
||||
|
||||
/*
|
||||
#cgo LDFLAGS: -licui18n -licuuc
|
||||
#include <stdlib.h>
|
||||
#include <unicode/ucol.h>
|
||||
#include <unicode/uiter.h>
|
||||
#include <unicode/utypes.h>
|
||||
*/
|
||||
import "C"
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"unicode/utf16"
|
||||
"unicode/utf8"
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
func init() {
|
||||
AddFactory(CollatorFactory{"icu", newUTF16,
|
||||
"Main ICU collator, using native strings."})
|
||||
AddFactory(CollatorFactory{"icu8", newUTF8iter,
|
||||
"ICU collator using ICU iterators to process UTF8."})
|
||||
AddFactory(CollatorFactory{"icu16", newUTF8conv,
|
||||
"ICU collation by first converting UTF8 to UTF16."})
|
||||
}
|
||||
|
||||
func icuCharP(s []byte) *C.char {
|
||||
return (*C.char)(unsafe.Pointer(&s[0]))
|
||||
}
|
||||
|
||||
func icuUInt8P(s []byte) *C.uint8_t {
|
||||
return (*C.uint8_t)(unsafe.Pointer(&s[0]))
|
||||
}
|
||||
|
||||
func icuUCharP(s []uint16) *C.UChar {
|
||||
return (*C.UChar)(unsafe.Pointer(&s[0]))
|
||||
}
|
||||
func icuULen(s []uint16) C.int32_t {
|
||||
return C.int32_t(len(s))
|
||||
}
|
||||
func icuSLen(s []byte) C.int32_t {
|
||||
return C.int32_t(len(s))
|
||||
}
|
||||
|
||||
// icuCollator implements a Collator based on ICU.
|
||||
type icuCollator struct {
|
||||
loc *C.char
|
||||
col *C.UCollator
|
||||
keyBuf []byte
|
||||
}
|
||||
|
||||
const growBufSize = 10 * 1024 * 1024
|
||||
|
||||
func (c *icuCollator) init(locale string) error {
|
||||
err := C.UErrorCode(0)
|
||||
c.loc = C.CString(locale)
|
||||
c.col = C.ucol_open(c.loc, &err)
|
||||
if err > 0 {
|
||||
return fmt.Errorf("failed opening collator for %q", locale)
|
||||
} else if err < 0 {
|
||||
loc := C.ucol_getLocaleByType(c.col, 0, &err)
|
||||
fmt, ok := map[int]string{
|
||||
-127: "warning: using default collator: %s",
|
||||
-128: "warning: using fallback collator: %s",
|
||||
}[int(err)]
|
||||
if ok {
|
||||
log.Printf(fmt, C.GoString(loc))
|
||||
}
|
||||
}
|
||||
c.keyBuf = make([]byte, 0, growBufSize)
|
||||
return nil
|
||||
}
|
||||
|
||||
func (c *icuCollator) buf() (*C.uint8_t, C.int32_t) {
|
||||
if len(c.keyBuf) == cap(c.keyBuf) {
|
||||
c.keyBuf = make([]byte, 0, growBufSize)
|
||||
}
|
||||
b := c.keyBuf[len(c.keyBuf):cap(c.keyBuf)]
|
||||
return icuUInt8P(b), icuSLen(b)
|
||||
}
|
||||
|
||||
func (c *icuCollator) extendBuf(n C.int32_t) []byte {
|
||||
end := len(c.keyBuf) + int(n)
|
||||
if end > cap(c.keyBuf) {
|
||||
if len(c.keyBuf) == 0 {
|
||||
log.Fatalf("icuCollator: max string size exceeded: %v > %v", n, growBufSize)
|
||||
}
|
||||
c.keyBuf = make([]byte, 0, growBufSize)
|
||||
return nil
|
||||
}
|
||||
b := c.keyBuf[len(c.keyBuf):end]
|
||||
c.keyBuf = c.keyBuf[:end]
|
||||
return b
|
||||
}
|
||||
|
||||
func (c *icuCollator) Close() error {
|
||||
C.ucol_close(c.col)
|
||||
C.free(unsafe.Pointer(c.loc))
|
||||
return nil
|
||||
}
|
||||
|
||||
// icuUTF16 implements the Collator interface.
|
||||
type icuUTF16 struct {
|
||||
icuCollator
|
||||
}
|
||||
|
||||
func newUTF16(locale string) (Collator, error) {
|
||||
c := &icuUTF16{}
|
||||
return c, c.init(locale)
|
||||
}
|
||||
|
||||
func (c *icuUTF16) Compare(a, b Input) int {
|
||||
return int(C.ucol_strcoll(c.col, icuUCharP(a.UTF16), icuULen(a.UTF16), icuUCharP(b.UTF16), icuULen(b.UTF16)))
|
||||
}
|
||||
|
||||
func (c *icuUTF16) Key(s Input) []byte {
|
||||
bp, bn := c.buf()
|
||||
n := C.ucol_getSortKey(c.col, icuUCharP(s.UTF16), icuULen(s.UTF16), bp, bn)
|
||||
if b := c.extendBuf(n); b != nil {
|
||||
return b
|
||||
}
|
||||
return c.Key(s)
|
||||
}
|
||||
|
||||
// icuUTF8iter implements the Collator interface
|
||||
// This implementation wraps the UTF8 string in an iterator
|
||||
// which is passed to the collator.
|
||||
type icuUTF8iter struct {
|
||||
icuCollator
|
||||
a, b C.UCharIterator
|
||||
}
|
||||
|
||||
func newUTF8iter(locale string) (Collator, error) {
|
||||
c := &icuUTF8iter{}
|
||||
return c, c.init(locale)
|
||||
}
|
||||
|
||||
func (c *icuUTF8iter) Compare(a, b Input) int {
|
||||
err := C.UErrorCode(0)
|
||||
C.uiter_setUTF8(&c.a, icuCharP(a.UTF8), icuSLen(a.UTF8))
|
||||
C.uiter_setUTF8(&c.b, icuCharP(b.UTF8), icuSLen(b.UTF8))
|
||||
return int(C.ucol_strcollIter(c.col, &c.a, &c.b, &err))
|
||||
}
|
||||
|
||||
func (c *icuUTF8iter) Key(s Input) []byte {
|
||||
err := C.UErrorCode(0)
|
||||
state := [2]C.uint32_t{}
|
||||
C.uiter_setUTF8(&c.a, icuCharP(s.UTF8), icuSLen(s.UTF8))
|
||||
bp, bn := c.buf()
|
||||
n := C.ucol_nextSortKeyPart(c.col, &c.a, &(state[0]), bp, bn, &err)
|
||||
if n >= bn {
|
||||
// Force failure.
|
||||
if c.extendBuf(n+1) != nil {
|
||||
log.Fatal("expected extension to fail")
|
||||
}
|
||||
return c.Key(s)
|
||||
}
|
||||
return c.extendBuf(n)
|
||||
}
|
||||
|
||||
// icuUTF8conv implements the Collator interface.
|
||||
// This implementation first converts the give UTF8 string
|
||||
// to UTF16 and then calls the main ICU collation function.
|
||||
type icuUTF8conv struct {
|
||||
icuCollator
|
||||
}
|
||||
|
||||
func newUTF8conv(locale string) (Collator, error) {
|
||||
c := &icuUTF8conv{}
|
||||
return c, c.init(locale)
|
||||
}
|
||||
|
||||
func (c *icuUTF8conv) Compare(sa, sb Input) int {
|
||||
a := encodeUTF16(sa.UTF8)
|
||||
b := encodeUTF16(sb.UTF8)
|
||||
return int(C.ucol_strcoll(c.col, icuUCharP(a), icuULen(a), icuUCharP(b), icuULen(b)))
|
||||
}
|
||||
|
||||
func (c *icuUTF8conv) Key(s Input) []byte {
|
||||
a := encodeUTF16(s.UTF8)
|
||||
bp, bn := c.buf()
|
||||
n := C.ucol_getSortKey(c.col, icuUCharP(a), icuULen(a), bp, bn)
|
||||
if b := c.extendBuf(n); b != nil {
|
||||
return b
|
||||
}
|
||||
return c.Key(s)
|
||||
}
|
||||
|
||||
func encodeUTF16(b []byte) []uint16 {
|
||||
a := []uint16{}
|
||||
for len(b) > 0 {
|
||||
r, sz := utf8.DecodeRune(b)
|
||||
b = b[sz:]
|
||||
r1, r2 := utf16.EncodeRune(r)
|
||||
if r1 != 0xFFFD {
|
||||
a = append(a, uint16(r1), uint16(r2))
|
||||
} else {
|
||||
a = append(a, uint16(r))
|
||||
}
|
||||
}
|
||||
return a
|
||||
}
|
||||
66
vendor/golang.org/x/text/currency/common.go
generated
vendored
Normal file
66
vendor/golang.org/x/text/currency/common.go
generated
vendored
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
// Code generated by running "go generate" in golang.org/x/text. DO NOT EDIT.
|
||||
|
||||
package currency
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// This file contains code common to gen.go and the package code.
|
||||
|
||||
const (
|
||||
cashShift = 3
|
||||
roundMask = 0x7
|
||||
|
||||
nonTenderBit = 0x8000
|
||||
)
|
||||
|
||||
// currencyInfo contains information about a currency.
|
||||
// bits 0..2: index into roundings for standard rounding
|
||||
// bits 3..5: index into roundings for cash rounding
|
||||
type currencyInfo byte
|
||||
|
||||
// roundingType defines the scale (number of fractional decimals) and increments
|
||||
// in terms of units of size 10^-scale. For example, for scale == 2 and
|
||||
// increment == 1, the currency is rounded to units of 0.01.
|
||||
type roundingType struct {
|
||||
scale, increment uint8
|
||||
}
|
||||
|
||||
// roundings contains rounding data for currencies. This struct is
|
||||
// created by hand as it is very unlikely to change much.
|
||||
var roundings = [...]roundingType{
|
||||
{2, 1}, // default
|
||||
{0, 1},
|
||||
{1, 1},
|
||||
{3, 1},
|
||||
{4, 1},
|
||||
{2, 5}, // cash rounding alternative
|
||||
}
|
||||
|
||||
// regionToCode returns a 16-bit region code. Only two-letter codes are
|
||||
// supported. (Three-letter codes are not needed.)
|
||||
func regionToCode(r language.Region) uint16 {
|
||||
if s := r.String(); len(s) == 2 {
|
||||
return uint16(s[0])<<8 | uint16(s[1])
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func toDate(t time.Time) uint32 {
|
||||
y := t.Year()
|
||||
if y == 1 {
|
||||
return 0
|
||||
}
|
||||
date := uint32(y) << 4
|
||||
date |= uint32(t.Month())
|
||||
date <<= 5
|
||||
date |= uint32(t.Day())
|
||||
return date
|
||||
}
|
||||
|
||||
func fromDate(date uint32) time.Time {
|
||||
return time.Date(int(date>>9), time.Month((date>>5)&0xf), int(date&0x1f), 0, 0, 0, 0, time.UTC)
|
||||
}
|
||||
185
vendor/golang.org/x/text/currency/currency.go
generated
vendored
Normal file
185
vendor/golang.org/x/text/currency/currency.go
generated
vendored
Normal file
|
|
@ -0,0 +1,185 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go run gen.go gen_common.go -output tables.go
|
||||
|
||||
// Package currency contains currency-related functionality.
|
||||
//
|
||||
// NOTE: the formatting functionality is currently under development and may
|
||||
// change without notice.
|
||||
package currency // import "golang.org/x/text/currency"
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sort"
|
||||
|
||||
"golang.org/x/text/internal/tag"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// TODO:
|
||||
// - language-specific currency names.
|
||||
// - currency formatting.
|
||||
// - currency information per region
|
||||
// - register currency code (there are no private use area)
|
||||
|
||||
// TODO: remove Currency type from package language.
|
||||
|
||||
// Kind determines the rounding and rendering properties of a currency value.
|
||||
type Kind struct {
|
||||
rounding rounding
|
||||
// TODO: formatting type: standard, accounting. See CLDR.
|
||||
}
|
||||
|
||||
type rounding byte
|
||||
|
||||
const (
|
||||
standard rounding = iota
|
||||
cash
|
||||
)
|
||||
|
||||
var (
|
||||
// Standard defines standard rounding and formatting for currencies.
|
||||
Standard Kind = Kind{rounding: standard}
|
||||
|
||||
// Cash defines rounding and formatting standards for cash transactions.
|
||||
Cash Kind = Kind{rounding: cash}
|
||||
|
||||
// Accounting defines rounding and formatting standards for accounting.
|
||||
Accounting Kind = Kind{rounding: standard}
|
||||
)
|
||||
|
||||
// Rounding reports the rounding characteristics for the given currency, where
|
||||
// scale is the number of fractional decimals and increment is the number of
|
||||
// units in terms of 10^(-scale) to which to round to.
|
||||
func (k Kind) Rounding(cur Unit) (scale, increment int) {
|
||||
info := currency.Elem(int(cur.index))[3]
|
||||
switch k.rounding {
|
||||
case standard:
|
||||
info &= roundMask
|
||||
case cash:
|
||||
info >>= cashShift
|
||||
}
|
||||
return int(roundings[info].scale), int(roundings[info].increment)
|
||||
}
|
||||
|
||||
// Unit is an ISO 4217 currency designator.
|
||||
type Unit struct {
|
||||
index uint16
|
||||
}
|
||||
|
||||
// String returns the ISO code of u.
|
||||
func (u Unit) String() string {
|
||||
if u.index == 0 {
|
||||
return "XXX"
|
||||
}
|
||||
return currency.Elem(int(u.index))[:3]
|
||||
}
|
||||
|
||||
// Amount creates an Amount for the given currency unit and amount.
|
||||
func (u Unit) Amount(amount interface{}) Amount {
|
||||
// TODO: verify amount is a supported number type
|
||||
return Amount{amount: amount, currency: u}
|
||||
}
|
||||
|
||||
var (
|
||||
errSyntax = errors.New("currency: tag is not well-formed")
|
||||
errValue = errors.New("currency: tag is not a recognized currency")
|
||||
)
|
||||
|
||||
// ParseISO parses a 3-letter ISO 4217 currency code. It returns an error if s
|
||||
// is not well-formed or not a recognized currency code.
|
||||
func ParseISO(s string) (Unit, error) {
|
||||
var buf [4]byte // Take one byte more to detect oversize keys.
|
||||
key := buf[:copy(buf[:], s)]
|
||||
if !tag.FixCase("XXX", key) {
|
||||
return Unit{}, errSyntax
|
||||
}
|
||||
if i := currency.Index(key); i >= 0 {
|
||||
if i == xxx {
|
||||
return Unit{}, nil
|
||||
}
|
||||
return Unit{uint16(i)}, nil
|
||||
}
|
||||
return Unit{}, errValue
|
||||
}
|
||||
|
||||
// MustParseISO is like ParseISO, but panics if the given currency unit
|
||||
// cannot be parsed. It simplifies safe initialization of Unit values.
|
||||
func MustParseISO(s string) Unit {
|
||||
c, err := ParseISO(s)
|
||||
if err != nil {
|
||||
panic(err)
|
||||
}
|
||||
return c
|
||||
}
|
||||
|
||||
// FromRegion reports the currency unit that is currently legal tender in the
|
||||
// given region according to CLDR. It will return false if region currently does
|
||||
// not have a legal tender.
|
||||
func FromRegion(r language.Region) (currency Unit, ok bool) {
|
||||
x := regionToCode(r)
|
||||
i := sort.Search(len(regionToCurrency), func(i int) bool {
|
||||
return regionToCurrency[i].region >= x
|
||||
})
|
||||
if i < len(regionToCurrency) && regionToCurrency[i].region == x {
|
||||
return Unit{regionToCurrency[i].code}, true
|
||||
}
|
||||
return Unit{}, false
|
||||
}
|
||||
|
||||
// FromTag reports the most likely currency for the given tag. It considers the
|
||||
// currency defined in the -u extension and infers the region if necessary.
|
||||
func FromTag(t language.Tag) (Unit, language.Confidence) {
|
||||
if cur := t.TypeForKey("cu"); len(cur) == 3 {
|
||||
c, _ := ParseISO(cur)
|
||||
return c, language.Exact
|
||||
}
|
||||
r, conf := t.Region()
|
||||
if cur, ok := FromRegion(r); ok {
|
||||
return cur, conf
|
||||
}
|
||||
return Unit{}, language.No
|
||||
}
|
||||
|
||||
var (
|
||||
// Undefined and testing.
|
||||
XXX Unit = Unit{}
|
||||
XTS Unit = Unit{xts}
|
||||
|
||||
// G10 currencies https://en.wikipedia.org/wiki/G10_currencies.
|
||||
USD Unit = Unit{usd}
|
||||
EUR Unit = Unit{eur}
|
||||
JPY Unit = Unit{jpy}
|
||||
GBP Unit = Unit{gbp}
|
||||
CHF Unit = Unit{chf}
|
||||
AUD Unit = Unit{aud}
|
||||
NZD Unit = Unit{nzd}
|
||||
CAD Unit = Unit{cad}
|
||||
SEK Unit = Unit{sek}
|
||||
NOK Unit = Unit{nok}
|
||||
|
||||
// Additional common currencies as defined by CLDR.
|
||||
BRL Unit = Unit{brl}
|
||||
CNY Unit = Unit{cny}
|
||||
DKK Unit = Unit{dkk}
|
||||
INR Unit = Unit{inr}
|
||||
RUB Unit = Unit{rub}
|
||||
HKD Unit = Unit{hkd}
|
||||
IDR Unit = Unit{idr}
|
||||
KRW Unit = Unit{krw}
|
||||
MXN Unit = Unit{mxn}
|
||||
PLN Unit = Unit{pln}
|
||||
SAR Unit = Unit{sar}
|
||||
THB Unit = Unit{thb}
|
||||
TRY Unit = Unit{try}
|
||||
TWD Unit = Unit{twd}
|
||||
ZAR Unit = Unit{zar}
|
||||
|
||||
// Precious metals.
|
||||
XAG Unit = Unit{xag}
|
||||
XAU Unit = Unit{xau}
|
||||
XPT Unit = Unit{xpt}
|
||||
XPD Unit = Unit{xpd}
|
||||
)
|
||||
171
vendor/golang.org/x/text/currency/currency_test.go
generated
vendored
Normal file
171
vendor/golang.org/x/text/currency/currency_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,171 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package currency
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/internal/testtext"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
var (
|
||||
cup = MustParseISO("CUP")
|
||||
czk = MustParseISO("CZK")
|
||||
xcd = MustParseISO("XCD")
|
||||
zwr = MustParseISO("ZWR")
|
||||
)
|
||||
|
||||
func TestParseISO(t *testing.T) {
|
||||
testCases := []struct {
|
||||
in string
|
||||
out Unit
|
||||
ok bool
|
||||
}{
|
||||
{"USD", USD, true},
|
||||
{"xxx", XXX, true},
|
||||
{"xts", XTS, true},
|
||||
{"XX", XXX, false},
|
||||
{"XXXX", XXX, false},
|
||||
{"", XXX, false}, // not well-formed
|
||||
{"UUU", XXX, false}, // unknown
|
||||
{"\u22A9", XXX, false}, // non-ASCII, printable
|
||||
|
||||
{"aaa", XXX, false},
|
||||
{"zzz", XXX, false},
|
||||
{"000", XXX, false},
|
||||
{"999", XXX, false},
|
||||
{"---", XXX, false},
|
||||
{"\x00\x00\x00", XXX, false},
|
||||
{"\xff\xff\xff", XXX, false},
|
||||
}
|
||||
for i, tc := range testCases {
|
||||
if x, err := ParseISO(tc.in); x != tc.out || err == nil != tc.ok {
|
||||
t.Errorf("%d:%s: was %s, %v; want %s, %v", i, tc.in, x, err == nil, tc.out, tc.ok)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFromRegion(t *testing.T) {
|
||||
testCases := []struct {
|
||||
region string
|
||||
currency Unit
|
||||
ok bool
|
||||
}{
|
||||
{"NL", EUR, true},
|
||||
{"BE", EUR, true},
|
||||
{"AG", xcd, true},
|
||||
{"CH", CHF, true},
|
||||
{"CU", cup, true}, // first of multiple
|
||||
{"DG", USD, true}, // does not have M49 code
|
||||
{"150", XXX, false}, // implicit false
|
||||
{"CP", XXX, false}, // explicit false in CLDR
|
||||
{"CS", XXX, false}, // all expired
|
||||
{"ZZ", XXX, false}, // none match
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
cur, ok := FromRegion(language.MustParseRegion(tc.region))
|
||||
if cur != tc.currency || ok != tc.ok {
|
||||
t.Errorf("%s: got %v, %v; want %v, %v", tc.region, cur, ok, tc.currency, tc.ok)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFromTag(t *testing.T) {
|
||||
testCases := []struct {
|
||||
tag string
|
||||
currency Unit
|
||||
conf language.Confidence
|
||||
}{
|
||||
{"nl", EUR, language.Low}, // nl also spoken outside Euro land.
|
||||
{"nl-BE", EUR, language.Exact}, // region is known
|
||||
{"pt", BRL, language.Low},
|
||||
{"en", USD, language.Low},
|
||||
{"en-u-cu-eur", EUR, language.Exact},
|
||||
{"tlh", XXX, language.No}, // Klingon has no country.
|
||||
{"es-419", XXX, language.No},
|
||||
{"und", USD, language.Low},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
cur, conf := FromTag(language.MustParse(tc.tag))
|
||||
if cur != tc.currency || conf != tc.conf {
|
||||
t.Errorf("%s: got %v, %v; want %v, %v", tc.tag, cur, conf, tc.currency, tc.conf)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestTable(t *testing.T) {
|
||||
for i := 4; i < len(currency); i += 4 {
|
||||
if a, b := currency[i-4:i-1], currency[i:i+3]; a >= b {
|
||||
t.Errorf("currency unordered at element %d: %s >= %s", i, a, b)
|
||||
}
|
||||
}
|
||||
// First currency has index 1, last is numCurrencies.
|
||||
if c := currency.Elem(1)[:3]; c != "ADP" {
|
||||
t.Errorf("first was %c; want ADP", c)
|
||||
}
|
||||
if c := currency.Elem(numCurrencies)[:3]; c != "ZWR" {
|
||||
t.Errorf("last was %c; want ZWR", c)
|
||||
}
|
||||
}
|
||||
|
||||
func TestKindRounding(t *testing.T) {
|
||||
testCases := []struct {
|
||||
kind Kind
|
||||
cur Unit
|
||||
scale int
|
||||
inc int
|
||||
}{
|
||||
{Standard, USD, 2, 1},
|
||||
{Standard, CHF, 2, 1},
|
||||
{Cash, CHF, 2, 5},
|
||||
{Standard, TWD, 2, 1},
|
||||
{Cash, TWD, 0, 1},
|
||||
{Standard, czk, 2, 1},
|
||||
{Cash, czk, 0, 1},
|
||||
{Standard, zwr, 2, 1},
|
||||
{Cash, zwr, 0, 1},
|
||||
{Standard, KRW, 0, 1},
|
||||
{Cash, KRW, 0, 1}, // Cash defaults to standard.
|
||||
}
|
||||
for i, tc := range testCases {
|
||||
if scale, inc := tc.kind.Rounding(tc.cur); scale != tc.scale && inc != tc.inc {
|
||||
t.Errorf("%d: got %d, %d; want %d, %d", i, scale, inc, tc.scale, tc.inc)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const body = `package main
|
||||
import (
|
||||
"fmt"
|
||||
"golang.org/x/text/currency"
|
||||
)
|
||||
func main() {
|
||||
%s
|
||||
}
|
||||
`
|
||||
|
||||
func TestLinking(t *testing.T) {
|
||||
base := getSize(t, `fmt.Print(currency.CLDRVersion)`)
|
||||
symbols := getSize(t, `fmt.Print(currency.Symbol(currency.USD))`)
|
||||
if d := symbols - base; d < 2*1024 {
|
||||
t.Errorf("size(symbols)-size(base) was %d; want > 2K", d)
|
||||
}
|
||||
}
|
||||
|
||||
func getSize(t *testing.T, main string) int {
|
||||
size, err := testtext.CodeSize(fmt.Sprintf(body, main))
|
||||
if err != nil {
|
||||
t.Skipf("skipping link size test; binary size could not be determined: %v", err)
|
||||
}
|
||||
return size
|
||||
}
|
||||
|
||||
func BenchmarkString(b *testing.B) {
|
||||
for i := 0; i < b.N; i++ {
|
||||
USD.String()
|
||||
}
|
||||
}
|
||||
27
vendor/golang.org/x/text/currency/example_test.go
generated
vendored
Normal file
27
vendor/golang.org/x/text/currency/example_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package currency_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/currency"
|
||||
)
|
||||
|
||||
func ExampleQuery() {
|
||||
t1799, _ := time.Parse("2006-01-02", "1799-01-01")
|
||||
for it := currency.Query(currency.Date(t1799)); it.Next(); {
|
||||
from := ""
|
||||
if t, ok := it.From(); ok {
|
||||
from = t.Format("2006-01-01")
|
||||
}
|
||||
fmt.Printf("%v is used in %v since: %v\n", it.Unit(), it.Region(), from)
|
||||
}
|
||||
// Output:
|
||||
// GBP is used in GB since: 1694-07-07
|
||||
// GIP is used in GI since: 1713-01-01
|
||||
// USD is used in US since: 1792-01-01
|
||||
}
|
||||
215
vendor/golang.org/x/text/currency/format.go
generated
vendored
Normal file
215
vendor/golang.org/x/text/currency/format.go
generated
vendored
Normal file
|
|
@ -0,0 +1,215 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package currency
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"sort"
|
||||
|
||||
"golang.org/x/text/internal"
|
||||
"golang.org/x/text/internal/format"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// Amount is an amount-currency unit pair.
|
||||
type Amount struct {
|
||||
amount interface{} // Change to decimal(64|128).
|
||||
currency Unit
|
||||
}
|
||||
|
||||
// Currency reports the currency unit of this amount.
|
||||
func (a Amount) Currency() Unit { return a.currency }
|
||||
|
||||
// TODO: based on decimal type, but may make sense to customize a bit.
|
||||
// func (a Amount) Decimal()
|
||||
// func (a Amount) Int() (int64, error)
|
||||
// func (a Amount) Fraction() (int64, error)
|
||||
// func (a Amount) Rat() *big.Rat
|
||||
// func (a Amount) Float() (float64, error)
|
||||
// func (a Amount) Scale() uint
|
||||
// func (a Amount) Precision() uint
|
||||
// func (a Amount) Sign() int
|
||||
//
|
||||
// Add/Sub/Div/Mul/Round.
|
||||
|
||||
var space = []byte(" ")
|
||||
|
||||
// Format implements fmt.Formatter. It accepts format.State for
|
||||
// language-specific rendering.
|
||||
func (a Amount) Format(s fmt.State, verb rune) {
|
||||
v := formattedValue{
|
||||
currency: a.currency,
|
||||
amount: a.amount,
|
||||
format: defaultFormat,
|
||||
}
|
||||
v.Format(s, verb)
|
||||
}
|
||||
|
||||
// formattedValue is currency amount or unit that implements language-sensitive
|
||||
// formatting.
|
||||
type formattedValue struct {
|
||||
currency Unit
|
||||
amount interface{} // Amount, Unit, or number.
|
||||
format *options
|
||||
}
|
||||
|
||||
// Format implements fmt.Formatter. It accepts format.State for
|
||||
// language-specific rendering.
|
||||
func (v formattedValue) Format(s fmt.State, verb rune) {
|
||||
var lang int
|
||||
if state, ok := s.(format.State); ok {
|
||||
lang, _ = language.CompactIndex(state.Language())
|
||||
}
|
||||
|
||||
// Get the options. Use DefaultFormat if not present.
|
||||
opt := v.format
|
||||
if opt == nil {
|
||||
opt = defaultFormat
|
||||
}
|
||||
cur := v.currency
|
||||
if cur.index == 0 {
|
||||
cur = opt.currency
|
||||
}
|
||||
|
||||
// TODO: use pattern.
|
||||
io.WriteString(s, opt.symbol(lang, cur))
|
||||
if v.amount != nil {
|
||||
s.Write(space)
|
||||
|
||||
// TODO: apply currency-specific rounding
|
||||
scale, _ := opt.kind.Rounding(cur)
|
||||
if _, ok := s.Precision(); !ok {
|
||||
fmt.Fprintf(s, "%.*f", scale, v.amount)
|
||||
} else {
|
||||
fmt.Fprint(s, v.amount)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Formatter decorates a given number, Unit or Amount with formatting options.
|
||||
type Formatter func(amount interface{}) formattedValue
|
||||
|
||||
// func (f Formatter) Options(opts ...Option) Formatter
|
||||
|
||||
// TODO: call this a Formatter or FormatFunc?
|
||||
|
||||
var dummy = USD.Amount(0)
|
||||
|
||||
// adjust creates a new Formatter based on the adjustments of fn on f.
|
||||
func (f Formatter) adjust(fn func(*options)) Formatter {
|
||||
var o options = *(f(dummy).format)
|
||||
fn(&o)
|
||||
return o.format
|
||||
}
|
||||
|
||||
// Default creates a new Formatter that defaults to currency unit c if a numeric
|
||||
// value is passed that is not associated with a currency.
|
||||
func (f Formatter) Default(currency Unit) Formatter {
|
||||
return f.adjust(func(o *options) { o.currency = currency })
|
||||
}
|
||||
|
||||
// Kind sets the kind of the underlying currency unit.
|
||||
func (f Formatter) Kind(k Kind) Formatter {
|
||||
return f.adjust(func(o *options) { o.kind = k })
|
||||
}
|
||||
|
||||
var defaultFormat *options = ISO(dummy).format
|
||||
|
||||
var (
|
||||
// Uses Narrow symbols. Overrides Symbol, if present.
|
||||
NarrowSymbol Formatter = Formatter(formNarrow)
|
||||
|
||||
// Use Symbols instead of ISO codes, when available.
|
||||
Symbol Formatter = Formatter(formSymbol)
|
||||
|
||||
// Use ISO code as symbol.
|
||||
ISO Formatter = Formatter(formISO)
|
||||
|
||||
// TODO:
|
||||
// // Use full name as symbol.
|
||||
// Name Formatter
|
||||
)
|
||||
|
||||
// options configures rendering and rounding options for an Amount.
|
||||
type options struct {
|
||||
currency Unit
|
||||
kind Kind
|
||||
|
||||
symbol func(compactIndex int, c Unit) string
|
||||
}
|
||||
|
||||
func (o *options) format(amount interface{}) formattedValue {
|
||||
v := formattedValue{format: o}
|
||||
switch x := amount.(type) {
|
||||
case Amount:
|
||||
v.amount = x.amount
|
||||
v.currency = x.currency
|
||||
case *Amount:
|
||||
v.amount = x.amount
|
||||
v.currency = x.currency
|
||||
case Unit:
|
||||
v.currency = x
|
||||
case *Unit:
|
||||
v.currency = *x
|
||||
default:
|
||||
if o.currency.index == 0 {
|
||||
panic("cannot format number without a currency being set")
|
||||
}
|
||||
// TODO: Must be a number.
|
||||
v.amount = x
|
||||
v.currency = o.currency
|
||||
}
|
||||
return v
|
||||
}
|
||||
|
||||
var (
|
||||
optISO = options{symbol: lookupISO}
|
||||
optSymbol = options{symbol: lookupSymbol}
|
||||
optNarrow = options{symbol: lookupNarrow}
|
||||
)
|
||||
|
||||
// These need to be functions, rather than curried methods, as curried methods
|
||||
// are evaluated at init time, causing tables to be included unconditionally.
|
||||
func formISO(x interface{}) formattedValue { return optISO.format(x) }
|
||||
func formSymbol(x interface{}) formattedValue { return optSymbol.format(x) }
|
||||
func formNarrow(x interface{}) formattedValue { return optNarrow.format(x) }
|
||||
|
||||
func lookupISO(x int, c Unit) string { return c.String() }
|
||||
func lookupSymbol(x int, c Unit) string { return normalSymbol.lookup(x, c) }
|
||||
func lookupNarrow(x int, c Unit) string { return narrowSymbol.lookup(x, c) }
|
||||
|
||||
type symbolIndex struct {
|
||||
index []uint16 // position corresponds with compact index of language.
|
||||
data []curToIndex
|
||||
}
|
||||
|
||||
var (
|
||||
normalSymbol = symbolIndex{normalLangIndex, normalSymIndex}
|
||||
narrowSymbol = symbolIndex{narrowLangIndex, narrowSymIndex}
|
||||
)
|
||||
|
||||
func (x *symbolIndex) lookup(lang int, c Unit) string {
|
||||
for {
|
||||
index := x.data[x.index[lang]:x.index[lang+1]]
|
||||
i := sort.Search(len(index), func(i int) bool {
|
||||
return index[i].cur >= c.index
|
||||
})
|
||||
if i < len(index) && index[i].cur == c.index {
|
||||
x := index[i].idx
|
||||
start := x + 1
|
||||
end := start + uint16(symbols[x])
|
||||
if start == end {
|
||||
return c.String()
|
||||
}
|
||||
return symbols[start:end]
|
||||
}
|
||||
if lang == 0 {
|
||||
break
|
||||
}
|
||||
lang = int(internal.Parent[lang])
|
||||
}
|
||||
return c.String()
|
||||
}
|
||||
70
vendor/golang.org/x/text/currency/format_test.go
generated
vendored
Normal file
70
vendor/golang.org/x/text/currency/format_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package currency
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/message"
|
||||
)
|
||||
|
||||
var (
|
||||
en = language.English
|
||||
fr = language.French
|
||||
en_US = language.AmericanEnglish
|
||||
en_GB = language.BritishEnglish
|
||||
en_AU = language.MustParse("en-AU")
|
||||
und = language.Und
|
||||
)
|
||||
|
||||
func TestFormatting(t *testing.T) {
|
||||
testCases := []struct {
|
||||
tag language.Tag
|
||||
value interface{}
|
||||
format Formatter
|
||||
want string
|
||||
}{
|
||||
0: {en, USD.Amount(0.1), nil, "USD 0.10"},
|
||||
1: {en, XPT.Amount(1.0), Symbol, "XPT 1.00"},
|
||||
|
||||
2: {en, USD.Amount(2.0), ISO, "USD 2.00"},
|
||||
3: {und, USD.Amount(3.0), Symbol, "US$ 3.00"},
|
||||
4: {en, USD.Amount(4.0), Symbol, "$ 4.00"},
|
||||
|
||||
5: {en, USD.Amount(5.20), NarrowSymbol, "$ 5.20"},
|
||||
6: {en, AUD.Amount(6.20), Symbol, "A$ 6.20"},
|
||||
|
||||
7: {en_AU, AUD.Amount(7.20), Symbol, "$ 7.20"},
|
||||
8: {en_GB, USD.Amount(8.20), Symbol, "US$ 8.20"},
|
||||
|
||||
9: {en, 9.0, Symbol.Default(EUR), "€ 9.00"},
|
||||
10: {en, 10.123, Symbol.Default(KRW), "₩ 10"},
|
||||
11: {fr, 11.52, Symbol.Default(TWD), "TWD 11.52"},
|
||||
12: {en, 12.123, Symbol.Default(czk), "CZK 12.12"},
|
||||
13: {en, 13.123, Symbol.Default(czk).Kind(Cash), "CZK 13"},
|
||||
14: {en, 14.12345, ISO.Default(MustParseISO("CLF")), "CLF 14.1235"},
|
||||
15: {en, USD.Amount(15.00), ISO.Default(TWD), "USD 15.00"},
|
||||
16: {en, KRW.Amount(16.00), ISO.Kind(Cash), "KRW 16"},
|
||||
|
||||
// TODO: support integers as well.
|
||||
|
||||
17: {en, USD, nil, "USD"},
|
||||
18: {en, USD, ISO, "USD"},
|
||||
19: {en, USD, Symbol, "$"},
|
||||
20: {en_GB, USD, Symbol, "US$"},
|
||||
21: {en_AU, USD, NarrowSymbol, "$"},
|
||||
}
|
||||
for i, tc := range testCases {
|
||||
p := message.NewPrinter(tc.tag)
|
||||
v := tc.value
|
||||
if tc.format != nil {
|
||||
v = tc.format(v)
|
||||
}
|
||||
if got := p.Sprint(v); got != tc.want {
|
||||
t.Errorf("%d: got %q; want %q", i, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
400
vendor/golang.org/x/text/currency/gen.go
generated
vendored
Normal file
400
vendor/golang.org/x/text/currency/gen.go
generated
vendored
Normal file
|
|
@ -0,0 +1,400 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
// Generator for currency-related data.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"log"
|
||||
"os"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/internal"
|
||||
"golang.org/x/text/internal/gen"
|
||||
"golang.org/x/text/internal/tag"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/unicode/cldr"
|
||||
)
|
||||
|
||||
var (
|
||||
test = flag.Bool("test", false,
|
||||
"test existing tables; can be used to compare web data with package data.")
|
||||
outputFile = flag.String("output", "tables.go", "output file")
|
||||
|
||||
draft = flag.String("draft",
|
||||
"contributed",
|
||||
`Minimal draft requirements (approved, contributed, provisional, unconfirmed).`)
|
||||
)
|
||||
|
||||
func main() {
|
||||
gen.Init()
|
||||
|
||||
gen.Repackage("gen_common.go", "common.go", "currency")
|
||||
|
||||
// Read the CLDR zip file.
|
||||
r := gen.OpenCLDRCoreZip()
|
||||
defer r.Close()
|
||||
|
||||
d := &cldr.Decoder{}
|
||||
d.SetDirFilter("supplemental", "main")
|
||||
d.SetSectionFilter("numbers")
|
||||
data, err := d.DecodeZip(r)
|
||||
if err != nil {
|
||||
log.Fatalf("DecodeZip: %v", err)
|
||||
}
|
||||
|
||||
w := gen.NewCodeWriter()
|
||||
defer w.WriteGoFile(*outputFile, "currency")
|
||||
|
||||
fmt.Fprintln(w, `import "golang.org/x/text/internal/tag"`)
|
||||
|
||||
gen.WriteCLDRVersion(w)
|
||||
b := &builder{}
|
||||
b.genCurrencies(w, data.Supplemental())
|
||||
b.genSymbols(w, data)
|
||||
}
|
||||
|
||||
var constants = []string{
|
||||
// Undefined and testing.
|
||||
"XXX", "XTS",
|
||||
// G11 currencies https://en.wikipedia.org/wiki/G10_currencies.
|
||||
"USD", "EUR", "JPY", "GBP", "CHF", "AUD", "NZD", "CAD", "SEK", "NOK", "DKK",
|
||||
// Precious metals.
|
||||
"XAG", "XAU", "XPT", "XPD",
|
||||
|
||||
// Additional common currencies as defined by CLDR.
|
||||
"BRL", "CNY", "INR", "RUB", "HKD", "IDR", "KRW", "MXN", "PLN", "SAR",
|
||||
"THB", "TRY", "TWD", "ZAR",
|
||||
}
|
||||
|
||||
type builder struct {
|
||||
currencies tag.Index
|
||||
numCurrencies int
|
||||
}
|
||||
|
||||
func (b *builder) genCurrencies(w *gen.CodeWriter, data *cldr.SupplementalData) {
|
||||
// 3-letter ISO currency codes
|
||||
// Start with dummy to let index start at 1.
|
||||
currencies := []string{"\x00\x00\x00\x00"}
|
||||
|
||||
// currency codes
|
||||
for _, reg := range data.CurrencyData.Region {
|
||||
for _, cur := range reg.Currency {
|
||||
currencies = append(currencies, cur.Iso4217)
|
||||
}
|
||||
}
|
||||
// Not included in the list for some reasons:
|
||||
currencies = append(currencies, "MVP")
|
||||
|
||||
sort.Strings(currencies)
|
||||
// Unique the elements.
|
||||
k := 0
|
||||
for i := 1; i < len(currencies); i++ {
|
||||
if currencies[k] != currencies[i] {
|
||||
currencies[k+1] = currencies[i]
|
||||
k++
|
||||
}
|
||||
}
|
||||
currencies = currencies[:k+1]
|
||||
|
||||
// Close with dummy for simpler and faster searching.
|
||||
currencies = append(currencies, "\xff\xff\xff\xff")
|
||||
|
||||
// Write currency values.
|
||||
fmt.Fprintln(w, "const (")
|
||||
for _, c := range constants {
|
||||
index := sort.SearchStrings(currencies, c)
|
||||
fmt.Fprintf(w, "\t%s = %d\n", strings.ToLower(c), index)
|
||||
}
|
||||
fmt.Fprint(w, ")")
|
||||
|
||||
// Compute currency-related data that we merge into the table.
|
||||
for _, info := range data.CurrencyData.Fractions[0].Info {
|
||||
if info.Iso4217 == "DEFAULT" {
|
||||
continue
|
||||
}
|
||||
standard := getRoundingIndex(info.Digits, info.Rounding, 0)
|
||||
cash := getRoundingIndex(info.CashDigits, info.CashRounding, standard)
|
||||
|
||||
index := sort.SearchStrings(currencies, info.Iso4217)
|
||||
currencies[index] += mkCurrencyInfo(standard, cash)
|
||||
}
|
||||
|
||||
// Set default values for entries that weren't touched.
|
||||
for i, c := range currencies {
|
||||
if len(c) == 3 {
|
||||
currencies[i] += mkCurrencyInfo(0, 0)
|
||||
}
|
||||
}
|
||||
|
||||
b.currencies = tag.Index(strings.Join(currencies, ""))
|
||||
w.WriteComment(`
|
||||
currency holds an alphabetically sorted list of canonical 3-letter currency
|
||||
identifiers. Each identifier is followed by a byte of type currencyInfo,
|
||||
defined in gen_common.go.`)
|
||||
w.WriteConst("currency", b.currencies)
|
||||
|
||||
// Hack alert: gofmt indents a trailing comment after an indented string.
|
||||
// Ensure that the next thing written is not a comment.
|
||||
b.numCurrencies = (len(b.currencies) / 4) - 2
|
||||
w.WriteConst("numCurrencies", b.numCurrencies)
|
||||
|
||||
// Create a table that maps regions to currencies.
|
||||
regionToCurrency := []toCurrency{}
|
||||
|
||||
for _, reg := range data.CurrencyData.Region {
|
||||
if len(reg.Iso3166) != 2 {
|
||||
log.Fatalf("Unexpected group %q in region data", reg.Iso3166)
|
||||
}
|
||||
if len(reg.Currency) == 0 {
|
||||
continue
|
||||
}
|
||||
cur := reg.Currency[0]
|
||||
if cur.To != "" || cur.Tender == "false" {
|
||||
continue
|
||||
}
|
||||
regionToCurrency = append(regionToCurrency, toCurrency{
|
||||
region: regionToCode(language.MustParseRegion(reg.Iso3166)),
|
||||
code: uint16(b.currencies.Index([]byte(cur.Iso4217))),
|
||||
})
|
||||
}
|
||||
sort.Sort(byRegion(regionToCurrency))
|
||||
|
||||
w.WriteType(toCurrency{})
|
||||
w.WriteVar("regionToCurrency", regionToCurrency)
|
||||
|
||||
// Create a table that maps regions to currencies.
|
||||
regionData := []regionInfo{}
|
||||
|
||||
for _, reg := range data.CurrencyData.Region {
|
||||
if len(reg.Iso3166) != 2 {
|
||||
log.Fatalf("Unexpected group %q in region data", reg.Iso3166)
|
||||
}
|
||||
for _, cur := range reg.Currency {
|
||||
from, _ := time.Parse("2006-01-02", cur.From)
|
||||
to, _ := time.Parse("2006-01-02", cur.To)
|
||||
code := uint16(b.currencies.Index([]byte(cur.Iso4217)))
|
||||
if cur.Tender == "false" {
|
||||
code |= nonTenderBit
|
||||
}
|
||||
regionData = append(regionData, regionInfo{
|
||||
region: regionToCode(language.MustParseRegion(reg.Iso3166)),
|
||||
code: code,
|
||||
from: toDate(from),
|
||||
to: toDate(to),
|
||||
})
|
||||
}
|
||||
}
|
||||
sort.Stable(byRegionCode(regionData))
|
||||
|
||||
w.WriteType(regionInfo{})
|
||||
w.WriteVar("regionData", regionData)
|
||||
}
|
||||
|
||||
type regionInfo struct {
|
||||
region uint16
|
||||
code uint16 // 0x8000 not legal tender
|
||||
from uint32
|
||||
to uint32
|
||||
}
|
||||
|
||||
type byRegionCode []regionInfo
|
||||
|
||||
func (a byRegionCode) Len() int { return len(a) }
|
||||
func (a byRegionCode) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
|
||||
func (a byRegionCode) Less(i, j int) bool { return a[i].region < a[j].region }
|
||||
|
||||
type toCurrency struct {
|
||||
region uint16
|
||||
code uint16
|
||||
}
|
||||
|
||||
type byRegion []toCurrency
|
||||
|
||||
func (a byRegion) Len() int { return len(a) }
|
||||
func (a byRegion) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
|
||||
func (a byRegion) Less(i, j int) bool { return a[i].region < a[j].region }
|
||||
|
||||
func mkCurrencyInfo(standard, cash int) string {
|
||||
return string([]byte{byte(cash<<cashShift | standard)})
|
||||
}
|
||||
|
||||
func getRoundingIndex(digits, rounding string, defIndex int) int {
|
||||
round := roundings[defIndex] // default
|
||||
|
||||
if digits != "" {
|
||||
round.scale = parseUint8(digits)
|
||||
}
|
||||
if rounding != "" && rounding != "0" { // 0 means 1 here in CLDR
|
||||
round.increment = parseUint8(rounding)
|
||||
}
|
||||
|
||||
// Will panic if the entry doesn't exist:
|
||||
for i, r := range roundings {
|
||||
if r == round {
|
||||
return i
|
||||
}
|
||||
}
|
||||
log.Fatalf("Rounding entry %#v does not exist.", round)
|
||||
panic("unreachable")
|
||||
}
|
||||
|
||||
// genSymbols generates the symbols used for currencies. Most symbols are
|
||||
// defined in root and there is only very small variation per language.
|
||||
// The following rules apply:
|
||||
// - A symbol can be requested as normal or narrow.
|
||||
// - If a symbol is not defined for a currency, it defaults to its ISO code.
|
||||
func (b *builder) genSymbols(w *gen.CodeWriter, data *cldr.CLDR) {
|
||||
d, err := cldr.ParseDraft(*draft)
|
||||
if err != nil {
|
||||
log.Fatalf("filter: %v", err)
|
||||
}
|
||||
|
||||
const (
|
||||
normal = iota
|
||||
narrow
|
||||
numTypes
|
||||
)
|
||||
// language -> currency -> type -> symbol
|
||||
var symbols [language.NumCompactTags][][numTypes]*string
|
||||
|
||||
// Collect symbol information per language.
|
||||
for _, lang := range data.Locales() {
|
||||
ldml := data.RawLDML(lang)
|
||||
if ldml.Numbers == nil || ldml.Numbers.Currencies == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
langIndex, ok := language.CompactIndex(language.MustParse(lang))
|
||||
if !ok {
|
||||
log.Fatalf("No compact index for language %s", lang)
|
||||
}
|
||||
|
||||
symbols[langIndex] = make([][numTypes]*string, b.numCurrencies+1)
|
||||
|
||||
for _, c := range ldml.Numbers.Currencies.Currency {
|
||||
syms := cldr.MakeSlice(&c.Symbol)
|
||||
syms.SelectDraft(d)
|
||||
|
||||
for _, sym := range c.Symbol {
|
||||
v := sym.Data()
|
||||
if v == c.Type {
|
||||
// We define "" to mean the ISO symbol.
|
||||
v = ""
|
||||
}
|
||||
cur := b.currencies.Index([]byte(c.Type))
|
||||
// XXX gets reassigned to 0 in the package's code.
|
||||
if c.Type == "XXX" {
|
||||
cur = 0
|
||||
}
|
||||
if cur == -1 {
|
||||
fmt.Println("Unsupported:", c.Type)
|
||||
continue
|
||||
}
|
||||
|
||||
switch sym.Alt {
|
||||
case "":
|
||||
symbols[langIndex][cur][normal] = &v
|
||||
case "narrow":
|
||||
symbols[langIndex][cur][narrow] = &v
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Remove values identical to the parent.
|
||||
for langIndex, data := range symbols {
|
||||
for curIndex, curs := range data {
|
||||
for typ, sym := range curs {
|
||||
if sym == nil {
|
||||
continue
|
||||
}
|
||||
for p := uint16(langIndex); p != 0; {
|
||||
p = internal.Parent[p]
|
||||
x := symbols[p]
|
||||
if x == nil {
|
||||
continue
|
||||
}
|
||||
if v := x[curIndex][typ]; v != nil || p == 0 {
|
||||
// Value is equal to the default value root value is undefined.
|
||||
parentSym := ""
|
||||
if v != nil {
|
||||
parentSym = *v
|
||||
}
|
||||
if parentSym == *sym {
|
||||
// Value is the same as parent.
|
||||
data[curIndex][typ] = nil
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Create symbol index.
|
||||
symbolData := []byte{0}
|
||||
symbolLookup := map[string]uint16{"": 0} // 0 means default, so block that value.
|
||||
for _, data := range symbols {
|
||||
for _, curs := range data {
|
||||
for _, sym := range curs {
|
||||
if sym == nil {
|
||||
continue
|
||||
}
|
||||
if _, ok := symbolLookup[*sym]; !ok {
|
||||
symbolLookup[*sym] = uint16(len(symbolData))
|
||||
symbolData = append(symbolData, byte(len(*sym)))
|
||||
symbolData = append(symbolData, *sym...)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
w.WriteComment(`
|
||||
symbols holds symbol data of the form <n> <str>, where n is the length of
|
||||
the symbol string str.`)
|
||||
w.WriteConst("symbols", string(symbolData))
|
||||
|
||||
// Create index from language to currency lookup to symbol.
|
||||
type curToIndex struct{ cur, idx uint16 }
|
||||
w.WriteType(curToIndex{})
|
||||
|
||||
prefix := []string{"normal", "narrow"}
|
||||
// Create data for regular and narrow symbol data.
|
||||
for typ := normal; typ <= narrow; typ++ {
|
||||
|
||||
indexes := []curToIndex{} // maps currency to symbol index
|
||||
languages := []uint16{}
|
||||
|
||||
for _, data := range symbols {
|
||||
languages = append(languages, uint16(len(indexes)))
|
||||
for curIndex, curs := range data {
|
||||
|
||||
if sym := curs[typ]; sym != nil {
|
||||
indexes = append(indexes, curToIndex{uint16(curIndex), symbolLookup[*sym]})
|
||||
}
|
||||
}
|
||||
}
|
||||
languages = append(languages, uint16(len(indexes)))
|
||||
|
||||
w.WriteVar(prefix[typ]+"LangIndex", languages)
|
||||
w.WriteVar(prefix[typ]+"SymIndex", indexes)
|
||||
}
|
||||
}
|
||||
func parseUint8(str string) uint8 {
|
||||
x, err := strconv.ParseUint(str, 10, 8)
|
||||
if err != nil {
|
||||
// Show line number of where this function was called.
|
||||
log.New(os.Stderr, "", log.Lshortfile).Output(2, err.Error())
|
||||
os.Exit(1)
|
||||
}
|
||||
return uint8(x)
|
||||
}
|
||||
70
vendor/golang.org/x/text/currency/gen_common.go
generated
vendored
Normal file
70
vendor/golang.org/x/text/currency/gen_common.go
generated
vendored
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// This file contains code common to gen.go and the package code.
|
||||
|
||||
const (
|
||||
cashShift = 3
|
||||
roundMask = 0x7
|
||||
|
||||
nonTenderBit = 0x8000
|
||||
)
|
||||
|
||||
// currencyInfo contains information about a currency.
|
||||
// bits 0..2: index into roundings for standard rounding
|
||||
// bits 3..5: index into roundings for cash rounding
|
||||
type currencyInfo byte
|
||||
|
||||
// roundingType defines the scale (number of fractional decimals) and increments
|
||||
// in terms of units of size 10^-scale. For example, for scale == 2 and
|
||||
// increment == 1, the currency is rounded to units of 0.01.
|
||||
type roundingType struct {
|
||||
scale, increment uint8
|
||||
}
|
||||
|
||||
// roundings contains rounding data for currencies. This struct is
|
||||
// created by hand as it is very unlikely to change much.
|
||||
var roundings = [...]roundingType{
|
||||
{2, 1}, // default
|
||||
{0, 1},
|
||||
{1, 1},
|
||||
{3, 1},
|
||||
{4, 1},
|
||||
{2, 5}, // cash rounding alternative
|
||||
}
|
||||
|
||||
// regionToCode returns a 16-bit region code. Only two-letter codes are
|
||||
// supported. (Three-letter codes are not needed.)
|
||||
func regionToCode(r language.Region) uint16 {
|
||||
if s := r.String(); len(s) == 2 {
|
||||
return uint16(s[0])<<8 | uint16(s[1])
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func toDate(t time.Time) uint32 {
|
||||
y := t.Year()
|
||||
if y == 1 {
|
||||
return 0
|
||||
}
|
||||
date := uint32(y) << 4
|
||||
date |= uint32(t.Month())
|
||||
date <<= 5
|
||||
date |= uint32(t.Day())
|
||||
return date
|
||||
}
|
||||
|
||||
func fromDate(date uint32) time.Time {
|
||||
return time.Date(int(date>>9), time.Month((date>>5)&0xf), int(date&0x1f), 0, 0, 0, 0, time.UTC)
|
||||
}
|
||||
152
vendor/golang.org/x/text/currency/query.go
generated
vendored
Normal file
152
vendor/golang.org/x/text/currency/query.go
generated
vendored
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package currency
|
||||
|
||||
import (
|
||||
"sort"
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
// QueryIter represents a set of Units. The default set includes all Units that
|
||||
// are currently in use as legal tender in any Region.
|
||||
type QueryIter interface {
|
||||
// Next returns true if there is a next element available.
|
||||
// It must be called before any of the other methods are called.
|
||||
Next() bool
|
||||
|
||||
// Unit returns the unit of the current iteration.
|
||||
Unit() Unit
|
||||
|
||||
// Region returns the Region for the current iteration.
|
||||
Region() language.Region
|
||||
|
||||
// From returns the date from which the unit was used in the region.
|
||||
// It returns false if this date is unknown.
|
||||
From() (time.Time, bool)
|
||||
|
||||
// To returns the date up till which the unit was used in the region.
|
||||
// It returns false if this date is unknown or if the unit is still in use.
|
||||
To() (time.Time, bool)
|
||||
|
||||
// IsTender reports whether the unit is a legal tender in the region during
|
||||
// the specified date range.
|
||||
IsTender() bool
|
||||
}
|
||||
|
||||
// Query represents a set of Units. The default set includes all Units that are
|
||||
// currently in use as legal tender in any Region.
|
||||
func Query(options ...QueryOption) QueryIter {
|
||||
it := &iter{
|
||||
end: len(regionData),
|
||||
date: 0xFFFFFFFF,
|
||||
}
|
||||
for _, fn := range options {
|
||||
fn(it)
|
||||
}
|
||||
return it
|
||||
}
|
||||
|
||||
// NonTender returns a new query that also includes matching Units that are not
|
||||
// legal tender.
|
||||
var NonTender QueryOption = nonTender
|
||||
|
||||
func nonTender(i *iter) {
|
||||
i.nonTender = true
|
||||
}
|
||||
|
||||
// Historical selects the units for all dates.
|
||||
var Historical QueryOption = historical
|
||||
|
||||
func historical(i *iter) {
|
||||
i.date = hist
|
||||
}
|
||||
|
||||
// A QueryOption can be used to change the set of unit information returned by
|
||||
// a query.
|
||||
type QueryOption func(*iter)
|
||||
|
||||
// Date queries the units that were in use at the given point in history.
|
||||
func Date(t time.Time) QueryOption {
|
||||
d := toDate(t)
|
||||
return func(i *iter) {
|
||||
i.date = d
|
||||
}
|
||||
}
|
||||
|
||||
// Region limits the query to only return entries for the given region.
|
||||
func Region(r language.Region) QueryOption {
|
||||
p, end := len(regionData), len(regionData)
|
||||
x := regionToCode(r)
|
||||
i := sort.Search(len(regionData), func(i int) bool {
|
||||
return regionData[i].region >= x
|
||||
})
|
||||
if i < len(regionData) && regionData[i].region == x {
|
||||
p = i
|
||||
for i++; i < len(regionData) && regionData[i].region == x; i++ {
|
||||
}
|
||||
end = i
|
||||
}
|
||||
return func(i *iter) {
|
||||
i.p, i.end = p, end
|
||||
}
|
||||
}
|
||||
|
||||
const (
|
||||
hist = 0x00
|
||||
now = 0xFFFFFFFF
|
||||
)
|
||||
|
||||
type iter struct {
|
||||
*regionInfo
|
||||
p, end int
|
||||
date uint32
|
||||
nonTender bool
|
||||
}
|
||||
|
||||
func (i *iter) Next() bool {
|
||||
for ; i.p < i.end; i.p++ {
|
||||
i.regionInfo = ®ionData[i.p]
|
||||
if !i.nonTender && !i.IsTender() {
|
||||
continue
|
||||
}
|
||||
if i.date == hist || (i.from <= i.date && (i.to == 0 || i.date <= i.to)) {
|
||||
i.p++
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (r *regionInfo) Region() language.Region {
|
||||
// TODO: this could be much faster.
|
||||
var buf [2]byte
|
||||
buf[0] = uint8(r.region >> 8)
|
||||
buf[1] = uint8(r.region)
|
||||
return language.MustParseRegion(string(buf[:]))
|
||||
}
|
||||
|
||||
func (r *regionInfo) Unit() Unit {
|
||||
return Unit{r.code &^ nonTenderBit}
|
||||
}
|
||||
|
||||
func (r *regionInfo) IsTender() bool {
|
||||
return r.code&nonTenderBit == 0
|
||||
}
|
||||
|
||||
func (r *regionInfo) From() (time.Time, bool) {
|
||||
if r.from == 0 {
|
||||
return time.Time{}, false
|
||||
}
|
||||
return fromDate(r.from), true
|
||||
}
|
||||
|
||||
func (r *regionInfo) To() (time.Time, bool) {
|
||||
if r.to == 0 {
|
||||
return time.Time{}, false
|
||||
}
|
||||
return fromDate(r.to), true
|
||||
}
|
||||
107
vendor/golang.org/x/text/currency/query_test.go
generated
vendored
Normal file
107
vendor/golang.org/x/text/currency/query_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,107 @@
|
|||
// Copyright 2016 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package currency
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
func TestQuery(t *testing.T) {
|
||||
r := func(region string) language.Region {
|
||||
return language.MustParseRegion(region)
|
||||
}
|
||||
t1800, _ := time.Parse("2006-01-02", "1800-01-01")
|
||||
type result struct {
|
||||
region language.Region
|
||||
unit Unit
|
||||
isTender bool
|
||||
from, to string
|
||||
}
|
||||
testCases := []struct {
|
||||
name string
|
||||
opts []QueryOption
|
||||
results []result
|
||||
}{{
|
||||
name: "XA",
|
||||
opts: []QueryOption{Region(r("XA"))},
|
||||
results: []result{},
|
||||
}, {
|
||||
name: "AC",
|
||||
opts: []QueryOption{Region(r("AC"))},
|
||||
results: []result{
|
||||
{r("AC"), MustParseISO("SHP"), true, "1976-01-01", ""},
|
||||
},
|
||||
}, {
|
||||
name: "US",
|
||||
opts: []QueryOption{Region(r("US"))},
|
||||
results: []result{
|
||||
{r("US"), MustParseISO("USD"), true, "1792-01-01", ""},
|
||||
},
|
||||
}, {
|
||||
name: "US-hist",
|
||||
opts: []QueryOption{Region(r("US")), Historical},
|
||||
results: []result{
|
||||
{r("US"), MustParseISO("USD"), true, "1792-01-01", ""},
|
||||
},
|
||||
}, {
|
||||
name: "US-non-tender",
|
||||
opts: []QueryOption{Region(r("US")), NonTender},
|
||||
results: []result{
|
||||
{r("US"), MustParseISO("USD"), true, "1792-01-01", ""},
|
||||
{r("US"), MustParseISO("USN"), false, "", ""},
|
||||
},
|
||||
}, {
|
||||
name: "US-historical+non-tender",
|
||||
opts: []QueryOption{Region(r("US")), Historical, NonTender},
|
||||
results: []result{
|
||||
{r("US"), MustParseISO("USD"), true, "1792-01-01", ""},
|
||||
{r("US"), MustParseISO("USN"), false, "", ""},
|
||||
{r("US"), MustParseISO("USS"), false, "", "2014-03-01"},
|
||||
},
|
||||
}, {
|
||||
name: "1800",
|
||||
opts: []QueryOption{Date(t1800)},
|
||||
results: []result{
|
||||
{r("CH"), MustParseISO("CHF"), true, "1799-03-17", ""},
|
||||
{r("GB"), MustParseISO("GBP"), true, "1694-07-27", ""},
|
||||
{r("GI"), MustParseISO("GIP"), true, "1713-01-01", ""},
|
||||
// The date for IE and PR seem wrong, so these may be updated at
|
||||
// some point causing the tests to fail.
|
||||
{r("IE"), MustParseISO("GBP"), true, "1800-01-01", "1922-01-01"},
|
||||
{r("PR"), MustParseISO("ESP"), true, "1800-01-01", "1898-12-10"},
|
||||
{r("US"), MustParseISO("USD"), true, "1792-01-01", ""},
|
||||
},
|
||||
}}
|
||||
for _, tc := range testCases {
|
||||
n := 0
|
||||
for it := Query(tc.opts...); it.Next(); n++ {
|
||||
if n < len(tc.results) {
|
||||
got := result{
|
||||
it.Region(),
|
||||
it.Unit(),
|
||||
it.IsTender(),
|
||||
getTime(it.From()),
|
||||
getTime(it.To()),
|
||||
}
|
||||
if got != tc.results[n] {
|
||||
t.Errorf("%s:%d: got %v; want %v", tc.name, n, got, tc.results[n])
|
||||
}
|
||||
}
|
||||
}
|
||||
if n != len(tc.results) {
|
||||
t.Errorf("%s: unexpected number of results: got %d; want %d", tc.name, n, len(tc.results))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func getTime(t time.Time, ok bool) string {
|
||||
if !ok {
|
||||
return ""
|
||||
}
|
||||
return t.Format("2006-01-02")
|
||||
}
|
||||
2574
vendor/golang.org/x/text/currency/tables.go
generated
vendored
Normal file
2574
vendor/golang.org/x/text/currency/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
93
vendor/golang.org/x/text/currency/tables_test.go
generated
vendored
Normal file
93
vendor/golang.org/x/text/currency/tables_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,93 @@
|
|||
package currency
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"golang.org/x/text/internal/gen"
|
||||
"golang.org/x/text/internal/testtext"
|
||||
"golang.org/x/text/language"
|
||||
"golang.org/x/text/message"
|
||||
"golang.org/x/text/unicode/cldr"
|
||||
)
|
||||
|
||||
var draft = flag.String("draft",
|
||||
"contributed",
|
||||
`Minimal draft requirements (approved, contributed, provisional, unconfirmed).`)
|
||||
|
||||
func TestTables(t *testing.T) {
|
||||
testtext.SkipIfNotLong(t)
|
||||
|
||||
// Read the CLDR zip file.
|
||||
r := gen.OpenCLDRCoreZip()
|
||||
defer r.Close()
|
||||
|
||||
d := &cldr.Decoder{}
|
||||
d.SetDirFilter("supplemental", "main")
|
||||
d.SetSectionFilter("numbers")
|
||||
data, err := d.DecodeZip(r)
|
||||
if err != nil {
|
||||
t.Fatalf("DecodeZip: %v", err)
|
||||
}
|
||||
|
||||
dr, err := cldr.ParseDraft(*draft)
|
||||
if err != nil {
|
||||
t.Fatalf("filter: %v", err)
|
||||
}
|
||||
|
||||
for _, lang := range data.Locales() {
|
||||
p := message.NewPrinter(language.MustParse(lang))
|
||||
|
||||
ldml := data.RawLDML(lang)
|
||||
if ldml.Numbers == nil || ldml.Numbers.Currencies == nil {
|
||||
continue
|
||||
}
|
||||
for _, c := range ldml.Numbers.Currencies.Currency {
|
||||
syms := cldr.MakeSlice(&c.Symbol)
|
||||
syms.SelectDraft(dr)
|
||||
|
||||
for _, sym := range c.Symbol {
|
||||
cur, err := ParseISO(c.Type)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
formatter := Symbol
|
||||
switch sym.Alt {
|
||||
case "":
|
||||
case "narrow":
|
||||
formatter = NarrowSymbol
|
||||
default:
|
||||
continue
|
||||
}
|
||||
want := sym.Data()
|
||||
if got := p.Sprint(formatter(cur)); got != want {
|
||||
t.Errorf("%s:%sSymbol(%s) = %s; want %s", lang, strings.Title(sym.Alt), c.Type, got, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for _, reg := range data.Supplemental().CurrencyData.Region {
|
||||
i := 0
|
||||
for ; regionData[i].Region().String() != reg.Iso3166; i++ {
|
||||
}
|
||||
it := Query(Historical, NonTender, Region(language.MustParseRegion(reg.Iso3166)))
|
||||
for _, cur := range reg.Currency {
|
||||
from, _ := time.Parse("2006-01-02", cur.From)
|
||||
to, _ := time.Parse("2006-01-02", cur.To)
|
||||
|
||||
it.Next()
|
||||
for j, r := range []QueryIter{&iter{regionInfo: ®ionData[i]}, it} {
|
||||
if got, _ := r.From(); from != got {
|
||||
t.Errorf("%d:%s:%s:from: got %v; want %v", j, reg.Iso3166, cur.Iso4217, got, from)
|
||||
}
|
||||
if got, _ := r.To(); to != got {
|
||||
t.Errorf("%d:%s:%s:to: got %v; want %v", j, reg.Iso3166, cur.Iso4217, got, to)
|
||||
}
|
||||
}
|
||||
i++
|
||||
}
|
||||
}
|
||||
}
|
||||
13
vendor/golang.org/x/text/doc.go
generated
vendored
Normal file
13
vendor/golang.org/x/text/doc.go
generated
vendored
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go run gen.go
|
||||
|
||||
// text is a repository of text-related packages related to internationalization
|
||||
// (i18n) and localization (l10n), such as character encodings, text
|
||||
// transformations, and locale-specific text handling.
|
||||
package text
|
||||
|
||||
// TODO: more documentation on general concepts, such as Transformers, use
|
||||
// of normalization, etc.
|
||||
249
vendor/golang.org/x/text/encoding/charmap/charmap.go
generated
vendored
Normal file
249
vendor/golang.org/x/text/encoding/charmap/charmap.go
generated
vendored
Normal file
|
|
@ -0,0 +1,249 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go run maketables.go
|
||||
|
||||
// Package charmap provides simple character encodings such as IBM Code Page 437
|
||||
// and Windows 1252.
|
||||
package charmap // import "golang.org/x/text/encoding/charmap"
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// These encodings vary only in the way clients should interpret them. Their
|
||||
// coded character set is identical and a single implementation can be shared.
|
||||
var (
|
||||
// ISO8859_6E is the ISO 8859-6E encoding.
|
||||
ISO8859_6E encoding.Encoding = &iso8859_6E
|
||||
|
||||
// ISO8859_6I is the ISO 8859-6I encoding.
|
||||
ISO8859_6I encoding.Encoding = &iso8859_6I
|
||||
|
||||
// ISO8859_8E is the ISO 8859-8E encoding.
|
||||
ISO8859_8E encoding.Encoding = &iso8859_8E
|
||||
|
||||
// ISO8859_8I is the ISO 8859-8I encoding.
|
||||
ISO8859_8I encoding.Encoding = &iso8859_8I
|
||||
|
||||
iso8859_6E = internal.Encoding{
|
||||
Encoding: ISO8859_6,
|
||||
Name: "ISO-8859-6E",
|
||||
MIB: identifier.ISO88596E,
|
||||
}
|
||||
|
||||
iso8859_6I = internal.Encoding{
|
||||
Encoding: ISO8859_6,
|
||||
Name: "ISO-8859-6I",
|
||||
MIB: identifier.ISO88596I,
|
||||
}
|
||||
|
||||
iso8859_8E = internal.Encoding{
|
||||
Encoding: ISO8859_8,
|
||||
Name: "ISO-8859-8E",
|
||||
MIB: identifier.ISO88598E,
|
||||
}
|
||||
|
||||
iso8859_8I = internal.Encoding{
|
||||
Encoding: ISO8859_8,
|
||||
Name: "ISO-8859-8I",
|
||||
MIB: identifier.ISO88598I,
|
||||
}
|
||||
)
|
||||
|
||||
// All is a list of all defined encodings in this package.
|
||||
var All []encoding.Encoding = listAll
|
||||
|
||||
// TODO: implement these encodings, in order of importance.
|
||||
// ASCII, ISO8859_1: Rather common. Close to Windows 1252.
|
||||
// ISO8859_9: Close to Windows 1254.
|
||||
|
||||
// utf8Enc holds a rune's UTF-8 encoding in data[:len].
|
||||
type utf8Enc struct {
|
||||
len uint8
|
||||
data [3]byte
|
||||
}
|
||||
|
||||
// Charmap is an 8-bit character set encoding.
|
||||
type Charmap struct {
|
||||
// name is the encoding's name.
|
||||
name string
|
||||
// mib is the encoding type of this encoder.
|
||||
mib identifier.MIB
|
||||
// asciiSuperset states whether the encoding is a superset of ASCII.
|
||||
asciiSuperset bool
|
||||
// low is the lower bound of the encoded byte for a non-ASCII rune. If
|
||||
// Charmap.asciiSuperset is true then this will be 0x80, otherwise 0x00.
|
||||
low uint8
|
||||
// replacement is the encoded replacement character.
|
||||
replacement byte
|
||||
// decode is the map from encoded byte to UTF-8.
|
||||
decode [256]utf8Enc
|
||||
// encoding is the map from runes to encoded bytes. Each entry is a
|
||||
// uint32: the high 8 bits are the encoded byte and the low 24 bits are
|
||||
// the rune. The table entries are sorted by ascending rune.
|
||||
encode [256]uint32
|
||||
}
|
||||
|
||||
// NewDecoder implements the encoding.Encoding interface.
|
||||
func (m *Charmap) NewDecoder() *encoding.Decoder {
|
||||
return &encoding.Decoder{Transformer: charmapDecoder{charmap: m}}
|
||||
}
|
||||
|
||||
// NewEncoder implements the encoding.Encoding interface.
|
||||
func (m *Charmap) NewEncoder() *encoding.Encoder {
|
||||
return &encoding.Encoder{Transformer: charmapEncoder{charmap: m}}
|
||||
}
|
||||
|
||||
// String returns the Charmap's name.
|
||||
func (m *Charmap) String() string {
|
||||
return m.name
|
||||
}
|
||||
|
||||
// ID implements an internal interface.
|
||||
func (m *Charmap) ID() (mib identifier.MIB, other string) {
|
||||
return m.mib, ""
|
||||
}
|
||||
|
||||
// charmapDecoder implements transform.Transformer by decoding to UTF-8.
|
||||
type charmapDecoder struct {
|
||||
transform.NopResetter
|
||||
charmap *Charmap
|
||||
}
|
||||
|
||||
func (m charmapDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
for i, c := range src {
|
||||
if m.charmap.asciiSuperset && c < utf8.RuneSelf {
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = c
|
||||
nDst++
|
||||
nSrc = i + 1
|
||||
continue
|
||||
}
|
||||
|
||||
decode := &m.charmap.decode[c]
|
||||
n := int(decode.len)
|
||||
if nDst+n > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
// It's 15% faster to avoid calling copy for these tiny slices.
|
||||
for j := 0; j < n; j++ {
|
||||
dst[nDst] = decode.data[j]
|
||||
nDst++
|
||||
}
|
||||
nSrc = i + 1
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
// DecodeByte returns the Charmap's rune decoding of the byte b.
|
||||
func (m *Charmap) DecodeByte(b byte) rune {
|
||||
switch x := &m.decode[b]; x.len {
|
||||
case 1:
|
||||
return rune(x.data[0])
|
||||
case 2:
|
||||
return rune(x.data[0]&0x1f)<<6 | rune(x.data[1]&0x3f)
|
||||
default:
|
||||
return rune(x.data[0]&0x0f)<<12 | rune(x.data[1]&0x3f)<<6 | rune(x.data[2]&0x3f)
|
||||
}
|
||||
}
|
||||
|
||||
// charmapEncoder implements transform.Transformer by encoding from UTF-8.
|
||||
type charmapEncoder struct {
|
||||
transform.NopResetter
|
||||
charmap *Charmap
|
||||
}
|
||||
|
||||
func (m charmapEncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for nSrc < len(src) {
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
if m.charmap.asciiSuperset {
|
||||
nSrc++
|
||||
dst[nDst] = uint8(r)
|
||||
nDst++
|
||||
continue
|
||||
}
|
||||
size = 1
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
} else {
|
||||
err = internal.RepertoireError(m.charmap.replacement)
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// Binary search in [low, high) for that rune in the m.charmap.encode table.
|
||||
for low, high := int(m.charmap.low), 0x100; ; {
|
||||
if low >= high {
|
||||
err = internal.RepertoireError(m.charmap.replacement)
|
||||
break loop
|
||||
}
|
||||
mid := (low + high) / 2
|
||||
got := m.charmap.encode[mid]
|
||||
gotRune := rune(got & (1<<24 - 1))
|
||||
if gotRune < r {
|
||||
low = mid + 1
|
||||
} else if gotRune > r {
|
||||
high = mid
|
||||
} else {
|
||||
dst[nDst] = byte(got >> 24)
|
||||
nDst++
|
||||
break
|
||||
}
|
||||
}
|
||||
nSrc += size
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
// EncodeRune returns the Charmap's byte encoding of the rune r. ok is whether
|
||||
// r is in the Charmap's repertoire. If not, b is set to the Charmap's
|
||||
// replacement byte. This is often the ASCII substitute character '\x1a'.
|
||||
func (m *Charmap) EncodeRune(r rune) (b byte, ok bool) {
|
||||
if r < utf8.RuneSelf && m.asciiSuperset {
|
||||
return byte(r), true
|
||||
}
|
||||
for low, high := int(m.low), 0x100; ; {
|
||||
if low >= high {
|
||||
return m.replacement, false
|
||||
}
|
||||
mid := (low + high) / 2
|
||||
got := m.encode[mid]
|
||||
gotRune := rune(got & (1<<24 - 1))
|
||||
if gotRune < r {
|
||||
low = mid + 1
|
||||
} else if gotRune > r {
|
||||
high = mid
|
||||
} else {
|
||||
return byte(got >> 24), true
|
||||
}
|
||||
}
|
||||
}
|
||||
258
vendor/golang.org/x/text/encoding/charmap/charmap_test.go
generated
vendored
Normal file
258
vendor/golang.org/x/text/encoding/charmap/charmap_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package charmap
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/enctest"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
func dec(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Decode", e.NewDecoder(), nil
|
||||
}
|
||||
|
||||
func encASCIISuperset(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Encode", e.NewEncoder(), internal.ErrASCIIReplacement
|
||||
}
|
||||
|
||||
func encEBCDIC(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Encode", e.NewEncoder(), internal.RepertoireError(0x3f)
|
||||
}
|
||||
|
||||
func TestNonRepertoire(t *testing.T) {
|
||||
testCases := []struct {
|
||||
init func(e encoding.Encoding) (string, transform.Transformer, error)
|
||||
e encoding.Encoding
|
||||
src, want string
|
||||
}{
|
||||
{dec, Windows1252, "\x81", "\ufffd"},
|
||||
|
||||
{encEBCDIC, CodePage037, "갂", ""},
|
||||
|
||||
{encEBCDIC, CodePage1047, "갂", ""},
|
||||
{encEBCDIC, CodePage1047, "a¤갂", "\x81\x9F"},
|
||||
|
||||
{encEBCDIC, CodePage1140, "갂", ""},
|
||||
{encEBCDIC, CodePage1140, "a€갂", "\x81\x9F"},
|
||||
|
||||
{encASCIISuperset, Windows1252, "갂", ""},
|
||||
{encASCIISuperset, Windows1252, "a갂", "a"},
|
||||
{encASCIISuperset, Windows1252, "\u00E9갂", "\xE9"},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
dir, tr, wantErr := tc.init(tc.e)
|
||||
|
||||
dst, _, err := transform.String(tr, tc.src)
|
||||
if err != wantErr {
|
||||
t.Errorf("%s %v(%q): got %v; want %v", dir, tc.e, tc.src, err, wantErr)
|
||||
}
|
||||
if got := string(dst); got != tc.want {
|
||||
t.Errorf("%s %v(%q):\ngot %q\nwant %q", dir, tc.e, tc.src, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestBasics(t *testing.T) {
|
||||
testCases := []struct {
|
||||
e encoding.Encoding
|
||||
encoded string
|
||||
utf8 string
|
||||
}{{
|
||||
e: CodePage037,
|
||||
encoded: "\xc8\x51\xba\x93\xcf",
|
||||
utf8: "Hé[lõ",
|
||||
}, {
|
||||
e: CodePage437,
|
||||
encoded: "H\x82ll\x93 \x9d\xa7\xf4\x9c\xbe",
|
||||
utf8: "Héllô ¥º⌠£╛",
|
||||
}, {
|
||||
e: CodePage866,
|
||||
encoded: "H\xf3\xd3o \x98\xfd\x9f\xdd\xa1",
|
||||
utf8: "Hє╙o Ш¤Я▌б",
|
||||
}, {
|
||||
e: CodePage1047,
|
||||
encoded: "\xc8\x54\x93\x93\x9f",
|
||||
utf8: "Hèll¤",
|
||||
}, {
|
||||
e: CodePage1140,
|
||||
encoded: "\xc8\x9f\x93\x93\xcf",
|
||||
utf8: "H€llõ",
|
||||
}, {
|
||||
e: ISO8859_2,
|
||||
encoded: "Hel\xe5\xf5",
|
||||
utf8: "Helĺő",
|
||||
}, {
|
||||
e: ISO8859_3,
|
||||
encoded: "He\xbd\xd4",
|
||||
utf8: "He½Ô",
|
||||
}, {
|
||||
e: ISO8859_4,
|
||||
encoded: "Hel\xb6\xf8",
|
||||
utf8: "Helļø",
|
||||
}, {
|
||||
e: ISO8859_5,
|
||||
encoded: "H\xd7\xc6o",
|
||||
utf8: "HзЦo",
|
||||
}, {
|
||||
e: ISO8859_6,
|
||||
encoded: "Hel\xc2\xc9",
|
||||
utf8: "Helآة",
|
||||
}, {
|
||||
e: ISO8859_7,
|
||||
encoded: "H\xeel\xebo",
|
||||
utf8: "Hξlλo",
|
||||
}, {
|
||||
e: ISO8859_8,
|
||||
encoded: "Hel\xf5\xed",
|
||||
utf8: "Helץם",
|
||||
}, {
|
||||
e: ISO8859_9,
|
||||
encoded: "\xdeayet",
|
||||
utf8: "Şayet",
|
||||
}, {
|
||||
e: ISO8859_10,
|
||||
encoded: "H\xea\xbfo",
|
||||
utf8: "Hęŋo",
|
||||
}, {
|
||||
e: ISO8859_13,
|
||||
encoded: "H\xe6l\xf9o",
|
||||
utf8: "Hęlło",
|
||||
}, {
|
||||
e: ISO8859_14,
|
||||
encoded: "He\xfe\xd0o",
|
||||
utf8: "HeŷŴo",
|
||||
}, {
|
||||
e: ISO8859_15,
|
||||
encoded: "H\xa4ll\xd8",
|
||||
utf8: "H€llØ",
|
||||
}, {
|
||||
e: ISO8859_16,
|
||||
encoded: "H\xe6ll\xbd",
|
||||
utf8: "Hællœ",
|
||||
}, {
|
||||
e: KOI8R,
|
||||
encoded: "He\x93\xad\x9c",
|
||||
utf8: "He⌠╜°",
|
||||
}, {
|
||||
e: KOI8U,
|
||||
encoded: "He\x93\xad\x9c",
|
||||
utf8: "He⌠ґ°",
|
||||
}, {
|
||||
e: Macintosh,
|
||||
encoded: "He\xdf\xd7",
|
||||
utf8: "Hefl◊",
|
||||
}, {
|
||||
e: MacintoshCyrillic,
|
||||
encoded: "He\xbe\x94",
|
||||
utf8: "HeЊФ",
|
||||
}, {
|
||||
e: Windows874,
|
||||
encoded: "He\xb7\xf0",
|
||||
utf8: "Heท๐",
|
||||
}, {
|
||||
e: Windows1250,
|
||||
encoded: "He\xe5\xe5o",
|
||||
utf8: "Heĺĺo",
|
||||
}, {
|
||||
e: Windows1251,
|
||||
encoded: "H\xball\xfe",
|
||||
utf8: "Hєllю",
|
||||
}, {
|
||||
e: Windows1252,
|
||||
encoded: "H\xe9ll\xf4 \xa5\xbA\xae\xa3\xd0",
|
||||
utf8: "Héllô ¥º®£Ð",
|
||||
}, {
|
||||
e: Windows1253,
|
||||
encoded: "H\xe5ll\xd6",
|
||||
utf8: "HεllΦ",
|
||||
}, {
|
||||
e: Windows1254,
|
||||
encoded: "\xd0ello",
|
||||
utf8: "Ğello",
|
||||
}, {
|
||||
e: Windows1255,
|
||||
encoded: "He\xd4o",
|
||||
utf8: "Heװo",
|
||||
}, {
|
||||
e: Windows1256,
|
||||
encoded: "H\xdbllo",
|
||||
utf8: "Hغllo",
|
||||
}, {
|
||||
e: Windows1257,
|
||||
encoded: "He\xeflo",
|
||||
utf8: "Heļlo",
|
||||
}, {
|
||||
e: Windows1258,
|
||||
encoded: "Hell\xf5",
|
||||
utf8: "Hellơ",
|
||||
}, {
|
||||
e: XUserDefined,
|
||||
encoded: "\x00\x40\x7f\x80\xab\xff",
|
||||
utf8: "\u0000\u0040\u007f\uf780\uf7ab\uf7ff",
|
||||
}}
|
||||
|
||||
for _, tc := range testCases {
|
||||
enctest.TestEncoding(t, tc.e, tc.encoded, tc.utf8, "", "")
|
||||
}
|
||||
}
|
||||
|
||||
var windows1255TestCases = []struct {
|
||||
b byte
|
||||
ok bool
|
||||
r rune
|
||||
}{
|
||||
{'\x00', true, '\u0000'},
|
||||
{'\x1a', true, '\u001a'},
|
||||
{'\x61', true, '\u0061'},
|
||||
{'\x7f', true, '\u007f'},
|
||||
{'\x80', true, '\u20ac'},
|
||||
{'\x95', true, '\u2022'},
|
||||
{'\xa0', true, '\u00a0'},
|
||||
{'\xc0', true, '\u05b0'},
|
||||
{'\xfc', true, '\ufffd'},
|
||||
{'\xfd', true, '\u200e'},
|
||||
{'\xfe', true, '\u200f'},
|
||||
{'\xff', true, '\ufffd'},
|
||||
{encoding.ASCIISub, false, '\u0400'},
|
||||
{encoding.ASCIISub, false, '\u2603'},
|
||||
{encoding.ASCIISub, false, '\U0001f4a9'},
|
||||
}
|
||||
|
||||
func TestDecodeByte(t *testing.T) {
|
||||
for _, tc := range windows1255TestCases {
|
||||
if !tc.ok {
|
||||
continue
|
||||
}
|
||||
|
||||
got := Windows1255.DecodeByte(tc.b)
|
||||
want := tc.r
|
||||
if got != want {
|
||||
t.Errorf("DecodeByte(%#02x): got %#08x, want %#08x", tc.b, got, want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestEncodeRune(t *testing.T) {
|
||||
for _, tc := range windows1255TestCases {
|
||||
// There can be multiple tc.b values that map to tc.r = '\ufffd'.
|
||||
if tc.r == '\ufffd' {
|
||||
continue
|
||||
}
|
||||
|
||||
gotB, gotOK := Windows1255.EncodeRune(tc.r)
|
||||
wantB, wantOK := tc.b, tc.ok
|
||||
if gotB != wantB || gotOK != wantOK {
|
||||
t.Errorf("EncodeRune(%#08x): got (%#02x, %t), want (%#02x, %t)", tc.r, gotB, gotOK, wantB, wantOK)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestFiles(t *testing.T) { enctest.TestFile(t, Windows1252) }
|
||||
|
||||
func BenchmarkEncoding(b *testing.B) { enctest.Benchmark(b, Windows1252) }
|
||||
556
vendor/golang.org/x/text/encoding/charmap/maketables.go
generated
vendored
Normal file
556
vendor/golang.org/x/text/encoding/charmap/maketables.go
generated
vendored
Normal file
|
|
@ -0,0 +1,556 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"sort"
|
||||
"strings"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/internal/gen"
|
||||
)
|
||||
|
||||
const ascii = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" +
|
||||
"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" +
|
||||
` !"#$%&'()*+,-./0123456789:;<=>?` +
|
||||
`@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` +
|
||||
"`abcdefghijklmnopqrstuvwxyz{|}~\u007f"
|
||||
|
||||
var encodings = []struct {
|
||||
name string
|
||||
mib string
|
||||
comment string
|
||||
varName string
|
||||
replacement byte
|
||||
mapping string
|
||||
}{
|
||||
{
|
||||
"IBM Code Page 037",
|
||||
"IBM037",
|
||||
"",
|
||||
"CodePage037",
|
||||
0x3f,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM037-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 437",
|
||||
"PC8CodePage437",
|
||||
"",
|
||||
"CodePage437",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM437-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 850",
|
||||
"PC850Multilingual",
|
||||
"",
|
||||
"CodePage850",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM850-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 852",
|
||||
"PCp852",
|
||||
"",
|
||||
"CodePage852",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM852-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 855",
|
||||
"IBM855",
|
||||
"",
|
||||
"CodePage855",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM855-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"Windows Code Page 858", // PC latin1 with Euro
|
||||
"IBM00858",
|
||||
"",
|
||||
"CodePage858",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/windows-858-2000.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 860",
|
||||
"IBM860",
|
||||
"",
|
||||
"CodePage860",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM860-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 862",
|
||||
"PC862LatinHebrew",
|
||||
"",
|
||||
"CodePage862",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM862-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 863",
|
||||
"IBM863",
|
||||
"",
|
||||
"CodePage863",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM863-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 865",
|
||||
"IBM865",
|
||||
"",
|
||||
"CodePage865",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM865-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 866",
|
||||
"IBM866",
|
||||
"",
|
||||
"CodePage866",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-ibm866.txt",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 1047",
|
||||
"IBM1047",
|
||||
"",
|
||||
"CodePage1047",
|
||||
0x3f,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/glibc-IBM1047-2.1.2.ucm",
|
||||
},
|
||||
{
|
||||
"IBM Code Page 1140",
|
||||
"IBM01140",
|
||||
"",
|
||||
"CodePage1140",
|
||||
0x3f,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/ibm-1140_P100-1997.ucm",
|
||||
},
|
||||
{
|
||||
"ISO 8859-1",
|
||||
"ISOLatin1",
|
||||
"",
|
||||
"ISO8859_1",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/iso-8859_1-1998.ucm",
|
||||
},
|
||||
{
|
||||
"ISO 8859-2",
|
||||
"ISOLatin2",
|
||||
"",
|
||||
"ISO8859_2",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-2.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-3",
|
||||
"ISOLatin3",
|
||||
"",
|
||||
"ISO8859_3",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-3.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-4",
|
||||
"ISOLatin4",
|
||||
"",
|
||||
"ISO8859_4",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-4.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-5",
|
||||
"ISOLatinCyrillic",
|
||||
"",
|
||||
"ISO8859_5",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-5.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-6",
|
||||
"ISOLatinArabic",
|
||||
"",
|
||||
"ISO8859_6,ISO8859_6E,ISO8859_6I",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-6.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-7",
|
||||
"ISOLatinGreek",
|
||||
"",
|
||||
"ISO8859_7",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-7.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-8",
|
||||
"ISOLatinHebrew",
|
||||
"",
|
||||
"ISO8859_8,ISO8859_8E,ISO8859_8I",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-8.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-9",
|
||||
"ISOLatin5",
|
||||
"",
|
||||
"ISO8859_9",
|
||||
encoding.ASCIISub,
|
||||
"http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/iso-8859_9-1999.ucm",
|
||||
},
|
||||
{
|
||||
"ISO 8859-10",
|
||||
"ISOLatin6",
|
||||
"",
|
||||
"ISO8859_10",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-10.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-13",
|
||||
"ISO885913",
|
||||
"",
|
||||
"ISO8859_13",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-13.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-14",
|
||||
"ISO885914",
|
||||
"",
|
||||
"ISO8859_14",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-14.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-15",
|
||||
"ISO885915",
|
||||
"",
|
||||
"ISO8859_15",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-15.txt",
|
||||
},
|
||||
{
|
||||
"ISO 8859-16",
|
||||
"ISO885916",
|
||||
"",
|
||||
"ISO8859_16",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-iso-8859-16.txt",
|
||||
},
|
||||
{
|
||||
"KOI8-R",
|
||||
"KOI8R",
|
||||
"",
|
||||
"KOI8R",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-koi8-r.txt",
|
||||
},
|
||||
{
|
||||
"KOI8-U",
|
||||
"KOI8U",
|
||||
"",
|
||||
"KOI8U",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-koi8-u.txt",
|
||||
},
|
||||
{
|
||||
"Macintosh",
|
||||
"Macintosh",
|
||||
"",
|
||||
"Macintosh",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-macintosh.txt",
|
||||
},
|
||||
{
|
||||
"Macintosh Cyrillic",
|
||||
"MacintoshCyrillic",
|
||||
"",
|
||||
"MacintoshCyrillic",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-x-mac-cyrillic.txt",
|
||||
},
|
||||
{
|
||||
"Windows 874",
|
||||
"Windows874",
|
||||
"",
|
||||
"Windows874",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-874.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1250",
|
||||
"Windows1250",
|
||||
"",
|
||||
"Windows1250",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1250.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1251",
|
||||
"Windows1251",
|
||||
"",
|
||||
"Windows1251",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1251.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1252",
|
||||
"Windows1252",
|
||||
"",
|
||||
"Windows1252",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1252.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1253",
|
||||
"Windows1253",
|
||||
"",
|
||||
"Windows1253",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1253.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1254",
|
||||
"Windows1254",
|
||||
"",
|
||||
"Windows1254",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1254.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1255",
|
||||
"Windows1255",
|
||||
"",
|
||||
"Windows1255",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1255.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1256",
|
||||
"Windows1256",
|
||||
"",
|
||||
"Windows1256",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1256.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1257",
|
||||
"Windows1257",
|
||||
"",
|
||||
"Windows1257",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1257.txt",
|
||||
},
|
||||
{
|
||||
"Windows 1258",
|
||||
"Windows1258",
|
||||
"",
|
||||
"Windows1258",
|
||||
encoding.ASCIISub,
|
||||
"http://encoding.spec.whatwg.org/index-windows-1258.txt",
|
||||
},
|
||||
{
|
||||
"X-User-Defined",
|
||||
"XUserDefined",
|
||||
"It is defined at http://encoding.spec.whatwg.org/#x-user-defined",
|
||||
"XUserDefined",
|
||||
encoding.ASCIISub,
|
||||
ascii +
|
||||
"\uf780\uf781\uf782\uf783\uf784\uf785\uf786\uf787" +
|
||||
"\uf788\uf789\uf78a\uf78b\uf78c\uf78d\uf78e\uf78f" +
|
||||
"\uf790\uf791\uf792\uf793\uf794\uf795\uf796\uf797" +
|
||||
"\uf798\uf799\uf79a\uf79b\uf79c\uf79d\uf79e\uf79f" +
|
||||
"\uf7a0\uf7a1\uf7a2\uf7a3\uf7a4\uf7a5\uf7a6\uf7a7" +
|
||||
"\uf7a8\uf7a9\uf7aa\uf7ab\uf7ac\uf7ad\uf7ae\uf7af" +
|
||||
"\uf7b0\uf7b1\uf7b2\uf7b3\uf7b4\uf7b5\uf7b6\uf7b7" +
|
||||
"\uf7b8\uf7b9\uf7ba\uf7bb\uf7bc\uf7bd\uf7be\uf7bf" +
|
||||
"\uf7c0\uf7c1\uf7c2\uf7c3\uf7c4\uf7c5\uf7c6\uf7c7" +
|
||||
"\uf7c8\uf7c9\uf7ca\uf7cb\uf7cc\uf7cd\uf7ce\uf7cf" +
|
||||
"\uf7d0\uf7d1\uf7d2\uf7d3\uf7d4\uf7d5\uf7d6\uf7d7" +
|
||||
"\uf7d8\uf7d9\uf7da\uf7db\uf7dc\uf7dd\uf7de\uf7df" +
|
||||
"\uf7e0\uf7e1\uf7e2\uf7e3\uf7e4\uf7e5\uf7e6\uf7e7" +
|
||||
"\uf7e8\uf7e9\uf7ea\uf7eb\uf7ec\uf7ed\uf7ee\uf7ef" +
|
||||
"\uf7f0\uf7f1\uf7f2\uf7f3\uf7f4\uf7f5\uf7f6\uf7f7" +
|
||||
"\uf7f8\uf7f9\uf7fa\uf7fb\uf7fc\uf7fd\uf7fe\uf7ff",
|
||||
},
|
||||
}
|
||||
|
||||
func getWHATWG(url string) string {
|
||||
res, err := http.Get(url)
|
||||
if err != nil {
|
||||
log.Fatalf("%q: Get: %v", url, err)
|
||||
}
|
||||
defer res.Body.Close()
|
||||
|
||||
mapping := make([]rune, 128)
|
||||
for i := range mapping {
|
||||
mapping[i] = '\ufffd'
|
||||
}
|
||||
|
||||
scanner := bufio.NewScanner(res.Body)
|
||||
for scanner.Scan() {
|
||||
s := strings.TrimSpace(scanner.Text())
|
||||
if s == "" || s[0] == '#' {
|
||||
continue
|
||||
}
|
||||
x, y := 0, 0
|
||||
if _, err := fmt.Sscanf(s, "%d\t0x%x", &x, &y); err != nil {
|
||||
log.Fatalf("could not parse %q", s)
|
||||
}
|
||||
if x < 0 || 128 <= x {
|
||||
log.Fatalf("code %d is out of range", x)
|
||||
}
|
||||
if 0x80 <= y && y < 0xa0 {
|
||||
// We diverge from the WHATWG spec by mapping control characters
|
||||
// in the range [0x80, 0xa0) to U+FFFD.
|
||||
continue
|
||||
}
|
||||
mapping[x] = rune(y)
|
||||
}
|
||||
return ascii + string(mapping)
|
||||
}
|
||||
|
||||
func getUCM(url string) string {
|
||||
res, err := http.Get(url)
|
||||
if err != nil {
|
||||
log.Fatalf("%q: Get: %v", url, err)
|
||||
}
|
||||
defer res.Body.Close()
|
||||
|
||||
mapping := make([]rune, 256)
|
||||
for i := range mapping {
|
||||
mapping[i] = '\ufffd'
|
||||
}
|
||||
|
||||
charsFound := 0
|
||||
scanner := bufio.NewScanner(res.Body)
|
||||
for scanner.Scan() {
|
||||
s := strings.TrimSpace(scanner.Text())
|
||||
if s == "" || s[0] == '#' {
|
||||
continue
|
||||
}
|
||||
var c byte
|
||||
var r rune
|
||||
if _, err := fmt.Sscanf(s, `<U%x> \x%x |0`, &r, &c); err != nil {
|
||||
continue
|
||||
}
|
||||
mapping[c] = r
|
||||
charsFound++
|
||||
}
|
||||
|
||||
if charsFound < 200 {
|
||||
log.Fatalf("%q: only %d characters found (wrong page format?)", url, charsFound)
|
||||
}
|
||||
|
||||
return string(mapping)
|
||||
}
|
||||
|
||||
func main() {
|
||||
mibs := map[string]bool{}
|
||||
all := []string{}
|
||||
|
||||
w := gen.NewCodeWriter()
|
||||
defer w.WriteGoFile("tables.go", "charmap")
|
||||
|
||||
printf := func(s string, a ...interface{}) { fmt.Fprintf(w, s, a...) }
|
||||
|
||||
printf("import (\n")
|
||||
printf("\t\"golang.org/x/text/encoding\"\n")
|
||||
printf("\t\"golang.org/x/text/encoding/internal/identifier\"\n")
|
||||
printf(")\n\n")
|
||||
for _, e := range encodings {
|
||||
varNames := strings.Split(e.varName, ",")
|
||||
all = append(all, varNames...)
|
||||
varName := varNames[0]
|
||||
switch {
|
||||
case strings.HasPrefix(e.mapping, "http://encoding.spec.whatwg.org/"):
|
||||
e.mapping = getWHATWG(e.mapping)
|
||||
case strings.HasPrefix(e.mapping, "http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/"):
|
||||
e.mapping = getUCM(e.mapping)
|
||||
}
|
||||
|
||||
asciiSuperset, low := strings.HasPrefix(e.mapping, ascii), 0x00
|
||||
if asciiSuperset {
|
||||
low = 0x80
|
||||
}
|
||||
lvn := 1
|
||||
if strings.HasPrefix(varName, "ISO") || strings.HasPrefix(varName, "KOI") {
|
||||
lvn = 3
|
||||
}
|
||||
lowerVarName := strings.ToLower(varName[:lvn]) + varName[lvn:]
|
||||
printf("// %s is the %s encoding.\n", varName, e.name)
|
||||
if e.comment != "" {
|
||||
printf("//\n// %s\n", e.comment)
|
||||
}
|
||||
printf("var %s *Charmap = &%s\n\nvar %s = Charmap{\nname: %q,\n",
|
||||
varName, lowerVarName, lowerVarName, e.name)
|
||||
if mibs[e.mib] {
|
||||
log.Fatalf("MIB type %q declared multiple times.", e.mib)
|
||||
}
|
||||
printf("mib: identifier.%s,\n", e.mib)
|
||||
printf("asciiSuperset: %t,\n", asciiSuperset)
|
||||
printf("low: 0x%02x,\n", low)
|
||||
printf("replacement: 0x%02x,\n", e.replacement)
|
||||
|
||||
printf("decode: [256]utf8Enc{\n")
|
||||
i, backMapping := 0, map[rune]byte{}
|
||||
for _, c := range e.mapping {
|
||||
if _, ok := backMapping[c]; !ok && c != utf8.RuneError {
|
||||
backMapping[c] = byte(i)
|
||||
}
|
||||
var buf [8]byte
|
||||
n := utf8.EncodeRune(buf[:], c)
|
||||
if n > 3 {
|
||||
panic(fmt.Sprintf("rune %q (%U) is too long", c, c))
|
||||
}
|
||||
printf("{%d,[3]byte{0x%02x,0x%02x,0x%02x}},", n, buf[0], buf[1], buf[2])
|
||||
if i%2 == 1 {
|
||||
printf("\n")
|
||||
}
|
||||
i++
|
||||
}
|
||||
printf("},\n")
|
||||
|
||||
printf("encode: [256]uint32{\n")
|
||||
encode := make([]uint32, 0, 256)
|
||||
for c, i := range backMapping {
|
||||
encode = append(encode, uint32(i)<<24|uint32(c))
|
||||
}
|
||||
sort.Sort(byRune(encode))
|
||||
for len(encode) < cap(encode) {
|
||||
encode = append(encode, encode[len(encode)-1])
|
||||
}
|
||||
for i, enc := range encode {
|
||||
printf("0x%08x,", enc)
|
||||
if i%8 == 7 {
|
||||
printf("\n")
|
||||
}
|
||||
}
|
||||
printf("},\n}\n")
|
||||
|
||||
// Add an estimate of the size of a single Charmap{} struct value, which
|
||||
// includes two 256 elem arrays of 4 bytes and some extra fields, which
|
||||
// align to 3 uint64s on 64-bit architectures.
|
||||
w.Size += 2*4*256 + 3*8
|
||||
}
|
||||
// TODO: add proper line breaking.
|
||||
printf("var listAll = []encoding.Encoding{\n%s,\n}\n\n", strings.Join(all, ",\n"))
|
||||
}
|
||||
|
||||
type byRune []uint32
|
||||
|
||||
func (b byRune) Len() int { return len(b) }
|
||||
func (b byRune) Less(i, j int) bool { return b[i]&0xffffff < b[j]&0xffffff }
|
||||
func (b byRune) Swap(i, j int) { b[i], b[j] = b[j], b[i] }
|
||||
7410
vendor/golang.org/x/text/encoding/charmap/tables.go
generated
vendored
Normal file
7410
vendor/golang.org/x/text/encoding/charmap/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
335
vendor/golang.org/x/text/encoding/encoding.go
generated
vendored
Normal file
335
vendor/golang.org/x/text/encoding/encoding.go
generated
vendored
Normal file
|
|
@ -0,0 +1,335 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// Package encoding defines an interface for character encodings, such as Shift
|
||||
// JIS and Windows 1252, that can convert to and from UTF-8.
|
||||
//
|
||||
// Encoding implementations are provided in other packages, such as
|
||||
// golang.org/x/text/encoding/charmap and
|
||||
// golang.org/x/text/encoding/japanese.
|
||||
package encoding // import "golang.org/x/text/encoding"
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"io"
|
||||
"strconv"
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// TODO:
|
||||
// - There seems to be some inconsistency in when decoders return errors
|
||||
// and when not. Also documentation seems to suggest they shouldn't return
|
||||
// errors at all (except for UTF-16).
|
||||
// - Encoders seem to rely on or at least benefit from the input being in NFC
|
||||
// normal form. Perhaps add an example how users could prepare their output.
|
||||
|
||||
// Encoding is a character set encoding that can be transformed to and from
|
||||
// UTF-8.
|
||||
type Encoding interface {
|
||||
// NewDecoder returns a Decoder.
|
||||
NewDecoder() *Decoder
|
||||
|
||||
// NewEncoder returns an Encoder.
|
||||
NewEncoder() *Encoder
|
||||
}
|
||||
|
||||
// A Decoder converts bytes to UTF-8. It implements transform.Transformer.
|
||||
//
|
||||
// Transforming source bytes that are not of that encoding will not result in an
|
||||
// error per se. Each byte that cannot be transcoded will be represented in the
|
||||
// output by the UTF-8 encoding of '\uFFFD', the replacement rune.
|
||||
type Decoder struct {
|
||||
transform.Transformer
|
||||
|
||||
// This forces external creators of Decoders to use names in struct
|
||||
// initializers, allowing for future extendibility without having to break
|
||||
// code.
|
||||
_ struct{}
|
||||
}
|
||||
|
||||
// Bytes converts the given encoded bytes to UTF-8. It returns the converted
|
||||
// bytes or nil, err if any error occurred.
|
||||
func (d *Decoder) Bytes(b []byte) ([]byte, error) {
|
||||
b, _, err := transform.Bytes(d, b)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return b, nil
|
||||
}
|
||||
|
||||
// String converts the given encoded string to UTF-8. It returns the converted
|
||||
// string or "", err if any error occurred.
|
||||
func (d *Decoder) String(s string) (string, error) {
|
||||
s, _, err := transform.String(d, s)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return s, nil
|
||||
}
|
||||
|
||||
// Reader wraps another Reader to decode its bytes.
|
||||
//
|
||||
// The Decoder may not be used for any other operation as long as the returned
|
||||
// Reader is in use.
|
||||
func (d *Decoder) Reader(r io.Reader) io.Reader {
|
||||
return transform.NewReader(r, d)
|
||||
}
|
||||
|
||||
// An Encoder converts bytes from UTF-8. It implements transform.Transformer.
|
||||
//
|
||||
// Each rune that cannot be transcoded will result in an error. In this case,
|
||||
// the transform will consume all source byte up to, not including the offending
|
||||
// rune. Transforming source bytes that are not valid UTF-8 will be replaced by
|
||||
// `\uFFFD`. To return early with an error instead, use transform.Chain to
|
||||
// preprocess the data with a UTF8Validator.
|
||||
type Encoder struct {
|
||||
transform.Transformer
|
||||
|
||||
// This forces external creators of Encoders to use names in struct
|
||||
// initializers, allowing for future extendibility without having to break
|
||||
// code.
|
||||
_ struct{}
|
||||
}
|
||||
|
||||
// Bytes converts bytes from UTF-8. It returns the converted bytes or nil, err if
|
||||
// any error occurred.
|
||||
func (e *Encoder) Bytes(b []byte) ([]byte, error) {
|
||||
b, _, err := transform.Bytes(e, b)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return b, nil
|
||||
}
|
||||
|
||||
// String converts a string from UTF-8. It returns the converted string or
|
||||
// "", err if any error occurred.
|
||||
func (e *Encoder) String(s string) (string, error) {
|
||||
s, _, err := transform.String(e, s)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return s, nil
|
||||
}
|
||||
|
||||
// Writer wraps another Writer to encode its UTF-8 output.
|
||||
//
|
||||
// The Encoder may not be used for any other operation as long as the returned
|
||||
// Writer is in use.
|
||||
func (e *Encoder) Writer(w io.Writer) io.Writer {
|
||||
return transform.NewWriter(w, e)
|
||||
}
|
||||
|
||||
// ASCIISub is the ASCII substitute character, as recommended by
|
||||
// http://unicode.org/reports/tr36/#Text_Comparison
|
||||
const ASCIISub = '\x1a'
|
||||
|
||||
// Nop is the nop encoding. Its transformed bytes are the same as the source
|
||||
// bytes; it does not replace invalid UTF-8 sequences.
|
||||
var Nop Encoding = nop{}
|
||||
|
||||
type nop struct{}
|
||||
|
||||
func (nop) NewDecoder() *Decoder {
|
||||
return &Decoder{Transformer: transform.Nop}
|
||||
}
|
||||
func (nop) NewEncoder() *Encoder {
|
||||
return &Encoder{Transformer: transform.Nop}
|
||||
}
|
||||
|
||||
// Replacement is the replacement encoding. Decoding from the replacement
|
||||
// encoding yields a single '\uFFFD' replacement rune. Encoding from UTF-8 to
|
||||
// the replacement encoding yields the same as the source bytes except that
|
||||
// invalid UTF-8 is converted to '\uFFFD'.
|
||||
//
|
||||
// It is defined at http://encoding.spec.whatwg.org/#replacement
|
||||
var Replacement Encoding = replacement{}
|
||||
|
||||
type replacement struct{}
|
||||
|
||||
func (replacement) NewDecoder() *Decoder {
|
||||
return &Decoder{Transformer: replacementDecoder{}}
|
||||
}
|
||||
|
||||
func (replacement) NewEncoder() *Encoder {
|
||||
return &Encoder{Transformer: replacementEncoder{}}
|
||||
}
|
||||
|
||||
func (replacement) ID() (mib identifier.MIB, other string) {
|
||||
return identifier.Replacement, ""
|
||||
}
|
||||
|
||||
type replacementDecoder struct{ transform.NopResetter }
|
||||
|
||||
func (replacementDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
if len(dst) < 3 {
|
||||
return 0, 0, transform.ErrShortDst
|
||||
}
|
||||
if atEOF {
|
||||
const fffd = "\ufffd"
|
||||
dst[0] = fffd[0]
|
||||
dst[1] = fffd[1]
|
||||
dst[2] = fffd[2]
|
||||
nDst = 3
|
||||
}
|
||||
return nDst, len(src), nil
|
||||
}
|
||||
|
||||
type replacementEncoder struct{ transform.NopResetter }
|
||||
|
||||
func (replacementEncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break
|
||||
}
|
||||
r = '\ufffd'
|
||||
}
|
||||
}
|
||||
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
// HTMLEscapeUnsupported wraps encoders to replace source runes outside the
|
||||
// repertoire of the destination encoding with HTML escape sequences.
|
||||
//
|
||||
// This wrapper exists to comply to URL and HTML forms requiring a
|
||||
// non-terminating legacy encoder. The produced sequences may lead to data
|
||||
// loss as they are indistinguishable from legitimate input. To avoid this
|
||||
// issue, use UTF-8 encodings whenever possible.
|
||||
func HTMLEscapeUnsupported(e *Encoder) *Encoder {
|
||||
return &Encoder{Transformer: &errorHandler{e, errorToHTML}}
|
||||
}
|
||||
|
||||
// ReplaceUnsupported wraps encoders to replace source runes outside the
|
||||
// repertoire of the destination encoding with an encoding-specific
|
||||
// replacement.
|
||||
//
|
||||
// This wrapper is only provided for backwards compatibility and legacy
|
||||
// handling. Its use is strongly discouraged. Use UTF-8 whenever possible.
|
||||
func ReplaceUnsupported(e *Encoder) *Encoder {
|
||||
return &Encoder{Transformer: &errorHandler{e, errorToReplacement}}
|
||||
}
|
||||
|
||||
type errorHandler struct {
|
||||
*Encoder
|
||||
handler func(dst []byte, r rune, err repertoireError) (n int, ok bool)
|
||||
}
|
||||
|
||||
// TODO: consider making this error public in some form.
|
||||
type repertoireError interface {
|
||||
Replacement() byte
|
||||
}
|
||||
|
||||
func (h errorHandler) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
nDst, nSrc, err = h.Transformer.Transform(dst, src, atEOF)
|
||||
for err != nil {
|
||||
rerr, ok := err.(repertoireError)
|
||||
if !ok {
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
r, sz := utf8.DecodeRune(src[nSrc:])
|
||||
n, ok := h.handler(dst[nDst:], r, rerr)
|
||||
if !ok {
|
||||
return nDst, nSrc, transform.ErrShortDst
|
||||
}
|
||||
err = nil
|
||||
nDst += n
|
||||
if nSrc += sz; nSrc < len(src) {
|
||||
var dn, sn int
|
||||
dn, sn, err = h.Transformer.Transform(dst[nDst:], src[nSrc:], atEOF)
|
||||
nDst += dn
|
||||
nSrc += sn
|
||||
}
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
func errorToHTML(dst []byte, r rune, err repertoireError) (n int, ok bool) {
|
||||
buf := [8]byte{}
|
||||
b := strconv.AppendUint(buf[:0], uint64(r), 10)
|
||||
if n = len(b) + len("&#;"); n >= len(dst) {
|
||||
return 0, false
|
||||
}
|
||||
dst[0] = '&'
|
||||
dst[1] = '#'
|
||||
dst[copy(dst[2:], b)+2] = ';'
|
||||
return n, true
|
||||
}
|
||||
|
||||
func errorToReplacement(dst []byte, r rune, err repertoireError) (n int, ok bool) {
|
||||
if len(dst) == 0 {
|
||||
return 0, false
|
||||
}
|
||||
dst[0] = err.Replacement()
|
||||
return 1, true
|
||||
}
|
||||
|
||||
// ErrInvalidUTF8 means that a transformer encountered invalid UTF-8.
|
||||
var ErrInvalidUTF8 = errors.New("encoding: invalid UTF-8")
|
||||
|
||||
// UTF8Validator is a transformer that returns ErrInvalidUTF8 on the first
|
||||
// input byte that is not valid UTF-8.
|
||||
var UTF8Validator transform.Transformer = utf8Validator{}
|
||||
|
||||
type utf8Validator struct{ transform.NopResetter }
|
||||
|
||||
func (utf8Validator) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
n := len(src)
|
||||
if n > len(dst) {
|
||||
n = len(dst)
|
||||
}
|
||||
for i := 0; i < n; {
|
||||
if c := src[i]; c < utf8.RuneSelf {
|
||||
dst[i] = c
|
||||
i++
|
||||
continue
|
||||
}
|
||||
_, size := utf8.DecodeRune(src[i:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
err = ErrInvalidUTF8
|
||||
if !atEOF && !utf8.FullRune(src[i:]) {
|
||||
err = transform.ErrShortSrc
|
||||
}
|
||||
return i, i, err
|
||||
}
|
||||
if i+size > len(dst) {
|
||||
return i, i, transform.ErrShortDst
|
||||
}
|
||||
for ; size > 0; size-- {
|
||||
dst[i] = src[i]
|
||||
i++
|
||||
}
|
||||
}
|
||||
if len(src) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
}
|
||||
return n, n, err
|
||||
}
|
||||
290
vendor/golang.org/x/text/encoding/encoding_test.go
generated
vendored
Normal file
290
vendor/golang.org/x/text/encoding/encoding_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,290 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package encoding_test
|
||||
|
||||
import (
|
||||
"io/ioutil"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
func TestEncodeInvalidUTF8(t *testing.T) {
|
||||
inputs := []string{
|
||||
"hello.",
|
||||
"wo\ufffdld.",
|
||||
"ABC\xff\x80\x80", // Invalid UTF-8.
|
||||
"\x80\x80\x80\x80\x80",
|
||||
"\x80\x80D\x80\x80", // Valid rune at "D".
|
||||
"E\xed\xa0\x80\xed\xbf\xbfF", // Two invalid UTF-8 runes (surrogates).
|
||||
"G",
|
||||
"H\xe2\x82", // U+20AC in UTF-8 is "\xe2\x82\xac", which we split over two
|
||||
"\xacI\xe2\x82", // input lines. It maps to 0x80 in the Windows-1252 encoding.
|
||||
}
|
||||
// Each invalid source byte becomes '\x1a'.
|
||||
want := strings.Replace("hello.wo?ld.ABC??????????D??E??????FGH\x80I??", "?", "\x1a", -1)
|
||||
|
||||
transformer := encoding.ReplaceUnsupported(charmap.Windows1252.NewEncoder())
|
||||
gotBuf := make([]byte, 0, 1024)
|
||||
src := make([]byte, 0, 1024)
|
||||
for i, input := range inputs {
|
||||
dst := make([]byte, 1024)
|
||||
src = append(src, input...)
|
||||
atEOF := i == len(inputs)-1
|
||||
nDst, nSrc, err := transformer.Transform(dst, src, atEOF)
|
||||
gotBuf = append(gotBuf, dst[:nDst]...)
|
||||
src = src[nSrc:]
|
||||
if err != nil && err != transform.ErrShortSrc {
|
||||
t.Fatalf("i=%d: %v", i, err)
|
||||
}
|
||||
if atEOF && err != nil {
|
||||
t.Fatalf("i=%d: atEOF: %v", i, err)
|
||||
}
|
||||
}
|
||||
if got := string(gotBuf); got != want {
|
||||
t.Fatalf("\ngot %+q\nwant %+q", got, want)
|
||||
}
|
||||
}
|
||||
|
||||
func TestReplacement(t *testing.T) {
|
||||
for _, direction := range []string{"Decode", "Encode"} {
|
||||
enc, want := (transform.Transformer)(nil), ""
|
||||
if direction == "Decode" {
|
||||
enc = encoding.Replacement.NewDecoder()
|
||||
want = "\ufffd"
|
||||
} else {
|
||||
enc = encoding.Replacement.NewEncoder()
|
||||
want = "AB\x00CD\ufffdYZ"
|
||||
}
|
||||
sr := strings.NewReader("AB\x00CD\x80YZ")
|
||||
g, err := ioutil.ReadAll(transform.NewReader(sr, enc))
|
||||
if err != nil {
|
||||
t.Errorf("%s: ReadAll: %v", direction, err)
|
||||
continue
|
||||
}
|
||||
if got := string(g); got != want {
|
||||
t.Errorf("%s:\ngot %q\nwant %q", direction, got, want)
|
||||
continue
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestUTF8Validator(t *testing.T) {
|
||||
testCases := []struct {
|
||||
desc string
|
||||
dstSize int
|
||||
src string
|
||||
atEOF bool
|
||||
want string
|
||||
wantErr error
|
||||
}{
|
||||
{
|
||||
"empty input",
|
||||
100,
|
||||
"",
|
||||
false,
|
||||
"",
|
||||
nil,
|
||||
},
|
||||
{
|
||||
"valid 1-byte 1-rune input",
|
||||
100,
|
||||
"a",
|
||||
false,
|
||||
"a",
|
||||
nil,
|
||||
},
|
||||
{
|
||||
"valid 3-byte 1-rune input",
|
||||
100,
|
||||
"\u1234",
|
||||
false,
|
||||
"\u1234",
|
||||
nil,
|
||||
},
|
||||
{
|
||||
"valid 5-byte 3-rune input",
|
||||
100,
|
||||
"a\u0100\u0101",
|
||||
false,
|
||||
"a\u0100\u0101",
|
||||
nil,
|
||||
},
|
||||
{
|
||||
"perfectly sized dst (non-ASCII)",
|
||||
5,
|
||||
"a\u0100\u0101",
|
||||
false,
|
||||
"a\u0100\u0101",
|
||||
nil,
|
||||
},
|
||||
{
|
||||
"short dst (non-ASCII)",
|
||||
4,
|
||||
"a\u0100\u0101",
|
||||
false,
|
||||
"a\u0100",
|
||||
transform.ErrShortDst,
|
||||
},
|
||||
{
|
||||
"perfectly sized dst (ASCII)",
|
||||
5,
|
||||
"abcde",
|
||||
false,
|
||||
"abcde",
|
||||
nil,
|
||||
},
|
||||
{
|
||||
"short dst (ASCII)",
|
||||
4,
|
||||
"abcde",
|
||||
false,
|
||||
"abcd",
|
||||
transform.ErrShortDst,
|
||||
},
|
||||
{
|
||||
"partial input (!EOF)",
|
||||
100,
|
||||
"a\u0100\xf1",
|
||||
false,
|
||||
"a\u0100",
|
||||
transform.ErrShortSrc,
|
||||
},
|
||||
{
|
||||
"invalid input (EOF)",
|
||||
100,
|
||||
"a\u0100\xf1",
|
||||
true,
|
||||
"a\u0100",
|
||||
encoding.ErrInvalidUTF8,
|
||||
},
|
||||
{
|
||||
"invalid input (!EOF)",
|
||||
100,
|
||||
"a\u0100\x80",
|
||||
false,
|
||||
"a\u0100",
|
||||
encoding.ErrInvalidUTF8,
|
||||
},
|
||||
{
|
||||
"invalid input (above U+10FFFF)",
|
||||
100,
|
||||
"a\u0100\xf7\xbf\xbf\xbf",
|
||||
false,
|
||||
"a\u0100",
|
||||
encoding.ErrInvalidUTF8,
|
||||
},
|
||||
{
|
||||
"invalid input (surrogate half)",
|
||||
100,
|
||||
"a\u0100\xed\xa0\x80",
|
||||
false,
|
||||
"a\u0100",
|
||||
encoding.ErrInvalidUTF8,
|
||||
},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
dst := make([]byte, tc.dstSize)
|
||||
nDst, nSrc, err := encoding.UTF8Validator.Transform(dst, []byte(tc.src), tc.atEOF)
|
||||
if nDst < 0 || len(dst) < nDst {
|
||||
t.Errorf("%s: nDst=%d out of range", tc.desc, nDst)
|
||||
continue
|
||||
}
|
||||
got := string(dst[:nDst])
|
||||
if got != tc.want || nSrc != len(tc.want) || err != tc.wantErr {
|
||||
t.Errorf("%s:\ngot %+q, %d, %v\nwant %+q, %d, %v",
|
||||
tc.desc, got, nSrc, err, tc.want, len(tc.want), tc.wantErr)
|
||||
continue
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestErrorHandler(t *testing.T) {
|
||||
testCases := []struct {
|
||||
desc string
|
||||
handler func(*encoding.Encoder) *encoding.Encoder
|
||||
sizeDst int
|
||||
src, want string
|
||||
nSrc int
|
||||
err error
|
||||
}{
|
||||
{
|
||||
desc: "one rune replacement",
|
||||
handler: encoding.ReplaceUnsupported,
|
||||
sizeDst: 100,
|
||||
src: "\uAC00",
|
||||
want: "\x1a",
|
||||
nSrc: 3,
|
||||
},
|
||||
{
|
||||
desc: "mid-stream rune replacement",
|
||||
handler: encoding.ReplaceUnsupported,
|
||||
sizeDst: 100,
|
||||
src: "a\uAC00bcd\u00e9",
|
||||
want: "a\x1abcd\xe9",
|
||||
nSrc: 9,
|
||||
},
|
||||
{
|
||||
desc: "at end rune replacement",
|
||||
handler: encoding.ReplaceUnsupported,
|
||||
sizeDst: 10,
|
||||
src: "\u00e9\uAC00",
|
||||
want: "\xe9\x1a",
|
||||
nSrc: 5,
|
||||
},
|
||||
{
|
||||
desc: "short buffer replacement",
|
||||
handler: encoding.ReplaceUnsupported,
|
||||
sizeDst: 1,
|
||||
src: "\u00e9\uAC00",
|
||||
want: "\xe9",
|
||||
nSrc: 2,
|
||||
err: transform.ErrShortDst,
|
||||
},
|
||||
{
|
||||
desc: "one rune html escape",
|
||||
handler: encoding.HTMLEscapeUnsupported,
|
||||
sizeDst: 100,
|
||||
src: "\uAC00",
|
||||
want: "가",
|
||||
nSrc: 3,
|
||||
},
|
||||
{
|
||||
desc: "mid-stream html escape",
|
||||
handler: encoding.HTMLEscapeUnsupported,
|
||||
sizeDst: 100,
|
||||
src: "\u00e9\uAC00dcba",
|
||||
want: "\xe9가dcba",
|
||||
nSrc: 9,
|
||||
},
|
||||
{
|
||||
desc: "short buffer html escape",
|
||||
handler: encoding.HTMLEscapeUnsupported,
|
||||
sizeDst: 9,
|
||||
src: "ab\uAC01",
|
||||
want: "ab",
|
||||
nSrc: 2,
|
||||
err: transform.ErrShortDst,
|
||||
},
|
||||
}
|
||||
for i, tc := range testCases {
|
||||
tr := tc.handler(charmap.Windows1250.NewEncoder())
|
||||
b := make([]byte, tc.sizeDst)
|
||||
nDst, nSrc, err := tr.Transform(b, []byte(tc.src), true)
|
||||
if err != tc.err {
|
||||
t.Errorf("%d:%s: error was %v; want %v", i, tc.desc, err, tc.err)
|
||||
}
|
||||
if got := string(b[:nDst]); got != tc.want {
|
||||
t.Errorf("%d:%s: result was %q: want %q", i, tc.desc, got, tc.want)
|
||||
}
|
||||
if nSrc != tc.nSrc {
|
||||
t.Errorf("%d:%s: nSrc was %d; want %d", i, tc.desc, nSrc, tc.nSrc)
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
42
vendor/golang.org/x/text/encoding/example_test.go
generated
vendored
Normal file
42
vendor/golang.org/x/text/encoding/example_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package encoding_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/encoding/unicode"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
func ExampleDecodeWindows1252() {
|
||||
sr := strings.NewReader("Gar\xe7on !")
|
||||
tr := charmap.Windows1252.NewDecoder().Reader(sr)
|
||||
io.Copy(os.Stdout, tr)
|
||||
// Output: Garçon !
|
||||
}
|
||||
|
||||
func ExampleUTF8Validator() {
|
||||
for i := 0; i < 2; i++ {
|
||||
var transformer transform.Transformer
|
||||
transformer = unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM).NewEncoder()
|
||||
if i == 1 {
|
||||
transformer = transform.Chain(encoding.UTF8Validator, transformer)
|
||||
}
|
||||
dst := make([]byte, 256)
|
||||
src := []byte("abc\xffxyz") // src is invalid UTF-8.
|
||||
nDst, nSrc, err := transformer.Transform(dst, src, true)
|
||||
fmt.Printf("i=%d: produced %q, consumed %q, error %v\n",
|
||||
i, dst[:nDst], src[:nSrc], err)
|
||||
}
|
||||
// Output:
|
||||
// i=0: produced "\x00a\x00b\x00c\xff\xfd\x00x\x00y\x00z", consumed "abc\xffxyz", error <nil>
|
||||
// i=1: produced "\x00a\x00b\x00c", consumed "abc", error encoding: invalid UTF-8
|
||||
}
|
||||
173
vendor/golang.org/x/text/encoding/htmlindex/gen.go
generated
vendored
Normal file
173
vendor/golang.org/x/text/encoding/htmlindex/gen.go
generated
vendored
Normal file
|
|
@ -0,0 +1,173 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/internal/gen"
|
||||
)
|
||||
|
||||
type group struct {
|
||||
Encodings []struct {
|
||||
Labels []string
|
||||
Name string
|
||||
}
|
||||
}
|
||||
|
||||
func main() {
|
||||
gen.Init()
|
||||
|
||||
r := gen.Open("https://encoding.spec.whatwg.org", "whatwg", "encodings.json")
|
||||
var groups []group
|
||||
if err := json.NewDecoder(r).Decode(&groups); err != nil {
|
||||
log.Fatalf("Error reading encodings.json: %v", err)
|
||||
}
|
||||
|
||||
w := &bytes.Buffer{}
|
||||
fmt.Fprintln(w, "type htmlEncoding byte")
|
||||
fmt.Fprintln(w, "const (")
|
||||
for i, g := range groups {
|
||||
for _, e := range g.Encodings {
|
||||
key := strings.ToLower(e.Name)
|
||||
name := consts[key]
|
||||
if name == "" {
|
||||
log.Fatalf("No const defined for %s.", key)
|
||||
}
|
||||
if i == 0 {
|
||||
fmt.Fprintf(w, "%s htmlEncoding = iota\n", name)
|
||||
} else {
|
||||
fmt.Fprintf(w, "%s\n", name)
|
||||
}
|
||||
}
|
||||
}
|
||||
fmt.Fprintln(w, "numEncodings")
|
||||
fmt.Fprint(w, ")\n\n")
|
||||
|
||||
fmt.Fprintln(w, "var canonical = [numEncodings]string{")
|
||||
for _, g := range groups {
|
||||
for _, e := range g.Encodings {
|
||||
fmt.Fprintf(w, "%q,\n", strings.ToLower(e.Name))
|
||||
}
|
||||
}
|
||||
fmt.Fprint(w, "}\n\n")
|
||||
|
||||
fmt.Fprintln(w, "var nameMap = map[string]htmlEncoding{")
|
||||
for _, g := range groups {
|
||||
for _, e := range g.Encodings {
|
||||
for _, l := range e.Labels {
|
||||
key := strings.ToLower(e.Name)
|
||||
name := consts[key]
|
||||
fmt.Fprintf(w, "%q: %s,\n", l, name)
|
||||
}
|
||||
}
|
||||
}
|
||||
fmt.Fprint(w, "}\n\n")
|
||||
|
||||
var tags []string
|
||||
fmt.Fprintln(w, "var localeMap = []htmlEncoding{")
|
||||
for _, loc := range locales {
|
||||
tags = append(tags, loc.tag)
|
||||
fmt.Fprintf(w, "%s, // %s \n", consts[loc.name], loc.tag)
|
||||
}
|
||||
fmt.Fprint(w, "}\n\n")
|
||||
|
||||
fmt.Fprintf(w, "const locales = %q\n", strings.Join(tags, " "))
|
||||
|
||||
gen.WriteGoFile("tables.go", "htmlindex", w.Bytes())
|
||||
}
|
||||
|
||||
// consts maps canonical encoding name to internal constant.
|
||||
var consts = map[string]string{
|
||||
"utf-8": "utf8",
|
||||
"ibm866": "ibm866",
|
||||
"iso-8859-2": "iso8859_2",
|
||||
"iso-8859-3": "iso8859_3",
|
||||
"iso-8859-4": "iso8859_4",
|
||||
"iso-8859-5": "iso8859_5",
|
||||
"iso-8859-6": "iso8859_6",
|
||||
"iso-8859-7": "iso8859_7",
|
||||
"iso-8859-8": "iso8859_8",
|
||||
"iso-8859-8-i": "iso8859_8I",
|
||||
"iso-8859-10": "iso8859_10",
|
||||
"iso-8859-13": "iso8859_13",
|
||||
"iso-8859-14": "iso8859_14",
|
||||
"iso-8859-15": "iso8859_15",
|
||||
"iso-8859-16": "iso8859_16",
|
||||
"koi8-r": "koi8r",
|
||||
"koi8-u": "koi8u",
|
||||
"macintosh": "macintosh",
|
||||
"windows-874": "windows874",
|
||||
"windows-1250": "windows1250",
|
||||
"windows-1251": "windows1251",
|
||||
"windows-1252": "windows1252",
|
||||
"windows-1253": "windows1253",
|
||||
"windows-1254": "windows1254",
|
||||
"windows-1255": "windows1255",
|
||||
"windows-1256": "windows1256",
|
||||
"windows-1257": "windows1257",
|
||||
"windows-1258": "windows1258",
|
||||
"x-mac-cyrillic": "macintoshCyrillic",
|
||||
"gbk": "gbk",
|
||||
"gb18030": "gb18030",
|
||||
// "hz-gb-2312": "hzgb2312", // Was removed from WhatWG
|
||||
"big5": "big5",
|
||||
"euc-jp": "eucjp",
|
||||
"iso-2022-jp": "iso2022jp",
|
||||
"shift_jis": "shiftJIS",
|
||||
"euc-kr": "euckr",
|
||||
"replacement": "replacement",
|
||||
"utf-16be": "utf16be",
|
||||
"utf-16le": "utf16le",
|
||||
"x-user-defined": "xUserDefined",
|
||||
}
|
||||
|
||||
// locales is taken from
|
||||
// https://html.spec.whatwg.org/multipage/syntax.html#encoding-sniffing-algorithm.
|
||||
var locales = []struct{ tag, name string }{
|
||||
// The default value. Explicitly state latin to benefit from the exact
|
||||
// script option, while still making 1252 the default encoding for languages
|
||||
// written in Latin script.
|
||||
{"und_Latn", "windows-1252"},
|
||||
{"ar", "windows-1256"},
|
||||
{"ba", "windows-1251"},
|
||||
{"be", "windows-1251"},
|
||||
{"bg", "windows-1251"},
|
||||
{"cs", "windows-1250"},
|
||||
{"el", "iso-8859-7"},
|
||||
{"et", "windows-1257"},
|
||||
{"fa", "windows-1256"},
|
||||
{"he", "windows-1255"},
|
||||
{"hr", "windows-1250"},
|
||||
{"hu", "iso-8859-2"},
|
||||
{"ja", "shift_jis"},
|
||||
{"kk", "windows-1251"},
|
||||
{"ko", "euc-kr"},
|
||||
{"ku", "windows-1254"},
|
||||
{"ky", "windows-1251"},
|
||||
{"lt", "windows-1257"},
|
||||
{"lv", "windows-1257"},
|
||||
{"mk", "windows-1251"},
|
||||
{"pl", "iso-8859-2"},
|
||||
{"ru", "windows-1251"},
|
||||
{"sah", "windows-1251"},
|
||||
{"sk", "windows-1250"},
|
||||
{"sl", "iso-8859-2"},
|
||||
{"sr", "windows-1251"},
|
||||
{"tg", "windows-1251"},
|
||||
{"th", "windows-874"},
|
||||
{"tr", "windows-1254"},
|
||||
{"tt", "windows-1251"},
|
||||
{"uk", "windows-1251"},
|
||||
{"vi", "windows-1258"},
|
||||
{"zh-hans", "gb18030"},
|
||||
{"zh-hant", "big5"},
|
||||
}
|
||||
86
vendor/golang.org/x/text/encoding/htmlindex/htmlindex.go
generated
vendored
Normal file
86
vendor/golang.org/x/text/encoding/htmlindex/htmlindex.go
generated
vendored
Normal file
|
|
@ -0,0 +1,86 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go run gen.go
|
||||
|
||||
// Package htmlindex maps character set encoding names to Encodings as
|
||||
// recommended by the W3C for use in HTML 5. See http://www.w3.org/TR/encoding.
|
||||
package htmlindex
|
||||
|
||||
// TODO: perhaps have a "bare" version of the index (used by this package) that
|
||||
// is not pre-loaded with all encodings. Global variables in encodings prevent
|
||||
// the linker from being able to purge unneeded tables. This means that
|
||||
// referencing all encodings, as this package does for the default index, links
|
||||
// in all encodings unconditionally.
|
||||
//
|
||||
// This issue can be solved by either solving the linking issue (see
|
||||
// https://github.com/golang/go/issues/6330) or refactoring the encoding tables
|
||||
// (e.g. moving the tables to internal packages that do not use global
|
||||
// variables).
|
||||
|
||||
// TODO: allow canonicalizing names
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"strings"
|
||||
"sync"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
var (
|
||||
errInvalidName = errors.New("htmlindex: invalid encoding name")
|
||||
errUnknown = errors.New("htmlindex: unknown Encoding")
|
||||
errUnsupported = errors.New("htmlindex: this encoding is not supported")
|
||||
)
|
||||
|
||||
var (
|
||||
matcherOnce sync.Once
|
||||
matcher language.Matcher
|
||||
)
|
||||
|
||||
// LanguageDefault returns the canonical name of the default encoding for a
|
||||
// given language.
|
||||
func LanguageDefault(tag language.Tag) string {
|
||||
matcherOnce.Do(func() {
|
||||
tags := []language.Tag{}
|
||||
for _, t := range strings.Split(locales, " ") {
|
||||
tags = append(tags, language.MustParse(t))
|
||||
}
|
||||
matcher = language.NewMatcher(tags, language.PreferSameScript(true))
|
||||
})
|
||||
_, i, _ := matcher.Match(tag)
|
||||
return canonical[localeMap[i]] // Default is Windows-1252.
|
||||
}
|
||||
|
||||
// Get returns an Encoding for one of the names listed in
|
||||
// http://www.w3.org/TR/encoding using the Default Index. Matching is case-
|
||||
// insensitive.
|
||||
func Get(name string) (encoding.Encoding, error) {
|
||||
x, ok := nameMap[strings.ToLower(strings.TrimSpace(name))]
|
||||
if !ok {
|
||||
return nil, errInvalidName
|
||||
}
|
||||
return encodings[x], nil
|
||||
}
|
||||
|
||||
// Name reports the canonical name of the given Encoding. It will return
|
||||
// an error if e is not associated with a supported encoding scheme.
|
||||
func Name(e encoding.Encoding) (string, error) {
|
||||
id, ok := e.(identifier.Interface)
|
||||
if !ok {
|
||||
return "", errUnknown
|
||||
}
|
||||
mib, _ := id.ID()
|
||||
if mib == 0 {
|
||||
return "", errUnknown
|
||||
}
|
||||
v, ok := mibMap[mib]
|
||||
if !ok {
|
||||
return "", errUnsupported
|
||||
}
|
||||
return canonical[v], nil
|
||||
}
|
||||
144
vendor/golang.org/x/text/encoding/htmlindex/htmlindex_test.go
generated
vendored
Normal file
144
vendor/golang.org/x/text/encoding/htmlindex/htmlindex_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,144 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package htmlindex
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/encoding/unicode"
|
||||
"golang.org/x/text/language"
|
||||
)
|
||||
|
||||
func TestGet(t *testing.T) {
|
||||
for i, tc := range []struct {
|
||||
name string
|
||||
canonical string
|
||||
err error
|
||||
}{
|
||||
{"utf-8", "utf-8", nil},
|
||||
{" utf-8 ", "utf-8", nil},
|
||||
{" l5 ", "windows-1254", nil},
|
||||
{"latin5 ", "windows-1254", nil},
|
||||
{"latin 5", "", errInvalidName},
|
||||
{"latin-5", "", errInvalidName},
|
||||
} {
|
||||
enc, err := Get(tc.name)
|
||||
if err != tc.err {
|
||||
t.Errorf("%d: error was %v; want %v", i, err, tc.err)
|
||||
}
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if got, err := Name(enc); got != tc.canonical {
|
||||
t.Errorf("%d: Name(Get(%q)) = %q; want %q (%v)", i, tc.name, got, tc.canonical, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestTables(t *testing.T) {
|
||||
for name, index := range nameMap {
|
||||
got, err := Get(name)
|
||||
if err != nil {
|
||||
t.Errorf("%s:err: expected non-nil error", name)
|
||||
}
|
||||
if want := encodings[index]; got != want {
|
||||
t.Errorf("%s:encoding: got %v; want %v", name, got, want)
|
||||
}
|
||||
mib, _ := got.(identifier.Interface).ID()
|
||||
if mibMap[mib] != index {
|
||||
t.Errorf("%s:mibMab: got %d; want %d", name, mibMap[mib], index)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestName(t *testing.T) {
|
||||
for i, tc := range []struct {
|
||||
desc string
|
||||
enc encoding.Encoding
|
||||
name string
|
||||
err error
|
||||
}{{
|
||||
"defined encoding",
|
||||
charmap.ISO8859_2,
|
||||
"iso-8859-2",
|
||||
nil,
|
||||
}, {
|
||||
"defined Unicode encoding",
|
||||
unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM),
|
||||
"utf-16be",
|
||||
nil,
|
||||
}, {
|
||||
"undefined Unicode encoding in HTML standard",
|
||||
unicode.UTF16(unicode.BigEndian, unicode.UseBOM),
|
||||
"",
|
||||
errUnsupported,
|
||||
}, {
|
||||
"undefined other encoding in HTML standard",
|
||||
charmap.CodePage437,
|
||||
"",
|
||||
errUnsupported,
|
||||
}, {
|
||||
"unknown encoding",
|
||||
encoding.Nop,
|
||||
"",
|
||||
errUnknown,
|
||||
}} {
|
||||
name, err := Name(tc.enc)
|
||||
if name != tc.name || err != tc.err {
|
||||
t.Errorf("%d:%s: got %q, %v; want %q, %v", i, tc.desc, name, err, tc.name, tc.err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestLanguageDefault(t *testing.T) {
|
||||
for _, tc := range []struct{ tag, want string }{
|
||||
{"und", "windows-1252"}, // The default value.
|
||||
{"ar", "windows-1256"},
|
||||
{"ba", "windows-1251"},
|
||||
{"be", "windows-1251"},
|
||||
{"bg", "windows-1251"},
|
||||
{"cs", "windows-1250"},
|
||||
{"el", "iso-8859-7"},
|
||||
{"et", "windows-1257"},
|
||||
{"fa", "windows-1256"},
|
||||
{"he", "windows-1255"},
|
||||
{"hr", "windows-1250"},
|
||||
{"hu", "iso-8859-2"},
|
||||
{"ja", "shift_jis"},
|
||||
{"kk", "windows-1251"},
|
||||
{"ko", "euc-kr"},
|
||||
{"ku", "windows-1254"},
|
||||
{"ky", "windows-1251"},
|
||||
{"lt", "windows-1257"},
|
||||
{"lv", "windows-1257"},
|
||||
{"mk", "windows-1251"},
|
||||
{"pl", "iso-8859-2"},
|
||||
{"ru", "windows-1251"},
|
||||
{"sah", "windows-1251"},
|
||||
{"sk", "windows-1250"},
|
||||
{"sl", "iso-8859-2"},
|
||||
{"sr", "windows-1251"},
|
||||
{"tg", "windows-1251"},
|
||||
{"th", "windows-874"},
|
||||
{"tr", "windows-1254"},
|
||||
{"tt", "windows-1251"},
|
||||
{"uk", "windows-1251"},
|
||||
{"vi", "windows-1258"},
|
||||
{"zh-hans", "gb18030"},
|
||||
{"zh-hant", "big5"},
|
||||
// Variants and close approximates of the above.
|
||||
{"ar_EG", "windows-1256"},
|
||||
{"bs", "windows-1250"}, // Bosnian Latin maps to Croatian.
|
||||
// Use default fallback in case of miss.
|
||||
{"nl", "windows-1252"},
|
||||
} {
|
||||
if got := LanguageDefault(language.MustParse(tc.tag)); got != tc.want {
|
||||
t.Errorf("LanguageDefault(%s) = %s; want %s", tc.tag, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
105
vendor/golang.org/x/text/encoding/htmlindex/map.go
generated
vendored
Normal file
105
vendor/golang.org/x/text/encoding/htmlindex/map.go
generated
vendored
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package htmlindex
|
||||
|
||||
import (
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/encoding/japanese"
|
||||
"golang.org/x/text/encoding/korean"
|
||||
"golang.org/x/text/encoding/simplifiedchinese"
|
||||
"golang.org/x/text/encoding/traditionalchinese"
|
||||
"golang.org/x/text/encoding/unicode"
|
||||
)
|
||||
|
||||
// mibMap maps a MIB identifier to an htmlEncoding index.
|
||||
var mibMap = map[identifier.MIB]htmlEncoding{
|
||||
identifier.UTF8: utf8,
|
||||
identifier.UTF16BE: utf16be,
|
||||
identifier.UTF16LE: utf16le,
|
||||
identifier.IBM866: ibm866,
|
||||
identifier.ISOLatin2: iso8859_2,
|
||||
identifier.ISOLatin3: iso8859_3,
|
||||
identifier.ISOLatin4: iso8859_4,
|
||||
identifier.ISOLatinCyrillic: iso8859_5,
|
||||
identifier.ISOLatinArabic: iso8859_6,
|
||||
identifier.ISOLatinGreek: iso8859_7,
|
||||
identifier.ISOLatinHebrew: iso8859_8,
|
||||
identifier.ISO88598I: iso8859_8I,
|
||||
identifier.ISOLatin6: iso8859_10,
|
||||
identifier.ISO885913: iso8859_13,
|
||||
identifier.ISO885914: iso8859_14,
|
||||
identifier.ISO885915: iso8859_15,
|
||||
identifier.ISO885916: iso8859_16,
|
||||
identifier.KOI8R: koi8r,
|
||||
identifier.KOI8U: koi8u,
|
||||
identifier.Macintosh: macintosh,
|
||||
identifier.MacintoshCyrillic: macintoshCyrillic,
|
||||
identifier.Windows874: windows874,
|
||||
identifier.Windows1250: windows1250,
|
||||
identifier.Windows1251: windows1251,
|
||||
identifier.Windows1252: windows1252,
|
||||
identifier.Windows1253: windows1253,
|
||||
identifier.Windows1254: windows1254,
|
||||
identifier.Windows1255: windows1255,
|
||||
identifier.Windows1256: windows1256,
|
||||
identifier.Windows1257: windows1257,
|
||||
identifier.Windows1258: windows1258,
|
||||
identifier.XUserDefined: xUserDefined,
|
||||
identifier.GBK: gbk,
|
||||
identifier.GB18030: gb18030,
|
||||
identifier.Big5: big5,
|
||||
identifier.EUCPkdFmtJapanese: eucjp,
|
||||
identifier.ISO2022JP: iso2022jp,
|
||||
identifier.ShiftJIS: shiftJIS,
|
||||
identifier.EUCKR: euckr,
|
||||
identifier.Replacement: replacement,
|
||||
}
|
||||
|
||||
// encodings maps the internal htmlEncoding to an Encoding.
|
||||
// TODO: consider using a reusable index in encoding/internal.
|
||||
var encodings = [numEncodings]encoding.Encoding{
|
||||
utf8: unicode.UTF8,
|
||||
ibm866: charmap.CodePage866,
|
||||
iso8859_2: charmap.ISO8859_2,
|
||||
iso8859_3: charmap.ISO8859_3,
|
||||
iso8859_4: charmap.ISO8859_4,
|
||||
iso8859_5: charmap.ISO8859_5,
|
||||
iso8859_6: charmap.ISO8859_6,
|
||||
iso8859_7: charmap.ISO8859_7,
|
||||
iso8859_8: charmap.ISO8859_8,
|
||||
iso8859_8I: charmap.ISO8859_8I,
|
||||
iso8859_10: charmap.ISO8859_10,
|
||||
iso8859_13: charmap.ISO8859_13,
|
||||
iso8859_14: charmap.ISO8859_14,
|
||||
iso8859_15: charmap.ISO8859_15,
|
||||
iso8859_16: charmap.ISO8859_16,
|
||||
koi8r: charmap.KOI8R,
|
||||
koi8u: charmap.KOI8U,
|
||||
macintosh: charmap.Macintosh,
|
||||
windows874: charmap.Windows874,
|
||||
windows1250: charmap.Windows1250,
|
||||
windows1251: charmap.Windows1251,
|
||||
windows1252: charmap.Windows1252,
|
||||
windows1253: charmap.Windows1253,
|
||||
windows1254: charmap.Windows1254,
|
||||
windows1255: charmap.Windows1255,
|
||||
windows1256: charmap.Windows1256,
|
||||
windows1257: charmap.Windows1257,
|
||||
windows1258: charmap.Windows1258,
|
||||
macintoshCyrillic: charmap.MacintoshCyrillic,
|
||||
gbk: simplifiedchinese.GBK,
|
||||
gb18030: simplifiedchinese.GB18030,
|
||||
big5: traditionalchinese.Big5,
|
||||
eucjp: japanese.EUCJP,
|
||||
iso2022jp: japanese.ISO2022JP,
|
||||
shiftJIS: japanese.ShiftJIS,
|
||||
euckr: korean.EUCKR,
|
||||
replacement: encoding.Replacement,
|
||||
utf16be: unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM),
|
||||
utf16le: unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM),
|
||||
xUserDefined: charmap.XUserDefined,
|
||||
}
|
||||
352
vendor/golang.org/x/text/encoding/htmlindex/tables.go
generated
vendored
Normal file
352
vendor/golang.org/x/text/encoding/htmlindex/tables.go
generated
vendored
Normal file
|
|
@ -0,0 +1,352 @@
|
|||
// Code generated by running "go generate" in golang.org/x/text. DO NOT EDIT.
|
||||
|
||||
package htmlindex
|
||||
|
||||
type htmlEncoding byte
|
||||
|
||||
const (
|
||||
utf8 htmlEncoding = iota
|
||||
ibm866
|
||||
iso8859_2
|
||||
iso8859_3
|
||||
iso8859_4
|
||||
iso8859_5
|
||||
iso8859_6
|
||||
iso8859_7
|
||||
iso8859_8
|
||||
iso8859_8I
|
||||
iso8859_10
|
||||
iso8859_13
|
||||
iso8859_14
|
||||
iso8859_15
|
||||
iso8859_16
|
||||
koi8r
|
||||
koi8u
|
||||
macintosh
|
||||
windows874
|
||||
windows1250
|
||||
windows1251
|
||||
windows1252
|
||||
windows1253
|
||||
windows1254
|
||||
windows1255
|
||||
windows1256
|
||||
windows1257
|
||||
windows1258
|
||||
macintoshCyrillic
|
||||
gbk
|
||||
gb18030
|
||||
big5
|
||||
eucjp
|
||||
iso2022jp
|
||||
shiftJIS
|
||||
euckr
|
||||
replacement
|
||||
utf16be
|
||||
utf16le
|
||||
xUserDefined
|
||||
numEncodings
|
||||
)
|
||||
|
||||
var canonical = [numEncodings]string{
|
||||
"utf-8",
|
||||
"ibm866",
|
||||
"iso-8859-2",
|
||||
"iso-8859-3",
|
||||
"iso-8859-4",
|
||||
"iso-8859-5",
|
||||
"iso-8859-6",
|
||||
"iso-8859-7",
|
||||
"iso-8859-8",
|
||||
"iso-8859-8-i",
|
||||
"iso-8859-10",
|
||||
"iso-8859-13",
|
||||
"iso-8859-14",
|
||||
"iso-8859-15",
|
||||
"iso-8859-16",
|
||||
"koi8-r",
|
||||
"koi8-u",
|
||||
"macintosh",
|
||||
"windows-874",
|
||||
"windows-1250",
|
||||
"windows-1251",
|
||||
"windows-1252",
|
||||
"windows-1253",
|
||||
"windows-1254",
|
||||
"windows-1255",
|
||||
"windows-1256",
|
||||
"windows-1257",
|
||||
"windows-1258",
|
||||
"x-mac-cyrillic",
|
||||
"gbk",
|
||||
"gb18030",
|
||||
"big5",
|
||||
"euc-jp",
|
||||
"iso-2022-jp",
|
||||
"shift_jis",
|
||||
"euc-kr",
|
||||
"replacement",
|
||||
"utf-16be",
|
||||
"utf-16le",
|
||||
"x-user-defined",
|
||||
}
|
||||
|
||||
var nameMap = map[string]htmlEncoding{
|
||||
"unicode-1-1-utf-8": utf8,
|
||||
"utf-8": utf8,
|
||||
"utf8": utf8,
|
||||
"866": ibm866,
|
||||
"cp866": ibm866,
|
||||
"csibm866": ibm866,
|
||||
"ibm866": ibm866,
|
||||
"csisolatin2": iso8859_2,
|
||||
"iso-8859-2": iso8859_2,
|
||||
"iso-ir-101": iso8859_2,
|
||||
"iso8859-2": iso8859_2,
|
||||
"iso88592": iso8859_2,
|
||||
"iso_8859-2": iso8859_2,
|
||||
"iso_8859-2:1987": iso8859_2,
|
||||
"l2": iso8859_2,
|
||||
"latin2": iso8859_2,
|
||||
"csisolatin3": iso8859_3,
|
||||
"iso-8859-3": iso8859_3,
|
||||
"iso-ir-109": iso8859_3,
|
||||
"iso8859-3": iso8859_3,
|
||||
"iso88593": iso8859_3,
|
||||
"iso_8859-3": iso8859_3,
|
||||
"iso_8859-3:1988": iso8859_3,
|
||||
"l3": iso8859_3,
|
||||
"latin3": iso8859_3,
|
||||
"csisolatin4": iso8859_4,
|
||||
"iso-8859-4": iso8859_4,
|
||||
"iso-ir-110": iso8859_4,
|
||||
"iso8859-4": iso8859_4,
|
||||
"iso88594": iso8859_4,
|
||||
"iso_8859-4": iso8859_4,
|
||||
"iso_8859-4:1988": iso8859_4,
|
||||
"l4": iso8859_4,
|
||||
"latin4": iso8859_4,
|
||||
"csisolatincyrillic": iso8859_5,
|
||||
"cyrillic": iso8859_5,
|
||||
"iso-8859-5": iso8859_5,
|
||||
"iso-ir-144": iso8859_5,
|
||||
"iso8859-5": iso8859_5,
|
||||
"iso88595": iso8859_5,
|
||||
"iso_8859-5": iso8859_5,
|
||||
"iso_8859-5:1988": iso8859_5,
|
||||
"arabic": iso8859_6,
|
||||
"asmo-708": iso8859_6,
|
||||
"csiso88596e": iso8859_6,
|
||||
"csiso88596i": iso8859_6,
|
||||
"csisolatinarabic": iso8859_6,
|
||||
"ecma-114": iso8859_6,
|
||||
"iso-8859-6": iso8859_6,
|
||||
"iso-8859-6-e": iso8859_6,
|
||||
"iso-8859-6-i": iso8859_6,
|
||||
"iso-ir-127": iso8859_6,
|
||||
"iso8859-6": iso8859_6,
|
||||
"iso88596": iso8859_6,
|
||||
"iso_8859-6": iso8859_6,
|
||||
"iso_8859-6:1987": iso8859_6,
|
||||
"csisolatingreek": iso8859_7,
|
||||
"ecma-118": iso8859_7,
|
||||
"elot_928": iso8859_7,
|
||||
"greek": iso8859_7,
|
||||
"greek8": iso8859_7,
|
||||
"iso-8859-7": iso8859_7,
|
||||
"iso-ir-126": iso8859_7,
|
||||
"iso8859-7": iso8859_7,
|
||||
"iso88597": iso8859_7,
|
||||
"iso_8859-7": iso8859_7,
|
||||
"iso_8859-7:1987": iso8859_7,
|
||||
"sun_eu_greek": iso8859_7,
|
||||
"csiso88598e": iso8859_8,
|
||||
"csisolatinhebrew": iso8859_8,
|
||||
"hebrew": iso8859_8,
|
||||
"iso-8859-8": iso8859_8,
|
||||
"iso-8859-8-e": iso8859_8,
|
||||
"iso-ir-138": iso8859_8,
|
||||
"iso8859-8": iso8859_8,
|
||||
"iso88598": iso8859_8,
|
||||
"iso_8859-8": iso8859_8,
|
||||
"iso_8859-8:1988": iso8859_8,
|
||||
"visual": iso8859_8,
|
||||
"csiso88598i": iso8859_8I,
|
||||
"iso-8859-8-i": iso8859_8I,
|
||||
"logical": iso8859_8I,
|
||||
"csisolatin6": iso8859_10,
|
||||
"iso-8859-10": iso8859_10,
|
||||
"iso-ir-157": iso8859_10,
|
||||
"iso8859-10": iso8859_10,
|
||||
"iso885910": iso8859_10,
|
||||
"l6": iso8859_10,
|
||||
"latin6": iso8859_10,
|
||||
"iso-8859-13": iso8859_13,
|
||||
"iso8859-13": iso8859_13,
|
||||
"iso885913": iso8859_13,
|
||||
"iso-8859-14": iso8859_14,
|
||||
"iso8859-14": iso8859_14,
|
||||
"iso885914": iso8859_14,
|
||||
"csisolatin9": iso8859_15,
|
||||
"iso-8859-15": iso8859_15,
|
||||
"iso8859-15": iso8859_15,
|
||||
"iso885915": iso8859_15,
|
||||
"iso_8859-15": iso8859_15,
|
||||
"l9": iso8859_15,
|
||||
"iso-8859-16": iso8859_16,
|
||||
"cskoi8r": koi8r,
|
||||
"koi": koi8r,
|
||||
"koi8": koi8r,
|
||||
"koi8-r": koi8r,
|
||||
"koi8_r": koi8r,
|
||||
"koi8-ru": koi8u,
|
||||
"koi8-u": koi8u,
|
||||
"csmacintosh": macintosh,
|
||||
"mac": macintosh,
|
||||
"macintosh": macintosh,
|
||||
"x-mac-roman": macintosh,
|
||||
"dos-874": windows874,
|
||||
"iso-8859-11": windows874,
|
||||
"iso8859-11": windows874,
|
||||
"iso885911": windows874,
|
||||
"tis-620": windows874,
|
||||
"windows-874": windows874,
|
||||
"cp1250": windows1250,
|
||||
"windows-1250": windows1250,
|
||||
"x-cp1250": windows1250,
|
||||
"cp1251": windows1251,
|
||||
"windows-1251": windows1251,
|
||||
"x-cp1251": windows1251,
|
||||
"ansi_x3.4-1968": windows1252,
|
||||
"ascii": windows1252,
|
||||
"cp1252": windows1252,
|
||||
"cp819": windows1252,
|
||||
"csisolatin1": windows1252,
|
||||
"ibm819": windows1252,
|
||||
"iso-8859-1": windows1252,
|
||||
"iso-ir-100": windows1252,
|
||||
"iso8859-1": windows1252,
|
||||
"iso88591": windows1252,
|
||||
"iso_8859-1": windows1252,
|
||||
"iso_8859-1:1987": windows1252,
|
||||
"l1": windows1252,
|
||||
"latin1": windows1252,
|
||||
"us-ascii": windows1252,
|
||||
"windows-1252": windows1252,
|
||||
"x-cp1252": windows1252,
|
||||
"cp1253": windows1253,
|
||||
"windows-1253": windows1253,
|
||||
"x-cp1253": windows1253,
|
||||
"cp1254": windows1254,
|
||||
"csisolatin5": windows1254,
|
||||
"iso-8859-9": windows1254,
|
||||
"iso-ir-148": windows1254,
|
||||
"iso8859-9": windows1254,
|
||||
"iso88599": windows1254,
|
||||
"iso_8859-9": windows1254,
|
||||
"iso_8859-9:1989": windows1254,
|
||||
"l5": windows1254,
|
||||
"latin5": windows1254,
|
||||
"windows-1254": windows1254,
|
||||
"x-cp1254": windows1254,
|
||||
"cp1255": windows1255,
|
||||
"windows-1255": windows1255,
|
||||
"x-cp1255": windows1255,
|
||||
"cp1256": windows1256,
|
||||
"windows-1256": windows1256,
|
||||
"x-cp1256": windows1256,
|
||||
"cp1257": windows1257,
|
||||
"windows-1257": windows1257,
|
||||
"x-cp1257": windows1257,
|
||||
"cp1258": windows1258,
|
||||
"windows-1258": windows1258,
|
||||
"x-cp1258": windows1258,
|
||||
"x-mac-cyrillic": macintoshCyrillic,
|
||||
"x-mac-ukrainian": macintoshCyrillic,
|
||||
"chinese": gbk,
|
||||
"csgb2312": gbk,
|
||||
"csiso58gb231280": gbk,
|
||||
"gb2312": gbk,
|
||||
"gb_2312": gbk,
|
||||
"gb_2312-80": gbk,
|
||||
"gbk": gbk,
|
||||
"iso-ir-58": gbk,
|
||||
"x-gbk": gbk,
|
||||
"gb18030": gb18030,
|
||||
"big5": big5,
|
||||
"big5-hkscs": big5,
|
||||
"cn-big5": big5,
|
||||
"csbig5": big5,
|
||||
"x-x-big5": big5,
|
||||
"cseucpkdfmtjapanese": eucjp,
|
||||
"euc-jp": eucjp,
|
||||
"x-euc-jp": eucjp,
|
||||
"csiso2022jp": iso2022jp,
|
||||
"iso-2022-jp": iso2022jp,
|
||||
"csshiftjis": shiftJIS,
|
||||
"ms932": shiftJIS,
|
||||
"ms_kanji": shiftJIS,
|
||||
"shift-jis": shiftJIS,
|
||||
"shift_jis": shiftJIS,
|
||||
"sjis": shiftJIS,
|
||||
"windows-31j": shiftJIS,
|
||||
"x-sjis": shiftJIS,
|
||||
"cseuckr": euckr,
|
||||
"csksc56011987": euckr,
|
||||
"euc-kr": euckr,
|
||||
"iso-ir-149": euckr,
|
||||
"korean": euckr,
|
||||
"ks_c_5601-1987": euckr,
|
||||
"ks_c_5601-1989": euckr,
|
||||
"ksc5601": euckr,
|
||||
"ksc_5601": euckr,
|
||||
"windows-949": euckr,
|
||||
"csiso2022kr": replacement,
|
||||
"hz-gb-2312": replacement,
|
||||
"iso-2022-cn": replacement,
|
||||
"iso-2022-cn-ext": replacement,
|
||||
"iso-2022-kr": replacement,
|
||||
"utf-16be": utf16be,
|
||||
"utf-16": utf16le,
|
||||
"utf-16le": utf16le,
|
||||
"x-user-defined": xUserDefined,
|
||||
}
|
||||
|
||||
var localeMap = []htmlEncoding{
|
||||
windows1252, // und_Latn
|
||||
windows1256, // ar
|
||||
windows1251, // ba
|
||||
windows1251, // be
|
||||
windows1251, // bg
|
||||
windows1250, // cs
|
||||
iso8859_7, // el
|
||||
windows1257, // et
|
||||
windows1256, // fa
|
||||
windows1255, // he
|
||||
windows1250, // hr
|
||||
iso8859_2, // hu
|
||||
shiftJIS, // ja
|
||||
windows1251, // kk
|
||||
euckr, // ko
|
||||
windows1254, // ku
|
||||
windows1251, // ky
|
||||
windows1257, // lt
|
||||
windows1257, // lv
|
||||
windows1251, // mk
|
||||
iso8859_2, // pl
|
||||
windows1251, // ru
|
||||
windows1251, // sah
|
||||
windows1250, // sk
|
||||
iso8859_2, // sl
|
||||
windows1251, // sr
|
||||
windows1251, // tg
|
||||
windows874, // th
|
||||
windows1254, // tr
|
||||
windows1251, // tt
|
||||
windows1251, // uk
|
||||
windows1258, // vi
|
||||
gb18030, // zh-hans
|
||||
big5, // zh-hant
|
||||
}
|
||||
|
||||
const locales = "und_Latn ar ba be bg cs el et fa he hr hu ja kk ko ku ky lt lv mk pl ru sah sk sl sr tg th tr tt uk vi zh-hans zh-hant"
|
||||
27
vendor/golang.org/x/text/encoding/ianaindex/example_test.go
generated
vendored
Normal file
27
vendor/golang.org/x/text/encoding/ianaindex/example_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package ianaindex_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/encoding/ianaindex"
|
||||
)
|
||||
|
||||
func ExampleIndex() {
|
||||
fmt.Println(ianaindex.MIME.Name(charmap.ISO8859_7))
|
||||
fmt.Println(ianaindex.IANA.Name(charmap.ISO8859_7))
|
||||
fmt.Println(ianaindex.MIB.Name(charmap.ISO8859_7))
|
||||
|
||||
e, _ := ianaindex.IANA.Encoding("cp437")
|
||||
fmt.Println(ianaindex.IANA.Name(e))
|
||||
|
||||
// Output:
|
||||
// ISO-8859-7 <nil>
|
||||
// ISO_8859-7:1987 <nil>
|
||||
// ISOLatinGreek <nil>
|
||||
// IBM437 <nil>
|
||||
}
|
||||
192
vendor/golang.org/x/text/encoding/ianaindex/gen.go
generated
vendored
Normal file
192
vendor/golang.org/x/text/encoding/ianaindex/gen.go
generated
vendored
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
// Copyright 2017 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/internal/gen"
|
||||
)
|
||||
|
||||
type registry struct {
|
||||
XMLName xml.Name `xml:"registry"`
|
||||
Updated string `xml:"updated"`
|
||||
Registry []struct {
|
||||
ID string `xml:"id,attr"`
|
||||
Record []struct {
|
||||
Name string `xml:"name"`
|
||||
Xref []struct {
|
||||
Type string `xml:"type,attr"`
|
||||
Data string `xml:"data,attr"`
|
||||
} `xml:"xref"`
|
||||
Desc struct {
|
||||
Data string `xml:",innerxml"`
|
||||
} `xml:"description,"`
|
||||
MIB string `xml:"value"`
|
||||
Alias []string `xml:"alias"`
|
||||
MIME string `xml:"preferred_alias"`
|
||||
} `xml:"record"`
|
||||
} `xml:"registry"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
r := gen.OpenIANAFile("assignments/character-sets/character-sets.xml")
|
||||
reg := ®istry{}
|
||||
if err := xml.NewDecoder(r).Decode(®); err != nil && err != io.EOF {
|
||||
log.Fatalf("Error decoding charset registry: %v", err)
|
||||
}
|
||||
if len(reg.Registry) == 0 || reg.Registry[0].ID != "character-sets-1" {
|
||||
log.Fatalf("Unexpected ID %s", reg.Registry[0].ID)
|
||||
}
|
||||
|
||||
x := &indexInfo{}
|
||||
|
||||
for _, rec := range reg.Registry[0].Record {
|
||||
mib := identifier.MIB(parseInt(rec.MIB))
|
||||
x.addEntry(mib, rec.Name)
|
||||
for _, a := range rec.Alias {
|
||||
a = strings.Split(a, " ")[0] // strip comments.
|
||||
x.addAlias(a, mib)
|
||||
// MIB name aliases are prefixed with a "cs" (character set) in the
|
||||
// registry to identify them as display names and to ensure that
|
||||
// the name starts with a lowercase letter in case it is used as
|
||||
// an identifier. We remove it to be left with a nice clean name.
|
||||
if strings.HasPrefix(a, "cs") {
|
||||
x.setName(2, a[2:])
|
||||
}
|
||||
}
|
||||
if rec.MIME != "" {
|
||||
x.addAlias(rec.MIME, mib)
|
||||
x.setName(1, rec.MIME)
|
||||
}
|
||||
}
|
||||
|
||||
w := gen.NewCodeWriter()
|
||||
|
||||
fmt.Fprintln(w, `import "golang.org/x/text/encoding/internal/identifier"`)
|
||||
|
||||
writeIndex(w, x)
|
||||
|
||||
w.WriteGoFile("tables.go", "ianaindex")
|
||||
}
|
||||
|
||||
type alias struct {
|
||||
name string
|
||||
mib identifier.MIB
|
||||
}
|
||||
|
||||
type indexInfo struct {
|
||||
// compacted index from code to MIB
|
||||
codeToMIB []identifier.MIB
|
||||
alias []alias
|
||||
names [][3]string
|
||||
}
|
||||
|
||||
func (ii *indexInfo) Len() int {
|
||||
return len(ii.codeToMIB)
|
||||
}
|
||||
|
||||
func (ii *indexInfo) Less(a, b int) bool {
|
||||
return ii.codeToMIB[a] < ii.codeToMIB[b]
|
||||
}
|
||||
|
||||
func (ii *indexInfo) Swap(a, b int) {
|
||||
ii.codeToMIB[a], ii.codeToMIB[b] = ii.codeToMIB[b], ii.codeToMIB[a]
|
||||
// Co-sort the names.
|
||||
ii.names[a], ii.names[b] = ii.names[b], ii.names[a]
|
||||
}
|
||||
|
||||
func (ii *indexInfo) setName(i int, name string) {
|
||||
ii.names[len(ii.names)-1][i] = name
|
||||
}
|
||||
|
||||
func (ii *indexInfo) addEntry(mib identifier.MIB, name string) {
|
||||
ii.names = append(ii.names, [3]string{name, name, name})
|
||||
ii.addAlias(name, mib)
|
||||
ii.codeToMIB = append(ii.codeToMIB, mib)
|
||||
}
|
||||
|
||||
func (ii *indexInfo) addAlias(name string, mib identifier.MIB) {
|
||||
// Don't add duplicates for the same mib. Adding duplicate aliases for
|
||||
// different MIBs will cause the compiler to barf on an invalid map: great!.
|
||||
for i := len(ii.alias) - 1; i >= 0 && ii.alias[i].mib == mib; i-- {
|
||||
if ii.alias[i].name == name {
|
||||
return
|
||||
}
|
||||
}
|
||||
ii.alias = append(ii.alias, alias{name, mib})
|
||||
lower := strings.ToLower(name)
|
||||
if lower != name {
|
||||
ii.addAlias(lower, mib)
|
||||
}
|
||||
}
|
||||
|
||||
const maxMIMENameLen = '0' - 1 // officially 40, but we leave some buffer.
|
||||
|
||||
func writeIndex(w *gen.CodeWriter, x *indexInfo) {
|
||||
sort.Stable(x)
|
||||
|
||||
// Write constants.
|
||||
fmt.Fprintln(w, "const (")
|
||||
for i, m := range x.codeToMIB {
|
||||
if i == 0 {
|
||||
fmt.Fprintf(w, "enc%d = iota\n", m)
|
||||
} else {
|
||||
fmt.Fprintf(w, "enc%d\n", m)
|
||||
}
|
||||
}
|
||||
fmt.Fprintln(w, "numIANA")
|
||||
fmt.Fprintln(w, ")")
|
||||
|
||||
w.WriteVar("ianaToMIB", x.codeToMIB)
|
||||
|
||||
var ianaNames, mibNames []string
|
||||
for _, names := range x.names {
|
||||
n := names[0]
|
||||
if names[0] != names[1] {
|
||||
// MIME names are mostly identical to IANA names. We share the
|
||||
// tables by setting the first byte of the string to an index into
|
||||
// the string itself (< maxMIMENameLen) to the IANA name. The MIME
|
||||
// name immediately follows the index.
|
||||
x := len(names[1]) + 1
|
||||
if x > maxMIMENameLen {
|
||||
log.Fatalf("MIME name length (%d) > %d", x, maxMIMENameLen)
|
||||
}
|
||||
n = string(x) + names[1] + names[0]
|
||||
}
|
||||
ianaNames = append(ianaNames, n)
|
||||
mibNames = append(mibNames, names[2])
|
||||
}
|
||||
|
||||
w.WriteVar("ianaNames", ianaNames)
|
||||
w.WriteVar("mibNames", mibNames)
|
||||
|
||||
w.WriteComment(`
|
||||
TODO: Instead of using a map, we could use binary search strings doing
|
||||
on-the fly lower-casing per character. This allows to always avoid
|
||||
allocation and will be considerably more compact.`)
|
||||
fmt.Fprintln(w, "var ianaAliases = map[string]int{")
|
||||
for _, a := range x.alias {
|
||||
fmt.Fprintf(w, "%q: enc%d,\n", a.name, a.mib)
|
||||
}
|
||||
fmt.Fprintln(w, "}")
|
||||
}
|
||||
|
||||
func parseInt(s string) int {
|
||||
x, err := strconv.ParseInt(s, 10, 64)
|
||||
if err != nil {
|
||||
log.Fatalf("Could not parse integer: %v", err)
|
||||
}
|
||||
return int(x)
|
||||
}
|
||||
209
vendor/golang.org/x/text/encoding/ianaindex/ianaindex.go
generated
vendored
Normal file
209
vendor/golang.org/x/text/encoding/ianaindex/ianaindex.go
generated
vendored
Normal file
|
|
@ -0,0 +1,209 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go run gen.go
|
||||
|
||||
// Package ianaindex maps names to Encodings as specified by the IANA registry.
|
||||
// This includes both the MIME and IANA names.
|
||||
//
|
||||
// See http://www.iana.org/assignments/character-sets/character-sets.xhtml for
|
||||
// more details.
|
||||
package ianaindex
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sort"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/encoding/japanese"
|
||||
"golang.org/x/text/encoding/korean"
|
||||
"golang.org/x/text/encoding/simplifiedchinese"
|
||||
"golang.org/x/text/encoding/traditionalchinese"
|
||||
"golang.org/x/text/encoding/unicode"
|
||||
)
|
||||
|
||||
// TODO: remove the "Status... incomplete" in the package doc comment.
|
||||
// TODO: allow users to specify their own aliases?
|
||||
// TODO: allow users to specify their own indexes?
|
||||
// TODO: allow canonicalizing names
|
||||
|
||||
// NOTE: only use these top-level variables if we can get the linker to drop
|
||||
// the indexes when they are not used. Make them a function or perhaps only
|
||||
// support MIME otherwise.
|
||||
|
||||
var (
|
||||
// MIME is an index to map MIME names.
|
||||
MIME *Index = mime
|
||||
|
||||
// IANA is an index that supports all names and aliases using IANA names as
|
||||
// the canonical identifier.
|
||||
IANA *Index = iana
|
||||
|
||||
// MIB is an index that associates the MIB display name with an Encoding.
|
||||
MIB *Index = mib
|
||||
|
||||
mime = &Index{mimeName, ianaToMIB, ianaAliases, encodings[:]}
|
||||
iana = &Index{ianaName, ianaToMIB, ianaAliases, encodings[:]}
|
||||
mib = &Index{mibName, ianaToMIB, ianaAliases, encodings[:]}
|
||||
)
|
||||
|
||||
// Index maps names registered by IANA to Encodings.
|
||||
// Currently different Indexes only differ in the names they return for
|
||||
// encodings. In the future they may also differ in supported aliases.
|
||||
type Index struct {
|
||||
names func(i int) string
|
||||
toMIB []identifier.MIB // Sorted slice of supported MIBs
|
||||
alias map[string]int
|
||||
enc []encoding.Encoding
|
||||
}
|
||||
|
||||
var (
|
||||
errInvalidName = errors.New("ianaindex: invalid encoding name")
|
||||
errUnknown = errors.New("ianaindex: unknown Encoding")
|
||||
errUnsupported = errors.New("ianaindex: unsupported Encoding")
|
||||
)
|
||||
|
||||
// Encoding returns an Encoding for IANA-registered names. Matching is
|
||||
// case-insensitive.
|
||||
func (x *Index) Encoding(name string) (encoding.Encoding, error) {
|
||||
name = strings.TrimSpace(name)
|
||||
// First try without lowercasing (possibly creating an allocation).
|
||||
i, ok := x.alias[name]
|
||||
if !ok {
|
||||
i, ok = x.alias[strings.ToLower(name)]
|
||||
if !ok {
|
||||
return nil, errInvalidName
|
||||
}
|
||||
}
|
||||
return x.enc[i], nil
|
||||
}
|
||||
|
||||
// Name reports the canonical name of the given Encoding. It will return an
|
||||
// error if the e is not associated with a known encoding scheme.
|
||||
func (x *Index) Name(e encoding.Encoding) (string, error) {
|
||||
id, ok := e.(identifier.Interface)
|
||||
if !ok {
|
||||
return "", errUnknown
|
||||
}
|
||||
mib, _ := id.ID()
|
||||
if mib == 0 {
|
||||
return "", errUnknown
|
||||
}
|
||||
v := findMIB(x.toMIB, mib)
|
||||
if v == -1 {
|
||||
return "", errUnsupported
|
||||
}
|
||||
return x.names(v), nil
|
||||
}
|
||||
|
||||
// TODO: the coverage of this index is rather spotty. Allowing users to set
|
||||
// encodings would allow:
|
||||
// - users to increase coverage
|
||||
// - allow a partially loaded set of encodings in case the user doesn't need to
|
||||
// them all.
|
||||
// - write an OS-specific wrapper for supported encodings and set them.
|
||||
// The exact definition of Set depends a bit on if and how we want to let users
|
||||
// write their own Encoding implementations. Also, it is not possible yet to
|
||||
// only partially load the encodings without doing some refactoring. Until this
|
||||
// is solved, we might as well not support Set.
|
||||
// // Set sets the e to be used for the encoding scheme identified by name. Only
|
||||
// // canonical names may be used. An empty name assigns e to its internally
|
||||
// // associated encoding scheme.
|
||||
// func (x *Index) Set(name string, e encoding.Encoding) error {
|
||||
// panic("TODO: implement")
|
||||
// }
|
||||
|
||||
func findMIB(x []identifier.MIB, mib identifier.MIB) int {
|
||||
i := sort.Search(len(x), func(i int) bool { return x[i] >= mib })
|
||||
if i < len(x) && x[i] == mib {
|
||||
return i
|
||||
}
|
||||
return -1
|
||||
}
|
||||
|
||||
const maxMIMENameLen = '0' - 1 // officially 40, but we leave some buffer.
|
||||
|
||||
func mimeName(x int) string {
|
||||
n := ianaNames[x]
|
||||
// See gen.go for a description of the encoding.
|
||||
if n[0] <= maxMIMENameLen {
|
||||
return n[1:n[0]]
|
||||
}
|
||||
return n
|
||||
}
|
||||
|
||||
func ianaName(x int) string {
|
||||
n := ianaNames[x]
|
||||
// See gen.go for a description of the encoding.
|
||||
if n[0] <= maxMIMENameLen {
|
||||
return n[n[0]:]
|
||||
}
|
||||
return n
|
||||
}
|
||||
|
||||
func mibName(x int) string {
|
||||
return mibNames[x]
|
||||
}
|
||||
|
||||
var encodings = [numIANA]encoding.Encoding{
|
||||
enc106: unicode.UTF8,
|
||||
enc1015: unicode.UTF16(unicode.BigEndian, unicode.UseBOM),
|
||||
enc1013: unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM),
|
||||
enc1014: unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM),
|
||||
enc2028: charmap.CodePage037,
|
||||
enc2011: charmap.CodePage437,
|
||||
enc2009: charmap.CodePage850,
|
||||
enc2010: charmap.CodePage852,
|
||||
enc2046: charmap.CodePage855,
|
||||
enc2089: charmap.CodePage858,
|
||||
enc2048: charmap.CodePage860,
|
||||
enc2013: charmap.CodePage862,
|
||||
enc2050: charmap.CodePage863,
|
||||
enc2052: charmap.CodePage865,
|
||||
enc2086: charmap.CodePage866,
|
||||
enc2102: charmap.CodePage1047,
|
||||
enc2091: charmap.CodePage1140,
|
||||
enc4: charmap.ISO8859_1,
|
||||
enc5: charmap.ISO8859_2,
|
||||
enc6: charmap.ISO8859_3,
|
||||
enc7: charmap.ISO8859_4,
|
||||
enc8: charmap.ISO8859_5,
|
||||
enc9: charmap.ISO8859_6,
|
||||
enc81: charmap.ISO8859_6E,
|
||||
enc82: charmap.ISO8859_6I,
|
||||
enc10: charmap.ISO8859_7,
|
||||
enc11: charmap.ISO8859_8,
|
||||
enc84: charmap.ISO8859_8E,
|
||||
enc85: charmap.ISO8859_8I,
|
||||
enc12: charmap.ISO8859_9,
|
||||
enc13: charmap.ISO8859_10,
|
||||
enc109: charmap.ISO8859_13,
|
||||
enc110: charmap.ISO8859_14,
|
||||
enc111: charmap.ISO8859_15,
|
||||
enc112: charmap.ISO8859_16,
|
||||
enc2084: charmap.KOI8R,
|
||||
enc2088: charmap.KOI8U,
|
||||
enc2027: charmap.Macintosh,
|
||||
enc2109: charmap.Windows874,
|
||||
enc2250: charmap.Windows1250,
|
||||
enc2251: charmap.Windows1251,
|
||||
enc2252: charmap.Windows1252,
|
||||
enc2253: charmap.Windows1253,
|
||||
enc2254: charmap.Windows1254,
|
||||
enc2255: charmap.Windows1255,
|
||||
enc2256: charmap.Windows1256,
|
||||
enc2257: charmap.Windows1257,
|
||||
enc2258: charmap.Windows1258,
|
||||
enc18: japanese.EUCJP,
|
||||
enc39: japanese.ISO2022JP,
|
||||
enc17: japanese.ShiftJIS,
|
||||
enc38: korean.EUCKR,
|
||||
enc114: simplifiedchinese.GB18030,
|
||||
enc113: simplifiedchinese.GBK,
|
||||
enc2085: simplifiedchinese.HZGB2312,
|
||||
enc2026: traditionalchinese.Big5,
|
||||
}
|
||||
192
vendor/golang.org/x/text/encoding/ianaindex/ianaindex_test.go
generated
vendored
Normal file
192
vendor/golang.org/x/text/encoding/ianaindex/ianaindex_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
// Copyright 2017 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package ianaindex
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/charmap"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/encoding/japanese"
|
||||
"golang.org/x/text/encoding/korean"
|
||||
"golang.org/x/text/encoding/simplifiedchinese"
|
||||
"golang.org/x/text/encoding/traditionalchinese"
|
||||
"golang.org/x/text/encoding/unicode"
|
||||
)
|
||||
|
||||
var All = [][]encoding.Encoding{
|
||||
unicode.All,
|
||||
charmap.All,
|
||||
japanese.All,
|
||||
korean.All,
|
||||
simplifiedchinese.All,
|
||||
traditionalchinese.All,
|
||||
}
|
||||
|
||||
// TestAllIANA tests whether an Encoding supported in x/text is defined by IANA but
|
||||
// not supported by this package.
|
||||
func TestAllIANA(t *testing.T) {
|
||||
for _, ea := range All {
|
||||
for _, e := range ea {
|
||||
mib, _ := e.(identifier.Interface).ID()
|
||||
if x := findMIB(ianaToMIB, mib); x != -1 && encodings[x] == nil {
|
||||
t.Errorf("supported MIB %v (%v) not in index", mib, e)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestNotSupported reports the encodings in IANA, but not by x/text.
|
||||
func TestNotSupported(t *testing.T) {
|
||||
mibs := map[identifier.MIB]bool{}
|
||||
for _, ea := range All {
|
||||
for _, e := range ea {
|
||||
mib, _ := e.(identifier.Interface).ID()
|
||||
mibs[mib] = true
|
||||
}
|
||||
}
|
||||
|
||||
// Many encodings in the IANA index will likely not be suppored by the
|
||||
// Go encodings. That is fine.
|
||||
// TODO: consider wheter we should add this test.
|
||||
// for code, mib := range ianaToMIB {
|
||||
// t.Run(fmt.Sprint("IANA:", mib), func(t *testing.T) {
|
||||
// if !mibs[mib] {
|
||||
// t.Skipf("IANA encoding %s (MIB %v) not supported",
|
||||
// ianaNames[code], mib)
|
||||
// }
|
||||
// })
|
||||
// }
|
||||
}
|
||||
|
||||
func TestEncoding(t *testing.T) {
|
||||
testCases := []struct {
|
||||
index *Index
|
||||
name string
|
||||
canonical string
|
||||
err error
|
||||
}{
|
||||
{MIME, "utf-8", "UTF-8", nil},
|
||||
{MIME, " utf-8 ", "UTF-8", nil},
|
||||
{MIME, " l5 ", "ISO-8859-9", nil},
|
||||
{MIME, "latin5 ", "ISO-8859-9", nil},
|
||||
{MIME, "LATIN5 ", "ISO-8859-9", nil},
|
||||
{MIME, "latin 5", "", errInvalidName},
|
||||
{MIME, "latin-5", "", errInvalidName},
|
||||
|
||||
{IANA, "utf-8", "UTF-8", nil},
|
||||
{IANA, " utf-8 ", "UTF-8", nil},
|
||||
{IANA, " l5 ", "ISO_8859-9:1989", nil},
|
||||
{IANA, "latin5 ", "ISO_8859-9:1989", nil},
|
||||
{IANA, "LATIN5 ", "ISO_8859-9:1989", nil},
|
||||
{IANA, "latin 5", "", errInvalidName},
|
||||
{IANA, "latin-5", "", errInvalidName},
|
||||
|
||||
{MIB, "utf-8", "UTF8", nil},
|
||||
{MIB, " utf-8 ", "UTF8", nil},
|
||||
{MIB, " l5 ", "ISOLatin5", nil},
|
||||
{MIB, "latin5 ", "ISOLatin5", nil},
|
||||
{MIB, "LATIN5 ", "ISOLatin5", nil},
|
||||
{MIB, "latin 5", "", errInvalidName},
|
||||
{MIB, "latin-5", "", errInvalidName},
|
||||
}
|
||||
for i, tc := range testCases {
|
||||
enc, err := tc.index.Encoding(tc.name)
|
||||
if err != tc.err {
|
||||
t.Errorf("%d: error was %v; want %v", i, err, tc.err)
|
||||
}
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if got, err := tc.index.Name(enc); got != tc.canonical {
|
||||
t.Errorf("%d: Name(Encoding(%q)) = %q; want %q (%v)", i, tc.name, got, tc.canonical, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestTables(t *testing.T) {
|
||||
for i, x := range []*Index{MIME, IANA} {
|
||||
for name, index := range x.alias {
|
||||
got, err := x.Encoding(name)
|
||||
if err != nil {
|
||||
t.Errorf("%d%s:err: unexpected error %v", i, name, err)
|
||||
}
|
||||
if want := x.enc[index]; got != want {
|
||||
t.Errorf("%d%s:encoding: got %v; want %v", i, name, got, want)
|
||||
}
|
||||
if got != nil {
|
||||
mib, _ := got.(identifier.Interface).ID()
|
||||
if i := findMIB(x.toMIB, mib); i != index {
|
||||
t.Errorf("%d%s:mib: got %d; want %d", i, name, i, index)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type unsupported struct {
|
||||
encoding.Encoding
|
||||
}
|
||||
|
||||
func (unsupported) ID() (identifier.MIB, string) { return 9999, "" }
|
||||
|
||||
func TestName(t *testing.T) {
|
||||
testCases := []struct {
|
||||
desc string
|
||||
enc encoding.Encoding
|
||||
f func(e encoding.Encoding) (string, error)
|
||||
name string
|
||||
err error
|
||||
}{{
|
||||
"defined encoding",
|
||||
charmap.ISO8859_2,
|
||||
MIME.Name,
|
||||
"ISO-8859-2",
|
||||
nil,
|
||||
}, {
|
||||
"defined Unicode encoding",
|
||||
unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM),
|
||||
IANA.Name,
|
||||
"UTF-16BE",
|
||||
nil,
|
||||
}, {
|
||||
"another defined Unicode encoding",
|
||||
unicode.UTF16(unicode.BigEndian, unicode.UseBOM),
|
||||
MIME.Name,
|
||||
"UTF-16",
|
||||
nil,
|
||||
}, {
|
||||
"unknown Unicode encoding",
|
||||
unicode.UTF16(unicode.BigEndian, unicode.ExpectBOM),
|
||||
MIME.Name,
|
||||
"",
|
||||
errUnknown,
|
||||
}, {
|
||||
"undefined encoding",
|
||||
unsupported{},
|
||||
MIME.Name,
|
||||
"",
|
||||
errUnsupported,
|
||||
}, {
|
||||
"undefined other encoding in HTML standard",
|
||||
charmap.CodePage437,
|
||||
IANA.Name,
|
||||
"IBM437",
|
||||
nil,
|
||||
}, {
|
||||
"unknown encoding",
|
||||
encoding.Nop,
|
||||
IANA.Name,
|
||||
"",
|
||||
errUnknown,
|
||||
}}
|
||||
for i, tc := range testCases {
|
||||
name, err := tc.f(tc.enc)
|
||||
if name != tc.name || err != tc.err {
|
||||
t.Errorf("%d:%s: got %q, %v; want %q, %v", i, tc.desc, name, err, tc.name, tc.err)
|
||||
}
|
||||
}
|
||||
}
|
||||
2348
vendor/golang.org/x/text/encoding/ianaindex/tables.go
generated
vendored
Normal file
2348
vendor/golang.org/x/text/encoding/ianaindex/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
180
vendor/golang.org/x/text/encoding/internal/enctest/enctest.go
generated
vendored
Normal file
180
vendor/golang.org/x/text/encoding/internal/enctest/enctest.go
generated
vendored
Normal file
|
|
@ -0,0 +1,180 @@
|
|||
// Copyright 2017 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package enctest
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"io"
|
||||
"io/ioutil"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// Encoder or Decoder
|
||||
type Transcoder interface {
|
||||
transform.Transformer
|
||||
Bytes([]byte) ([]byte, error)
|
||||
String(string) (string, error)
|
||||
}
|
||||
|
||||
func TestEncoding(t *testing.T, e encoding.Encoding, encoded, utf8, prefix, suffix string) {
|
||||
for _, direction := range []string{"Decode", "Encode"} {
|
||||
t.Run(fmt.Sprintf("%v/%s", e, direction), func(t *testing.T) {
|
||||
|
||||
var coder Transcoder
|
||||
var want, src, wPrefix, sPrefix, wSuffix, sSuffix string
|
||||
if direction == "Decode" {
|
||||
coder, want, src = e.NewDecoder(), utf8, encoded
|
||||
wPrefix, sPrefix, wSuffix, sSuffix = "", prefix, "", suffix
|
||||
} else {
|
||||
coder, want, src = e.NewEncoder(), encoded, utf8
|
||||
wPrefix, sPrefix, wSuffix, sSuffix = prefix, "", suffix, ""
|
||||
}
|
||||
|
||||
dst := make([]byte, len(wPrefix)+len(want)+len(wSuffix))
|
||||
nDst, nSrc, err := coder.Transform(dst, []byte(sPrefix+src+sSuffix), true)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
if nDst != len(wPrefix)+len(want)+len(wSuffix) {
|
||||
t.Fatalf("nDst got %d, want %d",
|
||||
nDst, len(wPrefix)+len(want)+len(wSuffix))
|
||||
}
|
||||
if nSrc != len(sPrefix)+len(src)+len(sSuffix) {
|
||||
t.Fatalf("nSrc got %d, want %d",
|
||||
nSrc, len(sPrefix)+len(src)+len(sSuffix))
|
||||
}
|
||||
if got := string(dst); got != wPrefix+want+wSuffix {
|
||||
t.Fatalf("\ngot %q\nwant %q", got, wPrefix+want+wSuffix)
|
||||
}
|
||||
|
||||
for _, n := range []int{0, 1, 2, 10, 123, 4567} {
|
||||
input := sPrefix + strings.Repeat(src, n) + sSuffix
|
||||
g, err := coder.String(input)
|
||||
if err != nil {
|
||||
t.Fatalf("Bytes: n=%d: %v", n, err)
|
||||
}
|
||||
if len(g) == 0 && len(input) == 0 {
|
||||
// If the input is empty then the output can be empty,
|
||||
// regardless of whatever wPrefix is.
|
||||
continue
|
||||
}
|
||||
got1, want1 := string(g), wPrefix+strings.Repeat(want, n)+wSuffix
|
||||
if got1 != want1 {
|
||||
t.Fatalf("ReadAll: n=%d\ngot %q\nwant %q",
|
||||
n, trim(got1), trim(want1))
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestFile(t *testing.T, e encoding.Encoding) {
|
||||
for _, dir := range []string{"Decode", "Encode"} {
|
||||
t.Run(fmt.Sprintf("%s/%s", e, dir), func(t *testing.T) {
|
||||
dst, src, transformer, err := load(dir, e)
|
||||
if err != nil {
|
||||
t.Fatalf("load: %v", err)
|
||||
}
|
||||
buf, err := transformer.Bytes(src)
|
||||
if err != nil {
|
||||
t.Fatalf("transform: %v", err)
|
||||
}
|
||||
if !bytes.Equal(buf, dst) {
|
||||
t.Error("transformed bytes did not match golden file")
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Benchmark(b *testing.B, enc encoding.Encoding) {
|
||||
for _, direction := range []string{"Decode", "Encode"} {
|
||||
b.Run(fmt.Sprintf("%s/%s", enc, direction), func(b *testing.B) {
|
||||
_, src, transformer, err := load(direction, enc)
|
||||
if err != nil {
|
||||
b.Fatal(err)
|
||||
}
|
||||
b.SetBytes(int64(len(src)))
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
r := transform.NewReader(bytes.NewReader(src), transformer)
|
||||
io.Copy(ioutil.Discard, r)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// testdataFiles are files in testdata/*.txt.
|
||||
var testdataFiles = []struct {
|
||||
mib identifier.MIB
|
||||
basename, ext string
|
||||
}{
|
||||
{identifier.Windows1252, "candide", "windows-1252"},
|
||||
{identifier.EUCPkdFmtJapanese, "rashomon", "euc-jp"},
|
||||
{identifier.ISO2022JP, "rashomon", "iso-2022-jp"},
|
||||
{identifier.ShiftJIS, "rashomon", "shift-jis"},
|
||||
{identifier.EUCKR, "unsu-joh-eun-nal", "euc-kr"},
|
||||
{identifier.GBK, "sunzi-bingfa-simplified", "gbk"},
|
||||
{identifier.HZGB2312, "sunzi-bingfa-gb-levels-1-and-2", "hz-gb2312"},
|
||||
{identifier.Big5, "sunzi-bingfa-traditional", "big5"},
|
||||
{identifier.UTF16LE, "candide", "utf-16le"},
|
||||
{identifier.UTF8, "candide", "utf-8"},
|
||||
{identifier.UTF32BE, "candide", "utf-32be"},
|
||||
|
||||
// GB18030 is a superset of GBK and is nominally a Simplified Chinese
|
||||
// encoding, but it can also represent the entire Basic Multilingual
|
||||
// Plane, including codepoints like 'â' that aren't encodable by GBK.
|
||||
// GB18030 on Simplified Chinese should perform similarly to GBK on
|
||||
// Simplified Chinese. GB18030 on "candide" is more interesting.
|
||||
{identifier.GB18030, "candide", "gb18030"},
|
||||
}
|
||||
|
||||
func load(direction string, enc encoding.Encoding) ([]byte, []byte, Transcoder, error) {
|
||||
basename, ext, count := "", "", 0
|
||||
for _, tf := range testdataFiles {
|
||||
if mib, _ := enc.(identifier.Interface).ID(); tf.mib == mib {
|
||||
basename, ext = tf.basename, tf.ext
|
||||
count++
|
||||
}
|
||||
}
|
||||
if count != 1 {
|
||||
if count == 0 {
|
||||
return nil, nil, nil, fmt.Errorf("no testdataFiles for %s", enc)
|
||||
}
|
||||
return nil, nil, nil, fmt.Errorf("too many testdataFiles for %s", enc)
|
||||
}
|
||||
dstFile := fmt.Sprintf("../testdata/%s-%s.txt", basename, ext)
|
||||
srcFile := fmt.Sprintf("../testdata/%s-utf-8.txt", basename)
|
||||
var coder Transcoder = encoding.ReplaceUnsupported(enc.NewEncoder())
|
||||
if direction == "Decode" {
|
||||
dstFile, srcFile = srcFile, dstFile
|
||||
coder = enc.NewDecoder()
|
||||
}
|
||||
dst, err := ioutil.ReadFile(dstFile)
|
||||
if err != nil {
|
||||
if dst, err = ioutil.ReadFile("../" + dstFile); err != nil {
|
||||
return nil, nil, nil, err
|
||||
}
|
||||
}
|
||||
src, err := ioutil.ReadFile(srcFile)
|
||||
if err != nil {
|
||||
if src, err = ioutil.ReadFile("../" + srcFile); err != nil {
|
||||
return nil, nil, nil, err
|
||||
}
|
||||
}
|
||||
return dst, src, coder, nil
|
||||
}
|
||||
|
||||
func trim(s string) string {
|
||||
if len(s) < 120 {
|
||||
return s
|
||||
}
|
||||
return s[:50] + "..." + s[len(s)-50:]
|
||||
}
|
||||
137
vendor/golang.org/x/text/encoding/internal/identifier/gen.go
generated
vendored
Normal file
137
vendor/golang.org/x/text/encoding/internal/identifier/gen.go
generated
vendored
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/xml"
|
||||
"fmt"
|
||||
"io"
|
||||
"log"
|
||||
"strings"
|
||||
|
||||
"golang.org/x/text/internal/gen"
|
||||
)
|
||||
|
||||
type registry struct {
|
||||
XMLName xml.Name `xml:"registry"`
|
||||
Updated string `xml:"updated"`
|
||||
Registry []struct {
|
||||
ID string `xml:"id,attr"`
|
||||
Record []struct {
|
||||
Name string `xml:"name"`
|
||||
Xref []struct {
|
||||
Type string `xml:"type,attr"`
|
||||
Data string `xml:"data,attr"`
|
||||
} `xml:"xref"`
|
||||
Desc struct {
|
||||
Data string `xml:",innerxml"`
|
||||
// Any []struct {
|
||||
// Data string `xml:",chardata"`
|
||||
// } `xml:",any"`
|
||||
// Data string `xml:",chardata"`
|
||||
} `xml:"description,"`
|
||||
MIB string `xml:"value"`
|
||||
Alias []string `xml:"alias"`
|
||||
MIME string `xml:"preferred_alias"`
|
||||
} `xml:"record"`
|
||||
} `xml:"registry"`
|
||||
}
|
||||
|
||||
func main() {
|
||||
r := gen.OpenIANAFile("assignments/character-sets/character-sets.xml")
|
||||
reg := ®istry{}
|
||||
if err := xml.NewDecoder(r).Decode(®); err != nil && err != io.EOF {
|
||||
log.Fatalf("Error decoding charset registry: %v", err)
|
||||
}
|
||||
if len(reg.Registry) == 0 || reg.Registry[0].ID != "character-sets-1" {
|
||||
log.Fatalf("Unexpected ID %s", reg.Registry[0].ID)
|
||||
}
|
||||
|
||||
w := &bytes.Buffer{}
|
||||
fmt.Fprintf(w, "const (\n")
|
||||
for _, rec := range reg.Registry[0].Record {
|
||||
constName := ""
|
||||
for _, a := range rec.Alias {
|
||||
if strings.HasPrefix(a, "cs") && strings.IndexByte(a, '-') == -1 {
|
||||
// Some of the constant definitions have comments in them. Strip those.
|
||||
constName = strings.Title(strings.SplitN(a[2:], "\n", 2)[0])
|
||||
}
|
||||
}
|
||||
if constName == "" {
|
||||
switch rec.MIB {
|
||||
case "2085":
|
||||
constName = "HZGB2312" // Not listed as alias for some reason.
|
||||
default:
|
||||
log.Fatalf("No cs alias defined for %s.", rec.MIB)
|
||||
}
|
||||
}
|
||||
if rec.MIME != "" {
|
||||
rec.MIME = fmt.Sprintf(" (MIME: %s)", rec.MIME)
|
||||
}
|
||||
fmt.Fprintf(w, "// %s is the MIB identifier with IANA name %s%s.\n//\n", constName, rec.Name, rec.MIME)
|
||||
if len(rec.Desc.Data) > 0 {
|
||||
fmt.Fprint(w, "// ")
|
||||
d := xml.NewDecoder(strings.NewReader(rec.Desc.Data))
|
||||
inElem := true
|
||||
attr := ""
|
||||
for {
|
||||
t, err := d.Token()
|
||||
if err != nil {
|
||||
if err != io.EOF {
|
||||
log.Fatal(err)
|
||||
}
|
||||
break
|
||||
}
|
||||
switch x := t.(type) {
|
||||
case xml.CharData:
|
||||
attr = "" // Don't need attribute info.
|
||||
a := bytes.Split([]byte(x), []byte("\n"))
|
||||
for i, b := range a {
|
||||
if b = bytes.TrimSpace(b); len(b) != 0 {
|
||||
if !inElem && i > 0 {
|
||||
fmt.Fprint(w, "\n// ")
|
||||
}
|
||||
inElem = false
|
||||
fmt.Fprintf(w, "%s ", string(b))
|
||||
}
|
||||
}
|
||||
case xml.StartElement:
|
||||
if x.Name.Local == "xref" {
|
||||
inElem = true
|
||||
use := false
|
||||
for _, a := range x.Attr {
|
||||
if a.Name.Local == "type" {
|
||||
use = use || a.Value != "person"
|
||||
}
|
||||
if a.Name.Local == "data" && use {
|
||||
attr = a.Value + " "
|
||||
}
|
||||
}
|
||||
}
|
||||
case xml.EndElement:
|
||||
inElem = false
|
||||
fmt.Fprint(w, attr)
|
||||
}
|
||||
}
|
||||
fmt.Fprint(w, "\n")
|
||||
}
|
||||
for _, x := range rec.Xref {
|
||||
switch x.Type {
|
||||
case "rfc":
|
||||
fmt.Fprintf(w, "// Reference: %s\n", strings.ToUpper(x.Data))
|
||||
case "uri":
|
||||
fmt.Fprintf(w, "// Reference: %s\n", x.Data)
|
||||
}
|
||||
}
|
||||
fmt.Fprintf(w, "%s MIB = %s\n", constName, rec.MIB)
|
||||
fmt.Fprintln(w)
|
||||
}
|
||||
fmt.Fprintln(w, ")")
|
||||
|
||||
gen.WriteGoFile("mib.go", "identifier", w.Bytes())
|
||||
}
|
||||
81
vendor/golang.org/x/text/encoding/internal/identifier/identifier.go
generated
vendored
Normal file
81
vendor/golang.org/x/text/encoding/internal/identifier/identifier.go
generated
vendored
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
//go:generate go run gen.go
|
||||
|
||||
// Package identifier defines the contract between implementations of Encoding
|
||||
// and Index by defining identifiers that uniquely identify standardized coded
|
||||
// character sets (CCS) and character encoding schemes (CES), which we will
|
||||
// together refer to as encodings, for which Encoding implementations provide
|
||||
// converters to and from UTF-8. This package is typically only of concern to
|
||||
// implementers of Indexes and Encodings.
|
||||
//
|
||||
// One part of the identifier is the MIB code, which is defined by IANA and
|
||||
// uniquely identifies a CCS or CES. Each code is associated with data that
|
||||
// references authorities, official documentation as well as aliases and MIME
|
||||
// names.
|
||||
//
|
||||
// Not all CESs are covered by the IANA registry. The "other" string that is
|
||||
// returned by ID can be used to identify other character sets or versions of
|
||||
// existing ones.
|
||||
//
|
||||
// It is recommended that each package that provides a set of Encodings provide
|
||||
// the All and Common variables to reference all supported encodings and
|
||||
// commonly used subset. This allows Index implementations to include all
|
||||
// available encodings without explicitly referencing or knowing about them.
|
||||
package identifier
|
||||
|
||||
// Note: this package is internal, but could be made public if there is a need
|
||||
// for writing third-party Indexes and Encodings.
|
||||
|
||||
// References:
|
||||
// - http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/convrtrs.txt
|
||||
// - http://www.iana.org/assignments/character-sets/character-sets.xhtml
|
||||
// - http://www.iana.org/assignments/ianacharset-mib/ianacharset-mib
|
||||
// - http://www.ietf.org/rfc/rfc2978.txt
|
||||
// - http://www.unicode.org/reports/tr22/
|
||||
// - http://www.w3.org/TR/encoding/
|
||||
// - https://encoding.spec.whatwg.org/
|
||||
// - https://encoding.spec.whatwg.org/encodings.json
|
||||
// - https://tools.ietf.org/html/rfc6657#section-5
|
||||
|
||||
// Interface can be implemented by Encodings to define the CCS or CES for which
|
||||
// it implements conversions.
|
||||
type Interface interface {
|
||||
// ID returns an encoding identifier. Exactly one of the mib and other
|
||||
// values should be non-zero.
|
||||
//
|
||||
// In the usual case it is only necessary to indicate the MIB code. The
|
||||
// other string can be used to specify encodings for which there is no MIB,
|
||||
// such as "x-mac-dingbat".
|
||||
//
|
||||
// The other string may only contain the characters a-z, A-Z, 0-9, - and _.
|
||||
ID() (mib MIB, other string)
|
||||
|
||||
// NOTE: the restrictions on the encoding are to allow extending the syntax
|
||||
// with additional information such as versions, vendors and other variants.
|
||||
}
|
||||
|
||||
// A MIB identifies an encoding. It is derived from the IANA MIB codes and adds
|
||||
// some identifiers for some encodings that are not covered by the IANA
|
||||
// standard.
|
||||
//
|
||||
// See http://www.iana.org/assignments/ianacharset-mib.
|
||||
type MIB uint16
|
||||
|
||||
// These additional MIB types are not defined in IANA. They are added because
|
||||
// they are common and defined within the text repo.
|
||||
const (
|
||||
// Unofficial marks the start of encodings not registered by IANA.
|
||||
Unofficial MIB = 10000 + iota
|
||||
|
||||
// Replacement is the WhatWG replacement encoding.
|
||||
Replacement
|
||||
|
||||
// XUserDefined is the code for x-user-defined.
|
||||
XUserDefined
|
||||
|
||||
// MacintoshCyrillic is the code for x-mac-cyrillic.
|
||||
MacintoshCyrillic
|
||||
)
|
||||
1621
vendor/golang.org/x/text/encoding/internal/identifier/mib.go
generated
vendored
Normal file
1621
vendor/golang.org/x/text/encoding/internal/identifier/mib.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
75
vendor/golang.org/x/text/encoding/internal/internal.go
generated
vendored
Normal file
75
vendor/golang.org/x/text/encoding/internal/internal.go
generated
vendored
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// Package internal contains code that is shared among encoding implementations.
|
||||
package internal
|
||||
|
||||
import (
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// Encoding is an implementation of the Encoding interface that adds the String
|
||||
// and ID methods to an existing encoding.
|
||||
type Encoding struct {
|
||||
encoding.Encoding
|
||||
Name string
|
||||
MIB identifier.MIB
|
||||
}
|
||||
|
||||
// _ verifies that Encoding implements identifier.Interface.
|
||||
var _ identifier.Interface = (*Encoding)(nil)
|
||||
|
||||
func (e *Encoding) String() string {
|
||||
return e.Name
|
||||
}
|
||||
|
||||
func (e *Encoding) ID() (mib identifier.MIB, other string) {
|
||||
return e.MIB, ""
|
||||
}
|
||||
|
||||
// SimpleEncoding is an Encoding that combines two Transformers.
|
||||
type SimpleEncoding struct {
|
||||
Decoder transform.Transformer
|
||||
Encoder transform.Transformer
|
||||
}
|
||||
|
||||
func (e *SimpleEncoding) NewDecoder() *encoding.Decoder {
|
||||
return &encoding.Decoder{Transformer: e.Decoder}
|
||||
}
|
||||
|
||||
func (e *SimpleEncoding) NewEncoder() *encoding.Encoder {
|
||||
return &encoding.Encoder{Transformer: e.Encoder}
|
||||
}
|
||||
|
||||
// FuncEncoding is an Encoding that combines two functions returning a new
|
||||
// Transformer.
|
||||
type FuncEncoding struct {
|
||||
Decoder func() transform.Transformer
|
||||
Encoder func() transform.Transformer
|
||||
}
|
||||
|
||||
func (e FuncEncoding) NewDecoder() *encoding.Decoder {
|
||||
return &encoding.Decoder{Transformer: e.Decoder()}
|
||||
}
|
||||
|
||||
func (e FuncEncoding) NewEncoder() *encoding.Encoder {
|
||||
return &encoding.Encoder{Transformer: e.Encoder()}
|
||||
}
|
||||
|
||||
// A RepertoireError indicates a rune is not in the repertoire of a destination
|
||||
// encoding. It is associated with an encoding-specific suggested replacement
|
||||
// byte.
|
||||
type RepertoireError byte
|
||||
|
||||
// Error implements the error interrface.
|
||||
func (r RepertoireError) Error() string {
|
||||
return "encoding: rune not supported by encoding."
|
||||
}
|
||||
|
||||
// Replacement returns the replacement string associated with this error.
|
||||
func (r RepertoireError) Replacement() byte { return byte(r) }
|
||||
|
||||
var ErrASCIIReplacement = RepertoireError(encoding.ASCIISub)
|
||||
12
vendor/golang.org/x/text/encoding/japanese/all.go
generated
vendored
Normal file
12
vendor/golang.org/x/text/encoding/japanese/all.go
generated
vendored
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package japanese
|
||||
|
||||
import (
|
||||
"golang.org/x/text/encoding"
|
||||
)
|
||||
|
||||
// All is a list of all defined encodings in this package.
|
||||
var All = []encoding.Encoding{EUCJP, ISO2022JP, ShiftJIS}
|
||||
248
vendor/golang.org/x/text/encoding/japanese/all_test.go
generated
vendored
Normal file
248
vendor/golang.org/x/text/encoding/japanese/all_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,248 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package japanese
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/enctest"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
func dec(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Decode", e.NewDecoder(), nil
|
||||
}
|
||||
func enc(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Encode", e.NewEncoder(), internal.ErrASCIIReplacement
|
||||
}
|
||||
|
||||
func TestNonRepertoire(t *testing.T) {
|
||||
// Pick n to cause the destination buffer in transform.String to overflow.
|
||||
const n = 100
|
||||
long := strings.Repeat(".", n)
|
||||
testCases := []struct {
|
||||
init func(e encoding.Encoding) (string, transform.Transformer, error)
|
||||
e encoding.Encoding
|
||||
src, want string
|
||||
}{
|
||||
{enc, EUCJP, "갂", ""},
|
||||
{enc, EUCJP, "a갂", "a"},
|
||||
{enc, EUCJP, "丌갂", "\x8f\xb0\xa4"},
|
||||
|
||||
{enc, ISO2022JP, "갂", ""},
|
||||
{enc, ISO2022JP, "a갂", "a"},
|
||||
{enc, ISO2022JP, "朗갂", "\x1b$BzF\x1b(B"}, // switch back to ASCII mode at end
|
||||
|
||||
{enc, ShiftJIS, "갂", ""},
|
||||
{enc, ShiftJIS, "a갂", "a"},
|
||||
{enc, ShiftJIS, "\u2190갂", "\x81\xa9"},
|
||||
|
||||
// Continue correctly after errors
|
||||
{dec, EUCJP, "\x8e\xa0", "\ufffd\ufffd"},
|
||||
{dec, EUCJP, "\x8e\xe0", "\ufffd"},
|
||||
{dec, EUCJP, "\x8e\xff", "\ufffd\ufffd"},
|
||||
{dec, EUCJP, "\x8ea", "\ufffda"},
|
||||
{dec, EUCJP, "\x8f\xa0", "\ufffd\ufffd"},
|
||||
{dec, EUCJP, "\x8f\xa1\xa0", "\ufffd\ufffd"},
|
||||
{dec, EUCJP, "\x8f\xa1a", "\ufffda"},
|
||||
{dec, EUCJP, "\x8f\xa1a", "\ufffda"},
|
||||
{dec, EUCJP, "\x8f\xa1a", "\ufffda"},
|
||||
{dec, EUCJP, "\x8f\xa2\xa2", "\ufffd"},
|
||||
{dec, EUCJP, "\xfe", "\ufffd"},
|
||||
{dec, EUCJP, "\xfe\xfc", "\ufffd"},
|
||||
{dec, EUCJP, "\xfe\xff", "\ufffd\ufffd"},
|
||||
// Correct handling of end of source
|
||||
{dec, EUCJP, strings.Repeat("\x8e", n), strings.Repeat("\ufffd", n)},
|
||||
{dec, EUCJP, strings.Repeat("\x8f", n), strings.Repeat("\ufffd", n)},
|
||||
{dec, EUCJP, strings.Repeat("\x8f\xa0", n), strings.Repeat("\ufffd", 2*n)},
|
||||
{dec, EUCJP, "a" + strings.Repeat("\x8f\xa1", n), "a" + strings.Repeat("\ufffd", n)},
|
||||
{dec, EUCJP, "a" + strings.Repeat("\x8f\xa1\xff", n), "a" + strings.Repeat("\ufffd", 2*n)},
|
||||
|
||||
// Continue correctly after errors
|
||||
{dec, ShiftJIS, "\x80", "\u0080"}, // It's what the spec says.
|
||||
{dec, ShiftJIS, "\x81", "\ufffd"},
|
||||
{dec, ShiftJIS, "\x81\x7f", "\ufffd\u007f"},
|
||||
{dec, ShiftJIS, "\xe0", "\ufffd"},
|
||||
{dec, ShiftJIS, "\xe0\x39", "\ufffd\u0039"},
|
||||
{dec, ShiftJIS, "\xe0\x9f", "燹"},
|
||||
{dec, ShiftJIS, "\xe0\xfd", "\ufffd"},
|
||||
{dec, ShiftJIS, "\xef\xfc", "\ufffd"},
|
||||
{dec, ShiftJIS, "\xfc\xfc", "\ufffd"},
|
||||
{dec, ShiftJIS, "\xfc\xfd", "\ufffd"},
|
||||
{dec, ShiftJIS, "\xfdaa", "\ufffdaa"},
|
||||
|
||||
{dec, ShiftJIS, strings.Repeat("\x81\x81", n), strings.Repeat("=", n)},
|
||||
{dec, ShiftJIS, strings.Repeat("\xe0\xfd", n), strings.Repeat("\ufffd", n)},
|
||||
{dec, ShiftJIS, "a" + strings.Repeat("\xe0\xfd", n), "a" + strings.Repeat("\ufffd", n)},
|
||||
|
||||
{dec, ISO2022JP, "\x1b$", "\ufffd$"},
|
||||
{dec, ISO2022JP, "\x1b(", "\ufffd("},
|
||||
{dec, ISO2022JP, "\x1b@", "\ufffd@"},
|
||||
{dec, ISO2022JP, "\x1bZ", "\ufffdZ"},
|
||||
// incomplete escapes
|
||||
{dec, ISO2022JP, "\x1b$", "\ufffd$"},
|
||||
{dec, ISO2022JP, "\x1b$J.", "\ufffd$J."}, // illegal
|
||||
{dec, ISO2022JP, "\x1b$B.", "\ufffd"}, // JIS208
|
||||
{dec, ISO2022JP, "\x1b$(", "\ufffd$("}, // JIS212
|
||||
{dec, ISO2022JP, "\x1b$(..", "\ufffd$(.."}, // JIS212
|
||||
{dec, ISO2022JP, "\x1b$(" + long, "\ufffd$(" + long}, // JIS212
|
||||
{dec, ISO2022JP, "\x1b$(D.", "\ufffd"}, // JIS212
|
||||
{dec, ISO2022JP, "\x1b$(D..", "\ufffd"}, // JIS212
|
||||
{dec, ISO2022JP, "\x1b$(D...", "\ufffd\ufffd"}, // JIS212
|
||||
{dec, ISO2022JP, "\x1b(B.", "."}, // ascii
|
||||
{dec, ISO2022JP, "\x1b(B..", ".."}, // ascii
|
||||
{dec, ISO2022JP, "\x1b(J.", "."}, // roman
|
||||
{dec, ISO2022JP, "\x1b(J..", ".."}, // roman
|
||||
{dec, ISO2022JP, "\x1b(I\x20", "\ufffd"}, // katakana
|
||||
{dec, ISO2022JP, "\x1b(I\x20\x20", "\ufffd\ufffd"}, // katakana
|
||||
// recover to same state
|
||||
{dec, ISO2022JP, "\x1b(B\x1b.", "\ufffd."},
|
||||
{dec, ISO2022JP, "\x1b(I\x1b.", "\ufffdョ"},
|
||||
{dec, ISO2022JP, "\x1b(I\x1b$.", "\ufffd、ョ"},
|
||||
{dec, ISO2022JP, "\x1b(I\x1b(.", "\ufffdィョ"},
|
||||
{dec, ISO2022JP, "\x1b$B\x7e\x7e", "\ufffd"},
|
||||
{dec, ISO2022JP, "\x1b$@\x0a.", "\x0a."},
|
||||
{dec, ISO2022JP, "\x1b$B\x0a.", "\x0a."},
|
||||
{dec, ISO2022JP, "\x1b$(D\x0a.", "\x0a."},
|
||||
{dec, ISO2022JP, "\x1b$(D\x7e\x7e", "\ufffd"},
|
||||
{dec, ISO2022JP, "\x80", "\ufffd"},
|
||||
|
||||
// TODO: according to https://encoding.spec.whatwg.org/#iso-2022-jp,
|
||||
// these should all be correct.
|
||||
// {dec, ISO2022JP, "\x1b(B\x0E", "\ufffd"},
|
||||
// {dec, ISO2022JP, "\x1b(B\x0F", "\ufffd"},
|
||||
{dec, ISO2022JP, "\x1b(B\x5C", "\u005C"},
|
||||
{dec, ISO2022JP, "\x1b(B\x7E", "\u007E"},
|
||||
// {dec, ISO2022JP, "\x1b(J\x0E", "\ufffd"},
|
||||
// {dec, ISO2022JP, "\x1b(J\x0F", "\ufffd"},
|
||||
// {dec, ISO2022JP, "\x1b(J\x5C", "\u00A5"},
|
||||
// {dec, ISO2022JP, "\x1b(J\x7E", "\u203E"},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
dir, tr, wantErr := tc.init(tc.e)
|
||||
t.Run(fmt.Sprintf("%s/%v/%q", dir, tc.e, tc.src), func(t *testing.T) {
|
||||
dst := make([]byte, 100000)
|
||||
src := []byte(tc.src)
|
||||
for i := 0; i <= len(tc.src); i++ {
|
||||
nDst, nSrc, err := tr.Transform(dst, src[:i], false)
|
||||
if err != nil && err != transform.ErrShortSrc && err != wantErr {
|
||||
t.Fatalf("error on first call to Transform: %v", err)
|
||||
}
|
||||
n, _, err := tr.Transform(dst[nDst:], src[nSrc:], true)
|
||||
nDst += n
|
||||
if err != wantErr {
|
||||
t.Fatalf("(%q|%q): got %v; want %v", tc.src[:i], tc.src[i:], err, wantErr)
|
||||
}
|
||||
if got := string(dst[:nDst]); got != tc.want {
|
||||
t.Errorf("(%q|%q):\ngot %q\nwant %q", tc.src[:i], tc.src[i:], got, tc.want)
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestCorrect(t *testing.T) {
|
||||
testCases := []struct {
|
||||
init func(e encoding.Encoding) (string, transform.Transformer, error)
|
||||
e encoding.Encoding
|
||||
src, want string
|
||||
}{
|
||||
{dec, ShiftJIS, "\x9f\xfc", "滌"},
|
||||
{dec, ShiftJIS, "\xfb\xfc", "髙"},
|
||||
{dec, ShiftJIS, "\xfa\xb1", "﨑"},
|
||||
{enc, ShiftJIS, "滌", "\x9f\xfc"},
|
||||
{enc, ShiftJIS, "﨑", "\xed\x95"},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
dir, tr, _ := tc.init(tc.e)
|
||||
|
||||
dst, _, err := transform.String(tr, tc.src)
|
||||
if err != nil {
|
||||
t.Errorf("%s %v(%q): got %v; want %v", dir, tc.e, tc.src, err, nil)
|
||||
}
|
||||
if got := string(dst); got != tc.want {
|
||||
t.Errorf("%s %v(%q):\ngot %q\nwant %q", dir, tc.e, tc.src, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestBasics(t *testing.T) {
|
||||
// The encoded forms can be verified by the iconv program:
|
||||
// $ echo 月日は百代 | iconv -f UTF-8 -t SHIFT-JIS | xxd
|
||||
testCases := []struct {
|
||||
e encoding.Encoding
|
||||
encPrefix string
|
||||
encSuffix string
|
||||
encoded string
|
||||
utf8 string
|
||||
}{{
|
||||
// "A。カ゚ 0208: etc 0212: etc" is a nonsense string that contains ASCII, half-width
|
||||
// kana, JIS X 0208 (including two near the kink in the Shift JIS second byte
|
||||
// encoding) and JIS X 0212 encodable codepoints.
|
||||
//
|
||||
// "月日は百代の過客にして、行かふ年も又旅人也。" is from the 17th century poem
|
||||
// "Oku no Hosomichi" and contains both hiragana and kanji.
|
||||
e: EUCJP,
|
||||
encoded: "A\x8e\xa1\x8e\xb6\x8e\xdf " +
|
||||
"0208: \xa1\xa1\xa1\xa2\xa1\xdf\xa1\xe0\xa1\xfd\xa1\xfe\xa2\xa1\xa2\xa2\xf4\xa6 " +
|
||||
"0212: \x8f\xa2\xaf\x8f\xed\xe3",
|
||||
utf8: "A。カ゚ " +
|
||||
"0208: \u3000\u3001\u00d7\u00f7\u25ce\u25c7\u25c6\u25a1\u7199 " +
|
||||
"0212: \u02d8\u9fa5",
|
||||
}, {
|
||||
e: EUCJP,
|
||||
encoded: "\xb7\xee\xc6\xfc\xa4\xcf\xc9\xb4\xc2\xe5\xa4\xce\xb2\xe1\xb5\xd2" +
|
||||
"\xa4\xcb\xa4\xb7\xa4\xc6\xa1\xa2\xb9\xd4\xa4\xab\xa4\xd5\xc7\xaf" +
|
||||
"\xa4\xe2\xcb\xf4\xce\xb9\xbf\xcd\xcc\xe9\xa1\xa3",
|
||||
utf8: "月日は百代の過客にして、行かふ年も又旅人也。",
|
||||
}, {
|
||||
e: ISO2022JP,
|
||||
encSuffix: "\x1b\x28\x42",
|
||||
encoded: "\x1b\x28\x49\x21\x36\x5f\x1b\x28\x42 " +
|
||||
"0208: \x1b\x24\x42\x21\x21\x21\x22\x21\x5f\x21\x60\x21\x7d\x21\x7e\x22\x21\x22\x22\x74\x26",
|
||||
utf8: "。カ゚ " +
|
||||
"0208: \u3000\u3001\u00d7\u00f7\u25ce\u25c7\u25c6\u25a1\u7199",
|
||||
}, {
|
||||
e: ISO2022JP,
|
||||
encPrefix: "\x1b\x24\x42",
|
||||
encSuffix: "\x1b\x28\x42",
|
||||
encoded: "\x37\x6e\x46\x7c\x24\x4f\x49\x34\x42\x65\x24\x4e\x32\x61\x35\x52" +
|
||||
"\x24\x4b\x24\x37\x24\x46\x21\x22\x39\x54\x24\x2b\x24\x55\x47\x2f" +
|
||||
"\x24\x62\x4b\x74\x4e\x39\x3f\x4d\x4c\x69\x21\x23",
|
||||
utf8: "月日は百代の過客にして、行かふ年も又旅人也。",
|
||||
}, {
|
||||
e: ShiftJIS,
|
||||
encoded: "A\xa1\xb6\xdf " +
|
||||
"0208: \x81\x40\x81\x41\x81\x7e\x81\x80\x81\x9d\x81\x9e\x81\x9f\x81\xa0\xea\xa4",
|
||||
utf8: "A。カ゚ " +
|
||||
"0208: \u3000\u3001\u00d7\u00f7\u25ce\u25c7\u25c6\u25a1\u7199",
|
||||
}, {
|
||||
e: ShiftJIS,
|
||||
encoded: "\x8c\x8e\x93\xfa\x82\xcd\x95\x53\x91\xe3\x82\xcc\x89\xdf\x8b\x71" +
|
||||
"\x82\xc9\x82\xb5\x82\xc4\x81\x41\x8d\x73\x82\xa9\x82\xd3\x94\x4e" +
|
||||
"\x82\xe0\x96\x94\x97\xb7\x90\x6c\x96\xe7\x81\x42",
|
||||
utf8: "月日は百代の過客にして、行かふ年も又旅人也。",
|
||||
}}
|
||||
|
||||
for _, tc := range testCases {
|
||||
enctest.TestEncoding(t, tc.e, tc.encoded, tc.utf8, tc.encPrefix, tc.encSuffix)
|
||||
}
|
||||
}
|
||||
|
||||
func TestFiles(t *testing.T) {
|
||||
enctest.TestFile(t, EUCJP)
|
||||
enctest.TestFile(t, ISO2022JP)
|
||||
enctest.TestFile(t, ShiftJIS)
|
||||
}
|
||||
|
||||
func BenchmarkEncoding(b *testing.B) {
|
||||
enctest.Benchmark(b, EUCJP)
|
||||
enctest.Benchmark(b, ISO2022JP)
|
||||
enctest.Benchmark(b, ShiftJIS)
|
||||
}
|
||||
225
vendor/golang.org/x/text/encoding/japanese/eucjp.go
generated
vendored
Normal file
225
vendor/golang.org/x/text/encoding/japanese/eucjp.go
generated
vendored
Normal file
|
|
@ -0,0 +1,225 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package japanese
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// EUCJP is the EUC-JP encoding.
|
||||
var EUCJP encoding.Encoding = &eucJP
|
||||
|
||||
var eucJP = internal.Encoding{
|
||||
&internal.SimpleEncoding{eucJPDecoder{}, eucJPEncoder{}},
|
||||
"EUC-JP",
|
||||
identifier.EUCPkdFmtJapanese,
|
||||
}
|
||||
|
||||
type eucJPDecoder struct{ transform.NopResetter }
|
||||
|
||||
// See https://encoding.spec.whatwg.org/#euc-jp-decoder.
|
||||
func (eucJPDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
switch c0 := src[nSrc]; {
|
||||
case c0 < utf8.RuneSelf:
|
||||
r, size = rune(c0), 1
|
||||
|
||||
case c0 == 0x8e:
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = utf8.RuneError, 1
|
||||
break
|
||||
}
|
||||
c1 := src[nSrc+1]
|
||||
switch {
|
||||
case c1 < 0xa1:
|
||||
r, size = utf8.RuneError, 1
|
||||
case c1 > 0xdf:
|
||||
r, size = utf8.RuneError, 2
|
||||
if c1 == 0xff {
|
||||
size = 1
|
||||
}
|
||||
default:
|
||||
r, size = rune(c1)+(0xff61-0xa1), 2
|
||||
}
|
||||
case c0 == 0x8f:
|
||||
if nSrc+2 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = utf8.RuneError, 1
|
||||
if p := nSrc + 1; p < len(src) && 0xa1 <= src[p] && src[p] < 0xfe {
|
||||
size = 2
|
||||
}
|
||||
break
|
||||
}
|
||||
c1 := src[nSrc+1]
|
||||
if c1 < 0xa1 || 0xfe < c1 {
|
||||
r, size = utf8.RuneError, 1
|
||||
break
|
||||
}
|
||||
c2 := src[nSrc+2]
|
||||
if c2 < 0xa1 || 0xfe < c2 {
|
||||
r, size = utf8.RuneError, 2
|
||||
break
|
||||
}
|
||||
r, size = utf8.RuneError, 3
|
||||
if i := int(c1-0xa1)*94 + int(c2-0xa1); i < len(jis0212Decode) {
|
||||
r = rune(jis0212Decode[i])
|
||||
if r == 0 {
|
||||
r = utf8.RuneError
|
||||
}
|
||||
}
|
||||
|
||||
case 0xa1 <= c0 && c0 <= 0xfe:
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = utf8.RuneError, 1
|
||||
break
|
||||
}
|
||||
c1 := src[nSrc+1]
|
||||
if c1 < 0xa1 || 0xfe < c1 {
|
||||
r, size = utf8.RuneError, 1
|
||||
break
|
||||
}
|
||||
r, size = utf8.RuneError, 2
|
||||
if i := int(c0-0xa1)*94 + int(c1-0xa1); i < len(jis0208Decode) {
|
||||
r = rune(jis0208Decode[i])
|
||||
if r == 0 {
|
||||
r = utf8.RuneError
|
||||
}
|
||||
}
|
||||
|
||||
default:
|
||||
r, size = utf8.RuneError, 1
|
||||
}
|
||||
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break loop
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
type eucJPEncoder struct{ transform.NopResetter }
|
||||
|
||||
func (eucJPEncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// func init checks that the switch covers all tables.
|
||||
switch {
|
||||
case encode0Low <= r && r < encode0High:
|
||||
if r = rune(encode0[r-encode0Low]); r != 0 {
|
||||
goto write2or3
|
||||
}
|
||||
case encode1Low <= r && r < encode1High:
|
||||
if r = rune(encode1[r-encode1Low]); r != 0 {
|
||||
goto write2or3
|
||||
}
|
||||
case encode2Low <= r && r < encode2High:
|
||||
if r = rune(encode2[r-encode2Low]); r != 0 {
|
||||
goto write2or3
|
||||
}
|
||||
case encode3Low <= r && r < encode3High:
|
||||
if r = rune(encode3[r-encode3Low]); r != 0 {
|
||||
goto write2or3
|
||||
}
|
||||
case encode4Low <= r && r < encode4High:
|
||||
if r = rune(encode4[r-encode4Low]); r != 0 {
|
||||
goto write2or3
|
||||
}
|
||||
case encode5Low <= r && r < encode5High:
|
||||
if 0xff61 <= r && r < 0xffa0 {
|
||||
goto write2
|
||||
}
|
||||
if r = rune(encode5[r-encode5Low]); r != 0 {
|
||||
goto write2or3
|
||||
}
|
||||
}
|
||||
err = internal.ErrASCIIReplacement
|
||||
break
|
||||
}
|
||||
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r)
|
||||
nDst++
|
||||
continue
|
||||
|
||||
write2or3:
|
||||
if r>>tableShift == jis0208 {
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
} else {
|
||||
if nDst+3 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = 0x8f
|
||||
nDst++
|
||||
}
|
||||
dst[nDst+0] = 0xa1 + uint8(r>>codeShift)&codeMask
|
||||
dst[nDst+1] = 0xa1 + uint8(r)&codeMask
|
||||
nDst += 2
|
||||
continue
|
||||
|
||||
write2:
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = 0x8e
|
||||
dst[nDst+1] = uint8(r - (0xff61 - 0xa1))
|
||||
nDst += 2
|
||||
continue
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
func init() {
|
||||
// Check that the hard-coded encode switch covers all tables.
|
||||
if numEncodeTables != 6 {
|
||||
panic("bad numEncodeTables")
|
||||
}
|
||||
}
|
||||
299
vendor/golang.org/x/text/encoding/japanese/iso2022jp.go
generated
vendored
Normal file
299
vendor/golang.org/x/text/encoding/japanese/iso2022jp.go
generated
vendored
Normal file
|
|
@ -0,0 +1,299 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package japanese
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// ISO2022JP is the ISO-2022-JP encoding.
|
||||
var ISO2022JP encoding.Encoding = &iso2022JP
|
||||
|
||||
var iso2022JP = internal.Encoding{
|
||||
internal.FuncEncoding{iso2022JPNewDecoder, iso2022JPNewEncoder},
|
||||
"ISO-2022-JP",
|
||||
identifier.ISO2022JP,
|
||||
}
|
||||
|
||||
func iso2022JPNewDecoder() transform.Transformer {
|
||||
return new(iso2022JPDecoder)
|
||||
}
|
||||
|
||||
func iso2022JPNewEncoder() transform.Transformer {
|
||||
return new(iso2022JPEncoder)
|
||||
}
|
||||
|
||||
const (
|
||||
asciiState = iota
|
||||
katakanaState
|
||||
jis0208State
|
||||
jis0212State
|
||||
)
|
||||
|
||||
const asciiEsc = 0x1b
|
||||
|
||||
type iso2022JPDecoder int
|
||||
|
||||
func (d *iso2022JPDecoder) Reset() {
|
||||
*d = asciiState
|
||||
}
|
||||
|
||||
func (d *iso2022JPDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
c0 := src[nSrc]
|
||||
if c0 >= utf8.RuneSelf {
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
|
||||
if c0 == asciiEsc {
|
||||
if nSrc+2 >= len(src) {
|
||||
if !atEOF {
|
||||
return nDst, nSrc, transform.ErrShortSrc
|
||||
}
|
||||
// TODO: is it correct to only skip 1??
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
size = 3
|
||||
c1 := src[nSrc+1]
|
||||
c2 := src[nSrc+2]
|
||||
switch {
|
||||
case c1 == '$' && (c2 == '@' || c2 == 'B'): // 0x24 {0x40, 0x42}
|
||||
*d = jis0208State
|
||||
continue
|
||||
case c1 == '$' && c2 == '(': // 0x24 0x28
|
||||
if nSrc+3 >= len(src) {
|
||||
if !atEOF {
|
||||
return nDst, nSrc, transform.ErrShortSrc
|
||||
}
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
size = 4
|
||||
if src[nSrc+3] == 'D' {
|
||||
*d = jis0212State
|
||||
continue
|
||||
}
|
||||
case c1 == '(' && (c2 == 'B' || c2 == 'J'): // 0x28 {0x42, 0x4A}
|
||||
*d = asciiState
|
||||
continue
|
||||
case c1 == '(' && c2 == 'I': // 0x28 0x49
|
||||
*d = katakanaState
|
||||
continue
|
||||
}
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
|
||||
switch *d {
|
||||
case asciiState:
|
||||
r, size = rune(c0), 1
|
||||
|
||||
case katakanaState:
|
||||
if c0 < 0x21 || 0x60 <= c0 {
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
r, size = rune(c0)+(0xff61-0x21), 1
|
||||
|
||||
default:
|
||||
if c0 == 0x0a {
|
||||
*d = asciiState
|
||||
r, size = rune(c0), 1
|
||||
goto write
|
||||
}
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
return nDst, nSrc, transform.ErrShortSrc
|
||||
}
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
size = 2
|
||||
c1 := src[nSrc+1]
|
||||
i := int(c0-0x21)*94 + int(c1-0x21)
|
||||
if *d == jis0208State && i < len(jis0208Decode) {
|
||||
r = rune(jis0208Decode[i])
|
||||
} else if *d == jis0212State && i < len(jis0212Decode) {
|
||||
r = rune(jis0212Decode[i])
|
||||
} else {
|
||||
r = '\ufffd'
|
||||
goto write
|
||||
}
|
||||
if r == 0 {
|
||||
r = '\ufffd'
|
||||
}
|
||||
}
|
||||
|
||||
write:
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
return nDst, nSrc, transform.ErrShortDst
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
type iso2022JPEncoder int
|
||||
|
||||
func (e *iso2022JPEncoder) Reset() {
|
||||
*e = asciiState
|
||||
}
|
||||
|
||||
func (e *iso2022JPEncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// func init checks that the switch covers all tables.
|
||||
//
|
||||
// http://encoding.spec.whatwg.org/#iso-2022-jp says that "the index jis0212
|
||||
// is not used by the iso-2022-jp encoder due to lack of widespread support".
|
||||
//
|
||||
// TODO: do we have to special-case U+00A5 and U+203E, as per
|
||||
// http://encoding.spec.whatwg.org/#iso-2022-jp
|
||||
// Doing so would mean that "\u00a5" would not be preserved
|
||||
// after an encode-decode round trip.
|
||||
switch {
|
||||
case encode0Low <= r && r < encode0High:
|
||||
if r = rune(encode0[r-encode0Low]); r>>tableShift == jis0208 {
|
||||
goto writeJIS
|
||||
}
|
||||
case encode1Low <= r && r < encode1High:
|
||||
if r = rune(encode1[r-encode1Low]); r>>tableShift == jis0208 {
|
||||
goto writeJIS
|
||||
}
|
||||
case encode2Low <= r && r < encode2High:
|
||||
if r = rune(encode2[r-encode2Low]); r>>tableShift == jis0208 {
|
||||
goto writeJIS
|
||||
}
|
||||
case encode3Low <= r && r < encode3High:
|
||||
if r = rune(encode3[r-encode3Low]); r>>tableShift == jis0208 {
|
||||
goto writeJIS
|
||||
}
|
||||
case encode4Low <= r && r < encode4High:
|
||||
if r = rune(encode4[r-encode4Low]); r>>tableShift == jis0208 {
|
||||
goto writeJIS
|
||||
}
|
||||
case encode5Low <= r && r < encode5High:
|
||||
if 0xff61 <= r && r < 0xffa0 {
|
||||
goto writeKatakana
|
||||
}
|
||||
if r = rune(encode5[r-encode5Low]); r>>tableShift == jis0208 {
|
||||
goto writeJIS
|
||||
}
|
||||
}
|
||||
|
||||
// Switch back to ASCII state in case of error so that an ASCII
|
||||
// replacement character can be written in the correct state.
|
||||
if *e != asciiState {
|
||||
if nDst+3 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
*e = asciiState
|
||||
dst[nDst+0] = asciiEsc
|
||||
dst[nDst+1] = '('
|
||||
dst[nDst+2] = 'B'
|
||||
nDst += 3
|
||||
}
|
||||
err = internal.ErrASCIIReplacement
|
||||
break
|
||||
}
|
||||
|
||||
if *e != asciiState {
|
||||
if nDst+4 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
*e = asciiState
|
||||
dst[nDst+0] = asciiEsc
|
||||
dst[nDst+1] = '('
|
||||
dst[nDst+2] = 'B'
|
||||
nDst += 3
|
||||
} else if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r)
|
||||
nDst++
|
||||
continue
|
||||
|
||||
writeJIS:
|
||||
if *e != jis0208State {
|
||||
if nDst+5 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
*e = jis0208State
|
||||
dst[nDst+0] = asciiEsc
|
||||
dst[nDst+1] = '$'
|
||||
dst[nDst+2] = 'B'
|
||||
nDst += 3
|
||||
} else if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = 0x21 + uint8(r>>codeShift)&codeMask
|
||||
dst[nDst+1] = 0x21 + uint8(r)&codeMask
|
||||
nDst += 2
|
||||
continue
|
||||
|
||||
writeKatakana:
|
||||
if *e != katakanaState {
|
||||
if nDst+4 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
*e = katakanaState
|
||||
dst[nDst+0] = asciiEsc
|
||||
dst[nDst+1] = '('
|
||||
dst[nDst+2] = 'I'
|
||||
nDst += 3
|
||||
} else if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r - (0xff61 - 0x21))
|
||||
nDst++
|
||||
continue
|
||||
}
|
||||
if atEOF && err == nil && *e != asciiState {
|
||||
if nDst+3 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
} else {
|
||||
*e = asciiState
|
||||
dst[nDst+0] = asciiEsc
|
||||
dst[nDst+1] = '('
|
||||
dst[nDst+2] = 'B'
|
||||
nDst += 3
|
||||
}
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
161
vendor/golang.org/x/text/encoding/japanese/maketables.go
generated
vendored
Normal file
161
vendor/golang.org/x/text/encoding/japanese/maketables.go
generated
vendored
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
// This program generates tables.go:
|
||||
// go run maketables.go | gofmt > tables.go
|
||||
|
||||
// TODO: Emoji extensions?
|
||||
// http://www.unicode.org/faq/emoji_dingbats.html
|
||||
// http://www.unicode.org/Public/UNIDATA/EmojiSources.txt
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"sort"
|
||||
"strings"
|
||||
)
|
||||
|
||||
type entry struct {
|
||||
jisCode, table int
|
||||
}
|
||||
|
||||
func main() {
|
||||
fmt.Printf("// generated by go run maketables.go; DO NOT EDIT\n\n")
|
||||
fmt.Printf("// Package japanese provides Japanese encodings such as EUC-JP and Shift JIS.\n")
|
||||
fmt.Printf(`package japanese // import "golang.org/x/text/encoding/japanese"` + "\n\n")
|
||||
|
||||
reverse := [65536]entry{}
|
||||
for i := range reverse {
|
||||
reverse[i].table = -1
|
||||
}
|
||||
|
||||
tables := []struct {
|
||||
url string
|
||||
name string
|
||||
}{
|
||||
{"http://encoding.spec.whatwg.org/index-jis0208.txt", "0208"},
|
||||
{"http://encoding.spec.whatwg.org/index-jis0212.txt", "0212"},
|
||||
}
|
||||
for i, table := range tables {
|
||||
res, err := http.Get(table.url)
|
||||
if err != nil {
|
||||
log.Fatalf("%q: Get: %v", table.url, err)
|
||||
}
|
||||
defer res.Body.Close()
|
||||
|
||||
mapping := [65536]uint16{}
|
||||
|
||||
scanner := bufio.NewScanner(res.Body)
|
||||
for scanner.Scan() {
|
||||
s := strings.TrimSpace(scanner.Text())
|
||||
if s == "" || s[0] == '#' {
|
||||
continue
|
||||
}
|
||||
x, y := 0, uint16(0)
|
||||
if _, err := fmt.Sscanf(s, "%d 0x%x", &x, &y); err != nil {
|
||||
log.Fatalf("%q: could not parse %q", table.url, s)
|
||||
}
|
||||
if x < 0 || 120*94 <= x {
|
||||
log.Fatalf("%q: JIS code %d is out of range", table.url, x)
|
||||
}
|
||||
mapping[x] = y
|
||||
if reverse[y].table == -1 {
|
||||
reverse[y] = entry{jisCode: x, table: i}
|
||||
}
|
||||
}
|
||||
if err := scanner.Err(); err != nil {
|
||||
log.Fatalf("%q: scanner error: %v", table.url, err)
|
||||
}
|
||||
|
||||
fmt.Printf("// jis%sDecode is the decoding table from JIS %s code to Unicode.\n// It is defined at %s\n",
|
||||
table.name, table.name, table.url)
|
||||
fmt.Printf("var jis%sDecode = [...]uint16{\n", table.name)
|
||||
for i, m := range mapping {
|
||||
if m != 0 {
|
||||
fmt.Printf("\t%d: 0x%04X,\n", i, m)
|
||||
}
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
}
|
||||
|
||||
// Any run of at least separation continuous zero entries in the reverse map will
|
||||
// be a separate encode table.
|
||||
const separation = 1024
|
||||
|
||||
intervals := []interval(nil)
|
||||
low, high := -1, -1
|
||||
for i, v := range reverse {
|
||||
if v.table == -1 {
|
||||
continue
|
||||
}
|
||||
if low < 0 {
|
||||
low = i
|
||||
} else if i-high >= separation {
|
||||
if high >= 0 {
|
||||
intervals = append(intervals, interval{low, high})
|
||||
}
|
||||
low = i
|
||||
}
|
||||
high = i + 1
|
||||
}
|
||||
if high >= 0 {
|
||||
intervals = append(intervals, interval{low, high})
|
||||
}
|
||||
sort.Sort(byDecreasingLength(intervals))
|
||||
|
||||
fmt.Printf("const (\n")
|
||||
fmt.Printf("\tjis0208 = 1\n")
|
||||
fmt.Printf("\tjis0212 = 2\n")
|
||||
fmt.Printf("\tcodeMask = 0x7f\n")
|
||||
fmt.Printf("\tcodeShift = 7\n")
|
||||
fmt.Printf("\ttableShift = 14\n")
|
||||
fmt.Printf(")\n\n")
|
||||
|
||||
fmt.Printf("const numEncodeTables = %d\n\n", len(intervals))
|
||||
fmt.Printf("// encodeX are the encoding tables from Unicode to JIS code,\n")
|
||||
fmt.Printf("// sorted by decreasing length.\n")
|
||||
for i, v := range intervals {
|
||||
fmt.Printf("// encode%d: %5d entries for runes in [%5d, %5d).\n", i, v.len(), v.low, v.high)
|
||||
}
|
||||
fmt.Printf("//\n")
|
||||
fmt.Printf("// The high two bits of the value record whether the JIS code comes from the\n")
|
||||
fmt.Printf("// JIS0208 table (high bits == 1) or the JIS0212 table (high bits == 2).\n")
|
||||
fmt.Printf("// The low 14 bits are two 7-bit unsigned integers j1 and j2 that form the\n")
|
||||
fmt.Printf("// JIS code (94*j1 + j2) within that table.\n")
|
||||
fmt.Printf("\n")
|
||||
|
||||
for i, v := range intervals {
|
||||
fmt.Printf("const encode%dLow, encode%dHigh = %d, %d\n\n", i, i, v.low, v.high)
|
||||
fmt.Printf("var encode%d = [...]uint16{\n", i)
|
||||
for j := v.low; j < v.high; j++ {
|
||||
x := reverse[j]
|
||||
if x.table == -1 {
|
||||
continue
|
||||
}
|
||||
fmt.Printf("\t%d - %d: jis%s<<14 | 0x%02X<<7 | 0x%02X,\n",
|
||||
j, v.low, tables[x.table].name, x.jisCode/94, x.jisCode%94)
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
}
|
||||
}
|
||||
|
||||
// interval is a half-open interval [low, high).
|
||||
type interval struct {
|
||||
low, high int
|
||||
}
|
||||
|
||||
func (i interval) len() int { return i.high - i.low }
|
||||
|
||||
// byDecreasingLength sorts intervals by decreasing length.
|
||||
type byDecreasingLength []interval
|
||||
|
||||
func (b byDecreasingLength) Len() int { return len(b) }
|
||||
func (b byDecreasingLength) Less(i, j int) bool { return b[i].len() > b[j].len() }
|
||||
func (b byDecreasingLength) Swap(i, j int) { b[i], b[j] = b[j], b[i] }
|
||||
189
vendor/golang.org/x/text/encoding/japanese/shiftjis.go
generated
vendored
Normal file
189
vendor/golang.org/x/text/encoding/japanese/shiftjis.go
generated
vendored
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package japanese
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// ShiftJIS is the Shift JIS encoding, also known as Code Page 932 and
|
||||
// Windows-31J.
|
||||
var ShiftJIS encoding.Encoding = &shiftJIS
|
||||
|
||||
var shiftJIS = internal.Encoding{
|
||||
&internal.SimpleEncoding{shiftJISDecoder{}, shiftJISEncoder{}},
|
||||
"Shift JIS",
|
||||
identifier.ShiftJIS,
|
||||
}
|
||||
|
||||
type shiftJISDecoder struct{ transform.NopResetter }
|
||||
|
||||
func (shiftJISDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
switch c0 := src[nSrc]; {
|
||||
case c0 < utf8.RuneSelf:
|
||||
r, size = rune(c0), 1
|
||||
|
||||
case 0xa1 <= c0 && c0 < 0xe0:
|
||||
r, size = rune(c0)+(0xff61-0xa1), 1
|
||||
|
||||
case (0x81 <= c0 && c0 < 0xa0) || (0xe0 <= c0 && c0 < 0xfd):
|
||||
if c0 <= 0x9f {
|
||||
c0 -= 0x70
|
||||
} else {
|
||||
c0 -= 0xb0
|
||||
}
|
||||
c0 = 2*c0 - 0x21
|
||||
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = '\ufffd', 1
|
||||
goto write
|
||||
}
|
||||
c1 := src[nSrc+1]
|
||||
switch {
|
||||
case c1 < 0x40:
|
||||
r, size = '\ufffd', 1 // c1 is ASCII so output on next round
|
||||
goto write
|
||||
case c1 < 0x7f:
|
||||
c0--
|
||||
c1 -= 0x40
|
||||
case c1 == 0x7f:
|
||||
r, size = '\ufffd', 1 // c1 is ASCII so output on next round
|
||||
goto write
|
||||
case c1 < 0x9f:
|
||||
c0--
|
||||
c1 -= 0x41
|
||||
case c1 < 0xfd:
|
||||
c1 -= 0x9f
|
||||
default:
|
||||
r, size = '\ufffd', 2
|
||||
goto write
|
||||
}
|
||||
r, size = '\ufffd', 2
|
||||
if i := int(c0)*94 + int(c1); i < len(jis0208Decode) {
|
||||
r = rune(jis0208Decode[i])
|
||||
if r == 0 {
|
||||
r = '\ufffd'
|
||||
}
|
||||
}
|
||||
|
||||
case c0 == 0x80:
|
||||
r, size = 0x80, 1
|
||||
|
||||
default:
|
||||
r, size = '\ufffd', 1
|
||||
}
|
||||
write:
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break loop
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
type shiftJISEncoder struct{ transform.NopResetter }
|
||||
|
||||
func (shiftJISEncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
}
|
||||
|
||||
// func init checks that the switch covers all tables.
|
||||
switch {
|
||||
case encode0Low <= r && r < encode0High:
|
||||
if r = rune(encode0[r-encode0Low]); r>>tableShift == jis0208 {
|
||||
goto write2
|
||||
}
|
||||
case encode1Low <= r && r < encode1High:
|
||||
if r = rune(encode1[r-encode1Low]); r>>tableShift == jis0208 {
|
||||
goto write2
|
||||
}
|
||||
case encode2Low <= r && r < encode2High:
|
||||
if r = rune(encode2[r-encode2Low]); r>>tableShift == jis0208 {
|
||||
goto write2
|
||||
}
|
||||
case encode3Low <= r && r < encode3High:
|
||||
if r = rune(encode3[r-encode3Low]); r>>tableShift == jis0208 {
|
||||
goto write2
|
||||
}
|
||||
case encode4Low <= r && r < encode4High:
|
||||
if r = rune(encode4[r-encode4Low]); r>>tableShift == jis0208 {
|
||||
goto write2
|
||||
}
|
||||
case encode5Low <= r && r < encode5High:
|
||||
if 0xff61 <= r && r < 0xffa0 {
|
||||
r -= 0xff61 - 0xa1
|
||||
goto write1
|
||||
}
|
||||
if r = rune(encode5[r-encode5Low]); r>>tableShift == jis0208 {
|
||||
goto write2
|
||||
}
|
||||
}
|
||||
err = internal.ErrASCIIReplacement
|
||||
break
|
||||
}
|
||||
|
||||
write1:
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r)
|
||||
nDst++
|
||||
continue
|
||||
|
||||
write2:
|
||||
j1 := uint8(r>>codeShift) & codeMask
|
||||
j2 := uint8(r) & codeMask
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break loop
|
||||
}
|
||||
if j1 <= 61 {
|
||||
dst[nDst+0] = 129 + j1/2
|
||||
} else {
|
||||
dst[nDst+0] = 193 + j1/2
|
||||
}
|
||||
if j1&1 == 0 {
|
||||
dst[nDst+1] = j2 + j2/63 + 64
|
||||
} else {
|
||||
dst[nDst+1] = j2 + 159
|
||||
}
|
||||
nDst += 2
|
||||
continue
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
26971
vendor/golang.org/x/text/encoding/japanese/tables.go
generated
vendored
Normal file
26971
vendor/golang.org/x/text/encoding/japanese/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
94
vendor/golang.org/x/text/encoding/korean/all_test.go
generated
vendored
Normal file
94
vendor/golang.org/x/text/encoding/korean/all_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package korean
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/enctest"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
func dec(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Decode", e.NewDecoder(), nil
|
||||
}
|
||||
func enc(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Encode", e.NewEncoder(), internal.ErrASCIIReplacement
|
||||
}
|
||||
|
||||
func TestNonRepertoire(t *testing.T) {
|
||||
// Pick n large enough to cause an overflow in the destination buffer of
|
||||
// transform.String.
|
||||
const n = 10000
|
||||
testCases := []struct {
|
||||
init func(e encoding.Encoding) (string, transform.Transformer, error)
|
||||
e encoding.Encoding
|
||||
src, want string
|
||||
}{
|
||||
{dec, EUCKR, "\xfe\xfe", "\ufffd"},
|
||||
// {dec, EUCKR, "א", "\ufffd"}, // TODO: why is this different?
|
||||
|
||||
{enc, EUCKR, "א", ""},
|
||||
{enc, EUCKR, "aא", "a"},
|
||||
{enc, EUCKR, "\uac00א", "\xb0\xa1"},
|
||||
// TODO: should we also handle Jamo?
|
||||
|
||||
{dec, EUCKR, "\x80", "\ufffd"},
|
||||
{dec, EUCKR, "\xff", "\ufffd"},
|
||||
{dec, EUCKR, "\x81", "\ufffd"},
|
||||
{dec, EUCKR, "\xb0\x40", "\ufffd@"},
|
||||
{dec, EUCKR, "\xb0\xff", "\ufffd"},
|
||||
{dec, EUCKR, "\xd0\x20", "\ufffd "},
|
||||
{dec, EUCKR, "\xd0\xff", "\ufffd"},
|
||||
|
||||
{dec, EUCKR, strings.Repeat("\x81", n), strings.Repeat("걖", n/2)},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
dir, tr, wantErr := tc.init(tc.e)
|
||||
|
||||
dst, _, err := transform.String(tr, tc.src)
|
||||
if err != wantErr {
|
||||
t.Errorf("%s %v(%q): got %v; want %v", dir, tc.e, tc.src, err, wantErr)
|
||||
}
|
||||
if got := string(dst); got != tc.want {
|
||||
t.Errorf("%s %v(%q):\ngot %q\nwant %q", dir, tc.e, tc.src, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestBasics(t *testing.T) {
|
||||
// The encoded forms can be verified by the iconv program:
|
||||
// $ echo 月日は百代 | iconv -f UTF-8 -t SHIFT-JIS | xxd
|
||||
testCases := []struct {
|
||||
e encoding.Encoding
|
||||
encoded string
|
||||
utf8 string
|
||||
}{{
|
||||
// Korean tests.
|
||||
//
|
||||
// "A\uac02\uac35\uac56\ud401B\ud408\ud620\ud624C\u4f3d\u8a70D" is a
|
||||
// nonsense string that contains ASCII, Hangul and CJK ideographs.
|
||||
//
|
||||
// "세계야, 안녕" translates as "Hello, world".
|
||||
e: EUCKR,
|
||||
encoded: "A\x81\x41\x81\x61\x81\x81\xc6\xfeB\xc7\xa1\xc7\xfe\xc8\xa1C\xca\xa1\xfd\xfeD",
|
||||
utf8: "A\uac02\uac35\uac56\ud401B\ud408\ud620\ud624C\u4f3d\u8a70D",
|
||||
}, {
|
||||
e: EUCKR,
|
||||
encoded: "\xbc\xbc\xb0\xe8\xbe\xdf\x2c\x20\xbe\xc8\xb3\xe7",
|
||||
utf8: "세계야, 안녕",
|
||||
}}
|
||||
|
||||
for _, tc := range testCases {
|
||||
enctest.TestEncoding(t, tc.e, tc.encoded, tc.utf8, "", "")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFiles(t *testing.T) { enctest.TestFile(t, EUCKR) }
|
||||
|
||||
func BenchmarkEncoding(b *testing.B) { enctest.Benchmark(b, EUCKR) }
|
||||
177
vendor/golang.org/x/text/encoding/korean/euckr.go
generated
vendored
Normal file
177
vendor/golang.org/x/text/encoding/korean/euckr.go
generated
vendored
Normal file
|
|
@ -0,0 +1,177 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package korean
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// All is a list of all defined encodings in this package.
|
||||
var All = []encoding.Encoding{EUCKR}
|
||||
|
||||
// EUCKR is the EUC-KR encoding, also known as Code Page 949.
|
||||
var EUCKR encoding.Encoding = &eucKR
|
||||
|
||||
var eucKR = internal.Encoding{
|
||||
&internal.SimpleEncoding{eucKRDecoder{}, eucKREncoder{}},
|
||||
"EUC-KR",
|
||||
identifier.EUCKR,
|
||||
}
|
||||
|
||||
type eucKRDecoder struct{ transform.NopResetter }
|
||||
|
||||
func (eucKRDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
switch c0 := src[nSrc]; {
|
||||
case c0 < utf8.RuneSelf:
|
||||
r, size = rune(c0), 1
|
||||
|
||||
case 0x81 <= c0 && c0 < 0xff:
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = utf8.RuneError, 1
|
||||
break
|
||||
}
|
||||
c1 := src[nSrc+1]
|
||||
size = 2
|
||||
if c0 < 0xc7 {
|
||||
r = 178 * rune(c0-0x81)
|
||||
switch {
|
||||
case 0x41 <= c1 && c1 < 0x5b:
|
||||
r += rune(c1) - (0x41 - 0*26)
|
||||
case 0x61 <= c1 && c1 < 0x7b:
|
||||
r += rune(c1) - (0x61 - 1*26)
|
||||
case 0x81 <= c1 && c1 < 0xff:
|
||||
r += rune(c1) - (0x81 - 2*26)
|
||||
default:
|
||||
goto decError
|
||||
}
|
||||
} else if 0xa1 <= c1 && c1 < 0xff {
|
||||
r = 178*(0xc7-0x81) + rune(c0-0xc7)*94 + rune(c1-0xa1)
|
||||
} else {
|
||||
goto decError
|
||||
}
|
||||
if int(r) < len(decode) {
|
||||
r = rune(decode[r])
|
||||
if r != 0 {
|
||||
break
|
||||
}
|
||||
}
|
||||
decError:
|
||||
r = utf8.RuneError
|
||||
if c1 < utf8.RuneSelf {
|
||||
size = 1
|
||||
}
|
||||
|
||||
default:
|
||||
r, size = utf8.RuneError, 1
|
||||
break
|
||||
}
|
||||
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
type eucKREncoder struct{ transform.NopResetter }
|
||||
|
||||
func (eucKREncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r)
|
||||
nDst++
|
||||
continue
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// func init checks that the switch covers all tables.
|
||||
switch {
|
||||
case encode0Low <= r && r < encode0High:
|
||||
if r = rune(encode0[r-encode0Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode1Low <= r && r < encode1High:
|
||||
if r = rune(encode1[r-encode1Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode2Low <= r && r < encode2High:
|
||||
if r = rune(encode2[r-encode2Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode3Low <= r && r < encode3High:
|
||||
if r = rune(encode3[r-encode3Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode4Low <= r && r < encode4High:
|
||||
if r = rune(encode4[r-encode4Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode5Low <= r && r < encode5High:
|
||||
if r = rune(encode5[r-encode5Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode6Low <= r && r < encode6High:
|
||||
if r = rune(encode6[r-encode6Low]); r != 0 {
|
||||
goto write2
|
||||
}
|
||||
}
|
||||
err = internal.ErrASCIIReplacement
|
||||
break
|
||||
}
|
||||
|
||||
write2:
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = uint8(r >> 8)
|
||||
dst[nDst+1] = uint8(r)
|
||||
nDst += 2
|
||||
continue
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
func init() {
|
||||
// Check that the hard-coded encode switch covers all tables.
|
||||
if numEncodeTables != 7 {
|
||||
panic("bad numEncodeTables")
|
||||
}
|
||||
}
|
||||
143
vendor/golang.org/x/text/encoding/korean/maketables.go
generated
vendored
Normal file
143
vendor/golang.org/x/text/encoding/korean/maketables.go
generated
vendored
Normal file
|
|
@ -0,0 +1,143 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
// This program generates tables.go:
|
||||
// go run maketables.go | gofmt > tables.go
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"sort"
|
||||
"strings"
|
||||
)
|
||||
|
||||
func main() {
|
||||
fmt.Printf("// generated by go run maketables.go; DO NOT EDIT\n\n")
|
||||
fmt.Printf("// Package korean provides Korean encodings such as EUC-KR.\n")
|
||||
fmt.Printf(`package korean // import "golang.org/x/text/encoding/korean"` + "\n\n")
|
||||
|
||||
res, err := http.Get("http://encoding.spec.whatwg.org/index-euc-kr.txt")
|
||||
if err != nil {
|
||||
log.Fatalf("Get: %v", err)
|
||||
}
|
||||
defer res.Body.Close()
|
||||
|
||||
mapping := [65536]uint16{}
|
||||
reverse := [65536]uint16{}
|
||||
|
||||
scanner := bufio.NewScanner(res.Body)
|
||||
for scanner.Scan() {
|
||||
s := strings.TrimSpace(scanner.Text())
|
||||
if s == "" || s[0] == '#' {
|
||||
continue
|
||||
}
|
||||
x, y := uint16(0), uint16(0)
|
||||
if _, err := fmt.Sscanf(s, "%d 0x%x", &x, &y); err != nil {
|
||||
log.Fatalf("could not parse %q", s)
|
||||
}
|
||||
if x < 0 || 178*(0xc7-0x81)+(0xfe-0xc7)*94+(0xff-0xa1) <= x {
|
||||
log.Fatalf("EUC-KR code %d is out of range", x)
|
||||
}
|
||||
mapping[x] = y
|
||||
if reverse[y] == 0 {
|
||||
c0, c1 := uint16(0), uint16(0)
|
||||
if x < 178*(0xc7-0x81) {
|
||||
c0 = uint16(x/178) + 0x81
|
||||
c1 = uint16(x % 178)
|
||||
switch {
|
||||
case c1 < 1*26:
|
||||
c1 += 0x41
|
||||
case c1 < 2*26:
|
||||
c1 += 0x47
|
||||
default:
|
||||
c1 += 0x4d
|
||||
}
|
||||
} else {
|
||||
x -= 178 * (0xc7 - 0x81)
|
||||
c0 = uint16(x/94) + 0xc7
|
||||
c1 = uint16(x%94) + 0xa1
|
||||
}
|
||||
reverse[y] = c0<<8 | c1
|
||||
}
|
||||
}
|
||||
if err := scanner.Err(); err != nil {
|
||||
log.Fatalf("scanner error: %v", err)
|
||||
}
|
||||
|
||||
fmt.Printf("// decode is the decoding table from EUC-KR code to Unicode.\n")
|
||||
fmt.Printf("// It is defined at http://encoding.spec.whatwg.org/index-euc-kr.txt\n")
|
||||
fmt.Printf("var decode = [...]uint16{\n")
|
||||
for i, v := range mapping {
|
||||
if v != 0 {
|
||||
fmt.Printf("\t%d: 0x%04X,\n", i, v)
|
||||
}
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
|
||||
// Any run of at least separation continuous zero entries in the reverse map will
|
||||
// be a separate encode table.
|
||||
const separation = 1024
|
||||
|
||||
intervals := []interval(nil)
|
||||
low, high := -1, -1
|
||||
for i, v := range reverse {
|
||||
if v == 0 {
|
||||
continue
|
||||
}
|
||||
if low < 0 {
|
||||
low = i
|
||||
} else if i-high >= separation {
|
||||
if high >= 0 {
|
||||
intervals = append(intervals, interval{low, high})
|
||||
}
|
||||
low = i
|
||||
}
|
||||
high = i + 1
|
||||
}
|
||||
if high >= 0 {
|
||||
intervals = append(intervals, interval{low, high})
|
||||
}
|
||||
sort.Sort(byDecreasingLength(intervals))
|
||||
|
||||
fmt.Printf("const numEncodeTables = %d\n\n", len(intervals))
|
||||
fmt.Printf("// encodeX are the encoding tables from Unicode to EUC-KR code,\n")
|
||||
fmt.Printf("// sorted by decreasing length.\n")
|
||||
for i, v := range intervals {
|
||||
fmt.Printf("// encode%d: %5d entries for runes in [%5d, %5d).\n", i, v.len(), v.low, v.high)
|
||||
}
|
||||
fmt.Printf("\n")
|
||||
|
||||
for i, v := range intervals {
|
||||
fmt.Printf("const encode%dLow, encode%dHigh = %d, %d\n\n", i, i, v.low, v.high)
|
||||
fmt.Printf("var encode%d = [...]uint16{\n", i)
|
||||
for j := v.low; j < v.high; j++ {
|
||||
x := reverse[j]
|
||||
if x == 0 {
|
||||
continue
|
||||
}
|
||||
fmt.Printf("\t%d-%d: 0x%04X,\n", j, v.low, x)
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
}
|
||||
}
|
||||
|
||||
// interval is a half-open interval [low, high).
|
||||
type interval struct {
|
||||
low, high int
|
||||
}
|
||||
|
||||
func (i interval) len() int { return i.high - i.low }
|
||||
|
||||
// byDecreasingLength sorts intervals by decreasing length.
|
||||
type byDecreasingLength []interval
|
||||
|
||||
func (b byDecreasingLength) Len() int { return len(b) }
|
||||
func (b byDecreasingLength) Less(i, j int) bool { return b[i].len() > b[j].len() }
|
||||
func (b byDecreasingLength) Swap(i, j int) { b[i], b[j] = b[j], b[i] }
|
||||
34152
vendor/golang.org/x/text/encoding/korean/tables.go
generated
vendored
Normal file
34152
vendor/golang.org/x/text/encoding/korean/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
12
vendor/golang.org/x/text/encoding/simplifiedchinese/all.go
generated
vendored
Normal file
12
vendor/golang.org/x/text/encoding/simplifiedchinese/all.go
generated
vendored
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package simplifiedchinese
|
||||
|
||||
import (
|
||||
"golang.org/x/text/encoding"
|
||||
)
|
||||
|
||||
// All is a list of all defined encodings in this package.
|
||||
var All = []encoding.Encoding{GB18030, GBK, HZGB2312}
|
||||
143
vendor/golang.org/x/text/encoding/simplifiedchinese/all_test.go
generated
vendored
Normal file
143
vendor/golang.org/x/text/encoding/simplifiedchinese/all_test.go
generated
vendored
Normal file
|
|
@ -0,0 +1,143 @@
|
|||
// Copyright 2015 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package simplifiedchinese
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/enctest"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
func dec(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Decode", e.NewDecoder(), nil
|
||||
}
|
||||
func enc(e encoding.Encoding) (dir string, t transform.Transformer, err error) {
|
||||
return "Encode", e.NewEncoder(), internal.ErrASCIIReplacement
|
||||
}
|
||||
|
||||
func TestNonRepertoire(t *testing.T) {
|
||||
// Pick n large enough to overflow the destination buffer of transform.String.
|
||||
const n = 10000
|
||||
testCases := []struct {
|
||||
init func(e encoding.Encoding) (string, transform.Transformer, error)
|
||||
e encoding.Encoding
|
||||
src, want string
|
||||
}{
|
||||
{dec, GBK, "a\xfe\xfeb", "a\ufffdb"},
|
||||
{dec, HZGB2312, "~{z~", "\ufffd"},
|
||||
|
||||
{enc, GBK, "갂", ""},
|
||||
{enc, GBK, "a갂", "a"},
|
||||
{enc, GBK, "\u4e02갂", "\x81@"},
|
||||
|
||||
{enc, HZGB2312, "갂", ""},
|
||||
{enc, HZGB2312, "a갂", "a"},
|
||||
{enc, HZGB2312, "\u6cf5갂", "~{1C~}"},
|
||||
|
||||
{dec, GB18030, "\x80", "€"},
|
||||
{dec, GB18030, "\x81", "\ufffd"},
|
||||
{dec, GB18030, "\x81\x20", "\ufffd "},
|
||||
{dec, GB18030, "\xfe\xfe", "\ufffd"},
|
||||
{dec, GB18030, "\xfe\xff", "\ufffd\ufffd"},
|
||||
{dec, GB18030, "\xfe\x30", "\ufffd0"},
|
||||
{dec, GB18030, "\xfe\x30\x30 ", "\ufffd00 "},
|
||||
{dec, GB18030, "\xfe\x30\xff ", "\ufffd0\ufffd "},
|
||||
{dec, GB18030, "\xfe\x30\x81\x21", "\ufffd0\ufffd!"},
|
||||
|
||||
{dec, GB18030, strings.Repeat("\xfe\x30", n), strings.Repeat("\ufffd0", n)},
|
||||
|
||||
{dec, HZGB2312, "~/", "\ufffd"},
|
||||
{dec, HZGB2312, "~{a\x80", "\ufffd"},
|
||||
{dec, HZGB2312, "~{a\x80", "\ufffd"},
|
||||
{dec, HZGB2312, "~{" + strings.Repeat("z~", n), strings.Repeat("\ufffd", n)},
|
||||
{dec, HZGB2312, "~{" + strings.Repeat("\xfe\x30", n), strings.Repeat("\ufffd", n*2)},
|
||||
}
|
||||
for _, tc := range testCases {
|
||||
dir, tr, wantErr := tc.init(tc.e)
|
||||
|
||||
dst, _, err := transform.String(tr, tc.src)
|
||||
if err != wantErr {
|
||||
t.Errorf("%s %v(%q): got %v; want %v", dir, tc.e, tc.src, err, wantErr)
|
||||
}
|
||||
if got := string(dst); got != tc.want {
|
||||
t.Errorf("%s %v(%q):\ngot %q\nwant %q", dir, tc.e, tc.src, got, tc.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestBasics(t *testing.T) {
|
||||
// The encoded forms can be verified by the iconv program:
|
||||
// $ echo 月日は百代 | iconv -f UTF-8 -t SHIFT-JIS | xxd
|
||||
testCases := []struct {
|
||||
e encoding.Encoding
|
||||
encPrefix string
|
||||
encoded string
|
||||
utf8 string
|
||||
}{{
|
||||
// "\u0081\u00de\u00df\u00e0\u00e1\u00e2\u00e3\uffff\U00010000" is a
|
||||
// nonsense string that contains GB18030 encodable codepoints of which
|
||||
// only U+00E0 and U+00E1 are GBK encodable.
|
||||
//
|
||||
// "A\u3000\u554a\u4e02\u4e90\u72dc\u7349\u02ca\u2588Z€" is a nonsense
|
||||
// string that contains ASCII and GBK encodable codepoints from Levels
|
||||
// 1-5 as well as the Euro sign.
|
||||
//
|
||||
// "A\u43f0\u4c32\U00027267\u3000\U0002910d\u79d4Z€" is a nonsense string
|
||||
// that contains ASCII and Big5 encodable codepoints from the Basic
|
||||
// Multilingual Plane and the Supplementary Ideographic Plane as well as
|
||||
// the Euro sign.
|
||||
//
|
||||
// "花间一壶酒,独酌无相亲。" (simplified) and
|
||||
// "花間一壺酒,獨酌無相親。" (traditional)
|
||||
// are from the 8th century poem "Yuè Xià Dú Zhuó".
|
||||
e: GB18030,
|
||||
encoded: "\x81\x30\x81\x31\x81\x30\x89\x37\x81\x30\x89\x38\xa8\xa4\xa8\xa2" +
|
||||
"\x81\x30\x89\x39\x81\x30\x8a\x30\x84\x31\xa4\x39\x90\x30\x81\x30",
|
||||
utf8: "\u0081\u00de\u00df\u00e0\u00e1\u00e2\u00e3\uffff\U00010000",
|
||||
}, {
|
||||
e: GB18030,
|
||||
encoded: "\xbb\xa8\xbc\xe4\xd2\xbb\xba\xf8\xbe\xc6\xa3\xac\xb6\xc0\xd7\xc3" +
|
||||
"\xce\xde\xcf\xe0\xc7\xd7\xa1\xa3",
|
||||
utf8: "花间一壶酒,独酌无相亲。",
|
||||
}, {
|
||||
e: GBK,
|
||||
encoded: "A\xa1\xa1\xb0\xa1\x81\x40\x81\x80\xaa\x40\xaa\x80\xa8\x40\xa8\x80Z\x80",
|
||||
utf8: "A\u3000\u554a\u4e02\u4e90\u72dc\u7349\u02ca\u2588Z€",
|
||||
}, {
|
||||
e: GBK,
|
||||
encoded: "\xbb\xa8\xbc\xe4\xd2\xbb\xba\xf8\xbe\xc6\xa3\xac\xb6\xc0\xd7\xc3" +
|
||||
"\xce\xde\xcf\xe0\xc7\xd7\xa1\xa3",
|
||||
utf8: "花间一壶酒,独酌无相亲。",
|
||||
}, {
|
||||
e: HZGB2312,
|
||||
encoded: "A~{\x21\x21~~\x30\x21~}Z~~",
|
||||
utf8: "A\u3000~\u554aZ~",
|
||||
}, {
|
||||
e: HZGB2312,
|
||||
encPrefix: "~{",
|
||||
encoded: ";(<dR;:x>F#,6@WCN^O`GW!#",
|
||||
utf8: "花间一壶酒,独酌无相亲。",
|
||||
}}
|
||||
|
||||
for _, tc := range testCases {
|
||||
enctest.TestEncoding(t, tc.e, tc.encoded, tc.utf8, tc.encPrefix, "")
|
||||
}
|
||||
}
|
||||
|
||||
func TestFiles(t *testing.T) {
|
||||
enctest.TestFile(t, GB18030)
|
||||
enctest.TestFile(t, GBK)
|
||||
enctest.TestFile(t, HZGB2312)
|
||||
}
|
||||
|
||||
func BenchmarkEncoding(b *testing.B) {
|
||||
enctest.Benchmark(b, GB18030)
|
||||
enctest.Benchmark(b, GBK)
|
||||
enctest.Benchmark(b, HZGB2312)
|
||||
}
|
||||
269
vendor/golang.org/x/text/encoding/simplifiedchinese/gbk.go
generated
vendored
Normal file
269
vendor/golang.org/x/text/encoding/simplifiedchinese/gbk.go
generated
vendored
Normal file
|
|
@ -0,0 +1,269 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package simplifiedchinese
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
var (
|
||||
// GB18030 is the GB18030 encoding.
|
||||
GB18030 encoding.Encoding = &gbk18030
|
||||
// GBK is the GBK encoding. It encodes an extension of the GB2312 character set
|
||||
// and is also known as Code Page 936.
|
||||
GBK encoding.Encoding = &gbk
|
||||
)
|
||||
|
||||
var gbk = internal.Encoding{
|
||||
&internal.SimpleEncoding{
|
||||
gbkDecoder{gb18030: false},
|
||||
gbkEncoder{gb18030: false},
|
||||
},
|
||||
"GBK",
|
||||
identifier.GBK,
|
||||
}
|
||||
|
||||
var gbk18030 = internal.Encoding{
|
||||
&internal.SimpleEncoding{
|
||||
gbkDecoder{gb18030: true},
|
||||
gbkEncoder{gb18030: true},
|
||||
},
|
||||
"GB18030",
|
||||
identifier.GB18030,
|
||||
}
|
||||
|
||||
type gbkDecoder struct {
|
||||
transform.NopResetter
|
||||
gb18030 bool
|
||||
}
|
||||
|
||||
func (d gbkDecoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
switch c0 := src[nSrc]; {
|
||||
case c0 < utf8.RuneSelf:
|
||||
r, size = rune(c0), 1
|
||||
|
||||
// Microsoft's Code Page 936 extends GBK 1.0 to encode the euro sign U+20AC
|
||||
// as 0x80. The HTML5 specification at http://encoding.spec.whatwg.org/#gbk
|
||||
// says to treat "gbk" as Code Page 936.
|
||||
case c0 == 0x80:
|
||||
r, size = '€', 1
|
||||
|
||||
case c0 < 0xff:
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
c1 := src[nSrc+1]
|
||||
switch {
|
||||
case 0x40 <= c1 && c1 < 0x7f:
|
||||
c1 -= 0x40
|
||||
case 0x80 <= c1 && c1 < 0xff:
|
||||
c1 -= 0x41
|
||||
case d.gb18030 && 0x30 <= c1 && c1 < 0x40:
|
||||
if nSrc+3 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
// The second byte here is always ASCII, so we can set size
|
||||
// to 1 in all cases.
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
c2 := src[nSrc+2]
|
||||
if c2 < 0x81 || 0xff <= c2 {
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
c3 := src[nSrc+3]
|
||||
if c3 < 0x30 || 0x3a <= c3 {
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
size = 4
|
||||
r = ((rune(c0-0x81)*10+rune(c1-0x30))*126+rune(c2-0x81))*10 + rune(c3-0x30)
|
||||
if r < 39420 {
|
||||
i, j := 0, len(gb18030)
|
||||
for i < j {
|
||||
h := i + (j-i)/2
|
||||
if r >= rune(gb18030[h][0]) {
|
||||
i = h + 1
|
||||
} else {
|
||||
j = h
|
||||
}
|
||||
}
|
||||
dec := &gb18030[i-1]
|
||||
r += rune(dec[1]) - rune(dec[0])
|
||||
goto write
|
||||
}
|
||||
r -= 189000
|
||||
if 0 <= r && r < 0x100000 {
|
||||
r += 0x10000
|
||||
} else {
|
||||
r, size = utf8.RuneError, 1
|
||||
}
|
||||
goto write
|
||||
default:
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
r, size = '\ufffd', 2
|
||||
if i := int(c0-0x81)*190 + int(c1); i < len(decode) {
|
||||
r = rune(decode[i])
|
||||
if r == 0 {
|
||||
r = '\ufffd'
|
||||
}
|
||||
}
|
||||
|
||||
default:
|
||||
r, size = utf8.RuneError, 1
|
||||
}
|
||||
|
||||
write:
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break loop
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
type gbkEncoder struct {
|
||||
transform.NopResetter
|
||||
gb18030 bool
|
||||
}
|
||||
|
||||
func (e gbkEncoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, r2, size := rune(0), rune(0), 0
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
|
||||
} else {
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// func init checks that the switch covers all tables.
|
||||
switch {
|
||||
case encode0Low <= r && r < encode0High:
|
||||
if r2 = rune(encode0[r-encode0Low]); r2 != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode1Low <= r && r < encode1High:
|
||||
// Microsoft's Code Page 936 extends GBK 1.0 to encode the euro sign U+20AC
|
||||
// as 0x80. The HTML5 specification at http://encoding.spec.whatwg.org/#gbk
|
||||
// says to treat "gbk" as Code Page 936.
|
||||
if r == '€' {
|
||||
r = 0x80
|
||||
goto write1
|
||||
}
|
||||
if r2 = rune(encode1[r-encode1Low]); r2 != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode2Low <= r && r < encode2High:
|
||||
if r2 = rune(encode2[r-encode2Low]); r2 != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode3Low <= r && r < encode3High:
|
||||
if r2 = rune(encode3[r-encode3Low]); r2 != 0 {
|
||||
goto write2
|
||||
}
|
||||
case encode4Low <= r && r < encode4High:
|
||||
if r2 = rune(encode4[r-encode4Low]); r2 != 0 {
|
||||
goto write2
|
||||
}
|
||||
}
|
||||
|
||||
if e.gb18030 {
|
||||
if r < 0x10000 {
|
||||
i, j := 0, len(gb18030)
|
||||
for i < j {
|
||||
h := i + (j-i)/2
|
||||
if r >= rune(gb18030[h][1]) {
|
||||
i = h + 1
|
||||
} else {
|
||||
j = h
|
||||
}
|
||||
}
|
||||
dec := &gb18030[i-1]
|
||||
r += rune(dec[0]) - rune(dec[1])
|
||||
goto write4
|
||||
} else if r < 0x110000 {
|
||||
r += 189000 - 0x10000
|
||||
goto write4
|
||||
}
|
||||
}
|
||||
err = internal.ErrASCIIReplacement
|
||||
break
|
||||
}
|
||||
|
||||
write1:
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r)
|
||||
nDst++
|
||||
continue
|
||||
|
||||
write2:
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = uint8(r2 >> 8)
|
||||
dst[nDst+1] = uint8(r2)
|
||||
nDst += 2
|
||||
continue
|
||||
|
||||
write4:
|
||||
if nDst+4 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+3] = uint8(r%10 + 0x30)
|
||||
r /= 10
|
||||
dst[nDst+2] = uint8(r%126 + 0x81)
|
||||
r /= 126
|
||||
dst[nDst+1] = uint8(r%10 + 0x30)
|
||||
r /= 10
|
||||
dst[nDst+0] = uint8(r + 0x81)
|
||||
nDst += 4
|
||||
continue
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
func init() {
|
||||
// Check that the hard-coded encode switch covers all tables.
|
||||
if numEncodeTables != 5 {
|
||||
panic("bad numEncodeTables")
|
||||
}
|
||||
}
|
||||
245
vendor/golang.org/x/text/encoding/simplifiedchinese/hzgb2312.go
generated
vendored
Normal file
245
vendor/golang.org/x/text/encoding/simplifiedchinese/hzgb2312.go
generated
vendored
Normal file
|
|
@ -0,0 +1,245 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
package simplifiedchinese
|
||||
|
||||
import (
|
||||
"unicode/utf8"
|
||||
|
||||
"golang.org/x/text/encoding"
|
||||
"golang.org/x/text/encoding/internal"
|
||||
"golang.org/x/text/encoding/internal/identifier"
|
||||
"golang.org/x/text/transform"
|
||||
)
|
||||
|
||||
// HZGB2312 is the HZ-GB2312 encoding.
|
||||
var HZGB2312 encoding.Encoding = &hzGB2312
|
||||
|
||||
var hzGB2312 = internal.Encoding{
|
||||
internal.FuncEncoding{hzGB2312NewDecoder, hzGB2312NewEncoder},
|
||||
"HZ-GB2312",
|
||||
identifier.HZGB2312,
|
||||
}
|
||||
|
||||
func hzGB2312NewDecoder() transform.Transformer {
|
||||
return new(hzGB2312Decoder)
|
||||
}
|
||||
|
||||
func hzGB2312NewEncoder() transform.Transformer {
|
||||
return new(hzGB2312Encoder)
|
||||
}
|
||||
|
||||
const (
|
||||
asciiState = iota
|
||||
gbState
|
||||
)
|
||||
|
||||
type hzGB2312Decoder int
|
||||
|
||||
func (d *hzGB2312Decoder) Reset() {
|
||||
*d = asciiState
|
||||
}
|
||||
|
||||
func (d *hzGB2312Decoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
loop:
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
c0 := src[nSrc]
|
||||
if c0 >= utf8.RuneSelf {
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
|
||||
if c0 == '~' {
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r = utf8.RuneError
|
||||
goto write
|
||||
}
|
||||
size = 2
|
||||
switch src[nSrc+1] {
|
||||
case '{':
|
||||
*d = gbState
|
||||
continue
|
||||
case '}':
|
||||
*d = asciiState
|
||||
continue
|
||||
case '~':
|
||||
if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break loop
|
||||
}
|
||||
dst[nDst] = '~'
|
||||
nDst++
|
||||
continue
|
||||
case '\n':
|
||||
continue
|
||||
default:
|
||||
r = utf8.RuneError
|
||||
goto write
|
||||
}
|
||||
}
|
||||
|
||||
if *d == asciiState {
|
||||
r, size = rune(c0), 1
|
||||
} else {
|
||||
if nSrc+1 >= len(src) {
|
||||
if !atEOF {
|
||||
err = transform.ErrShortSrc
|
||||
break loop
|
||||
}
|
||||
r, size = utf8.RuneError, 1
|
||||
goto write
|
||||
}
|
||||
size = 2
|
||||
c1 := src[nSrc+1]
|
||||
if c0 < 0x21 || 0x7e <= c0 || c1 < 0x21 || 0x7f <= c1 {
|
||||
// error
|
||||
} else if i := int(c0-0x01)*190 + int(c1+0x3f); i < len(decode) {
|
||||
r = rune(decode[i])
|
||||
if r != 0 {
|
||||
goto write
|
||||
}
|
||||
}
|
||||
if c1 > utf8.RuneSelf {
|
||||
// Be consistent and always treat non-ASCII as a single error.
|
||||
size = 1
|
||||
}
|
||||
r = utf8.RuneError
|
||||
}
|
||||
|
||||
write:
|
||||
if nDst+utf8.RuneLen(r) > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break loop
|
||||
}
|
||||
nDst += utf8.EncodeRune(dst[nDst:], r)
|
||||
}
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
|
||||
type hzGB2312Encoder int
|
||||
|
||||
func (d *hzGB2312Encoder) Reset() {
|
||||
*d = asciiState
|
||||
}
|
||||
|
||||
func (e *hzGB2312Encoder) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
|
||||
r, size := rune(0), 0
|
||||
for ; nSrc < len(src); nSrc += size {
|
||||
r = rune(src[nSrc])
|
||||
|
||||
// Decode a 1-byte rune.
|
||||
if r < utf8.RuneSelf {
|
||||
size = 1
|
||||
if r == '~' {
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = '~'
|
||||
dst[nDst+1] = '~'
|
||||
nDst += 2
|
||||
continue
|
||||
} else if *e != asciiState {
|
||||
if nDst+3 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
*e = asciiState
|
||||
dst[nDst+0] = '~'
|
||||
dst[nDst+1] = '}'
|
||||
nDst += 2
|
||||
} else if nDst >= len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst] = uint8(r)
|
||||
nDst += 1
|
||||
continue
|
||||
|
||||
}
|
||||
|
||||
// Decode a multi-byte rune.
|
||||
r, size = utf8.DecodeRune(src[nSrc:])
|
||||
if size == 1 {
|
||||
// All valid runes of size 1 (those below utf8.RuneSelf) were
|
||||
// handled above. We have invalid UTF-8 or we haven't seen the
|
||||
// full character yet.
|
||||
if !atEOF && !utf8.FullRune(src[nSrc:]) {
|
||||
err = transform.ErrShortSrc
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// func init checks that the switch covers all tables.
|
||||
switch {
|
||||
case encode0Low <= r && r < encode0High:
|
||||
if r = rune(encode0[r-encode0Low]); r != 0 {
|
||||
goto writeGB
|
||||
}
|
||||
case encode1Low <= r && r < encode1High:
|
||||
if r = rune(encode1[r-encode1Low]); r != 0 {
|
||||
goto writeGB
|
||||
}
|
||||
case encode2Low <= r && r < encode2High:
|
||||
if r = rune(encode2[r-encode2Low]); r != 0 {
|
||||
goto writeGB
|
||||
}
|
||||
case encode3Low <= r && r < encode3High:
|
||||
if r = rune(encode3[r-encode3Low]); r != 0 {
|
||||
goto writeGB
|
||||
}
|
||||
case encode4Low <= r && r < encode4High:
|
||||
if r = rune(encode4[r-encode4Low]); r != 0 {
|
||||
goto writeGB
|
||||
}
|
||||
}
|
||||
|
||||
terminateInASCIIState:
|
||||
// Switch back to ASCII state in case of error so that an ASCII
|
||||
// replacement character can be written in the correct state.
|
||||
if *e != asciiState {
|
||||
if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = '~'
|
||||
dst[nDst+1] = '}'
|
||||
nDst += 2
|
||||
}
|
||||
err = internal.ErrASCIIReplacement
|
||||
break
|
||||
|
||||
writeGB:
|
||||
c0 := uint8(r>>8) - 0x80
|
||||
c1 := uint8(r) - 0x80
|
||||
if c0 < 0x21 || 0x7e <= c0 || c1 < 0x21 || 0x7f <= c1 {
|
||||
goto terminateInASCIIState
|
||||
}
|
||||
if *e == asciiState {
|
||||
if nDst+4 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
*e = gbState
|
||||
dst[nDst+0] = '~'
|
||||
dst[nDst+1] = '{'
|
||||
nDst += 2
|
||||
} else if nDst+2 > len(dst) {
|
||||
err = transform.ErrShortDst
|
||||
break
|
||||
}
|
||||
dst[nDst+0] = c0
|
||||
dst[nDst+1] = c1
|
||||
nDst += 2
|
||||
continue
|
||||
}
|
||||
// TODO: should one always terminate in ASCII state to make it safe to
|
||||
// concatenate two HZ-GB2312-encoded strings?
|
||||
return nDst, nSrc, err
|
||||
}
|
||||
161
vendor/golang.org/x/text/encoding/simplifiedchinese/maketables.go
generated
vendored
Normal file
161
vendor/golang.org/x/text/encoding/simplifiedchinese/maketables.go
generated
vendored
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
// Copyright 2013 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
|
||||
// +build ignore
|
||||
|
||||
package main
|
||||
|
||||
// This program generates tables.go:
|
||||
// go run maketables.go | gofmt > tables.go
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"sort"
|
||||
"strings"
|
||||
)
|
||||
|
||||
func main() {
|
||||
fmt.Printf("// generated by go run maketables.go; DO NOT EDIT\n\n")
|
||||
fmt.Printf("// Package simplifiedchinese provides Simplified Chinese encodings such as GBK.\n")
|
||||
fmt.Printf(`package simplifiedchinese // import "golang.org/x/text/encoding/simplifiedchinese"` + "\n\n")
|
||||
|
||||
printGB18030()
|
||||
printGBK()
|
||||
}
|
||||
|
||||
func printGB18030() {
|
||||
res, err := http.Get("http://encoding.spec.whatwg.org/index-gb18030.txt")
|
||||
if err != nil {
|
||||
log.Fatalf("Get: %v", err)
|
||||
}
|
||||
defer res.Body.Close()
|
||||
|
||||
fmt.Printf("// gb18030 is the table from http://encoding.spec.whatwg.org/index-gb18030.txt\n")
|
||||
fmt.Printf("var gb18030 = [...][2]uint16{\n")
|
||||
scanner := bufio.NewScanner(res.Body)
|
||||
for scanner.Scan() {
|
||||
s := strings.TrimSpace(scanner.Text())
|
||||
if s == "" || s[0] == '#' {
|
||||
continue
|
||||
}
|
||||
x, y := uint32(0), uint32(0)
|
||||
if _, err := fmt.Sscanf(s, "%d 0x%x", &x, &y); err != nil {
|
||||
log.Fatalf("could not parse %q", s)
|
||||
}
|
||||
if x < 0x10000 && y < 0x10000 {
|
||||
fmt.Printf("\t{0x%04x, 0x%04x},\n", x, y)
|
||||
}
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
}
|
||||
|
||||
func printGBK() {
|
||||
res, err := http.Get("http://encoding.spec.whatwg.org/index-gbk.txt")
|
||||
if err != nil {
|
||||
log.Fatalf("Get: %v", err)
|
||||
}
|
||||
defer res.Body.Close()
|
||||
|
||||
mapping := [65536]uint16{}
|
||||
reverse := [65536]uint16{}
|
||||
|
||||
scanner := bufio.NewScanner(res.Body)
|
||||
for scanner.Scan() {
|
||||
s := strings.TrimSpace(scanner.Text())
|
||||
if s == "" || s[0] == '#' {
|
||||
continue
|
||||
}
|
||||
x, y := uint16(0), uint16(0)
|
||||
if _, err := fmt.Sscanf(s, "%d 0x%x", &x, &y); err != nil {
|
||||
log.Fatalf("could not parse %q", s)
|
||||
}
|
||||
if x < 0 || 126*190 <= x {
|
||||
log.Fatalf("GBK code %d is out of range", x)
|
||||
}
|
||||
mapping[x] = y
|
||||
if reverse[y] == 0 {
|
||||
c0, c1 := x/190, x%190
|
||||
if c1 >= 0x3f {
|
||||
c1++
|
||||
}
|
||||
reverse[y] = (0x81+c0)<<8 | (0x40 + c1)
|
||||
}
|
||||
}
|
||||
if err := scanner.Err(); err != nil {
|
||||
log.Fatalf("scanner error: %v", err)
|
||||
}
|
||||
|
||||
fmt.Printf("// decode is the decoding table from GBK code to Unicode.\n")
|
||||
fmt.Printf("// It is defined at http://encoding.spec.whatwg.org/index-gbk.txt\n")
|
||||
fmt.Printf("var decode = [...]uint16{\n")
|
||||
for i, v := range mapping {
|
||||
if v != 0 {
|
||||
fmt.Printf("\t%d: 0x%04X,\n", i, v)
|
||||
}
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
|
||||
// Any run of at least separation continuous zero entries in the reverse map will
|
||||
// be a separate encode table.
|
||||
const separation = 1024
|
||||
|
||||
intervals := []interval(nil)
|
||||
low, high := -1, -1
|
||||
for i, v := range reverse {
|
||||
if v == 0 {
|
||||
continue
|
||||
}
|
||||
if low < 0 {
|
||||
low = i
|
||||
} else if i-high >= separation {
|
||||
if high >= 0 {
|
||||
intervals = append(intervals, interval{low, high})
|
||||
}
|
||||
low = i
|
||||
}
|
||||
high = i + 1
|
||||
}
|
||||
if high >= 0 {
|
||||
intervals = append(intervals, interval{low, high})
|
||||
}
|
||||
sort.Sort(byDecreasingLength(intervals))
|
||||
|
||||
fmt.Printf("const numEncodeTables = %d\n\n", len(intervals))
|
||||
fmt.Printf("// encodeX are the encoding tables from Unicode to GBK code,\n")
|
||||
fmt.Printf("// sorted by decreasing length.\n")
|
||||
for i, v := range intervals {
|
||||
fmt.Printf("// encode%d: %5d entries for runes in [%5d, %5d).\n", i, v.len(), v.low, v.high)
|
||||
}
|
||||
fmt.Printf("\n")
|
||||
|
||||
for i, v := range intervals {
|
||||
fmt.Printf("const encode%dLow, encode%dHigh = %d, %d\n\n", i, i, v.low, v.high)
|
||||
fmt.Printf("var encode%d = [...]uint16{\n", i)
|
||||
for j := v.low; j < v.high; j++ {
|
||||
x := reverse[j]
|
||||
if x == 0 {
|
||||
continue
|
||||
}
|
||||
fmt.Printf("\t%d-%d: 0x%04X,\n", j, v.low, x)
|
||||
}
|
||||
fmt.Printf("}\n\n")
|
||||
}
|
||||
}
|
||||
|
||||
// interval is a half-open interval [low, high).
|
||||
type interval struct {
|
||||
low, high int
|
||||
}
|
||||
|
||||
func (i interval) len() int { return i.high - i.low }
|
||||
|
||||
// byDecreasingLength sorts intervals by decreasing length.
|
||||
type byDecreasingLength []interval
|
||||
|
||||
func (b byDecreasingLength) Len() int { return len(b) }
|
||||
func (b byDecreasingLength) Less(i, j int) bool { return b[i].len() > b[j].len() }
|
||||
func (b byDecreasingLength) Swap(i, j int) { b[i], b[j] = b[j], b[i] }
|
||||
43999
vendor/golang.org/x/text/encoding/simplifiedchinese/tables.go
generated
vendored
Normal file
43999
vendor/golang.org/x/text/encoding/simplifiedchinese/tables.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
510
vendor/golang.org/x/text/encoding/testdata/candide-gb18030.txt
generated
vendored
Normal file
510
vendor/golang.org/x/text/encoding/testdata/candide-gb18030.txt
generated
vendored
Normal file
|
|
@ -0,0 +1,510 @@
|
|||
This file was derived from
|
||||
http://www.gutenberg.org/cache/epub/4650/pg4650.txt
|
||||
--------
|
||||
|
||||
CANDIDE,
|
||||
|
||||
ou
|
||||
|
||||
L'OPTIMISME,
|
||||
|
||||
TRADUIT DE L'ALLEMAND
|
||||
|
||||
DE M. LE DOCTEUR RALPH,
|
||||
|
||||
AVEC LES ADDITIONS
|
||||
|
||||
QU'ON A TROUV<55>0‡7ES DANS LA POCHE DU DOCTEUR, LORSQU'IL MOURUT
|
||||
|
||||
<20>0†8 MINDEN, L'AN DE GR<47>0‡0CE 1759
|
||||
|
||||
1759
|
||||
|
||||
|
||||
|
||||
CHAPITRE I.
|
||||
|
||||
Comment Candide fut ¨¦lev¨¦ dans un beau ch<63>0‰9teau, et comment il fut
|
||||
chass¨¦ d'icelui.
|
||||
|
||||
Il y avait en Vestphalie, dans le ch<63>0‰9teau de M. le baron de
|
||||
Thunder-ten-tronckh, un jeune gar<61>0Š4on ¨¤ qui la nature avait donn¨¦
|
||||
les moeurs les plus douces. Sa physionomie annon<6F>0Š4ait son <20>0‰9me.
|
||||
Il avait le jugement assez droit, avec l'esprit le plus simple;
|
||||
c'est, je crois, pour cette raison qu'on le nommait Candide. Les
|
||||
anciens domestiques de la maison soup<75>0Š4onnaient qu'il ¨¦tait fils
|
||||
de la soeur de monsieur le baron et d'un bon et honn¨ºte
|
||||
gentilhomme du voisinage, que cette demoiselle ne voulut jamais
|
||||
¨¦pouser parce qu'il n'avait pu prouver que soixante et onze
|
||||
quartiers, et que le reste de son arbre g¨¦n¨¦alogique avait ¨¦t¨¦
|
||||
perdu par l'injure du temps.
|
||||
|
||||
Monsieur le baron ¨¦tait un des plus puissants seigneurs de la
|
||||
Westphalie, car son ch<63>0‰9teau avait une porte et des fen¨ºtres. Sa
|
||||
grande salle m¨ºme ¨¦tait orn¨¦e d'une tapisserie. Tous les chiens
|
||||
de ses basses-cours composaient une meute dans le besoin; ses
|
||||
palefreniers ¨¦taient ses piqueurs; le vicaire du village ¨¦tait
|
||||
son grand-aum<75>0‹0nier. Ils l'appelaient tous monseigneur, et ils
|
||||
riaient quand il fesait des contes.
|
||||
|
||||
Madame la baronne, qui pesait environ trois cent cinquante
|
||||
livres, s'attirait par l¨¤ une tr¨¨s grande consid¨¦ration, et
|
||||
fesait les honneurs de la maison avec une dignit¨¦ qui la rendait
|
||||
encore plus respectable. Sa fille Cun¨¦gonde, <20>0‰9g¨¦e de dix-sept
|
||||
ans, ¨¦tait haute en couleur, fra<72>0Š6che, grasse, app¨¦tissante. Le
|
||||
fils du baron paraissait en tout digne de son p¨¨re. Le
|
||||
pr¨¦cepteur Pangloss[1] ¨¦tait l'oracle de la maison, et le petit
|
||||
Candide ¨¦coutait ses le<6C>0Š4ons avec toute la bonne foi de son <20>0‰9ge et
|
||||
de son caract¨¨re.
|
||||
|
||||
[1] De _pan_, tout, et _glossa_, langue. B.
|
||||
|
||||
|
||||
Pangloss enseignait la m¨¦taphysico-th¨¦ologo-cosmolonigologie. Il
|
||||
prouvait admirablement qu'il n'y a point d'effet sans cause, et
|
||||
que, dans ce meilleur des mondes possibles, le ch<63>0‰9teau de
|
||||
monseigneur le baron ¨¦tait le plus beau des ch<63>0‰9teaux, et madame
|
||||
la meilleure des baronnes possibles.
|
||||
|
||||
Il est d¨¦montr¨¦, disait-il, que les choses ne peuvent ¨ºtre
|
||||
autrement; car tout ¨¦tant fait pour une fin, tout est
|
||||
n¨¦cessairement pour la meilleure fin. Remarquez bien que les nez
|
||||
ont ¨¦t¨¦ faits pour porter des lunettes; aussi avons-nous des
|
||||
lunettes[2]. Les jambes sont visiblement institu¨¦es pour ¨ºtre
|
||||
chauss¨¦es, et nous avons des chausses. Les pierres ont ¨¦t¨¦
|
||||
form¨¦es pour ¨ºtre taill¨¦es et pour en faire des ch<63>0‰9teaux; aussi
|
||||
monseigneur a un tr¨¨s beau ch<63>0‰9teau: le plus grand baron de la
|
||||
province doit ¨ºtre le mieux log¨¦; et les cochons ¨¦tant faits pour
|
||||
¨ºtre mang¨¦s, nous mangeons du porc toute l'ann¨¦e: par cons¨¦quent,
|
||||
ceux qui ont avanc¨¦ que tout est bien ont dit une sottise; il
|
||||
fallait dire que tout est au mieux.
|
||||
|
||||
[2] Voyez tome XXVII, page 528; et dans les _M¨¦langes_, ann¨¦e
|
||||
1738, le chapitre XI de la troisi¨¨me partie des _<>0‡7l¨¦ments de la
|
||||
philosophie de Newton_; et ann¨¦e 1768, le chapitre X des
|
||||
_Singularit¨¦s de la nature_. B.
|
||||
|
||||
|
||||
Candide ¨¦coutait attentivement, et croyait innocemment; car il
|
||||
trouvait mademoiselle Cun¨¦gonde extr¨ºmement belle, quoiqu'il ne
|
||||
pr<EFBFBD>0Š6t jamais la hardiesse de le lui dire. Il concluait qu'apr¨¨s
|
||||
le bonheur d'¨ºtre n¨¦ baron de Thunder-ten-tronckh, le second
|
||||
degr¨¦ de bonheur ¨¦tait d'¨ºtre mademoiselle Cun¨¦gonde; le
|
||||
troisi¨¨me, de la voir tous les jours; et le quatri¨¨me, d'entendre
|
||||
ma<EFBFBD>0Š6tre Pangloss, le plus grand philosophe de la province, et par
|
||||
cons¨¦quent de toute la terre.
|
||||
|
||||
Un jour Cun¨¦gonde, en se promenant aupr¨¨s du ch<63>0‰9teau, dans le
|
||||
petit bois qu'on appelait parc, vit entre des broussailles le
|
||||
docteur Pangloss qui donnait une le<6C>0Š4on de physique exp¨¦rimentale
|
||||
¨¤ la femme de chambre de sa m¨¨re, petite brune tr¨¨s jolie et tr¨¨s
|
||||
docile. Comme mademoiselle Cun¨¦gonde avait beaucoup de
|
||||
disposition pour les sciences, elle observa, sans souffler, les
|
||||
exp¨¦riences r¨¦it¨¦r¨¦es dont elle fut t¨¦moin; elle vit clairement
|
||||
la raison suffisante du docteur, les effets et les causes, et
|
||||
s'en retourna tout agit¨¦e, toute pensive, toute remplie du d¨¦sir
|
||||
d'¨ºtre savante, songeant qu'elle pourrait bien ¨ºtre la raison
|
||||
suffisante du jeune Candide, qui pouvait aussi ¨ºtre la sienne.
|
||||
|
||||
Elle rencontra Candide en revenant au ch<63>0‰9teau, et rougit: Candide
|
||||
rougit aussi . Elle lui dit bonjour d'une voix entrecoup¨¦e; et
|
||||
Candide lui parla sans savoir ce qu'il disait. Le lendemain,
|
||||
apr¨¨s le d<>0Š6ner, comme on sortait de table, Cun¨¦gonde et Candide
|
||||
se trouv¨¨rent derri¨¨re un paravent; Cun¨¦gonde laissa tomber son
|
||||
mouchoir, Candide le ramassa; elle lui prit innocemment la main;
|
||||
le jeune homme baisa innocemment la main de la jeune demoiselle
|
||||
avec une vivacit¨¦, une sensibilit¨¦, une gr<67>0‰9ce toute particuli¨¨re;
|
||||
leurs bouches se rencontr¨¨rent, leurs yeux s'enflamm¨¨rent, leurs
|
||||
genoux trembl¨¨rent, leurs mains s'¨¦gar¨¨rent. M. le baron de
|
||||
Thunder-ten-tronckh passa aupr¨¨s du paravent, et voyant cette
|
||||
cause et cet effet, chassa Candide du ch<63>0‰9teau ¨¤ grands coups de
|
||||
pied dans le derri¨¨re. Cun¨¦gonde s'¨¦vanouit: elle fut soufflet¨¦e
|
||||
par madame la baronne d¨¨s qu'elle fut revenue ¨¤ elle-m¨ºme; et
|
||||
tout fut constern¨¦ dans le plus beau et le plus agr¨¦able des
|
||||
ch<EFBFBD>0‰9teaux possibles.
|
||||
|
||||
|
||||
|
||||
CHAPITRE II
|
||||
|
||||
Ce que devint Candide parmi les Bulgares.
|
||||
|
||||
|
||||
Candide, chass¨¦ du paradis terrestre, marcha longtemps sans
|
||||
savoir o¨´, pleurant, levant les yeux au ciel, les tournant
|
||||
souvent vers le plus beau des ch<63>0‰9teaux qui renfermait la plus
|
||||
belle des baronnettes; il se coucha sans souper au milieu des
|
||||
champs entre deux sillons; la neige tombait ¨¤ gros flocons.
|
||||
Candide, tout transi, se tra<72>0Š6na le lendemain vers la ville
|
||||
voisine, qui s'appelle _Valdberghoff-trarbk-dikdorff_, n'ayant
|
||||
point d'argent, mourant de faim et de lassitude. Il s'arr¨ºta
|
||||
tristement ¨¤ la porte d'un cabaret. Deux hommes habill¨¦s de bleu
|
||||
le remarqu¨¨rent: Camarade, dit l'un, voil¨¤ un jeune homme tr¨¨s
|
||||
bien fait, et qui a la taille requise; ils s'avanc¨¨rent vers
|
||||
Candide et le pri¨¨rent ¨¤ d<>0Š6ner tr¨¨s civilement.--Messieurs, leur
|
||||
dit Candide avec une modestie charmante, vous me faites beaucoup
|
||||
d'honneur, mais je n'ai pas de quoi payer mon ¨¦cot.--Ah!
|
||||
monsieur, lui dit un des bleus, les personnes de votre figure et
|
||||
de votre m¨¦rite ne paient jamais rien: n'avez-vous pas cinq pieds
|
||||
cinq pouces de haut?--Oui, messieurs, c'est ma taille, dit-il en
|
||||
fesant la r¨¦v¨¦rence.--Ah! monsieur, mettez-vous ¨¤ table; non
|
||||
seulement nous vous d¨¦fraierons, mais nous ne souffrirons jamais
|
||||
qu'un homme comme vous manque d'argent; les hommes ne sont faits
|
||||
que pour se secourir les uns les autres.--Vous avez raison, dit
|
||||
Candide; c'est ce que M. Pangloss m'a toujours dit, et je vois
|
||||
bien que tout est au mieux. On le prie d'accepter quelques ¨¦cus,
|
||||
il les prend et veut faire son billet; on n'en veut point, on se
|
||||
met ¨¤ table. N'aimez-vous pas tendrement?....--Oh! oui,
|
||||
r¨¦pond-il, j'aime tendrement mademoiselle Cun¨¦gonde.--Non, dit
|
||||
l'un de ces messieurs, nous vous demandons si vous n'aimez pas
|
||||
tendrement le roi des Bulgares?--Point du tout, dit-il, car je ne
|
||||
l'ai jamais vu.--Comment! c'est le plus charmant des rois, et il
|
||||
faut boire ¨¤ sa sant¨¦.--Oh! tr¨¨s volontiers, messieurs. Et il
|
||||
boit. C'en est assez, lui dit-on, vous voil¨¤ l'appui, le
|
||||
soutien, le d¨¦fenseur, le h¨¦ros des Bulgares; votre fortune est
|
||||
faite, et votre gloire est assur¨¦e. On lui met sur-le-champ les
|
||||
fers aux pieds, et on le m¨¨ne au r¨¦giment. On le fait tourner ¨¤
|
||||
droite, ¨¤ gauche, hausser la baguette, remettre la baguette,
|
||||
coucher en joue, tirer, doubler le pas, et on lui donne trente
|
||||
coups de b<>0‰9ton; le lendemain, il fait l'exercice un peu moins
|
||||
mal, et il ne re<72>0Š4oit que vingt coups; le surlendemain, on ne lui
|
||||
en donne que dix, et il est regard¨¦ par ses camarades comme un
|
||||
prodige.
|
||||
|
||||
Candide, tout stup¨¦fait, ne d¨¦m¨ºlait pas encore trop bien comment
|
||||
il ¨¦tait un h¨¦ros. Il s'avisa un beau jour de printemps de
|
||||
s'aller promener, marchant tout droit devant lui, croyant que
|
||||
c'¨¦tait un privil¨¨ge de l'esp¨¨ce humaine, comme de l'esp¨¨ce
|
||||
animale, de se servir de ses jambes ¨¤ son plaisir. Il n'eut pas
|
||||
fait deux lieues que voil¨¤ quatre autres h¨¦ros de six pieds qui
|
||||
l'atteignent, qui le lient, qui le m¨¨nent dans un cachot. On lui
|
||||
demanda juridiquement ce qu'il aimait le mieux d'¨ºtre fustig¨¦
|
||||
trente-six fois par tout le r¨¦giment, ou de recevoir ¨¤-la-fois
|
||||
douze balles de plomb dans la cervelle. Il eut beau dire que les
|
||||
volont¨¦s sont libres, et qu'il ne voulait ni l'un ni l'autre, il
|
||||
fallut faire un choix; il se d¨¦termina, en vertu du don de Dieu
|
||||
qu'on nomme _libert¨¦_, ¨¤ passer trente-six fois par les
|
||||
baguettes; il essuya deux promenades. Le r¨¦giment ¨¦tait compos¨¦
|
||||
de deux mille hommes; cela lui composa quatre mille coups de
|
||||
baguette, qui, depuis la nuque du cou jusqu'au cul, lui
|
||||
d¨¦couvrirent les muscles et les nerfs. Comme on allait proc¨¦der
|
||||
¨¤ la troisi¨¨me course, Candide, n'en pouvant plus, demanda en
|
||||
gr<EFBFBD>0‰9ce qu'on voul<75>0‹4t bien avoir la bont¨¦ de lui casser la t¨ºte; il
|
||||
obtint cette faveur; on lui bande les yeux; on le fait mettre ¨¤
|
||||
genoux. Le roi des Bulgares passe dans ce moment, s'informe du
|
||||
crime du patient; et comme ce roi avait un grand g¨¦nie, il
|
||||
comprit, par tout ce qu'il apprit de Candide, que c'¨¦tait un
|
||||
jeune m¨¦taphysicien fort ignorant des choses de ce monde, et il
|
||||
lui accorda sa gr<67>0‰9ce avec une cl¨¦mence qui sera lou¨¦e dans tous
|
||||
les journaux et dans tous les si¨¨cles. Un brave chirurgien
|
||||
gu¨¦rit Candide en trois semaines avec les ¨¦mollients enseign¨¦s
|
||||
par Dioscoride. Il avait d¨¦j¨¤ un peu de peau et pouvait marcher,
|
||||
quand le roi des Bulgares livra bataille au roi des Abares.
|
||||
|
||||
|
||||
|
||||
CHAPITRE III.
|
||||
|
||||
Comment Candide se sauva d'entre les Bulgares, et ce qu'il
|
||||
devint.
|
||||
|
||||
|
||||
Rien n'¨¦tait si beau, si leste, si brillant, si bien ordonn¨¦ que
|
||||
les deux arm¨¦es. Les trompettes, les fifres, les hautbois, les
|
||||
tambours, les canons; formaient une harmonie telle qu'il n'y en
|
||||
eut jamais en enfer. Les canons renvers¨¨rent d'abord ¨¤ peu pr¨¨s
|
||||
six mille hommes de chaque c<>0‹0t¨¦; ensuite la mousqueterie <20>0‹0ta du
|
||||
meilleur des mondes environ neuf ¨¤ dix mille coquins qui en
|
||||
infectaient la surface. La ba<62>0Š7onnette fut aussi la raison
|
||||
suffisante de la mort de quelques milliers d'hommes. Le tout
|
||||
pouvait bien se monter ¨¤ une trentaine de mille <20>0‰9mes. Candide,
|
||||
qui tremblait comme un philosophe, se cacha du mieux qu'il put
|
||||
pendant cette boucherie h¨¦ro<72>0Š7que.
|
||||
|
||||
Enfin, tandis que les deux rois fesaient chanter des _Te Deum_,
|
||||
chacun dans son camp, il prit le parti d'aller raisonner ailleurs
|
||||
des effets et des causes. Il passa par-dessus des tas de morts
|
||||
et de mourants, et gagna d'abord un village voisin; il ¨¦tait en
|
||||
cendres: c'¨¦tait un village abare que les Bulgares avaient br<62>0‹4l¨¦,
|
||||
selon les lois du droit public. Ici des vieillards cribl¨¦s de
|
||||
coups regardaient mourir leurs femmes ¨¦gorg¨¦es, qui tenaient
|
||||
leurs enfants ¨¤ leurs mamelles sanglantes; l¨¤ des filles
|
||||
¨¦ventr¨¦es apr¨¨s avoir assouvi les besoins naturels de quelques
|
||||
h¨¦ros, rendaient les derniers soupirs; d'autres ¨¤ demi br<62>0‹4l¨¦es
|
||||
criaient qu'on achev<65>0‰9t de leur donner la mort. Des cervelles
|
||||
¨¦taient r¨¦pandues sur la terre ¨¤ c<>0‹0t¨¦ de bras et de jambes
|
||||
coup¨¦s.
|
||||
|
||||
Candide s'enfuit au plus vite dans un autre village: il
|
||||
appartenait ¨¤ des Bulgares, et les h¨¦ros abares l'avaient trait¨¦
|
||||
de m¨ºme. Candide, toujours marchant sur des membres palpitants
|
||||
ou ¨¤ travers des ruines, arriva enfin hors du th¨¦<C2A8>0‰9tre de la
|
||||
guerre, portant quelques petites provisions dans son bissac, et
|
||||
n'oubliant jamais mademoiselle Cun¨¦gonde. Ses provisions lui
|
||||
manqu¨¨rent quand il fut en Hollande; mais ayant entendu dire que
|
||||
tout le monde ¨¦tait riche dans ce pays-l¨¤, et qu'on y ¨¦tait
|
||||
chr¨¦tien, il ne douta pas qu'on ne le trait<69>0‰9t aussi bien qu'il
|
||||
l'avait ¨¦t¨¦ dans le ch<63>0‰9teau de M. le baron, avant qu'il en e<>0‹4t
|
||||
¨¦t¨¦ chass¨¦ pour les beaux yeux de mademoiselle Cun¨¦gonde.
|
||||
|
||||
Il demanda l'aum<75>0‹0ne ¨¤ plusieurs graves personnages, qui lui
|
||||
r¨¦pondirent tous que, s'il continuait ¨¤ faire ce m¨¦tier, on
|
||||
l'enfermerait dans une maison de correction pour lui apprendre ¨¤
|
||||
vivre.
|
||||
|
||||
Il s'adressa ensuite ¨¤ un homme qui venait de parler tout seul
|
||||
une heure de suite sur la charit¨¦ dans une grande assembl¨¦e. Cet
|
||||
orateur le regardant de travers lui dit: Que venez-vous faire
|
||||
ici? y ¨ºtes-vous pour la bonne cause? Il n'y a point d'effet sans
|
||||
cause, r¨¦pondit modestement Candide; tout est encha<68>0Š6n¨¦
|
||||
n¨¦cessairement et arrang¨¦ pour le mieux. Il a fallu que je fusse
|
||||
chass¨¦ d'aupr¨¨s de mademoiselle Cun¨¦gonde, que j'aie pass¨¦ par
|
||||
les baguettes, et il faut que je demande mon pain, jusqu'¨¤ ce que
|
||||
je puisse en gagner; tout cela ne pouvait ¨ºtre autrement. Mon
|
||||
ami, lui dit l'orateur, croyez-vous que le pape soit
|
||||
l'antechrist? Je ne l'avais pas encore entendu dire, r¨¦pondit
|
||||
Candide: mais qu'il le soit, ou qu'il ne le soit pas, je manque
|
||||
de pain. Tu ne m¨¦rites pas d'en manger, dit l'autre: va, coquin,
|
||||
va, mis¨¦rable, ne m'approche de ta vie. La femme de l'orateur
|
||||
ayant mis la t¨ºte ¨¤ la fen¨ºtre, et avisant un homme qui doutait
|
||||
que le pape f<>0‹4t antechrist, lui r¨¦pandit sur le chef un
|
||||
plein..... O ciel! ¨¤ quel exc¨¨s se porte le z¨¨le de la religion
|
||||
dans les dames!
|
||||
|
||||
Un homme qui n'avait point ¨¦t¨¦ baptis¨¦, un bon anabaptiste, nomm¨¦
|
||||
Jacques, vit la mani¨¨re cruelle et ignominieuse dont on traitait
|
||||
ainsi un de ses fr¨¨res, un ¨ºtre ¨¤ deux pieds sans plumes, qui
|
||||
avait une <20>0‰9me; il l'amena chez lui, le nettoya, lui donna du pain
|
||||
et de la bi¨¨re, lui fit pr¨¦sent de deux florins, et voulut m¨ºme
|
||||
lui apprendre ¨¤ travailler dans ses manufactures aux ¨¦toffes de
|
||||
Perse qu'on fabrique en Hollande. Candide se prosternant presque
|
||||
devant lui, s'¨¦criait: Ma<4D>0Š6tre Pangloss me l'avait bien dit que
|
||||
tout est au mieux dans ce monde, car je suis infiniment plus
|
||||
touch¨¦ de votre extr¨ºme g¨¦n¨¦rosit¨¦ que de la duret¨¦ de ce
|
||||
monsieur ¨¤ manteau noir, et de madame son ¨¦pouse.
|
||||
|
||||
Le lendemain, en se promenant, il rencontra un gueux tout couvert
|
||||
de pustules, les yeux morts, le bout du nez rong¨¦, la bouche de
|
||||
travers, les dents noires, et parlant de la gorge, tourment¨¦
|
||||
d'une toux violente, et crachant une dent ¨¤ chaque effort.
|
||||
|
||||
|
||||
|
||||
CHAPITRE IV.
|
||||
|
||||
Comment Candide rencontra son ancien ma<6D>0Š6tre de philosophie, le
|
||||
docteur Pangloss, et ce qui en advint.
|
||||
|
||||
|
||||
Candide, plus ¨¦mu encore de compassion que d'horreur, donna ¨¤ cet
|
||||
¨¦pouvantable gueux les deux florins qu'il avait re<72>0Š4us de son
|
||||
honn¨ºte anabaptiste Jacques. Le fant<6E>0‹0me le regarda fixement,
|
||||
versa des larmes, et sauta ¨¤ son cou. Candide effray¨¦ recule.
|
||||
H¨¦las! dit le mis¨¦rable ¨¤ l'autre mis¨¦rable, ne reconnaissez-vous
|
||||
plus votre cher Pangloss? Qu'entends-je? vous, mon cher ma<6D>0Š6tre!
|
||||
vous, dans cet ¨¦tat horrible! quel malheur vous est-il donc
|
||||
arriv¨¦? pourquoi n'¨ºtes-vous plus dans le plus beau des ch<63>0‰9teaux?
|
||||
qu'est devenue mademoiselle Cun¨¦gonde, la perle des filles, le
|
||||
chef-d'oeuvre de la nature? Je n'en peux plus, dit Pangloss.
|
||||
Aussit<EFBFBD>0‹0t Candide le mena dans l'¨¦table de l'anabaptiste, o¨´ il
|
||||
lui fit manger un peu de pain; et quand Pangloss fut refait: Eh
|
||||
bien! lui dit-il, Cun¨¦gonde? Elle est morte, reprit l'autre.
|
||||
Candide s'¨¦vanouit ¨¤ ce mot: son ami rappela ses sens avec un peu
|
||||
de mauvais vinaigre qui se trouva par hasard dans l'¨¦table.
|
||||
Candide rouvre les yeux. Cun¨¦gonde est morte! Ah! meilleur des
|
||||
mondes, o¨´ ¨ºtes-vous? Mais de quelle maladie est-elle morte? ne
|
||||
serait-ce point de m'avoir vu chasser du beau ch<63>0‰9teau de monsieur
|
||||
son p¨¨re ¨¤ grands coups de pied? Non, dit Pangloss, elle a ¨¦t¨¦
|
||||
¨¦ventr¨¦e par des soldats bulgares, apr¨¨s avoir ¨¦t¨¦ viol¨¦e autant
|
||||
qu'on peut l'¨ºtre; ils ont cass¨¦ la t¨ºte ¨¤ monsieur le baron qui
|
||||
voulait la d¨¦fendre; madame la baronne a ¨¦t¨¦ coup¨¦e en morceaux;
|
||||
mon pauvre pupille trait¨¦ pr¨¦cis¨¦ment comme sa soeur; et quant au
|
||||
ch<EFBFBD>0‰9teau, il n'est pas rest¨¦ pierre sur pierre, pas une grange,
|
||||
pas un mouton, pas un canard, pas un arbre; mais nous avons ¨¦t¨¦
|
||||
bien veng¨¦s, car les Abares en ont fait autant dans une baronnie
|
||||
voisine qui appartenait ¨¤ un seigneur bulgare.
|
||||
|
||||
A ce discours, Candide s'¨¦vanouit encore; mais revenu ¨¤ soi, et
|
||||
ayant dit tout ce qu'il devait dire, il s'enquit de la cause et
|
||||
de l'effet, et de la raison suffisante qui avait mis Pangloss
|
||||
dans un si piteux ¨¦tat. H¨¦las! dit l'autre, c'est l'amour:
|
||||
l'amour, le consolateur du genre humain, le conservateur de
|
||||
l'univers, l'<27>0‰9me de tous les ¨ºtres sensibles, le tendre amour.
|
||||
H¨¦las! dit Candide, je l'ai connu cet amour, ce souverain des
|
||||
coeurs, cette <20>0‰9me de notre <20>0‰9me; il ne m'a jamais valu qu'un
|
||||
baiser et vingt coups de pied au cul. Comment cette belle cause
|
||||
a-t-elle pu produire en vous un effet si abominable?
|
||||
|
||||
Pangloss r¨¦pondit en ces termes: O mon cher Candide! vous avez
|
||||
connu Paquette, cette jolie suivante de notre auguste baronne:
|
||||
j'ai go<67>0‹4t¨¦ dans ses bras les d¨¦lices du paradis, qui ont produit
|
||||
ces tourments d'enfer dont vous me voyez d¨¦vor¨¦; elle en ¨¦tait
|
||||
infect¨¦e, elle en est peut-¨ºtre morte. Paquette tenait ce
|
||||
pr¨¦sent d'un cordelier tr¨¨s savant qui avait remont¨¦ ¨¤ la source,
|
||||
car il l'avait eu d'une vieille comtesse, qui l'avait re<72>0Š4u d'un
|
||||
capitaine de cavalerie, qui le devait ¨¤ une marquise, qui le
|
||||
tenait d'un page, qui l'avait re<72>0Š4u d'un j¨¦suite, qui, ¨¦tant
|
||||
novice, l'avait eu en droite ligne d'un des compagnons de
|
||||
Christophe Colomb. Pour moi, je ne le donnerai ¨¤ personne, car
|
||||
je me meurs.
|
||||
|
||||
O Pangloss! s'¨¦cria Candide, voil¨¤ une ¨¦trange g¨¦n¨¦alogie!
|
||||
n'est-ce pas le diable qui en fut la souche? Point du tout,
|
||||
r¨¦pliqua ce grand homme; c'¨¦tait une chose indispensable dans le
|
||||
meilleur des mondes, un ingr¨¦dient n¨¦cessaire; car si Colomb
|
||||
n'avait pas attrap¨¦ dans une <20>0Š6le de l'Am¨¦rique cette maladie[1]
|
||||
qui empoisonne la source de la g¨¦n¨¦ration, qui souvent m¨ºme
|
||||
emp¨ºche la g¨¦n¨¦ration, et qui est ¨¦videmment l'oppos¨¦ du grand
|
||||
but de la nature, nous n'aurions ni le chocolat ni la cochenille;
|
||||
il faut encore observer que jusqu'aujourd'hui, dans notre
|
||||
continent, cette maladie nous est particuli¨¨re, comme la
|
||||
controverse. Les Turcs, les Indiens, les Persans, les Chinois,
|
||||
les Siamois, les Japonais, ne la connaissent pas encore; mais il
|
||||
y a une raison suffisante pour qu'ils la connaissent ¨¤ leur tour
|
||||
dans quelques si¨¨cles. En attendant elle a fait un merveilleux
|
||||
progr¨¨s parmi nous, et surtout dans ces grandes arm¨¦es compos¨¦es
|
||||
d'honn¨ºtes stipendiaires bien ¨¦lev¨¦s, qui d¨¦cident du destin des
|
||||
¨¦tats; on peut assurer que, quand trente mille hommes combattent
|
||||
en bataille rang¨¦e contre des troupes ¨¦gales en nombre, il y a
|
||||
environ vingt mille v¨¦rol¨¦s de chaque c<>0‹0t¨¦.
|
||||
|
||||
[1] Voyez tome XXXI, page 7. B.
|
||||
|
||||
|
||||
Voil¨¤ qui est admirable, dit Candide; mais il faut vous faire
|
||||
gu¨¦rir. Et comment le puis-je? dit Pangloss; je n'ai pas le sou,
|
||||
mon ami, et dans toute l'¨¦tendue de ce globe on ne peut ni se
|
||||
faire saigner, ni prendre un lavement sans payer, ou sans qu'il y
|
||||
ait quelqu'un qui paie pour nous.
|
||||
|
||||
Ce dernier discours d¨¦termina Candide; il alla se jeter aux pieds
|
||||
de son charitable anabaptiste Jacques, et lui fit une peinture si
|
||||
touchante de l'¨¦tat o¨´ son ami ¨¦tait r¨¦duit, que le bon-homme
|
||||
n'h¨¦sita pas ¨¤ recueillir le docteur Pangloss; il le fit gu¨¦rir ¨¤
|
||||
ses d¨¦pens. Pangloss, dans la cure, ne perdit qu'un oeil et une
|
||||
oreille. Il ¨¦crivait bien, et savait parfaitement
|
||||
l'arithm¨¦tique. L'anabaptiste Jacques en fit son teneur de
|
||||
livres. Au bout de deux mois, ¨¦tant oblig¨¦ d'aller ¨¤ Lisbonne
|
||||
pour les affaires de son commerce, il mena dans son vaisseau ses
|
||||
deux philosophes. Pangloss lui expliqua comment tout ¨¦tait on ne
|
||||
peut mieux. Jacques n'¨¦tait pas de cet avis. Il faut bien,
|
||||
disait-il, que les hommes aient un peu corrompu la nature, car
|
||||
ils ne sont point n¨¦s loups, et ils sont devenus loups. Dieu ne
|
||||
leur a donn¨¦ ni canons de vingt-quatre, ni ba<62>0Š7onnettes, et ils se
|
||||
sont fait des ba<62>0Š7onnettes et des canons pour se d¨¦truire. Je
|
||||
pourrais mettre en ligne de compte les banqueroutes, et la
|
||||
justice qui s'empare des biens des banqueroutiers pour en
|
||||
frustrer les cr¨¦anciers. Tout cela ¨¦tait indispensable,
|
||||
r¨¦pliquait le docteur borgne, et les malheurs particuliers font
|
||||
le bien g¨¦n¨¦ral; de sorte que plus il y a de malheurs
|
||||
particuliers, et plus tout est bien. Tandis qu'il raisonnait,
|
||||
l'air s'obscurcit, les vents souffl¨¨rent des quatre coins du
|
||||
monde, et le vaisseau fut assailli de la plus horrible temp¨ºte, ¨¤
|
||||
la vue du port de Lisbonne.
|
||||
|
||||
|
||||
CHAPITRE V.
|
||||
|
||||
Temp¨ºte, naufrage, tremblement de terre, et ce qui advint du
|
||||
docteur Pangloss, de Candide, et de l'anabaptiste Jacques.
|
||||
|
||||
La moiti¨¦ des passagers affaiblis, expirants de ces angoisses
|
||||
inconcevables que le roulis d'un vaisseau porte dans les nerfs et
|
||||
dans toutes les humeurs du corps agit¨¦es en sens contraires,
|
||||
n'avait pas m¨ºme la force de s'inqui¨¦ter du danger. L'autre
|
||||
moiti¨¦ jetait des cris et fesait des pri¨¨res; les voiles ¨¦taient
|
||||
d¨¦chir¨¦es, les m<>0‰9ts bris¨¦s, le vaisseau entr'ouvert. Travaillait
|
||||
qui pouvait, personne ne s'entendait, personne ne commandait.
|
||||
L'anabaptiste aidait un peu ¨¤ la manoeuvre; il ¨¦tait sur le
|
||||
tillac; un matelot furieux le frappe rudement et l'¨¦tend sur les
|
||||
planches; mais du coup qu'il lui donna, il eut lui-m¨ºme une si
|
||||
violente secousse, qu'il tomba hors du vaisseau, la t¨ºte la
|
||||
premi¨¨re. Il restait suspendu et accroch¨¦ ¨¤ une partie de m<>0‰9t
|
||||
rompu. Le bon Jacques court ¨¤ son secours, l'aide ¨¤ remonter, et
|
||||
de l'effort qu'il fait, il est pr¨¦cipit¨¦ dans la mer ¨¤ la vue du
|
||||
matelot, qui le laissa p¨¦rir sans daigner seulement le regarder.
|
||||
Candide approche, voit son bienfaiteur qui repara<72>0Š6t un moment, et
|
||||
qui est englouti pour jamais. Il veut se jeter apr¨¨s lui dans la
|
||||
mer: le philosophe Pangloss l'en emp¨ºche, en lui prouvant que la
|
||||
rade de Lisbonne avait ¨¦t¨¦ form¨¦e expr¨¨s pour que cet anabaptiste
|
||||
s'y noy<6F>0‰9t. Tandis qu'il le prouvait _¨¤ priori_, le vaisseau
|
||||
s'entr'ouvre, tout p¨¦rit ¨¤ la r¨¦serve de Pangloss, de Candide, et
|
||||
de ce brutal de matelot qui avait noy¨¦ le vertueux anabaptiste;
|
||||
le coquin nagea heureusement jusqu'au rivage, o¨´ Pangloss et
|
||||
Candide furent port¨¦s sur une planche.
|
||||
|
||||
Quand ils furent revenus un peu ¨¤ eux, ils march¨¨rent vers
|
||||
Lisbonne; il leur restait quelque argent, avec lequel ils
|
||||
esp¨¦raient se sauver de la faim apr¨¨s avoir ¨¦chapp¨¦ ¨¤ la temp¨ºte.
|
||||
|
||||
A peine ont-ils mis le pied dans la ville, en pleurant la mort de
|
||||
leur bienfaiteur, qu'ils sentent la terre trembler sous leurs
|
||||
pas[1]; la mer s'¨¦l¨¨ve en bouillonnant dans le port, et brise les
|
||||
vaisseaux qui sont ¨¤ l'ancre. Des tourbillons de flammes et de
|
||||
cendres couvrent les rues et les places publiques; les maisons
|
||||
s'¨¦croulent, les toits sont renvers¨¦s sur les fondements, et les
|
||||
fondements se dispersent; trente mille habitants de tout <20>0‰9ge et
|
||||
de tout sexe sont ¨¦cras¨¦s sous des ruines. Le matelot disait en
|
||||
sifflant et en jurant: il y aura quelque chose ¨¤ gagner ici.
|
||||
Quelle peut ¨ºtre la raison suffisante de ce ph¨¦nom¨¨ne? disait
|
||||
Pangloss. Voici le dernier jour du monde! s'¨¦criait Candide.
|
||||
Le matelot court incontinent au milieu des d¨¦bris, affronte la
|
||||
mort pour trouver de l'argent, en trouve, s'en empare, s'enivre,
|
||||
et ayant cuv¨¦ son vin, ach¨¨te les faveurs de la premi¨¨re fille de
|
||||
bonne volont¨¦ qu'il rencontre sur les ruines des maisons
|
||||
d¨¦truites, et au milieu des mourants et des morts. Pangloss le
|
||||
tirait cependant par la manche: Mon ami, lui disait-il, cela
|
||||
n'est pas bien, vous manquez ¨¤ la raison universelle, vous prenez
|
||||
mal votre temps. T¨ºte et sang, r¨¦pondit l'autre, je suis matelot
|
||||
et n¨¦ ¨¤ Batavia; j'ai march¨¦ quatre fois sur le crucifix dans
|
||||
quatre voyages au Japon[2]; tu as bien trouv¨¦ ton homme avec ta
|
||||
raison universelle!
|
||||
|
||||
|
||||
[1] Le tremblement de terre de Lisbonne est du 1er novembre 1755.
|
||||
B.
|
||||
|
||||
[2] Voyez tome XVIII, page 470. B.
|
||||
|
||||
|
||||
Quelques ¨¦clats de pierre avaient bless¨¦ Candide; il ¨¦tait ¨¦tendu
|
||||
dans la rue et couvert de d¨¦bris. Il disait ¨¤ Pangloss: H¨¦las!
|
||||
procure-moi un peu de vin et d'huile; je me meurs. Ce
|
||||
tremblement de terre n'est pas une chose nouvelle, r¨¦pondit
|
||||
Pangloss; la ville de Lima ¨¦prouva les m¨ºmes secousses en
|
||||
Am¨¦rique l'ann¨¦e pass¨¦e; m¨ºmes causes, m¨ºmes effets; il y a
|
||||
certainement une tra<72>0Š6n¨¦e de soufre sous terre depuis Lima jusqu'¨¤
|
||||
Lisbonne. Rien n'est plus probable, dit Candide; mais, pour
|
||||
Dieu, un peu d'huile et de vin. Comment probable? r¨¦pliqua le
|
||||
philosophe, je soutiens que la chose est d¨¦montr¨¦e. Candide
|
||||
perdit connaissance, et Pangloss lui apporta un peu d'eau d'une
|
||||
fontaine voisine.
|
||||
|
||||
Le lendemain, ayant trouv¨¦ quelques provisions de bouche en se
|
||||
glissant ¨¤ travers des d¨¦combres, ils r¨¦par¨¨rent un peu leurs
|
||||
forces. Ensuite ils travaill¨¨rent comme les autres ¨¤ soulager
|
||||
les habitants ¨¦chapp¨¦s ¨¤ la mort. Quelques citoyens, secourus
|
||||
par eux, leur donn¨¨rent un aussi bon d<>0Š6ner qu'on le pouvait dans
|
||||
un tel d¨¦sastre: il est vrai que le repas ¨¦tait triste; les
|
||||
convives arrosaient leur pain de leurs larmes; mais Pangloss les
|
||||
consola, en les assurant que les choses ne pouvaient ¨ºtre
|
||||
autrement: Car, dit-il, tout ceci est ce qu'il y a de mieux; car
|
||||
s'il y a un volcan ¨¤ Lisbonne, il ne pouvait ¨ºtre ailleurs; car
|
||||
il est impossible que les choses ne soient pas o¨´ elles sont, car
|
||||
tout est bien.
|
||||
|
||||
Un petit homme noir, familier de l'inquisition, lequel ¨¦tait ¨¤
|
||||
c<EFBFBD>0‹0t¨¦ de lui, prit poliment la parole et dit: Apparemment que
|
||||
monsieur ne croit pas au p¨¦ch¨¦ originel; car si tout est au
|
||||
mieux, il n'y a donc eu ni chute ni punition.
|
||||
|
||||
Je demande tr¨¨s humblement pardon ¨¤ votre excellence, r¨¦pondit
|
||||
Pangloss encore plus poliment, car la chute de l'homme et la
|
||||
mal¨¦diction entraient n¨¦cessairement dans le meilleur des mondes
|
||||
possibles. Monsieur ne croit donc pas ¨¤ la libert¨¦? dit le
|
||||
familier. Votre excellence m'excusera, dit Pangloss; la libert¨¦
|
||||
peut subsister avec la n¨¦cessit¨¦ absolue; car il ¨¦tait n¨¦cessaire
|
||||
que nous fussions libres; car enfin la volont¨¦ d¨¦termin¨¦e......
|
||||
Pangloss ¨¦tait au milieu de sa phrase, quand Je familier fit un
|
||||
signe de t¨ºte ¨¤ son estafier qui lui servait ¨¤ boire du vin de
|
||||
Porto ou d'Oporto.
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue