博客主页 所有文章 标签 关于我
img

stone

soft-engineering

Linxia Yao

personal site

welcome to my home ~


  • 博客主页
  • 所有文章
  • 标签
  • 关于我
  1. 汉字转拼音

    参考 http://pypinyin.mozillazg.com/zh_CN/master/_modules/pypinyin/core.html代码from pypinyin import pinyin, lazy_pinyin, Styledef to_pinyin(var_str): if isinstance(var_str, str): if var_str == "None": return "" else: ...…

    2019-12-22
    文本纠错
    阅读全文 »

  2. 华为机试助攻

    …

    2019-12-21
    阅读全文 »

  3. Hybrid Attention for Chinese Character-Level Neural Machine Translation

    简介 提出了一种新颖的机器翻译模型,关注字符级别和单词级别的attention 使用双向GRU 两种不同的attention,一种是字符级别,输入是原始字符,另一种是单词级别,可以自动生成组合单词 有了这两种attention,模型可以同步组合单词级别和词级别 实验在中英文翻译的任务上进行,BLEU值有所提高模型encoder 包含两个单独的双向GRU 其中一个被用来使用转移字符嵌入序列$x_1$,…..为隐藏变量序列$hc_1$,….. 另一个将单词序列的表示$w_1$,...…

    2019-12-20
    文献阅读
    阅读全文 »

  4. seq2seq

    数据预处理 < PAD>: 补全字符。 < EOS>: 解码器端的句子结束标识符。 < UNK>: 低频词或者一些未遇到过的词等。 < GO>: 解码器端的句子起始标识符。模型构建encoder层 定义输入的tensor 对字母进行embedding tf.contrib.layers.embed_sequence: 对于输入进行embedding 我们来看一个栗子,假如我们有一个batch=2,s...…

    2019-12-19
    机器学习
    阅读全文 »

  5. Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation

    简介提出了一种新颖的encoder-decoder框架,叫做WEAN。所提出的模型通过查分布式单词表示,以捕获相关单词的含义。实验在两个语句对象生成的任务上进行。创新点给出一个原文本,encoder层压缩原始文本到一个全连接代表的词向量表示,decoder层生成了释义文本。为了预测一个单词,decoder层使用隐藏层的输出去查询词嵌入,词嵌入获取了所有候选词,并且返回最符合查询的词嵌入,被选中的单词就作为预测的token,它的embedding然后被用来作为LSTM下一个时间步的输入。经过...…

    2019-12-19
    文献阅读
    阅读全文 »

  6. 数据预处理

    正则匹配格式处理数据格式处理部分见文件:/Users/stone/PycharmProjects/ocr_Correction/data_process/fanti2jianti.py 繁体转简体 训练集,测试集格式转换# -*- coding: utf8 -*-def test_opencc(): import opencc cc = opencc.OpenCC('t2s') print(cc.convert(u'Open Chinese Convert(OpenC...…

    2019-12-16
    文本纠错
    阅读全文 »

  7. 人才发展战略

    简介部分内容…

    2019-12-14
    阅读全文 »

  8. pointer网络

    简介通过对attention机制的简化而得到传统的seq2seq模型是无法解决输出序列的词汇表会随着输入序列长度的改变而改变的问题的,如寻找凸包等。输出是输入集合的子集,考虑能不能找到一种结构类似编程语言中的指针, 每个指针对应输入序列中的一个元素。可以直接操作输入序列不需要特意设定指定输出词汇表…

    2019-11-20
    机器学习
    阅读全文 »

  9. 求一个数的全部子集

    题目要求给定一个数组,求数组中元素的所有子集比如: 123=1,2,3,12,23,123代码如下# coding=utf-8# 求数组全部的子集,123=1,2,3,12,23,123def find_subSet(s=[1, 2, 3]): # l: 存放全部的子集 l = [] len_l = len(s) win = 1 # 子集为单独的 for i in range(len_l): l.append(s[i]) while ...…

    2019-11-15
    编程之法刷题笔记
    阅读全文 »

  10. NLP理论基础

    资料视频课,第六阶段-自然语言处理篇-1-word2vec环境NLTK经典应用 情感分析 元素频率表示文本特征使用矩阵统计,每个词出现的频率…

    2019-11-15
    NLP
    阅读全文 »

  11. mnist分类

    参考网址 https://tensorflow.google.cn/tutorials/keras/classification?hl=zh-cn代码from __future__ import absolute_import, division, print_function, unicode_literalsimport tensorflow as tffrom tensorflow import kerasimport numpy as npimport matplotlib.py...…

    2019-11-12
    机器学习
    阅读全文 »

  12. Karabiner-Elements

    安装地址Karabiner-Elements https://xclient.info/s/keyboard-maestro.html#versions教程指南 https://juejin.im/post/5a8ad7df51882528b640046e…

    2019-11-11
    环境工具安装
    阅读全文 »

  13. Confusionset-guided Pointer Networks for Chinese Spelling Check

    摘要This paper proposes Confusionset-guided Pointer Networks for Chinese Spell Check (CSC) task. More concretely, our approach utilizes the off-the-shelf confusionset for guiding the character generation. To this end, our novel Seq2Seq model jointly...…

    2019-11-09
    文献阅读
    阅读全文 »

  14. Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning

    摘要Spelling correction is a well-known task in Natural Language Processing (NLP). Automatic spelling correction is important for many NLP applications like web search engines, text summarization, sentiment analysis etc. Most approaches use parallel...…

    2019-11-09
    文献阅读
    阅读全文 »

  15. Adapting sequence models for sentence correction

    摘要In a controlled experiment of sequence-to- sequence approaches for the task of sen- tence correction, we find that character- based models are generally more effec- tive than word-based models and models that encode subword information via con- ...…

    2019-11-09
    文献阅读
    阅读全文 »

  16. A multilayer convolutional encoder-decoder neural network for grammatical error correction

    摘要We improve automatic correction of grammatical, orthographic, and collocation errors in text using a multilayer convolutional encoder-decoder neural network. The network is initialized with embeddings that make use of character Ngram information...…

    2019-11-09
    文献阅读
    阅读全文 »

  17. A New Benchmark and Evaluation Schema for Chinese Typo Detection and Correction

    摘要Despite the vast amount of research related to Chinese typodetection, we still lack a publicly available benchmark datasetfor evaluation. Furthermore, no precise evaluation schema forChinese typo detection has been defined. In response to thesep...…

    2019-11-09
    文献阅读
    阅读全文 »

  18. A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check

    摘要Chinese spelling check (CSC) is a challenging yet meaningful task, which not only serves as a preprocessing in many natural language processing (NLP) applications, but also facilitates reading and understanding of running texts in peoples’ daily...…

    2019-11-09
    文献阅读
    阅读全文 »

  19. A cost efficient approach to correct ocr errors in large document collections

    layout: posttitle: "A Cost Efficient Approach to Correct OCR Errors in Large Document Collections"tag: 文献阅读摘要Abstract—Word error rate of an OCR is often higher than its character error rate. This is specially true when OCRs are designed by recogni...…

    2019-11-09
    阅读全文 »

  20. Neural language correction with character Based attention

    layout: posttitle: "Neural Language Correction with Character-Based Attention"tag: 文献阅读摘要Natural language correction has the potential to help language learners improve their writing skills. While approaches with separate classifiers for different ...…

    2019-11-08
    阅读全文 »


← 最近 3 / 15 更早 →
  • Weibo
  • Github
  • Twitter
  • RSS
  • Email

Copyright © Linxia Yao 2020 Theme by leopardpan |

本站总访问量 次