Class StanfordParser::StandoffNode
In: lib/stanfordparser.rb
Parent: Treebank::Node

Standoff syntactic tree annotation of text. Terminal nodes are labeled with the appropriate StandoffToken objects. Standoff parses can reproduce the original string from which they were generated verbatim, optionally with brackets around the yields of specified non-terminal nodes.

Methods

Public Class methods

Create the standoff tree from a tree returned by the Stanford parser. For non-terminal nodes, the tokens argument will be a StandoffSentence containing the StandoffToken objects representing all the tokens beneath and after this node. For terminal nodes, the tokens argument will be a StandoffToken.

[Source]

# File lib/stanfordparser.rb, line 365
    def initialize(stanford_parser_node, tokens)
      # Annotate this node with a non-terminal label or a StandoffToken as
      # appropriate.
      super(tokens.instance_of?(StandoffSentence) ?
            stanford_parser_node.value : tokens)
      # Enumerate the children depth-first.  Tokens are removed from the list
      # left-to-right as terminal nodes are added to the tree.
      stanford_parser_node.children.each do |child|
        subtree = self.class.new(child, child.leaf? ? tokens.shift : tokens)
        attach_child!(subtree)
      end
    end

Public Instance methods

Print the original string with brackets around word spans dominated by the specified consituents.

The constituents to bracket are specified by passing a list of node coordinates, which are arrays of integers of the form returned by the tree enumerators of Treebank::Node objects.

coords:the coordinates of the nodes around which to place brackets
open:the open bracket symbol
close:the close bracket symbol

[Source]

# File lib/stanfordparser.rb, line 395
    def to_bracketed_string(coords, open = "[", close = "]")
      # Get a list of all the leaf nodes and their coordinates.
      items = depth_first_enumerator(true).find_all {|n| n.first.leaf?}
      # Enumerate over all the matching constituents inserting open and close
      # brackets around their yields in the items list.
      coords.each do |matching|
        # Insert using a simple state machine with three states: :start,
        # :open, and :close.
        state = :start
        # Enumerate over the items list looking for nodes that are the
        # children of the matching constituent.
        items.each_with_index do |item, index|
          # Skip inserted bracket characters.
          next if item.is_a? String
          # Handle terminal node items with the state machine.
          node, terminal_coordinate = item
          if state == :start
            next if not in_yield?(matching, terminal_coordinate)
            items.insert(index, open)
            state = :open
          else # state == :open
            next if in_yield?(matching, terminal_coordinate)
            items.insert(index, close)
            state = :close
            break
          end
        end # items.each_with_index
        # Handle the case where a matching constituent is flush with the end
        # of the sentence.
        items << close if state == :open
      end # each
      # Replace terminal nodes with their string representations.  Insert
      # spacing characters in the list.
      items.each_with_index do |item, index|
        next if item.is_a? String
        text = item.first.label.current
        spacing = item.first.label.after
        # Replace the terminal node with its text.
        items[index] = text
        # Insert the spacing that comes after this text before the first
        # non-close bracket character.
        close_pos = find_index(items[index+1..-1]) {|item| not item == close}
        items.insert(index + close_pos + 1, spacing)
      end
      items.join
    end

Return the original text string dominated by this node.

[Source]

# File lib/stanfordparser.rb, line 379
    def to_original_string
      leaves.inject("") do |s, leaf|
        s += leaf.label.current + leaf.label.after
      end
    end

[Validate]