Class ClickHouseDataProcessor

java.lang.Object
com.clickhouse.data.ClickHouseDataProcessor

public abstract class ClickHouseDataProcessor extends Object
This defines a data processor for dealing with serialization and deserialization of one or multiple ClickHouseFormat. Unlike ClickHouseDeserializer and ClickHouseSerializer, which is for specific column or data type, data processor is a combination of both, and it can handle more scenarios like separator between columns and rows.
  • Field Details

  • Constructor Details

  • Method Details

    • hasMoreToRead

      protected boolean hasMoreToRead() throws UncheckedIOException
      Checks whether there's more to read from input stream.
      Returns:
      true if there's more; false otherwise
      Throws:
      UncheckedIOException - when failed to read data from input stream
    • buildDeserializeSteps

      protected ClickHouseDeserializer[] buildDeserializeSteps(ClickHouseColumn column)
      Builds list of steps to deserialize value for the given column.
      Parameters:
      column - non-null column
      Returns:
      non-null list of steps for deserialization
    • buildSerializeSteps

      protected ClickHouseSerializer[] buildSerializeSteps(ClickHouseColumn column)
      Builds list of steps to serialize value for the given column.
      Parameters:
      column - non-null column
      Returns:
      non-null list of steps for serialization
    • getInitializedSerDe

      protected final ClickHouseDataProcessor.DefaultSerDe getInitializedSerDe() throws UncheckedIOException
      Throws:
      UncheckedIOException
    • initRecords

      protected Iterator<ClickHouseRecord> initRecords()
      Initializes iterator of ClickHouseRecord for reading values record by record. Usually this should be only called once during instantiation.
      Returns:
      non-null iterator of ClickHouseRecord
    • initValues

      protected Iterator<ClickHouseValue> initValues()
      Initializes iterator of ClickHouseValue for reading values one by one. Usually this should be only called once during instantiation.
      Returns:
      non-null iterator of ClickHouseValue
    • readAndFill

      protected void readAndFill(ClickHouseRecord r) throws IOException
      Reads columns(starting from readPosition) from input stream and fill deserialized data into the given record. This method is only used when iterating through records().
      Parameters:
      r - non-null record to fill
      Throws:
      IOException - when failed to read columns from input stream
    • readAndFill

      protected void readAndFill(ClickHouseValue value) throws IOException
      Reads next column(at readPosition from input stream and fill deserialized data into the given value object. This method is mainly used when iterating through values(). In default implementation, it's also used in readAndFill(ClickHouseRecord) for simplicity.
      Parameters:
      value - non-null value object to fill
      Throws:
      IOException - when failed to read column from input stream
    • readColumns

      protected abstract List<ClickHouseColumn> readColumns() throws IOException
      Reads columns from input stream. Usually this will be only called once during instantiation.
      Returns:
      non-null list of columns
      Throws:
      IOException - when failed to read columns from input stream
    • hasExtraProperties

      public boolean hasExtraProperties()
      Checks whether the processor contains extra property.
      Returns:
      true if the processor has extra property; false otherwise
    • getExtraProperty

      public <T extends Serializable> T getExtraProperty(String key, Class<T> valueClass)
      Gets a typed extra property.
      Type Parameters:
      T - type of the property value
      Parameters:
      key - key of the property
      valueClass - non-null Java class of the property value
      Returns:
      typed extra property, could be null
    • getDeserializer

      public abstract ClickHouseDeserializer getDeserializer(ClickHouseDataConfig config, ClickHouseColumn column)
    • getDeserializers

      public final ClickHouseDeserializer[] getDeserializers(ClickHouseDataConfig config, List<ClickHouseColumn> columns)
    • getSerializer

      public abstract ClickHouseSerializer getSerializer(ClickHouseDataConfig config, ClickHouseColumn column)
    • getSerializers

      public final ClickHouseSerializer[] getSerializers(ClickHouseDataConfig config, List<ClickHouseColumn> columns)
    • getColumns

      public final List<ClickHouseColumn> getColumns()
      Gets list of columns to process.
      Returns:
      list of columns to process
    • getInputStream

      public final ClickHouseInputStream getInputStream()
      Gets input stream.
      Returns:
      input stream, could be null
    • getOutputStream

      public final ClickHouseOutputStream getOutputStream()
      Gets output stream.
      Returns:
      output stream, could be null
    • records

      public final Iterable<ClickHouseRecord> records()
      Returns an iterable collection of records which can be walked through in a foreach-loop. Please pay attention that: 1) UncheckedIOException might be thrown when iterating through the collection; and 2) it's not supposed to be called for more than once because the input stream will be closed at the end of reading.
      Returns:
      non-null iterable records
      Throws:
      UncheckedIOException - when failed to access the input stream
    • records

      public final <T> Iterable<T> records(Class<T> objClass)
      Returns an iterable collection of mapped objects which can be walked through in a foreach loop. Same as records(objClass, null).
      Type Parameters:
      T - type of the mapped object
      Parameters:
      objClass - non-null class of the mapped object
      Returns:
      non-null iterable collection
      Throws:
      UncheckedIOException - when failed to read data(e.g. deserialization)
    • records

      public <T> Iterable<T> records(Class<T> objClass, T template)
      Returns an iterable collection of mapped objects which can be walked through in a foreach loop. When objClass is null or ClickHouseRecord, this is same as calling records().
      Type Parameters:
      T - type of the mapped object
      Parameters:
      objClass - non-null class of the mapped object
      template - optional template object to reuse
      Returns:
      non-null iterable collection
      Throws:
      UncheckedIOException - when failed to read data(e.g. deserialization)
    • values

      public final Iterable<ClickHouseValue> values()
      Returns an iterable collection of values which can be walked through in a foreach-loop. In general, this is slower than records(), because the latter reads data in bulk. However, it's particular useful when you're reading large values with limited memory - e.g. a binary field with a few GB bytes. Similarly, the input stream will be closed at the end of reading.
      Returns:
      non-null iterable values
      Throws:
      UncheckedIOException - when failed to access the input stream
    • read

      public ClickHouseValue read(ClickHouseValue value) throws IOException
      Reads deserialized value of next column(at readPosition) directly from input stream. Unlike records(), which reads multiple values at a time, this method will only read one for each call.
      Parameters:
      value - value to update, could be null
      Returns:
      updated value or a new ClickHouseValue when it is null
      Throws:
      IOException - when failed to read data from input stream
    • write

      public void write(ClickHouseValue value) throws IOException
      Writes serialized value of next column(at readPosition) to output stream.
      Parameters:
      value - non-null value to be serialized
      Throws:
      IOException - when failed to write data to output stream