What to Extract?

We have a text file, which including texts, numbers and time. We want to extract some time information, for instance hour field from the whole data.

One row from original data:

1
2
Unnamed: 0,createdAT,createdATUnixTime,msgtext,WGS84Latitude,WGS84Longitude,class
2506097,2014-03-16 15:52:05,1394985125,连环撞车啊,五部车子 我在这里:,31.2415853271484,121.417396393555,1

For this record, what we want to extract is hour field 15 from column createdAT. We load the data set into database and use EXTRACT command to do this.

An example:

1
2
3
4
5
6
database=# SELECT EXTRACT(HOUR FROM TIMESTAMP '2014-03-16 15:52:05');

date_part
-----------
15
(1 row)


How to extract?

  1. Load data into Database by ‘Command’

    Input the bash in terminal.

    1
    pgcsv --db postgresql://localhost/yourdatabase table /Users/xxx/Documents/xxx.csv
  2. Convert char to timestamp

    1
    2
    3
    4
    ALTER TABLE table 
    ALTER COLUMN createdAT
    TYPE TIMESTAMP WITHOUT TIME ZONE
    USING to_timestamp(createdAT, 'YYYY-MM-DD HH24:MI:SS');
  3. Add a new column

    1
    2
    ALTER TABLE table
    ADD COLUMN hour <span class="built_in">smallint;
  4. Extract hour field and save it in a column

    1
    UPDATE ntt_cluster SET hour = EXTRACT(HOUR FROM createdat);
  5. Check extract result

    1
    2
    3
    4
    5
    6
    7
    database=# select * from table limit 1;

    unnamed_0 | createdat | createdatunixtime | msgtext | wgs84latitude | wgs84longitude | class | hour
    -----------+---------------------+-------------------+--------------------------------+------------------+------------------+-------+------
    2506097 | 2014-03-16 15:52:05 | 1394985125 | 连环撞车啊,五部车子 我在这里: | 31.2415853271484 | 121.417396393555 | 1 | 15

    (1 row)

References

[1] PostgreSQL 8.2.23 Documentation | Chapter 9. Functions and Operators
[2] StackOverFlow | Updating postgreSQL column with EXTRACT function (More than one row returned by a subquery used as an expression)