Pushkar Prasad
2013-03-14 07:52:02 UTC
Hi,
I have the following schema in Cassandra 1.2.1:
+ TimeStamp
+ MACAddress
+ Data Transfer
+ LocationID
+ MacAddressCopy // Copy of MAC Address
** Primary KEY(TimeStamp, MacAddress) // Composite key,
partitioned on TimeStamp
There are close to 500K different MAC Address, and 10K timestamps. So a
total of 5 billion records are there. Each record is 50 bytes, so total size
of the data is 250 GB. I have a 4 node cluster with no replication where all
this data is stored.
When I created a secondary index on MacAddressCopy, and search for a
particular value of MAC, then I expect to get back 10K records (with
different timestamps) for that MAC Address. Since it is indexed, I expect it
to give a quick response, however, I am experiencing RPC Timeouts, and the
query does not respond.
Is there any reason why this should be so slow? Is there too much of disk
seek which is causing such timeouts? Is getting 10K records asking for too
much?
- Pushkar
I have the following schema in Cassandra 1.2.1:
+ TimeStamp
+ MACAddress
+ Data Transfer
+ LocationID
+ MacAddressCopy // Copy of MAC Address
** Primary KEY(TimeStamp, MacAddress) // Composite key,
partitioned on TimeStamp
There are close to 500K different MAC Address, and 10K timestamps. So a
total of 5 billion records are there. Each record is 50 bytes, so total size
of the data is 250 GB. I have a 4 node cluster with no replication where all
this data is stored.
When I created a secondary index on MacAddressCopy, and search for a
particular value of MAC, then I expect to get back 10K records (with
different timestamps) for that MAC Address. Since it is indexed, I expect it
to give a quick response, however, I am experiencing RPC Timeouts, and the
query does not respond.
Is there any reason why this should be so slow? Is there too much of disk
seek which is causing such timeouts? Is getting 10K records asking for too
much?
- Pushkar